Ricardo Dias

Mar 09 2016

Digital Forensics and Incident Response, Malware Analysis, Threat Intelligence

Unleashing YARA – Part 3

In the second post of this series we introduced an incident response challenge based on the static analysis of a suspicious executable file. The challenge featured 6 indicators that needed to be extracted from the analysis in order to create a YARA rule to match the suspicious file. In part 3 we will step through YARA’s PE, Hash and Math modules functions and how they can help you to meet the challenge objectives. Lets recap the challenge objectives and map it with the indicators we extracted from static analysis:

a suspicious string that seems to be related with debug information
- dddd.pdb
the MD5 hash of the .text section
- 2a7865468f9de73a531f0ce00750ed17
the .rsrc section with high entropy
- .rsrc entropy is 7.98
the symbol GetTickCount import
- Kernel32.dll GetTickCount is present in the IAT
the rich signature XOR key
- 2290058151
must be a Windows executable file
- 0x4D5A (MZ) found at file offset zero

In part 2 we created a YARA rule file named rule.yar, with the following content:

import "pe"

If you remember the exercise, we needed the PE module in order to parse the sample and extract the Rich signature XOR key. We will use this rule file to develop the remaining code.

The debug information string

In part 1 I have introduced YARA along with the rule format, featuring the strings and condition sections. When you add the dddd.pdb string condition the rule code should be something like:

yara_3_1

The code above depicts a simple rule object made of a single string variable named $str01 with the value set to the debug string we found.

The section hash condition

Next item to be added to the condition is the .text section hash, using both PE and HASH modules. To do so we will iterate over the PE file sections using two PE module functions: the number_of_sections and sections. The former will be used to iterate over the PE sections, the latter will allow us to fetch section raw_data_offset, or file offset, and raw_data_size, that will be passed as arguments to md5 hash function, in order to compute the md5 hash of the section data:

yara_3_2

The condition expression now features the for operator comprising two conditions: the section md5 hash and the section name. In essence, YARA will loop through every PE section until it finds a match on the section hash and name.

The resource entropy value

Its now time to add the resource entropy condition. To do so, we will rely on the math module, which will allow us to calculate the entropy of a given size of bytes. Again we will need to iterate over the PE sections using two conditions: the section entropy and the section name (.rsrc):

yara_3_3

Again we will loop until we find a match, that is a section named .rsrc with entropy above or equal to 7.0. Remember that entropy minimum value is 0.0 and maximum is 8.0, therefore 7.0 is considered high entropy and is frequently associated with packing [1]. Bear in mind that compressed data like images and other types of media can display high entropy, which might result in some false positives [2].

The GetTickCount import

Lets continue improving our YARA rule by adding the GetTickCount import to the condition. For this purpose lets use the PE module imports function that will take two arguments: the library and the DLL name. The GetTickCount function is exported by Kernel32.DLL, so when we passe these arguments to the pe.imports function the rule condition becomes:

yara_3_4

Please note that the DLL name is case insensitive [3].

The XOR key

Our YARA rule is almost complete, we now need to add the rich signature key to the condition. In this particular case the PE module provides the rich_signature function which allow us to match various attributes of the rich signature, in this case the key. The key will be de decimal value of dword used to encode the contents with XOR:

yara_3_5

Remember that the XOR key can be obtained either by inspecting the file with a hexdump of the PE header or using YARA PE module parsing capabilities, detailed in part 2 of this series.

The PE file type

Ok, we are almost done. The last condition will ensure that the file is a portable executable file. In part two of this series we did a quick hex dump of the samples header, which revealed the MZ (ASCII) at file offset zero, a common file signature for PE files. We will use the YARA int## functions to access data at a given position. The int## functions read 8, 16 and 32 bits signed integers, whereas the uint## reads unsigned integers. Both 16 and 32 bits are considered to be little-endian, for big-endian use int##be or uint##be.

Since checking only the first two bytes of the file can lead to false positives we can use a little trick to ensure the file is a PE, by looking for particular PE header values. Specifically we will check for the IMAGE_NT_HEADER Signature member, a dword with value “PE\0\0”. Since the signature file offset is variable we will need to rely on the IMAGE_DOS_HEADER e_lfanew field. e_lfanew value is the 4 byte physical offset of the PE Signature and its located at physical offset 0x3C [4].

With the conditions “MZ” and “PE\0\0” and respective offsets we will use uint16 and uint32 respectively:

yara_3_6

Note how we use the e_lfanew value to pivot the PE Signature, the first uint32 function output, the 0x3C offset, is used as argument in the second uint32 function, which must match the expected value “PE\0\0”.

Conclusion

Ok! We are done, last step is to test the rule against the file using the YARA tool and our brand new rule file rule.yar:

yara_3_7

YARA scans the file and, as expected, outputs the rule matched rule ID, in our case malware001.

A final word on YARA performance

While YARA performance might be of little importance if you are scanning a dozen of files, poorly written rules can impact significantly when scanning thousands or millions of files. As a rule of thumb you are advised to avoid using regex statements. Additionally you should ensure that false conditions appear first in the rules condition, this feature is named short-circuit evaluation and it was introduced in YARA 3.4.0 [5]. So how can we improve the rule we just created, in order to leverage YARA performance? In this case we can move the last condition, the PE file check signature, to the top of the statement, by doing so we will avoid checking for the PE header conditions if the file is an executable (i.e. PDF, DOC, etc). Lets see how the new rule looks like:

yara_3_8

If you like to learn more about YARA performance, check the Yara performance guidelines by Florian Roth, as it features lots of tips to keep your YARA rules resource friendly.

References

Structural Entropy Analysis for Automated Malware Classification
Practical Malware Analysis, The Hands-On Guide to Dissecting Malicious Software, Page 283.
YARA Documentation v.3.4.0, PE Module
The Portable Executable File Format
YARA 3.4.0 Release notes

Tagged Malware Analysis, REMnux, Yar

Feb 18 2016

2 Comments

Digital Forensics and Incident Response, Malware Analysis

Unleashing YARA – Part 2

In the first post of this series we uncovered YARA and demonstrated couple of use case that that can be used to justify the integration of this tool throughout the enterprise Incident Response life-cycle. In this post we will step through the requirements for the development of YARA rules specially crafted to match patterns in Windows portable executable “PE” files. Additionally, we will learn how to take advantage of Yara modules in order to create simple but effective rules. Everything will be wrapped-up in a use case where an incident responder, that will be you, will create YARA rules based on the static analysis of a PE file.

Specifically, the use case scenario will be split into two posts. In part 2 we will start with an incident report that will introduce a simple rule development challenge, solely based on static analysis. In the part 3, will cover rule creation, performance tuning and troubleshooting.

Prerequisites

Before we begin you will need a Linux distribution with the following tools:

YARA 3.4.0 (get it here)
pescanner.py (get it here)

If you are in a hurry I advise you to pick REMnux, Lenny Zeltser’s popular Linux distro for malware analysis, which include a generous amount of tools and frameworks used in the dark art of malware analysis and reverse engineering. REMnux is available for download here.

Additionally you will need a piece of malware to analyse, you can get your own copy of the sample from Malwr.com:

Malwr.com report link here

Sample MD5: f38b0f94694ae861175436fcb3981061

WARNING: this is real malware, ensure you will do your analysis in a controlled, isolated and safe environment, like a temporary virtual machine.

Incident Report

Its Wednesday 4:00PM when a incident report notification email drops on your mailbox. It seems that a Network IPS signature was triggered by a suspicious HTTP file download (f38b0f94694ae861175436fcb3981061) hash of a file. You check the details of the IPS alert to see if it stored the sample in a temporary repository for in-depth analysis. You find that the file was successfully stored and its of type PE (executable file), definitely deserves to be look at. After downloading the file you do the usual initial static analysis: Google for the MD5, lookup the hash in Virustotal, analyse the PE header of the file looking for malicious intent. Right of the bat the sample provides a handful of indicators that will help you to understanding how the file will behave during execution. Just what you needed to start developing your own YARA rules.

The challenge

Create a YARA rule that matches the following conditions:

a suspicious string that seems to be related with debug information
the MD5 hash of the .text section
the .rsrc section with high entropy
the symbol GetTickCount import
the rich signature XOR key
must be a Windows executable file

Static Analysis

Before we continue let me write that the details concerning the structure of the PE file are omitted for the sake of brevity. Please see here and here for more information on PE header structure. Onward!

The first challenge is to find a string related with debug information left by the linker [1], specifically we will be looking for a program database file path (i.e. PDB). Lets run the strings command to output the ASCII strings:

strings output

Amid the vast output the dddd.pdb string stands out. This is probably what we are looking for. Note that is important to output the file offset in decimal with -t d suffix so that you can pinpoint the string location within the file structure. If the string is indeed related to debug information it should be part of the RSDS header. Let’s dump a few bytes of the sample using the 99136 offset as a pivot:

xxd output

The presence of RSDS string gives us the confidence to select the string dddd.pdb as the string related to the debug information.

Next we need to compute the hash of the .text section, that typically contains the executable code [2], for this task we will use hiddenillusion’s version of pescanner.py [3] using the sample name as argument:

pescanner.py initial report

pescanner.py report on sections, resources and imports on the PE file

pescanner.py outputs an extensive report about the PE header structure, on which it includes the list of sections along with the hash. Take note of the .text section MD5 hash (2a7865468f9de73a531f0ce00750ed17) as we will need to use it later when creating the YARA rule.

Also in the pescanner.py report we are informed that the .rsrc section as high entropy. This is a suspicious indicator for the presence of heavily obfuscated code. Please keep this in mind when creating the rule, as this info will help us answering the third item in the challenge. Lastly the report also features the list of imported symbols, in which we can see the presence of GetTickCount, a well known anti-debugging timing function [4]. This will be required to answer the fourth entry of the challenge. By the way, the report also mentions the file type, indicating we are in the presence of a PE32 file, which matches the sixth item of the challenge.

Lastly we need to get our hands on the XOR key used to encode the Rich signature, read more about the Rich signature here. You can check existence of this key in two ways: traditionally you would dump the first bytes of the sample, enough to cover all the DOS Header in the PE file, the Rich signature starts at file offset 0x80, and the XOR key will be located in the dword that follows the Rich ASCII string:

yara_2.1_5

Bear in mind that the x86 byte-order is little-endian [5], therefore you need to byte-swap the dword value, so the XOR key value is 0x887f83a7 or 2290058151 in decimal.

Now for the easy way. Remember when I have mentioned in the first post of this series that the YARA scan engine is modular and feature rich? This is because you can use YARA pretty much like pescanner.py, in order to obtain valuable information on the PE header structure. Let’s start by creating the YARA rule file named rule.yar with the following content:

import “pe”

Next execute YARA as follows:

strings command output

By using the –print-module-data argument YARA will output the report of the PE module, on which will include the rich_signature section along with the XOR key decimal value.

Ok, we now have gathered all the info required to start creating the YARA rule and finish the challenge. In the part 3 of this series, we will cover the YARA rule creation process, featuring the information gathered from static analysis. Stay tuned!

References

http://www.godevtool.com/Other/pdb.htm
Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software, (page 22)
https://github.com/hiddenillusion/AnalyzePE/blob/master/pescanner.py
http://antukh.com/blog/2015/01/19/malware-techniques-cheat-sheet
http://teaching.idallen.com/cst8281/10w/notes/110_byte_order_endian.html

Tagged Malware Analysis, pescanner.py, REMnux, YARA

Feb 10 2016

Unleashing YARA – Part 1

[Editor’s Note: In the article below, Ricardo Dias who is a SANS GCFA gold certified and a seasoned security professional demonstrates the usefulness of Yara – the Swiss Army knife for Incident Responders. This way you can get familiar with this versatile tool and develop more proactive and mature response practices against threats. ~Luis]

Intro

yara_logo I remember back in 2011 when I’ve first used YARA. I was working as a security analyst on an incident response (IR) team, doing a lot of intrusion detection, forensics and malware analysis. YARA joined the tool set of the team with the purpose to enhance preliminary malware static analysis of portable executable (PE) files. Details from the PE header, imports and strings derived from the analysis resulted in YARA rules and shared within the team. It was considerably faster to check new malware samples against the rule repository when compared to lookup analysis reports. Back then concepts like the kill chain, indicator of compromise (IOC) and threat intelligence where still at its dawn.

In short YARA is an open-source tool capable of searching for strings inside files (1). The tool features a small but powerful command line scanning engine, written in pure C, optimized for speed. The engine is multi-platform, running on Windows, Linux and MacOS X. The tool also features a Python extension providing access to the engine via python scripts. Last but not least the engine is also capable of scanning running processes. YARA rules resemble C code, generally composed of two sections: the strings definition and a, mandatory, boolean expression (condition). Rules can be expressed as shown:

rule evil_executable
{
    strings:
        $ascii_01 = "mozart.pdb"
        $byte_01  = { 44 65 6d 6f 63 72 61 63 79 }
    condition:
        uint16(0) == 0x5A4D and
        1 of ( $ascii_01, $byte_01 )
}

The lexical simplicity of a rule and its boolean logic makes it a perfect IOC. In fact ever since 2011 the number of security vendors supporting YARA rules is increasing, meaning that the tool is no longer limited to the analyst laptop. It is now featured in malware sandboxes, honey-clients, forensic tools and network security appliances (2). Moreover, with the growing security community adopting YARA format to share IOCs, one can easily foresee a wider adoption of the format in the cyber defence arena.

In the meantime YARA became a feature rich scanner, particularly with the integration of modules. In essence modules enable very fine grained scanning while maintaining the rule readability. For example the PE module, specially crafted for handling Windows executable files, one can create a rule that will match a given PE section name. Similarly, the Hash module allows the creation on hashes (i.e. MD5) based on portions of a file, say for example a section of a PE file.

YARA in the incident response team

So how does exactly a tool like YARA integrate in the incident response team? Perhaps the most obvious answer is to develop and use YARA rules when performing malware static analysis, after all this is when the binary file is dissected, disassembled and understood. This gives you the chance to cross-reference the sample with previous analysis, thus saving time in case of a positive match, and creating new rules with the details extracted from the analysis. While there is nothing wrong with this approach, it is still focused on a very specific stage of the incident response. Moreover, if you don’t perform malware analysis you might end up opting to rule out YARA from your tool set.

Lets look at the SPAM analysis use case. If your team analyses suspicious email messages as part of their IR process, there is great chance for you to stumble across documents featuring malicious macros or websites redirecting to exploit kits. A popular tool to analyse suspicious Microsoft Office documents Tools is olevba.py, part of the oletools package (3), it features YARA when parsing OLE embedded objects in order to identify malware campaigns (read more about it here). When dealing with exploit kits, thug (4), a popular low-interaction honey-client that emulates a web browser, also features YARA for exploit kit family identification. In both cases YARA rule interchanging between the IR teams greatly enhances both triage and analysis of SPAM.

Another use case worth mentioning is forensics. Volatility, a popular memory forensics tool, supports YARA scanning (5) in order to pinpoint suspicious artefacts like processes, files, registry keys or mutexes. Traditionally YARA rules created to parse memory file objects benefit from a wider range of observables when compared to a static file rules, which need to deal with packers and cryptors. On the network forensics counterpart, yaraPcap (6), uses YARA for scan network captures (PCAP) files. Like in the SPAM analysis use case, forensic analysts will be in advantage when using YARA rules to leverage the analysis.

Finally, another noteworthy use case is endpoint scanning. That’s right, YARA scanning at the client computer. Since YARA scanning engine is multi-platform, it poses no problems to use Linux developed signatures on a Windows operating system. The only problem one needs to tackle is on how to distribute the scan engine, pull the rules and push the positive matches to a central location. Hipara, a host intrusion prevention system developed in C, is able to perform YARA file based scans and report results back to a central server (7). Another solution would be to develop an executable python script featuring the YARA module along with REST libraries for pull/push operations. The process have been documented, including conceptual code, in the SANS paper “Intelligence-Driven Incident Response with YARA” (read it here). This use case stands as the closing of the circle in IOC development, since it enters the realm of live IR, delivering and important advantage in the identification of advanced threats.

Conclusion

The key point lies in the ability for the IR teams to introduce the procedures for YARA rule creation and use. Tier 1 analysts should be instructed on how to use YARA to enhance incident triage, provide rule feedback, concerning false positives, and fine tuning to Tier 2 analyst. Additionally a repository should be created in order to centralize the rules and ensure the use of up-to-date rules. Last but not least teams should also agree on the rule naming scheme, preferably reflecting the taxonomy used for IR. These are some of the key steps for integrating YARA in the IR process, and to prepare teams for the IOC sharing process.

References:

Tagged Incident Response, Malware Analysis, Threat Intelligence, YARA

Count Upon Security

Increase security awareness. Promote, reinforce and learn security skills.

Author Archives: Ricardo Dias

Unleashing YARA – Part 3

The section hash condition

The resource entropy value

The GetTickCount import

The XOR key

The PE file type

Conclusion

A final word on YARA performance

References

Unleashing YARA – Part 2

Prerequisites

Incident Report

The challenge

Static Analysis

References

Unleashing YARA – Part 1

Intro

YARA in the incident response team

Conclusion