Unleashing YARA – Part 3

yara-logoIn the second post of this series we introduced an incident response challenge based on the static analysis of a suspicious executable file. The challenge featured 6 indicators that needed to be extracted from the analysis in order to create a YARA rule to match the suspicious file. In part 3 we will step through YARA’s PE, Hash and Math modules functions and how they can help you to meet the challenge objectives. Lets recap the challenge objectives and map it with the indicators we extracted from static analysis:

  1. a suspicious string that seems to be related with debug information
    • dddd.pdb
  2. the MD5 hash of the .text section
    • 2a7865468f9de73a531f0ce00750ed17
  3. the .rsrc section with high entropy
    • .rsrc entropy is 7.98
  4. the symbol GetTickCount import
    • Kernel32.dll GetTickCount is present in the IAT
  5. the rich signature XOR key
    • 2290058151
  6. must be a Windows executable file
    • 0x4D5A (MZ) found at file offset zero

In part 2 we created a YARA rule file named rule.yar, with the following content:

import "pe"

If you remember the exercise, we needed the PE module in order to parse the sample and extract the Rich signature XOR key. We will use this rule file to develop the remaining code.

The debug information string

In part 1 I have introduced YARA along with the rule format, featuring the strings and condition sections. When you add the dddd.pdb string condition the rule code should be something like:

yara_3_1

The code above depicts a simple rule object made of a single string variable named $str01 with the value set to the debug string we found.

The section hash condition

Next item to be added to the condition is the .text section hash, using both PE and HASH modules. To do so we will iterate over the PE file sections using two PE module functions: the number_of_sections and sections. The former will be used to iterate over the PE sections, the latter will allow us to fetch section raw_data_offset, or file offset, and raw_data_size, that will be passed as arguments to md5 hash function, in order to compute the md5 hash of the section data:

yara_3_2

The condition expression now features the for operator comprising two conditions: the section md5 hash and the section name. In essence, YARA will loop through every PE section until it finds a match on the section hash and name.

The resource entropy value

Its now time to add the resource entropy condition. To do so, we will rely on the math module, which will allow us to calculate the entropy of a given size of bytes. Again we will need to iterate over the PE sections using two conditions: the section entropy and the section name (.rsrc):

yara_3_3

Again we will loop until we find a match, that is a section named .rsrc with entropy above or equal to 7.0. Remember that entropy minimum value is 0.0 and maximum is 8.0, therefore 7.0 is considered high entropy and is frequently associated with packing [1]. Bear in mind that compressed data like images and other types of media can display high entropy, which might result in some false positives [2].

The GetTickCount import

Lets continue improving our YARA rule by adding the GetTickCount import to the condition. For this purpose lets use the PE module imports function that will take two arguments: the library and the DLL name. The GetTickCount function is exported by Kernel32.DLL, so when we passe these arguments to the pe.imports function the rule condition becomes:

yara_3_4

Please note that the DLL name is case insensitive [3].

The XOR key

Our YARA rule is almost complete, we now need to add the rich signature key to the condition. In this particular case the PE module provides the rich_signature function which allow us to match various attributes of the rich signature, in this case the key. The key will be de decimal value of dword used to encode the contents with XOR:

yara_3_5

Remember that the XOR key can be obtained either by inspecting the file with a hexdump of the PE header or using YARA PE module parsing capabilities, detailed in part 2 of this series.

The PE file type

Ok, we are almost done. The last condition will ensure that the file is a portable executable file. In part two of this series we did a quick hex dump of the samples header, which revealed the MZ (ASCII) at file offset zero, a common file signature for PE files. We will use the YARA int## functions to access data at a given position. The int## functions read 8, 16 and 32 bits signed integers, whereas the uint## reads unsigned integers. Both 16 and 32 bits are considered to be little-endian, for big-endian use int##be or uint##be.

Since checking only the first two bytes of the file can lead to false positives we can use a little trick to ensure the file is a PE, by looking for particular PE header values. Specifically we will check for the IMAGE_NT_HEADER Signature member, a dword with value “PE\0\0”. Since the signature file offset is variable we will need to rely on the IMAGE_DOS_HEADER e_lfanew field. e_lfanew value is the 4 byte physical offset of the PE Signature and its located at physical offset 0x3C [4].

With the conditions “MZ” and “PE\0\0” and respective offsets we will use uint16 and uint32 respectively:

yara_3_6

Note how we use the e_lfanew value to pivot the PE Signature, the first uint32 function output, the 0x3C offset, is used as argument in the second uint32 function, which must match the expected value “PE\0\0”.

Conclusion

Ok! We are done, last step is to test the rule against the file using the YARA tool and our brand new rule file rule.yar:

yara_3_7

YARA scans the file and, as expected, outputs the rule matched rule ID, in our case malware001.

A final word on YARA performance

While YARA performance might be of little importance if you are scanning a dozen of files, poorly written rules can impact significantly when scanning thousands or millions of files. As a rule of thumb you are advised to avoid using regex statements. Additionally you should ensure that false conditions appear first in the rules condition, this feature is named short-circuit evaluation and it was introduced in YARA 3.4.0 [5]. So how can we improve the rule we just created, in order to leverage YARA performance? In this case we can move the last condition, the PE file check signature, to the top of the statement, by doing so we will avoid checking for the PE header conditions if the file is an executable (i.e. PDF, DOC, etc). Lets see how the new rule looks like:

yara_3_8

If you like to learn more about YARA performance, check the Yara performance guidelines by Florian Roth, as it features lots of tips to keep your YARA rules resource friendly.

References

  1. Structural Entropy Analysis for Automated Malware Classification
  2. Practical Malware Analysis, The Hands-On Guide to Dissecting Malicious Software, Page 283.
  3. YARA Documentation v.3.4.0, PE Module
  4. The Portable Executable File Format
  5. YARA 3.4.0 Release notes
Tagged , ,

7 thoughts on “Unleashing YARA – Part 3

  1. Dror says:

    Thanks for the write-up!

    you have a small typo in:
    “The formed will be used to ..”

    meant former I guess 🙂

    Like

    • mattjang96 says:

      Hello. Thank you for your post on YARA. I have a few questions that I hope you can help me with.
      I have been studying YARA and have been trying to incorporate it into my project.
      1) How can YARA be used to block future malware invasion? That is, I understand that we can write rules to block common malware via reverse engineering. But what if a new malware is introduced? How do we know what strings (hexa, text, etc.) to put into our yara rules file?
      2) How can I use YARA in real time (stateful protocol)? That is, is there a way that I can use YARA to report malware/virus immediately? Or is YARA only available through manual compilation?
      3) If YARA doesn’t support real time (stateful protocol), is YARA used to scan files in a sequence/order? Can YARA be used to scan every file at once?
      I will appreciate your reply greatly.

      Like

      • Ricardo Dias says:

        Hello. “How can YARA be used to block future malware invasion?” – You can use YARA modules to find oddities on your samples. For example, you can use the PE module to find suspicious indicators on the PE header (i.e. entropy, section names, IMPhash etc). However, most of the time you will have to write new rules to match new malware.
        “How can I use YARA in real time?” Sure, YARA its supported by many tools and frameworks. Many of them use the python library to enable YARA scanning of HTTP, SMPT, etc… You even have HIDS projects based on YARA, like Procfilter.
        “Can YARA be used to scan every file at once?”
        Yes, by using the recursive option with “-r”, just pass a directory as path.

        Liked by 1 person

      • mattjang96 says:

        Dear Ricardo,

        Thank you for your response, and time. It means a lot to me.
        My last question is:
        I have been studying YARA documentation for quite a while now, but I still don’t understand how to use it 100%. How can I become an expert at YARA like you are? The YARA documentation page doesn’t have as clear explanations as you provide. My goal is to somehow to use YARA in real-time so that it reports malware/virus automatically (without manually checking for it). Thank you again so much!
        -MJ

        Like

      • Ricardo Dias says:

        Hi Mathew.
        Please note that automatic malware detection with YARA is very unlikely. YARA is a pattern match tool, so it all depends on the rules you create, which will inevitably become obsolete as time goes by. So there will always be a manual component in the process.

        Liked by 1 person

  2. Vish says:

    Hello,

    Thanks for writing this blog, It is really helpful. I do have one question.
    How can we look for sections using regex ?
    I have tried to look but it always gives an error as “mismatch”

    Thanks

    Like

Leave a comment