Tag Archives: REMnux

Malicious Documents – PDF Analysis in 5 steps


Mass mailing or targeted campaigns that use common files to host or exploit code have been and are a very popular vector of attack. In other words, a malicious PDF or MS Office document received via e-mail or opened trough a browser plug-in. In regards to malicious PDF files the security industry saw a significant increase of vulnerabilities after the second half of 2008 which might be related to Adobe Systems release of the specifications, format structure and functionality of PDF files.

Most enterprise networks perimeters are protected and contain several security filters and mechanism that block threats. However, a malicious PDF or MS Office document might be very successful passing trough Firewalls, Intrusion Prevention Systems, Anti-spam, Anti-virus and other security controls. By reaching the victim mailbox, this attack vector will leverage social engineering techniques to lure the user to click/open the document. Then, for example, If the user opens a PDF malicious file, it typically executes JavaScript that exploits a vulnerability when Adobe Reader parses the crafted file. This might cause the application to corrupt memory on the stack or heap causing it to run arbitrary code known as shellcode. This shellcode normally downloads and executes a malicious file from the Internet. The Internet Storm Center Handler Bojan Zdrnja wrote a good summary about one of these shellcodes.  In some circumstances the vulnerability could be exploited without opening the file and just by having a malicious file on the hard drive as described by Didier Stevens.

From a 100 feet view a PDF file is composed by a header , body, reference table and trailer. One key component is the body which might contains all kinds of content type objects that make parsing attractive for vulnerability researchers and exploit developers. The language is very rich and complex which means the same information can be encoded and obfuscated in many ways. For example, within objects there are streams that can be used to store data of any type of size. These streams are compressed and the PDF standard supports several algorithms including ASCIIHexDecode, ASCI85Decode, LZWDecode, FlateDecode, RunLengthDecode, CCITTFaxDecode, DCTCDecode called Filters. PDF files can contain multimedia content and support JavaScript and ActionScript trough Flash objects. Usage of JavaScript is a popular vector of attack because it can be hidden in the streams using different techniques making detection harder. In case the PDF file contains JavaScript, the malicious code is used to trigger a vulnerability and to execute shellcode. All this features and capabilities are translated in a huge attack surface!

From a security incident response perspective the knowledge about how to do a detailed analysis of such malicious files can be quite useful. When analyzing this kind of files an incident handler can determine the worst it can do, its capabilities and key characteristics. Furthermore, it can help to be better prepared and identify future security incidents and how to contain, eradicate and recover from those threats.

So, which steps could an incident handler or malware analyst perform to analyze such files?

In case of a malicious PDF files there are 5 steps. By using REMnux distro the steps are described by  Lenny Zeltser as being:

  1. Find and Extract Javascript
  2. Deobfuscate Javascript
  3. Extract the shellcode
  4. Create a shellcode executable
  5. Analyze shellcode and determine what is does.

A summary of tools and techniques using REMnux to analyze malicious documents are described in the cheat sheet compiled by Lenny, Didier and others. In order to practice these skills and to illustrate an introduction to the tools and techniques, below is the analysis of a malicious PDF using these steps.

The other day I received one of those emails that was part of a mass mailing campaign. The email contained an attachment with a malicious PDF file that took advantage of Adobe Reader Javascript engine to exploit CVE-2013-2729. This vulnerability found by Felipe Manzano exploits an integer overflow in several versions of the Adobe Reader when parsing BMP files compressed with RLE8 encoded in PDF forms. The file on Virus Total was only detected by 6 of the 55 AV engines. Let’s go through each one of the mentioned steps to find information on the malicious PDF key characteristics and its capabilities.

1st Step – Find and extract JavaScript

One technique is using Didier Stevens suite of tools to analyze the content of the PDF and look for suspicious elements. One of those tools is Pdfid which can show several keywords used in PDF files that could be used to exploit vulnerabilities. The previously mentioned cheat sheet contain some of these keywords. In this case the first observations shows the PDF file contains 6 objects and 2 streams. No JavaScript mentioned but it contains /AcroForm and /XFA elements. This means the PDF file contains XFA forms which might indicate it is malicious.



Then looking deeper we can use pdf-parser.py to display the contents of the 6 objects. The output was reduced for the sake of brevity but in this case the Object 2 is the /XFA element that is referencing to Object 1 which contains a stream compressed and rather suspicious.


Following this indicator pdf-parser.py allows us to show the contents of an object and pass the stream trough one of the supporter filters (FlateDecode, ASCIIHexDecode, ASCII85Decode, LZWDecode and RunLengthDecode only) trough the –filter switch. The –raw switch allows to show the output in a easier way to read. The output of the command is redirected to a file. Looking at the contents of this file we get the decompressed stream. When inspecting this file you will see several lines of JavaScript that weren’t on the original PDF file. If this document is opened by a victim the /XFA keyword will execute this malicious code.


Another fast method to find if the PDF file contains JavaScript and other malicious elements is to use the peepdf.py tool written by Jose Miguel Esparza. Peepdf is a tool to analyze PDF files, helping to show objects/streams, encode/decode streams, modify all of them, obtain different versions, show and modify metadata, execution of Javascript and shellcodes. When running the malicious PDF file against the last version of the tool it can show very useful information about the PDF structure, its contents and even detect which vulnerability it triggers in case it has a signature for it.


2nd Step – Deobfuscate  Javascript

The second step is to deobfuscate the JavaScript. JavaScript can contain several layers of obfuscation. in this case there was quite some manual cleanup in the extracted code just to get the code isolated. The object.raw contained 4 JavaScript elements between <script xxxx contentType=”application/x-javascript”> tags and 1 image in base64 format in <image> tag.  This JavaScript code between tags needs to be extracted and place into a separated file. The same can be done for the chunk of base64 data, when decoded will produce a 67Mb BMP file.  The JavaScript in this case was rather cryptic but there are tools and techniques that help do the job in order to interpret and execute the code.  In this case I used another tool called js-didier.pl which is a Didier version of the JavaScript interpreter SpiderMonkey. It is essentially a JavaScript interpreter without the browser plugins that you can run from the command line. This allows to run and analyze malicious JavaScript in a safe and controlled manner. The js-didier tool, just like SpiderMonkey, will execute the code and prints the result into files named eval.00x.log.  I got some errors on one of the variables due to the manual cleanup but was enough to produce several eval log files with interesting results.


3rd Step – Extract the shellcode

The third step is to extract the shellcode from the deobfuscated JavaScript. In this case the eval.005.log file contained the deobfuscated JavaScript. The file among other things contains 2 variables encoded as Unicode strings. This is one trick used to hide or obfuscate shellcode. Typically you find shellcode in JavaScript encoded in this way.


These Unicode encoded strings need to be converted into binary. To perform this isolate the Unicode encoded strings into a separated file and convert it the Unicode (\u) to hex (\x) notation. To do this you need using a series of Perl regular expressions using a Remnux script called unicode2hex-escaped. The resulting file will contain the shellcode in a hex format (“\xeb\x06\x00\x00..”) that will be used in the next step to convert it into a binary



4th Step – Create a shellcode executable

Next with the shellcode encoded in hexadecimal format we can produce a Windows binary that runs the shellcode. This is achieved using a script called shellcode2exe.py written by Mario Vilas and later tweaked by Anand Sastry. As Lenny states ” The shellcode2exe.py script accepts shellcode encoded as a string or as raw binary data, and produces an executable that can run that shellcode. You load the resulting executable file into a debugger to examine its. This approach is useful for analyzing shellcode that’s difficult to understand without stepping through it with a debugger.”



5th Step – Analyze shellcode and determine what is does.

Final step is to determine what the shellcode does. To analyze the shellcode you could use a dissasembler or a debugger. In this case the a static analysis of the shellcode using the strings command shows several API calls used by the shellcode. Further also shows a URL pointing to an executable that will be downloaded if this shellcode gets executed



We now have a strong IOC that can be used to take additional steps in order to hunt for evil and defend the networks. This URL can be used as evidence and to identify if machines have been compromised and attempted to download the malicious executable. At the time of this analysis the file was no longer there but its known to be a variant of the Game Over Zeus malware.

The steps followed are manual but with practice they are repeatable. They just represent a short introduction to the multifaceted world of analyzing malicious documents. Many other techniques and tools exist and much deeper analysis can be done. The focus was to demonstrate the 5 Steps that can be used as a framework to discover indicators of compromise that will reveal machines that have been compromised by the same bad guys. However using these 5 steps many other questions could be answered.  Using the mentioned and other tools and techniques within the 5 steps we can have a better practical understanding on how malicious documents work and which methods are used by Evil.  Two great resource for this type of analysis is the Malware Analyst’s Cookbook : Tools and Techniques for Fighting Malicious Code book from Michael Ligh and the SANS FOR610: Reverse-Engineering Malware: Malware Analysis Tools and Technique authored by Lenny Zeltser.

Download link for the malicious PDF file: https://0x0.st/sZyY.zip . MD5: 4f275c936b0772c969b2daf4688b7fc9
Password: infected

Tagged , , , , , ,

Malware Analysis

malwareanalysisMalware analysis is a very interesting topic, will continue to evolve in size, density and specialization. Additionally is intellectually challenging. One goal of performing this activity might be to analyze malware in order to determine its actions and get insight into its behavior and inner workings by analyzing its code. By doing this we can find answers to pertinent questions such as:

  • What are the malware capabilities?
  • What is the worst it can do?
  • Which indicators of compromise (IOC) could be used identify this malware in motion (network), at rest (file system) or in use (memory)?  – This IOCs can then be used across our defense systems.
  • What tactics, techniques and procedures (TTP) are used?
  • Which category does it falls i.e., criminal, commodity malware or targeted attacks?

To find answers to these and other questions there are several processes, procedures and tools. One well established process is called dynamic  or behavioral analysis. This process consists of executing the malware specimen in a safe, secure, isolated and controlled environment in order to determine its actions, behavior and how it interacts with host system at network, file system, registry and others. The instruments used during this process will allow us to gain better understanding of the malicious code and its capabilities and are mainly based on monitoring and capturing system changes at network, memory, I/O level, etc. Different tools exist to accomplish this. The tool choice is different depending on the operating system the malware runs on, the individual experience/preference and company culture. One great toolbox is the REMnux Linux distro which brings together a great number of tools for analyzing malicious executables.  Among other things it can also emulate a variety of networking services that assist during the behavioral analysis.

Another process is called static or code analysis and consists of analyzing the code or structure of the executable to determine its function. In contrast to the behavioral analysis, the static analysis does not execute the malware. Static analysis is normally a much more complex process that requires understanding of several techniques that should be ideally supported by knowledge of operating system internals and software development.  This process might consist of disassembling, debugging and decompiling the executable.  Different tools exist to assist this process and it might take you to complex topics such as unpacking and decrypting.  As so it might be overwhelming to find the needle in the haystack when going through this techniques. You might want to focus on the execution flow, code blocks, where it starts, what does it call? to start shaping an understanding.

So, which process should I use? Which tool to execute first? There is no right or wrong answer! Several approaches exist  and a combination of both process is normally used.  Start step by step in a incremental and controlled fashion. Use more than one tool to substantiate evidence.  Use the internet, books and research papers to assist you gathering knowledge about operating systems, networking, programming or security. How well are you educated in such topics will assist you during the malware analysis.  Jump from the behavior analyses process to the static code analysis and vice-versa in order to move forward.  If you get stuck, don’t give up!

As you look more and handle the tools better you slowly train yourself to determine what is normal and what is unusual. Soon you start recognizing differences and deviations from the norm.  If you are doing malware analysis as part of a forensic analysis, incident response or just for fun this is a fascinating journey!

For further reference you may want to look into the following books: Malware Analyst’s Cookbook and DVD: Tools and Techniques for Fighting Malicious Code, the Practical Malware Analysis and Malware Forensics: Investigating and Analyzing Malicious Code . More formal training is available from SANS with GREM course authored by Lenny Zeltser. Free resources are the Dr. FU’s Security blog on Malware analysis tutorials. The Binary Auditing site which contains free IDA Pro training material.  Finally, the malware analysis track  in the Open Security Training site is awesome. It contains several training videos and material for free!

Tagged , , , ,

Behavioral Android Malware Analysis with REMnux and Mobisec

[Editor’s Note: In the article below, Angel Alonso-Parrizas who is a SANS GSE certified, illustrates a series of very useful tools and techniques that security and malware analysts can apply to analyze mobile code malware. This way you can get familiar with malware specimens and analyze it on your own. Using free tools suite like Remnux or Mobisec you can put malware under a controlled environment and determine its purpose and functions. ~Luis]

Last week Lenny Zelster released version 4 of  REMnux, a Linux distribution thought to perform malware analysis.
REMnux includes  a set of tools to  facilitate and speed up the analysis and although REMnux is not designed to perform Android malware analysis, it is possible to use some of the tools for this purpose.
On the other hand the guys from  Secure Ideas have deployed a Linux distribution named MobiSec which is designed to evaluate and analyze the security of mobile devices and the applications running on them. Also, it has many tools, emulators, etc.
The idea of this post it to show both distributions and combine the functionalities of them with a simple example for Android. The purpose isn’t to do an exhaustive malware analysis, but  explain how these distributions can be used to perform part of the behavior analysis.
It is possible to download REMnux in OVF/OVA which can be imported in VMWare or VirtualBox. On the other hand MobiSec requires to download the ISO and to install it manually . Once both OS have been installed, and on account that we are dealing with malware, it is key to setup the network isolated, without internet access, hence in ‘Host only’ mode. In order to facilitate the analysis, we have to setup the DNS and Gateway of MobiSec to REMnux’s IP.
Mobisec, beside other tools, it includes several version of Android emulator which facilitates a lot the work when you need to test something in Android in a quick way, because there is not need to setup anything. For this example we will use version 4.0.3
Next step is to push the malware into the emulator which is running in MobiSec. In this example we will download the sample from http://contagiominidump.blogspot.com.es to the phisicall machine and afterwars it will be copied to MobiSec through SCP. The malware chosen is l
Android.Exprespam.  The last step is to install the file through  the ‘adb’ commands (adb install).

In the meantime, and before execute the malware in the emulator, let’s prepare REMnux. REMnux allows to run services (like a honeypot) to interact with the malware. For example, we can run a DNS server to resolve domain, IRC server, a SMTP server or a HTTP server

First step is to run ‘fakedns’ which will resolve any domain to REMnux IP and at the same time launch Wireshark to have visibility on what’s going on
In this case we can see that the malware is asking for the domain ftukguhilcom.globat.com and REMnux is replying with the IP of itself.
We can observe in Wireshark that the malware is trying to make a HTTPS
connection, which is RST by REMnux as there is no service running on that port.
Next step is to emulate the HTTPS server in order to see the traffic. REMnux includes an HTTP server and stunnel, which permits to combine both tools to provide an HTTPs server. To do that it is necessary to launch the server with ‘httpd start’ and setup stunnel with an autosigned certificated.
If we look to the HTTP logs we can see there is not any HTTP request. This is because it is necessary to accept the autosigned certificate when accessing through HTTPs. This might be because the malware checks the certificate, as an additional control, or the malware is using Android web browse which is not imported in the repository.

If it were the second case, it would be possible to import the certificate in Android’s repository and avoid the SSL error. As the purpose of this post isn’t to perform a full malware analysis but   to show the tools in REMnux y MobiSec we are going to continue with the analysis of other malware which performs HTTP requests. For this second case we will use  Chuli.A

The steps are the same as previously: to install the malware and run it.


In this case the malware accesses directly and IP, instead of resolving a hostname first. Luckily, REMnux is able to reply to such requests automatically hence looking at Wireshark it is possible to see that the HTTP resource requested is ‘android.php’ which it doesn’t exists in the server.


In the meantime we can see in the HTTP logs all the requests to android.php. On account that the file doesn’t exist, we are going to create it in order to interact with the malware and see what happens when targeting the resource.


Now, once the resource exists, it is possible to see the replies from the server with code 200 OK.


Probable in this POST request the malware has informed the C&C that a new device has been compromised and it sends some information about it. Likely the string  ‘phone1365842571243’ sent in the POST is a unique ID.

Given the fact that the reply is 200 now the malware is able to perform other request as it is possible to see in the logs of the HTTP server. To be precise, the requested resource is  ‘POST /data/phone1365842571243/process.php’. The same way than before, and in order to interact with the malware, we are going to creare such resource.phone1365
It seems it is working as there are several POST request which can be analyzed with Wireshark. This requests contains several information.
For example, in one of them the information sent looks like GPS coordinates encoded somehow.
In other one it looks like the contacts are being sent (but in this case the contact list is empty so the information sent is short)
We could continue analyzing all the request and check which information is being sent through the different POST requests but this is not the objective of this post.
The important part is to keep in mind that it is possible to use MobiSec and REMnux to interact dynamically with malware creating fake DNS replies, HTTP services, web objects, etc, while advancing in the malware analysis. Also, MobiSec integrates several tools to perform malware reverse but this will be explain in other post.
Tagged , , ,