Lenny Zeltser | Count Upon Security

Malicious Documents – PDF Analysis in 5 steps

Mass mailing or targeted campaigns that use common files to host or exploit code have been and are a very popular vector of attack. In other words, a malicious PDF or MS Office document received via e-mail or opened trough a browser plug-in. In regards to malicious PDF files the security industry saw a significant increase of vulnerabilities after the second half of 2008 which might be related to Adobe Systems release of the specifications, format structure and functionality of PDF files.

Most enterprise networks perimeters are protected and contain several security filters and mechanism that block threats. However, a malicious PDF or MS Office document might be very successful passing trough Firewalls, Intrusion Prevention Systems, Anti-spam, Anti-virus and other security controls. By reaching the victim mailbox, this attack vector will leverage social engineering techniques to lure the user to click/open the document. Then, for example, If the user opens a PDF malicious file, it typically executes JavaScript that exploits a vulnerability when Adobe Reader parses the crafted file. This might cause the application to corrupt memory on the stack or heap causing it to run arbitrary code known as shellcode. This shellcode normally downloads and executes a malicious file from the Internet. The Internet Storm Center Handler Bojan Zdrnja wrote a good summary about one of these shellcodes. In some circumstances the vulnerability could be exploited without opening the file and just by having a malicious file on the hard drive as described by Didier Stevens.

From a 100 feet view a PDF file is composed by a header , body, reference table and trailer. One key component is the body which might contains all kinds of content type objects that make parsing attractive for vulnerability researchers and exploit developers. The language is very rich and complex which means the same information can be encoded and obfuscated in many ways. For example, within objects there are streams that can be used to store data of any type of size. These streams are compressed and the PDF standard supports several algorithms including ASCIIHexDecode, ASCI85Decode, LZWDecode, FlateDecode, RunLengthDecode, CCITTFaxDecode, DCTCDecode called Filters. PDF files can contain multimedia content and support JavaScript and ActionScript trough Flash objects. Usage of JavaScript is a popular vector of attack because it can be hidden in the streams using different techniques making detection harder. In case the PDF file contains JavaScript, the malicious code is used to trigger a vulnerability and to execute shellcode. All this features and capabilities are translated in a huge attack surface!

From a security incident response perspective the knowledge about how to do a detailed analysis of such malicious files can be quite useful. When analyzing this kind of files an incident handler can determine the worst it can do, its capabilities and key characteristics. Furthermore, it can help to be better prepared and identify future security incidents and how to contain, eradicate and recover from those threats.

So, which steps could an incident handler or malware analyst perform to analyze such files?

In case of a malicious PDF files there are 5 steps. By using REMnux distro the steps are described by Lenny Zeltser as being:

Find and Extract Javascript
Deobfuscate Javascript
Extract the shellcode
Create a shellcode executable
Analyze shellcode and determine what is does.

A summary of tools and techniques using REMnux to analyze malicious documents are described in the cheat sheet compiled by Lenny, Didier and others. In order to practice these skills and to illustrate an introduction to the tools and techniques, below is the analysis of a malicious PDF using these steps.

The other day I received one of those emails that was part of a mass mailing campaign. The email contained an attachment with a malicious PDF file that took advantage of Adobe Reader Javascript engine to exploit CVE-2013-2729. This vulnerability found by Felipe Manzano exploits an integer overflow in several versions of the Adobe Reader when parsing BMP files compressed with RLE8 encoded in PDF forms. The file on Virus Total was only detected by 6 of the 55 AV engines. Let’s go through each one of the mentioned steps to find information on the malicious PDF key characteristics and its capabilities.

1st Step – Find and extract JavaScript

One technique is using Didier Stevens suite of tools to analyze the content of the PDF and look for suspicious elements. One of those tools is Pdfid which can show several keywords used in PDF files that could be used to exploit vulnerabilities. The previously mentioned cheat sheet contain some of these keywords. In this case the first observations shows the PDF file contains 6 objects and 2 streams. No JavaScript mentioned but it contains /AcroForm and /XFA elements. This means the PDF file contains XFA forms which might indicate it is malicious.

Then looking deeper we can use pdf-parser.py to display the contents of the 6 objects. The output was reduced for the sake of brevity but in this case the Object 2 is the /XFA element that is referencing to Object 1 which contains a stream compressed and rather suspicious.

Following this indicator pdf-parser.py allows us to show the contents of an object and pass the stream trough one of the supporter filters (FlateDecode, ASCIIHexDecode, ASCII85Decode, LZWDecode and RunLengthDecode only) trough the –filter switch. The –raw switch allows to show the output in a easier way to read. The output of the command is redirected to a file. Looking at the contents of this file we get the decompressed stream. When inspecting this file you will see several lines of JavaScript that weren’t on the original PDF file. If this document is opened by a victim the /XFA keyword will execute this malicious code.

Another fast method to find if the PDF file contains JavaScript and other malicious elements is to use the peepdf.py tool written by Jose Miguel Esparza. Peepdf is a tool to analyze PDF files, helping to show objects/streams, encode/decode streams, modify all of them, obtain different versions, show and modify metadata, execution of Javascript and shellcodes. When running the malicious PDF file against the last version of the tool it can show very useful information about the PDF structure, its contents and even detect which vulnerability it triggers in case it has a signature for it.

2nd Step – Deobfuscate Javascript

The second step is to deobfuscate the JavaScript. JavaScript can contain several layers of obfuscation. in this case there was quite some manual cleanup in the extracted code just to get the code isolated. The object.raw contained 4 JavaScript elements between <script xxxx contentType=”application/x-javascript”> tags and 1 image in base64 format in <image> tag. This JavaScript code between tags needs to be extracted and place into a separated file. The same can be done for the chunk of base64 data, when decoded will produce a 67Mb BMP file. The JavaScript in this case was rather cryptic but there are tools and techniques that help do the job in order to interpret and execute the code. In this case I used another tool called js-didier.pl which is a Didier version of the JavaScript interpreter SpiderMonkey. It is essentially a JavaScript interpreter without the browser plugins that you can run from the command line. This allows to run and analyze malicious JavaScript in a safe and controlled manner. The js-didier tool, just like SpiderMonkey, will execute the code and prints the result into files named eval.00x.log. I got some errors on one of the variables due to the manual cleanup but was enough to produce several eval log files with interesting results.

3rd Step – Extract the shellcode

The third step is to extract the shellcode from the deobfuscated JavaScript. In this case the eval.005.log file contained the deobfuscated JavaScript. The file among other things contains 2 variables encoded as Unicode strings. This is one trick used to hide or obfuscate shellcode. Typically you find shellcode in JavaScript encoded in this way.

These Unicode encoded strings need to be converted into binary. To perform this isolate the Unicode encoded strings into a separated file and convert it the Unicode (\u) to hex (\x) notation. To do this you need using a series of Perl regular expressions using a Remnux script called unicode2hex-escaped. The resulting file will contain the shellcode in a hex format (“\xeb\x06\x00\x00..”) that will be used in the next step to convert it into a binary

4th Step – Create a shellcode executable

Next with the shellcode encoded in hexadecimal format we can produce a Windows binary that runs the shellcode. This is achieved using a script called shellcode2exe.py written by Mario Vilas and later tweaked by Anand Sastry. As Lenny states ” The shellcode2exe.py script accepts shellcode encoded as a string or as raw binary data, and produces an executable that can run that shellcode. You load the resulting executable file into a debugger to examine its. This approach is useful for analyzing shellcode that’s difficult to understand without stepping through it with a debugger.”

5th Step – Analyze shellcode and determine what is does.

Final step is to determine what the shellcode does. To analyze the shellcode you could use a dissasembler or a debugger. In this case the a static analysis of the shellcode using the strings command shows several API calls used by the shellcode. Further also shows a URL pointing to an executable that will be downloaded if this shellcode gets executed

We now have a strong IOC that can be used to take additional steps in order to hunt for evil and defend the networks. This URL can be used as evidence and to identify if machines have been compromised and attempted to download the malicious executable. At the time of this analysis the file was no longer there but its known to be a variant of the Game Over Zeus malware.

The steps followed are manual but with practice they are repeatable. They just represent a short introduction to the multifaceted world of analyzing malicious documents. Many other techniques and tools exist and much deeper analysis can be done. The focus was to demonstrate the 5 Steps that can be used as a framework to discover indicators of compromise that will reveal machines that have been compromised by the same bad guys. However using these 5 steps many other questions could be answered. Using the mentioned and other tools and techniques within the 5 steps we can have a better practical understanding on how malicious documents work and which methods are used by Evil. Two great resource for this type of analysis is the Malware Analyst’s Cookbook : Tools and Techniques for Fighting Malicious Code book from Michael Ligh and the SANS FOR610: Reverse-Engineering Malware: Malware Analysis Tools and Technique authored by Lenny Zeltser.

Download link for the malicious PDF file: https://0x0.st/sZyY.zip . MD5: 4f275c936b0772c969b2daf4688b7fc9
Password: infected

Reverse-Engineering and Malware Analysis

Last year I had the chance to go to SANS Orlando 2013 in Orlando, Florida – thank you Wes! – which is one of the yearly’s biggest SANS conferences only outpaced in size by SANS FIRE in Baltimore, Maryland. I went there to take the 5 days course – FOR 610 Reverse-Engineering Malware: Malware Analysis Tools and Techniques – with Lenny Zeltser. Apart of the course the main choice was due to the instructor. Lenny is a brilliant fellow and top rated SANS instructed. Awesome writer and fantastic lecturer.

I was very enthusiastic about taking out the most of it. One reason was because I had read the Malware Fighting Malicious Code book from Ed Skoudis where Lenny wrote chapter 2 and 4. This book is 10 years and it’s still a classic, a historical object and definitely a must read to someone who is part of the security community. Other reason was that l wanted to get the skills to be capable of securely analyze, debug, and disassemble malicious programs in order to translate this capability into actionable threat intelligence.

On the first day of the training we got introduced to 2 approaches to examine malicious programs. Behavior analysis and static/code analysis. To perform this we started by setting up a controlled and isolated environment. A simple and inexpensive malware analysis lab running on VMware. Using this lab we used a set of free tools that allowed us to determine what the malicious program does and how it interacted with the file system, network, registry and memory. We also got introduced to REMnux . A lightweight Linux distribution for assisting malware analysts with reverse-engineering malicious software. The distribution is based on Ubuntu and is maintained by Lenny Zeltser. Using a set of Windows tools, the REMnux distro plus a variety techniques we got a better understanding how we could analyze malware and determine its capabilities. Then we got deeper in order to make a detailed analysis of the malware by using reverse engineering tools and different methods. By using techniques to find strings in the executable, run a disassembler (IDA Pro), load the executable into a debugger (OllyDbg) and execute it and look at the API calls being made we got a glimpse in the world of code analysis. After the lab was set and we got an understanding of the processes we will follow the fun started! With several hand-on labs and different specimens we observed what the malware does and we could document the findings and translate them into indicators of compromise and actionable intelligence that can be used to proactively detect and monitor threats.

Day two started with additional malware analysis approaches. We started to get introduced into packed executable’s and what patching means. Also we unpacked malicious executables using simple packing techniques. Here is where we began the journey on x86 Intel assembly. On the second half of day two we covered browser malware and flash based malware and how to use REMnux to use behaviors and code analysis techniques to analyze web malware. It was impressive to see the amount of ingenious techniques enforced by the bad guys to deliver malicious stuff.

Day three is a deep dive into malicious code analysis. Its starts with core reversing engineering concepts and you spend the rest of the day playing with malicious code at the assembly level. It’s the all day looking at a dissasembler and a debugger. Throughout the material and the exercises you get more and more exposed to x86 assembly stuff. We manage to use the debugger to control malicious program execution (step in, over, breakpoints) and monitor or change its state (registry and memory). On this day we also covered user mode rootkits, key loggers, sniffers, DLL injection and downloader’s – great stuff!

Day four, even after 10 hours sleep I doubt I had enough processing power on my neurons to absorb all Lenny had to say. As complementary strategy I gave a lot of use to my pencil and wrote as much notes as possible in my courseware material. During the first half of the day we were shown the techniques that the malware writes use to protect their programs. Packing was one. But more complex techniques such as anti-disassembly, anti-debugging and anti-VMware and others were demonstrated. It’s an extraordinary arms race between good and evil. A huge amount of hands on exercises were made so we could reinforce all these concepts and techniques. Was also amazing to see Lenny describing how different malware specimens use mazes of code and junk code to frustrates the analyst and mislead him. By enforcing this techniques, in case the analyst does not have enough resources (time/money) he will soon stop doing his analysis and move on to something else – Evil will win – an interesting trade-off. Apart of the techniques we were taught different techniques to bypass those malware defenses. One example was to infect a system with a piece of malware that was packed/obfuscated. When execute, the malware loaded its unpacked code into memory which allowed us to examine it. By staying resident in memory because in the file system it was encrypted we used techniques do dump it from memory. To do this we used Chimprec to extract the process from memory and then rebuilding its PE header import table in order to be executed. Other technique was the usage of a debugger to patch an executable to avoid anti-debugging mechanism. Other tools like LordPE and OllyDump are also used. On the second half of the day, shellcode analysis and Web malware anti-deobfuscation techniques were described and practiced.

Finally on day 5 we spent the first half of the day learning the techniques and tools for analyzing malicious Microsoft Office (Word, Excel, PowerPoint) and Adobe PDF documents. The second half of the day is spent on memory analysis forensics with the help of Volatility Framework and associated plug-ins. The course ends with explanation of the different techniques used by root kit infections and its deceptive techniques and how you could use memory and code analysis to determine and un-hide their capabilities.

The course is extremely technical and deep and very hands-on. I was overwhelmed with the amount of information. After day 3 I was feeling like I was drinking from the fire hose. The course is part of the SANS digital forensics and Incident response curriculum. It is very well structured and the sequential steps it follows are very well thought out.

This particular security field is a very interesting one, it will continue to evolve and it is challenging. Also as the security industry continues to progress from a reactive approach to a more proactive one, the malware analyst type of skills will have an increased demand. More and more companies are funding their own threat intelligence operations with this kind of capability in-house.

If you are an incident handler, sysadmin, researcher or simple want to be the next digital Sherlock Holmes you may also want to look into the Malware Analyst’s Cookbook and DVD: Tools and Techniques for Fighting Malicious Code book and the Practical Malware Analysis. Other relevant and free resources are the Dr. FU’s Security blog on Malware analysis tutorials. Thet Binary Auditing site which contains free IDA Pro training material. Finally, the malware analysis track in the Open Security Training site is awesome. It contains several training videos and material for free.

Count Upon Security

Increase security awareness. Promote, reinforce and learn security skills.

Tag Archives: Lenny Zeltser

Malicious Documents – PDF Analysis in 5 steps

Reverse-Engineering and Malware Analysis