Category Archives: Digital Forensics and Incident Response

Intro to Linux Forensics

This article is a quick exercise and a small introduction to the world of Linux forensics.  Below, I perform a series of steps in order to analyze a disk that was obtained from a compromised system that was running a Red Hat operating system. I start by recognizing the file system, mounting the different partitions, creating a super timeline and a file system timeline. I also take a quick look at the artifacts and then unmount the different partitions. The process of how to obtain the disk will be skipped but here are some old but good notes on how to obtain a disk image from a VMware ESX host.

When obtaining the different disk files from the ESX host, you will need the VMDK files. Then you move them to your Lab which could be simple as your laptop running a VM with SIFT workstation. To analyze the VMDK files you could use the “libvmdk-utils” package that contain tools to access data store in VMDK files. However, another approach would be to convert the VMDK file format into RAW format. In this way, it will be easier to run the different tools such as the tools from The Sleuth Kit – which will be heavily used – against the image. To perform the conversion, you could use the QEMU disk image utility. The picture below shows this step.

Following that, you could list the partition table from the disk image and obtain information about where each partition starts (sectors) using the “mmls” utility. Then, use the starting sector and query the details associated with the file system using the “fsstat” utility. As you could see in the image, the “mmls” and “fsstat” utilities are able to identify the first partition “/boot” which is of type 0x83 (ext4). However, “fsstat” does not recognize the second partition that starts on sector 1050624.

This is due to the fact that this partition is of type 0x8e (Logical Volume Manager). Nowadays, many Linux distributions use LVM (Logical Volume Manager) scheme as default. The LVM uses an abstraction layer that allows a hard drive or a set of hard drives to be allocated to a physical volume. The physical volumes are combined into logical volume groups which by its turn can be divided into logical volumes which have mount points and have a file system type like ext4.

With the “dd” utility you could easily see that you are in the presence of LVM2 volumes. To make them usable for our different forensic tools we will need to create device maps from the LVM partition table. To perform this operation, we start with “kpartx” which will automate the creation of the partition devices by creating loopback devices and mapping them. Then, we use the different utilities that manage LVM volumes such as “pvs”, “vgscan” abd “vgchange”. The figure below illustrates the necessary steps to perform this operation.

After activating the LVM volume group, we have six devices that map to six mount points that make the file system structure for this disk. Next step, mount the different volumes as read-only as we would mount a normal device for forensic analysis. For this is important to create a folder structure that will match the partition scheme.

After mounting the disk, you normally start your forensics analysis and investigation by creating a timeline. This is a crucial step and very useful because it includes information about files that were modified, accessed, changed and created in a human readable format, known as MAC time evidence (Modified, Accessed, Changed). This activity helps finding the particular time an event took place and in which order.

Before we create our timeline, noteworthy, that on Linux file systems like ext2 and ext3 there is no timestamp about the creation/birth time of a file. There are only 3 timestamps. The creation timestamp was introduced on ext4. The book “The Forensic Discovery 1st Edition”from Dan Farmer and Wietse Venema outlines the different timestamps:

  • Last Modification time. For directories, the last time an entry was added, renamed or removed. For other file types, the last time the file was written to.
  • Last Access (read) time. For directories, the last time it was searched. For other file types, the last time the file was read.
  • Last status Change. Examples of status change are: change of owner, change of access permission, change of hard link count, or an explicit change of any of the MACtimes.
  • Deletion time. Ext2fs and Ext3fs record the time a file was deleted in the dtime stamp.the filesystem layer but not all tools support it.
  • Creation time: Ext4fs records the time the file was created in crtime stamp but not all tools support it.

The different timestamps are stored in the metadata contained in the inodes. Inodes are the equivalent of MFT entry number on a Windows world. One way to read the file metadata on a Linux system is to first get the inode number using, for example, the command “ls -i file” and then you use “istat” against the partition device and specify the inode number. This will show you the different metadata attributes which include the time stamps, the file size, owners group and user id, permissions and the blocks that contains the actual data.

Ok, so, let’s start by creating a super timeline. We will use Plaso to create it. For contextualization  Plaso is a Python-based rewrite of the Perl-based log2timeline initially created by Kristinn Gudjonsson and enhanced by others. The creation of a super timeline is an easy process and it applies to different operating systems. However, the interpretation is hard. The last version of Plaso engine is able to parse the EXT version 4 and also parse different type of artifacts such as syslog messages, audit, utmp and others. To create the Super timeline we will launch log2timeline against the mounted disk folder and use the Linux parsers. This process will take some time and when its finished you will have a timeline with the different artifacts in plaso database format. Then you can convert them to CSV format using “psort.py” utility. The figure below outlines the steps necessary to perform this operation.

Before you start looking at the super timeline which combines different artifacts, you can also create a traditional timeline for the ext file system layer containing data about allocated and deleted files and unallocated inodes. This is done is two steps. First you generate a body file using “fls” tool from TSK. Then you use “mactime” to sort its contents and present the results in human readable format. You can perform this operation against each one of the device maps that were created with “kpartx”. For sake of brevity the image below only shows this step for the “/” partition. You will need to do it for each one of the other mapped devices.

Before we start the analysis, is important to mention that on Linux systems there is a wide range of files and logs that would be relevant for an investigation. The amount of data available to collect and investigate might vary depending on the configured settings and also on the function/role performed by the system. Also, the different flavors of Linux operating systems follow a filesystem structure that arranges the different files and directories in a common standard. This is known as the Filesystem Hierarchy Standart (FHS) and is maintained here. Its beneficial to be familiar with this structure in order to spot anomalies. There would be too much things to cover in terms of things to look but one thing you might want to run is the “chkrootkit” tool against the mounted disk. Chrootkit is a collection of scripts created by Nelson Murilo  and Klaus Steding-Jessen that allow you to check the disk for presence of any known  kernel-mode and user-mode rootkits. The last version is 0.52 and contains an extensive list of known bad files.

Now, with the supertimeline and timeline produced we can start the analysis. In this case, we go directly to timeline analysis and we have a hint that something might have happened in the first days of April.

During the analysis, it helps to be meticulous, patience and it facilitates if you have comprehensive file systems and operating system artifacts knowledge. One thing that helps the analysis of a (super)timeline is to have some kind of lead about when the event did happen. In this case, we got a hint that something might have happened in the beginning of April. With this information, we start to reduce the time frame of the (super)timeline and narrowing it down. Essentially, we will be looking for artifacts of interest that have a temporal proximity with the date.  The goal is to be able to recreate what happen based on the different artifacts.

After back and forth with the timelines, we found some suspicious activity. The figure below illustrates the timeline output that was produced using “fls” and “mactime”. Someone deleted a folder named “/tmp/k” and renamed common binaries such as “ping” and “ls” and files with the same name were placed in the “/usr/bin” folder.

This needs to be looked further. Looking at the timeline we can see that the output of “fls” shows that the entry has been deleted. Because the inode wasn’t reallocated we can try to see if a backup of the file still resides in the journaling. The journaling concept was introduced on ext3 file system. On ext4, by default, journaling is enabled and uses the mode “data=ordered”. You can see the different modes here. In this case, we could also check the options used to mount the file system. To do this just look at “/etc/fstab”. In this case we could see that the defaults were used. This means we might have a chance of recovering data from deleted files in case the gap between the time when the directory was deleted and the image was obtained is short. File system journaling is a complex topic but well explained in books like “File system forensics” from Brian Carrier. This SANS GCFA paper from Gregorio Narváez also covers it well.  One way you could  attempt to recover deleted data is using the tool “extundelete”.  The image below shows this step.

The recovered files would be very useful to understand more about what happened and further help the investigation. We can compute the files MD5’s, verify its contents and if they are known to the NSLR database or Virustotal. If it’s a binary we can do strings against the binary and deduce functionality with tools like “objdump” and “readelf”. Moving on, we also obtain and look at the different files that were created on “/usr/sbin” as seen in the timeline. Checking its MD5, we found that they are legitimate operating system files distributed with Red Hat. However, the files in “/bin” folder such as “ping” and “ls” are not, and they match the MD5 of the files recovered from “/tmp/k”.  Because some of the files are ELF binaries, we copy them to an isolated system in order to perform quick analysis. The topic of Linux ELF binary analysis would be for other time but we can easily launch the binary using “ltrace -i” and “strace -i” which will intercept and record the different function/system calls. Looking at the output we can easily spot that something is wrong. This binary doesn’t look the normal “ping” command, It calls the fopen() function to read the file “/usr/include/a.h” and writes to a file in /tmp folder where the name is generated with tmpnam(). Finally, it generates a segmentation fault. The figure below shows this behavior.

Provided with this information, we go back and see that this file “/usr/include/a.h” was modified moments before the file “ping” was moved/deleted. So, we can check when this “a.h” file was created – new timestamp of ext4 file system – using the “stat” command. By default, the “stat” doesn’t show the crtime timestamp but you can use it in conjunction with “debugfs” to get it. We also checked that the contents of this strange file are gibberish.

So, now we know that someone created this “a.h” file on April 8, 2017 at 16:34 and we were able to recover several other files that were deleted. In addition we found that some system binaries seem to be misplaced and at least the “ping” command expects to read from something from this “a.h” file. With this information we go back and look at the super timeline in order to find other events that might have happened around this time.  As I did mention, super timeline is able to parse different artifacts from Linux operating system. In this case, after some cleanup we could see that we have artifacts from audit.log and WTMP at the time of interest. The Linux audit.log tracks security-relevant information on Red Hat systems. Based on pre-configured rules, Audit daemon generates log entries to record as much information about the events that are happening on your system as possible. The WTMP records information about the logins and logouts to the system.

The logs shows that someone logged into the system from the IP 213.30.114.42 (fake IP) using root credentials moments before the file “a.h” was created and the “ping” and “ls” binaries misplaced.

And now we have a network indicator. Next step we would be to start looking at our proxy and firewall logs for traces about that IP address. In parallel, we could continue our timeline analysis to find additional artifacts of interest and also perform in-depth binary analysis of the files found, create IOC’s and, for example, Yara signatures which will help find more compromised systems in the environment.

That’s it for today. After you finish the analysis and forensic work you can umount the partitions, deactivate the volume group and delete the device mappers. The below picture shows this steps.

Linux forensics is a different and fascinating world compared with Microsoft Windows forensics. The interesting part (investigation) is to get familiar with Linux system artifacts. Install a pristine Linux system, obtain the disk and look at the different artifacts. Then compromise the machine using some tool/exploit and obtain the disk and analyze it again. This allows you to get practice. Practice these kind of skills, share your experiences, get feedback, repeat the practice, and improve until you are satisfied with your performance. If you want to look further into this topic, you can read “The Law Enforcement and Forensic Examiner’s Introduction to Linux” written by Barry J. Grundy. This is not being updated anymore but is a good overview. In addition, Hal Pomeranz has several presentations here and a series of articles written in the SANS blog, specially the 5 articles written about EXT4.

 

References:
Carrier, Brian (2005) File System Forensic Analysis
Nikkel, Bruce (2016) Practical Forensic Imaging

Tagged , , , ,

Extract and use Indicators of Compromise from Security Reports

apt-reports-1Every now and then there is a new security report released and with it comes a a wealth  of information about different threat actors offering insight about the attacker’s tactics, techniques and procedures. For example, in this article I wrote back in 2014, you have a short summary about some of the reports that were released publicly throughout the year. These reports allow the security companies to advertise their capabilities but on the other hand they are a great resource for network defenders. Some reports are more structured than others and they might contain different technical data. Nonetheless, almost all of them have IOC’s (Indicators of compromise).

For those who never heard about indicators of compromise let me give a brief summary. IOC’s are pieces of information that can be used to search and identify compromised systems. These pieces of information have been around since ages but the security industry is now using them in a more structural and consistent fashion. All types of enterprises are moving from the traditional way of handling security incidents – Wait for an alert to come in and then respond to it – to a more proactive approach which consists in taking the necessary steps to hunt for evil in order to defend their networks. In this new strategy, the IOCs are a key technical component. When someone compromises a system, they leave evidence behind.  That evidence, artifact or remnant piece of information left by an attacker can be used to identify the intrusion, threat group or the malicious actor. Examples of IOCs are IP addresses, domain names, URLs, email addresses, file hashes, HTTP user agents, registry keys, a service configuration change, a file is deleted, etc. With this information one could sweep the network/endpoints and look for indicators that the system might have been compromised. For more background about it you can read Lenny Zeltzer summary. Will Gragido from RSA explained it well in is 3 parts blog herehere and here. Mandiant also has this and this great articles about it.

So, if almost all the reports published contain IOC’s, is there a central place that contains all reports that were released by the different security organizations? Kiran Bandla created a repository on GitHub named APTnotes where he does exactly that. He maintains a list of all security reports released by the different vendors – “APTnotes is a repository of publicly-available papers and blogs (sorted by year) related to malicious campaigns/activity/software that have been associated with vendor-defined APT (Advanced Persistent Threat) groups and/or tool-sets.” Kiran, among other methods relies on the community who share with him when a report X or Y was released and he adds it to the APTnotes repository. The list of reports is available in CSV and JSON format.

So, what can you do with all this reports? How can we as network defenders use this information?

Three example use cases.  The first use case is more likely and common, where the other two might be more common in larger organizations with higher budget and bigger security appetite.

  • An organization that has a Security Operations Center in place could use the IOC’s to augment their existing monitoring and detection capabilities.
  • An organization that has in-house CSIRT capabilities could leverage the IOC’s in a proactive manner in order to have a higher probability of discovering something bad and, as such, reduce the business impact that a security incident might have in the organization.
  • An organization that has a Cyber Threat Intelligence capability in-house, could collect, process, exploit and analyze these reports. Then disseminate actionable information to the threat intelligence consumers throughout the organization.

In a simple manner, the process for the first scenario would look something like the following diagram:

ioc-loop

How would this work in practical terms? Normally, you could split the IOC’s in host based or network based. For example, a DNS name or IP addresses will be more effective to search across your network infrastructure. However, a Registry Key or a MD5 will be more likely to be searched across the endpoint.

For this article, I will focus on MD5’s. Some reports offer file hashes using SHA-1 or SHA-256 but not many organizations have the capability to search for this. MD5 is more common. Noteworthy that the value of MD5 hashes about a malicious file might be considered low.  A great article about the value of IOC’s and TTP’s was written back in 2014 from David Bianco title The Pyramid of Pain. Following that there is an article from Harlan Carvey with additional thoughts about it. Another point to take into consideration about the MD5’s from the reports is that some might be from legitimate files due to the usage of DLL hijacking or they are windows built-in commands used by threat actors. In addition is likely that malware used by the different threat actors is only used once and you might not see a MD5 hash a second time. Nonetheless, the MD5’s is a starting point.

So, how can we collect the reports, extract the IOC’s and convert them and use them?

First you can use a python script to download all the reports in a central place separated per year. Then you can use the tool IOC parser written by Armin Buescher. This tool will be able to parse PDF reports and extract IOC’s into CSV format. From here you can extract the relevant IOC’s. For this example, I want to extract the MD5’s and then use IOC writer to create IOC’s in OpenIOC 1.0/1.1  format which could be used with a tool such as Redline.  IOC writer is a python library written by William Gibb that allows you to manipulate IOC’s in OpenIOC 1.1 and 1.0 format and was released on BlackHat 2013.

The steps necessary to perform this are illustrated below – for sure there are  other better ways to perform this but this was a quick way to do the job -.

extract-create-ioc

With this we created a series of files with  .ioc extension that can be further edited with the ioc_writer Python library. However, not many tools support OpenIOC, so you might just use the MD5’s and feed that into whatever tool or format you are using. You can download below the excel with all the MD5’s separated per year.  If you use them, please bare in mind you might have false positives and you will need to go back to the csv files to understand from which report did the MD5 came from. For example I  already removed half dozen of MD5’s that are legitimate files but seen in the reports, and also the MD5 for an empty file “d41d8cd98f00b204e9800998ecf8427e”

That’t it, with this you can create a custom IOC set that contain MD5’s of different tools, malware families and files that was compiled by extracting the MD5’s from the public reports about targeted attacks. From here you can start building more complex IOC’s with different artifacts based on a specific report or threat actor. Maybe you get lucky and you find evil on your network!

The year 2010 contains 81 unique MD5’s.
The year 2011 contains 96 unique MD5’s.
The year 2012 contains 718 unique MD5’s.
The year 2013 contains 2149 unique MD5’s
The year 2014 contains 1306 unique MD5’s
The year 2015 contains 1553 unique MD5’s
The year 2016 contains 1173 unique MD5’s

Excel : md5-aptnotes-2010-2016

Tagged

Digital Forensics – DLL Search Order

Following our series of posts on Digital Forensics we will continue our journey about analyzing our compromised system. In the last two articles we covered Windows Prefetch and Shimcache. Among other things, we wrote that Windows Prefetch and ShimCache artifacts are useful to find evidence about executed files and executables that were on the system but weren’t execute. While doing our investigation and looking at these artifacts, the Event Logs and the SuperTimeline, we found evidence that REGEDIT.EXE was executed. In addition, from the Prefetch artifacts we saw this execution invoked a DLL called CLB.DLL from the wrong path. On Windows operating systems CLB.DLL is located under %SYSTEMROOT%\System32.  In this case CLB.DLL was invoked from %SYSTEMROOT%.

However, when we looked inside the %SYSTEMROOT% folder and we could not find any traces of the CLB.DLL file. This raised the following questions:

  • How did this file got loaded from the wrong PATH?
  • Did file got deleted by the attacker?

Let’s answer the first question.

Inside PE files there is a structure called Import Address Table (IAT) that contains the addresses of the library routines that are imported from DLL’s. When an application is launched the operating system will check this table to understand which routines are needed and from which DLL’s. For example, when I execute REGEDIT.EXE the binary has a set of dependencies needed in order to execute.  To see this dependencies, you can look at the IAT. On Windows you could use dumpbin.exe /IMPORTS or on REMNUX you could use pedump as illustrated below.

dllsearchorder-regiat

But from where will this DLL’s be loaded from? The operating system will locate the required DLL’s by searching a specific set of directories in a particular order. This is known as the DLL Search Order and is explained here. This mechanism can and has been abused frequently by attackers to plant a malicious DLL inside a directory that is part of the DLL Search Order mechanism. This will trick the Windows operating system to load the malicious DLL instead of the real one.  The DLL Search Order by default on Windows XP and above is the following:

  • The directory from which the application loaded.
  • The current directory.
  • The system directory.
  • The 16-bit system directory.
  • The Windows directory.
  • The directories that are listed in the PATH environment variable.

Not all DLL’s will be found using the DLL Search Order. There is a mechanism known as KnownDLLs Registry Key which contains a list of important DLL’s that will be invoked without consulting the DLL Search Order. This key is stored in the registry location HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\KnownDLLs.

Throughout the years Microsoft patched some of the problems with DLL Search Order mechanism and also introduced some improvements. One is the Safe DLL Search Order Registry which changes the order and moves the search of “The Current Directory” to the bottom making harder for the attacker without admin rights to plant a DLL in a place that will be searched first. This feature is controlled by the registry key HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SafeDllSearchMode.

Bottom-line, this technique is known as DLL pre-loading, side-loading or hijacking and is an attack vector used to takeover a DLL and escalate privileges or achieve persistence by taking advantage of the way DLL’s are searched. This technique can be pulled off by launching an executable that is not in %SYSTEMROOT%\System32 – like our REGEDIT.EXE – or by leveraging weak Directory Access Control Lists (DACLS) and dropping a malicious DLL in the appropriate folder. In addition, for this technique to work the malicious DLL should contain the same exported functions and functionality has the hijacked DLL or work as proxy in order to ensure the executed program will run properly.  The picture below shows the routines that are exported by the malicious DLL. As you could see these are the same functions like the ones required by REGEDIT.EXE from the CLB.DLL.

dllsearchorder-iat

To further understand the details, you might want to read a write-up on leveraging this technique to escalate privileges described by Parvez Anwar here and to achieve persistence described by Nick Harbour here. Microsoft also gives guidance to developers on how to make applications more resistant to this attacks here.

Considering the REGEDIT.EXE example we can see from where the DLL’s are loaded on a pristine system using Microsoft Windows debugger like CDB.EXE.  Here we can see that CLB.DLL is loaded from %SYSTEMROOT%\System32.

dllsearchorder-regedit

We have now a understanding about how that DLL file might have been loaded. DLL sideloading is a clever technique that load malicious code and is often used and abused to either escalate privileges or to achieve persistence. We found evidences of it using the Prefetch artifacts but without Prefetch e.g., a Windows Server, this won’t be so easy to find and we might need to rely on other sources of evidence like we saw on previous articles. Based on the evidence we observed we consider that the attacker used DLL sideloading technique to hijack CLB.DLL and execute malicious code when invoking REGEDIT.EXE. However, we could not find this DLL file on our system. We will need to look deeper and use different tools and techniques that help us find evidence about it and answer the question we raised in the begging. This will be the topic of the upcoming article!

 

References:
Luttgens, J., Pepe, M., Mandia, K. (2014) Incident Response & Computer Forensics, 3rd Edition
Carvey, H. (2014) Windows Forensic Analysis Toolkit, 4th Edition
Russinovich, M. E., Solomon, D. A., & Ionescu, A. (2012). Windows internals: Part 1
Russinovich, M. E., Solomon, D. A., & Ionescu, A. (2012). Windows internals: Part 2

Tagged , , ,

Digital Forensics – ShimCache Artifacts

shimcacheFollowing our last article about the Prefetch artifacts we will now move into the Windows Registry. When conducting incident response and digital forensics on Windows operating systems one of the sources of evidence that is normally part of every investigation is the Windows Registry.  The Windows Registry is an important component of the OS and applications functionality, maintains many aspects of its configuration and plays a key role on its performance. As written by Jerry Honeycutt on his books the Windows Registry is the heart and soul of modern Windows operating systems. The Windows Registry is a topic for a book on its own, either from a systems or a forensics perspective. One great example is the book “Windows Registry Forensics 2nd Edition“ from Harlan Carvey.

In any case, from a forensics perspective, the Windows registry is a treasure trove of valuable artifacts. Among these artifacts you might be looking at System and Configuration Registry Keys, Common Auto-Run Registry Keys, User Hive Registry keys or the Application Compatibility Cache a.k.a. ShimCache.

In this article we will look into the Application Compatibility Cache a.k.a. ShimCache. When performing Live Response or dead box forensics on Windows operating systems one of the many artifacts that might be of interest when determining which files have been executed and were accessed is the ShimCache. In our last article we mentioned the Prefetch where you could get evidence about a specific file being executed on the system. However, on Windows Servers operating systems, the Prefetch is disabled by default. This means the ShimCache is a great alternative and also a valuable source of evidence.

Let’s start with some background about the ShimCache. Microsoft introduced the ShimCache in Windows 95 and it remains today a mechanism to ensure backward compatibility of older binaries into new versions of Microsoft operating systems. When new Microsoft operating systems are released some old and legacy application might break. To fix this Microsoft has the ShimCache which acts as a proxy layer between the old application and the new operating system. A good overview about what is the ShimCache is available on the Microsoft Blog on an article written by Tim Newton “Demystifying Shims – or – Using the App Compat Toolkit to make your old stuff work with your new stuff“.

The interesting part is that from a forensics perspective the ShimCache is valuable because the cache tracks metadata for binary that was executed and stores it in the ShimCache.  Nevertheless, it wasn’t until 2012 when Andrew Davis wrote ” Leveraging the Application Compatibility Cache in Forensic Investigations” and released the ShimCache Parser tool that the value of this evidence source came widely known. This was a novel paper because Andrew made available a tool that could extract from the registry information about the ShimCache that is valuable for an investigation.  The paper outlines the internals of the ShimCache and where the data resides on the different Windows operating systems.

On Windows XP this data structure is stored under the registry key HKLM\CurrentControlSet\Control\Session Manager\AppCompatibility\AppCompatCache. On recent Windows the ShimCache data is stored under the registry key  HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\AppCompatCache\AppCompatCache

In the ShimCache we can obtain information about all executed binaries that have been executed in the system since it was rebooted and it tracks its size and the last modified date. In addition the ShimCache tracks executables that have not been executed but were browsed for example through explorer.exe. This makes a valuable source of evidence for example to track executables that were on the system but weren’t executed – consider an attacker that used a directory on a system to move around his toolkit.

On Windows XP the ShimCache maints up to 96 entries but on Windows 7 and earlier the ShimCache can maintain up to 1024 entries. Using the ShimCache Parser we can parse and view its contents. We only need to point to the SYSTEM registry hive file on our mounted evidence as illustrated below.

shimcache-parser

Nonetheless, the ShimCache as one drawback. The information is retained in memory and is only written to the registry when the system is shutdown. This impacts the ability of getting this source of evidence when conducting live response. To address this limitation Fred House, Claudiu Teodorescu, Andrew Davis wrote a Volatility plugin to read the ShimCache from memory. The plugin supports Windows XP SP2 through Windows 2012 R2 on both 32 and 64 bit architectures. This plugin won the volatility plugin contest of 2015. A write-up about it is available here and here. The plugin can be downloaded from the Volatility Community plugins page.  The picture below illustrates the usage of Volatility with the ShimCacheMem plugin against the memory of the analyzed system.

shimcache-volatility

By looking at the ShimCache either directly from memory or by querying the registry after system shutdown we can – in this case – confirm the evidence found in the Prefetch artifacts. On a Windows Server system because by default the Prefetch is disabled the ShimCache becomes a more valuable artifact.

Given the availability of this artifact across all Windows operating systems, the information obtained from the ShimCache can be valuable to an investigation. In this case, the ShimCache supported the findings of Prefetch on regedit.exe and rundll32.exe being executed on the system.

There are more artifacts associated with this feature. In 2013, Corey Harrell wrote on his blog his findings about the Windows 7 RecentFileCache.bcf file. Essentially, this file is maintained in %SYSTEMROOT%\AppCompat\Programs\ directory and keeps metadata (PATH and filename) about executable that are new in the system since the last time the service Application Experience was run. Yogesh Khatri, continued to research Corey findings and found that on Windows 8 this file has been replaced with a registry HIVE called amcache.hve which contains more metadata. From this file you can retrieve for every executable that run on the system the PATH, last modification time & created,  SHA1 and PE properties. Meanwhile, Yogesh noted that on Windows 7 you could also have the amcache.hve if you have installed KB2952664. To read the amcache HIVE you could use RegRipper or Willi Ballenthin stand-alone script.

The ShimCache has not only been used from a defensive perspective. From a offensive perspective, the ShimCache has been used several times by attacker. One of the best resources I’ve come across about the ShimCache is the website “sdb.tools” created by Sean Pierce dedicated to Application Compatibility database research and where he maintains his research and lists different tools, papers and talks.

That’s it, we went over a brief explanation on what is ShimCache, its artifacts, where to find it in memory and in the registry and which tools to use to obtain information from it. Next, we will go back to our SuperTimeline and continue our investigation.

 

References:
Luttgens, J., Pepe, M., Mandia, K. (2014) Incident Response & Computer Forensics, 3rd Edition
SANS 508 – Advanced Computer Forensics and Incident Response

Tagged , , , , ,

Digital Forensics – Prefetch Artifacts

prefetchIt has been a while since my last post on digital forensics about an investigation on a Windows host. But it’s never too late to start where we left. In this post we will continue our investigation and look into other digital artifacts of interest. To summarize what we have in this series of posts:

Following the last post, based on the different Event Log ID’s across our super timeline we found evidence that someone used Remote Desktop to login into the system as Administrator. Moments after, the Administrator made a modification to a “svchosted” system service, changing it to interactive and running it. There is also evidence that this service crashed.  Based on the time when all these events occurred we will look into other artifacts in our Super Timeline around the same timeframe.

The leads that were obtained from Windows Event Log’s will facilitate the analysis of our Super Timeline going forward. This allow us to reduce the time frame and narrowing it down. Essentially we will be looking for artifacts of interest that have a temporal proximity with the event of the administrator logging in.  The goal is to investigate further understand what has happened and what actions did the attacker did on our system.

That being written, which artifacts should we look next? We will start with Prefetch.

Let’s start with a brief explanation about Windows Prefetch. To improve customer experience, Microsoft introduced a memory management technology called Prefetch. This functionality was introduced into Windows XP and Windows 2003 Server. This mechanism analyses the applications that are most frequently used and preloads them in advance in order speed the operating system booting and application launching. On Windows Vista, Microsoft enhanced the algorithm and introduced SuperFetch which is an improved version of Prefetch.

The Prefetch files are stored in %SYSTEMROOT%\Prefetch directory and have a .pf extension. The Prefetch settings that are enforced can be retrieved from the Windows registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\PrefetchParameters. You can read this setting easily with the brilliant tool created by Harlan Carvey called RegRipper and use the Prefetch plugin.

prefetch-rip

On Windows XP and 7, there are a maximum of 128 .pf files. On Windows 8 this value can reach a maximum of 1024 .pf files. The file names are stored using the convention <executable filename>-<Prefetch hash>.pf.  Worth to mention that this mechanism is disabled by default on Server operating systems. In addition there have been cases where SSD drives have this mechanism disabled.

The internal format of windows Prefetch files have been documented extensively by Joachim Metz here. This documentation is part of libscca which is a library to access the Windows Prefetch File (SCCA) created by Joachim and can be used parse and read the Prefetch files from different operating systems. Matt Bromiley created a ready available script based on this library to read Windows 10 Prefetch files.

So,  other than the Prefetch internals what is the forensic value of the Prefetch artifacts’? The answer is that the Prefetch files keep track of programs that have been executed in the system even if the original file is no longer present. In addition Prefetch files can tell you when the program was executed, how many times and from which path.

Now that we have a basic understanding about Prefetch we will go back to our Super timeline analysis. Plaso supports Windows Prefetch events which we can be corroborated with FILE events and the Windows Event Logs. From the previous blog post analysis we were able to see evidence of when the Administrator logged into the system. And we know that when a user logs into the system, the initial Session Manager (smss.exe) creates a new instance of itself to configure the new session. The new Smss.exe process starts a Windows subsystem process (csrss.exe) and Winlogon process (winlogon.exe) for the new session. This sequence of Windows Events ID 4688 – you need audit process tracking enabled to view this one – can be corroborated with the WinPrefetch and FILE artifacts based on the temporal proximity. The picture below illustrates these events that were retrieved from the Super Timeline moments after the logon events.

prefetch-supertimeline

When combining the Events Logs, Windows Prefetch and File artifacts we start to bring together the different pieces of the puzzle and a start to form a picture of what happened. After looking at the result of the consolidated artifacts of interest in a single view and remove the noise we can conclude that:  On the 9th of October, around 22:42:22 the user account Administrator was used to login into the system. The login was performed from a system with the IP 189.62.10.9 (Brazil). Then based on the Prefetch artifact we have evidence that the user Administration executed the application rundll32.exe following by regedit.exe. Next, the Windows system service FastUserSwitchingCompatibility service  was changed and started. Finally a file cache.txt was created in the system. All this events are supported by evidence retrieved from our Super Timeline and can be shown in the following picture.

prefetch-supertimeline2

Now, with the consolidated view of the different artifacts we continue pursuing the investigation and looking further into those prefetch files. To determine when was the first and the most recent time the binary was executed you can use istat from The Sleuth Kit to read the NTFS metadata entry for the file to get the timestamps.

prefetch-metadata

In addition, to the execution times, we still need to read the content of the Prefetch files itself. These are useful because in addition to the last execution and how many times they were run, it includes the full path and files loaded by the application in the first 10 seconds of its execution. An exception to this is the NTOSBOOT-B00DFAAD.pf which exists on all systems and keeps tracks of the files accessed during the first 120 seconds of the boot process. The full path and files are valuable because they might show evidence of files being loaded from different paths or invoking unknown binaries.

To read the contents of the Prefetch files you could run strings with little endian encoding  (strings -e l) as a quick and dirty method to view its contents. A more robust method would be o use Joachim Metz libssca and create a Python script to read it.

For a readily available Python tool you could use Python script written by the researcher who uses as handler PoorBillionaire which is available here.

prefetch-contents

If you are a windows user then you could compile and use Eric Zimmerman’s Windows Prefetch parser which supports all known versions from Windows XP to Windows 10. In conjunction with the library Eric also released PECmd which is version 0.6 as the time of this writing.  Another method is to use the Windows freeware tool from NIRSOFT called WinPrefetchView. With the /folder suffix you can point to the Prefetch folder of your mounted evidence and see its contents

prefetch-winprefetch

In this case when looking at the Prefetch files from the regedit.exe we found evidence that regedit.exe invoked a DLL called clb.dll from a location that is not supposed to.  This give us another lead to pursue our investigation goals.

That’s it! In this article we reviewed some introductory concepts about the Windows Prefetch files, it’s forensic value and gave good references about tools to parse it. Next step, continue the investigation and review more artifacts’.

 

References:
Luttgens, J., Pepe, M., Mandia, K. (2014) Incident Response & Computer Forensics, 3rd Edition
Carvey, H. (2014) Windows Forensic Analysis Toolkit, 4th Edition
SANS 508 – Advanced Computer Forensics and Incident Response

Tagged , , , ,

Unleashing YARA – Part 3

yara-logoIn the second post of this series we introduced an incident response challenge based on the static analysis of a suspicious executable file. The challenge featured 6 indicators that needed to be extracted from the analysis in order to create a YARA rule to match the suspicious file. In part 3 we will step through YARA’s PE, Hash and Math modules functions and how they can help you to meet the challenge objectives. Lets recap the challenge objectives and map it with the indicators we extracted from static analysis:

  1. a suspicious string that seems to be related with debug information
    • dddd.pdb
  2. the MD5 hash of the .text section
    • 2a7865468f9de73a531f0ce00750ed17
  3. the .rsrc section with high entropy
    • .rsrc entropy is 7.98
  4. the symbol GetTickCount import
    • Kernel32.dll GetTickCount is present in the IAT
  5. the rich signature XOR key
    • 2290058151
  6. must be a Windows executable file
    • 0x4D5A (MZ) found at file offset zero

In part 2 we created a YARA rule file named rule.yar, with the following content:

import "pe"

If you remember the exercise, we needed the PE module in order to parse the sample and extract the Rich signature XOR key. We will use this rule file to develop the remaining code.

The debug information string

In part 1 I have introduced YARA along with the rule format, featuring the strings and condition sections. When you add the dddd.pdb string condition the rule code should be something like:

yara_3_1

The code above depicts a simple rule object made of a single string variable named $str01 with the value set to the debug string we found.

The section hash condition

Next item to be added to the condition is the .text section hash, using both PE and HASH modules. To do so we will iterate over the PE file sections using two PE module functions: the number_of_sections and sections. The former will be used to iterate over the PE sections, the latter will allow us to fetch section raw_data_offset, or file offset, and raw_data_size, that will be passed as arguments to md5 hash function, in order to compute the md5 hash of the section data:

yara_3_2

The condition expression now features the for operator comprising two conditions: the section md5 hash and the section name. In essence, YARA will loop through every PE section until it finds a match on the section hash and name.

The resource entropy value

Its now time to add the resource entropy condition. To do so, we will rely on the math module, which will allow us to calculate the entropy of a given size of bytes. Again we will need to iterate over the PE sections using two conditions: the section entropy and the section name (.rsrc):

yara_3_3

Again we will loop until we find a match, that is a section named .rsrc with entropy above or equal to 7.0. Remember that entropy minimum value is 0.0 and maximum is 8.0, therefore 7.0 is considered high entropy and is frequently associated with packing [1]. Bear in mind that compressed data like images and other types of media can display high entropy, which might result in some false positives [2].

The GetTickCount import

Lets continue improving our YARA rule by adding the GetTickCount import to the condition. For this purpose lets use the PE module imports function that will take two arguments: the library and the DLL name. The GetTickCount function is exported by Kernel32.DLL, so when we passe these arguments to the pe.imports function the rule condition becomes:

yara_3_4

Please note that the DLL name is case insensitive [3].

The XOR key

Our YARA rule is almost complete, we now need to add the rich signature key to the condition. In this particular case the PE module provides the rich_signature function which allow us to match various attributes of the rich signature, in this case the key. The key will be de decimal value of dword used to encode the contents with XOR:

yara_3_5

Remember that the XOR key can be obtained either by inspecting the file with a hexdump of the PE header or using YARA PE module parsing capabilities, detailed in part 2 of this series.

The PE file type

Ok, we are almost done. The last condition will ensure that the file is a portable executable file. In part two of this series we did a quick hex dump of the samples header, which revealed the MZ (ASCII) at file offset zero, a common file signature for PE files. We will use the YARA int## functions to access data at a given position. The int## functions read 8, 16 and 32 bits signed integers, whereas the uint## reads unsigned integers. Both 16 and 32 bits are considered to be little-endian, for big-endian use int##be or uint##be.

Since checking only the first two bytes of the file can lead to false positives we can use a little trick to ensure the file is a PE, by looking for particular PE header values. Specifically we will check for the IMAGE_NT_HEADER Signature member, a dword with value “PE\0\0”. Since the signature file offset is variable we will need to rely on the IMAGE_DOS_HEADER e_lfanew field. e_lfanew value is the 4 byte physical offset of the PE Signature and its located at physical offset 0x3C [4].

With the conditions “MZ” and “PE\0\0” and respective offsets we will use uint16 and uint32 respectively:

yara_3_6

Note how we use the e_lfanew value to pivot the PE Signature, the first uint32 function output, the 0x3C offset, is used as argument in the second uint32 function, which must match the expected value “PE\0\0”.

Conclusion

Ok! We are done, last step is to test the rule against the file using the YARA tool and our brand new rule file rule.yar:

yara_3_7

YARA scans the file and, as expected, outputs the rule matched rule ID, in our case malware001.

A final word on YARA performance

While YARA performance might be of little importance if you are scanning a dozen of files, poorly written rules can impact significantly when scanning thousands or millions of files. As a rule of thumb you are advised to avoid using regex statements. Additionally you should ensure that false conditions appear first in the rules condition, this feature is named short-circuit evaluation and it was introduced in YARA 3.4.0 [5]. So how can we improve the rule we just created, in order to leverage YARA performance? In this case we can move the last condition, the PE file check signature, to the top of the statement, by doing so we will avoid checking for the PE header conditions if the file is an executable (i.e. PDF, DOC, etc). Lets see how the new rule looks like:

yara_3_8

If you like to learn more about YARA performance, check the Yara performance guidelines by Florian Roth, as it features lots of tips to keep your YARA rules resource friendly.

References

  1. Structural Entropy Analysis for Automated Malware Classification
  2. Practical Malware Analysis, The Hands-On Guide to Dissecting Malicious Software, Page 283.
  3. YARA Documentation v.3.4.0, PE Module
  4. The Portable Executable File Format
  5. YARA 3.4.0 Release notes
Tagged , ,