Home Faster Malware Triage with YARA
Post
Cancel

Faster Malware Triage with YARA

As folks get into malware analysis they naturally develop their own personal style of triage process based on data that is usually important to them. For example, I go through a process to determine what kind of file I have in front of me and what identifying hashes come from that file that I can use in services like VirusTotal and MalwareBazaar to find details about the sample or similar ones. Once you do this enough, you’ll get a little unhappy with having to visit all the different tools that generate this output and want to consolidate your triage process down to a minimum number of tools. I revisit this stage periodically and think of ways I can get details using things like Python scripts. Today, I want to introduce you to a fast way to perform some triage using YARA.

My Triage Process is Good But it Can Be Better

YARA for the Impatient

YARA is an awesome tool and language developed and open-sourced by VirusTotal on GitHub. Most folks know it as a tool that can help you quickly determine whether files match byte or string patterns they’ve predefined in rules. A basic rule of this kind looks something like this:

1
2
3
4
5
6
7
8
9
10
11
rule ForensicITGuyString
{
    meta:
        description = "This is just an example"

    strings:
        $a = "ForensicITGuy"

    condition:
        $a
}

And you can use this rule in a command like this:

1
$ yara basic_rule.yar File_To_Test.exe

If the file you test using the yara utility and rule match the predefined pattern in the rule, the tool will produce output to that effect. This is the simplest use case, and from here malware analysts can make all sorts of amazing rules that match byte patterns to help them identify what is within the scanned file.

Expanding Use Cases to Generating Metadata

As you begin to tinker more with YARA, you’ll eventually learn about YARA modules, which provide interesting functionality beyond the simple byte patterns. For example, the “pe” YARA module defines a is_pe variable you can use to determine if a file is a Windows Portable Executable instead of performing checks manually on the first few bytes of a file. While the functionality is not quite as complex as languages like Python, Ruby, etc., you can use these modules to perform triage actions such as hashing a file and determining file type while giving you output.

The best part is that you don’t need much code, and it runs very fast even on large binaries. In this example, I want to determine a file type, mimetype, MD5, SHA-1, SHA-256, Import Table Hash (if present), and Rich Header Hash (if present) for a Windows Portable Executable file. I can do this in less than 25 lines of code in YARA:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import "pe"
import "console"
import "hash"
import "magic"

rule WhatIsIt {
    condition:
        console.log("File type:\t", magic.type()) and
        console.log("Mimetype:\t", magic.mime_type())        
}

rule BasicHashes {
    condition:
        console.log("MD5:\t", hash.md5(0,filesize)) and
        console.log("SHA-1:\t", hash.sha1(0,filesize)) and
        console.log("SHA-256:\t", hash.sha256(0,filesize))
}

rule PeHashes {
    condition:
        pe.is_pe and
        console.log("Imphash:\t", pe.imphash()) and
        console.log("Rich Header Hash:\t", hash.md5(pe.rich_signature.clear_data))
}

Using the “hash” and “magic” modules, we can calculate all the data we need and we can output it using “console”. In practice, I get this output with yara:

1
2
3
4
5
6
7
8
9
10
11
12
$ yara triage.yar sample.bin

File type:	PE32+ executable (DLL) (console) x86-64, for MS Windows
Mimetype:	application/x-dosexec
MD5:	7684a97f903ad72843cc1202b9700415
SHA-1:	64f3448fdba042bd2de11cbaffe0ddd8ab778903
SHA-256:	912cc2a3592b3b7835205d275cbf92bb66effc99cbd5cc338a223888de1b0d35
Imphash:	d8cf501f2ead6a968abf3df1e5f5d366
Rich Header Hash:	18a9047e952c4a05e803e12340dd45fd
WhatIsIt sample.bin
BasicHashes sample.bin
PeHashes sample.bin

To get equivalent data outside of this, I either have to write a tool or run a bunch of utilities on my REMnux machine using a shell script. Even then, I still have to write a script to get the Rich Header Hash. This is an awesome upgrade to my experience.

But The Speed?

What’s that you say? How fast does it go? Can it possibly compare to getting details from all the shell utilities in REMnux?

Yes it can. I timed this using a 645MB PE file and a shell script that gave me all the type details and hashes (minus the imphash and rich header hash).

1
2
3
4
5
6
7
8
9
10
11
12
13
$ ll -h husky.bin 
-rw-rw-r-- 1 remnux remnux 645M Jul 14 16:44 husky.bin

$ time ./triage.sh husky.bin 
husky.bin: PE32 executable (GUI) Intel 80386, for MS Windows
husky.bin: application/x-dosexec; charset=binary
b79a40df8c002fb6b97e3626d9250ec1  husky.bin
4dcd8458b06eeb9f96b2fc4bdabb6bac125436e5  husky.bin
46a3ce80e86ae93e888d57581cbf5c57eb9e1b26dc9b298ab9c9a4427ff53913  husky.bin

real	0m2.788s
user	0m1.785s
sys	0m0.991s

The time for the shell script was about 2.8 seconds, running in a REMnux VM. Now let’s hit the gas on YARA.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ ll -h husky.bin 
-rw-rw-r-- 1 remnux remnux 645M Jul 14 16:44 husky.bin

$ time yara triage.yar husky.bin 
File type:	PE32 executable (GUI) Intel 80386, for MS Windows
Mimetype:	application/x-dosexec
MD5:	b79a40df8c002fb6b97e3626d9250ec1
SHA-1:	4dcd8458b06eeb9f96b2fc4bdabb6bac125436e5
SHA-256:	46a3ce80e86ae93e888d57581cbf5c57eb9e1b26dc9b298ab9c9a4427ff53913
Imphash:	e81c04337118138a69c6d64241de5089
WhatIsIt husky.bin
BasicHashes husky.bin

real	0m2.388s
user	0m1.679s
sys	0m0.700s

By the way, it didn’t skip the rich header hash, it gracefully handled the non-presence of a rich header without me having to code around it. More data via YARA in less time. As with anything, there are a few caveats. Not all modules are available by default when you install yara, so you have to ensure your version of yara has them included. My version in REMnux has all the ones I’ve needed so far.

Where To Go From Here

Do you find yourself triaging a lot of Linux or .NET Framework malware? Check out what you can do with those modules, and see if you can make output with most of the data you’d get from a VirusTotal “Details” page before you even upload the sample.

Thanks for joining in, I hope it’s been helpful!

This post is licensed under CC BY 4.0 by the author.