Post

Bad Guys Hate This Trick for Malware Weight Loss!

Lately I’ve had to work with multiple malware samples that are extremely heavyweight in size. Usually about 300 MB and above, depending on the sample. This large sample size can significantly hinder analysis with sandboxes due to upload size restrictions, and it can even hinder analysis tools on your local system by causing them to slow down while processing a large file. In this post I’ll go over one trick I use to reduce that malware sample size to make it easier to analyze or submit to sandboxes.

Oh no, anyway

Triaging the large sample

For this post I’m using this 300 MB+ sample from VT: 218efc289854e3ef9086e9c3db36cf627d2171ceaece2c26085250c6203b31cd. Once we get it downloaded into our analysis machine we can take the triage steps:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
remnux@remnux:~/cases/heavyweight$ diec GoogleDrive.exe 
PE32
    Compiler: Microsoft Visual C/C++(2008)[libcmtd,wWinMain]
    Linker: Microsoft Linker(9.0)[GUI32]

remnux@remnux:~/cases/heavyweight$ exiftool GoogleDrive.exe 
ExifTool Version Number         : 12.42
File Name                       : GoogleDrive.exe
Directory                       : .
File Size                       : 321 MB
File Modification Date/Time     : 2022:10:15 16:01:45-04:00
File Access Date/Time           : 2022:10:15 16:02:27-04:00
File Inode Change Date/Time     : 2022:10:15 16:01:59-04:00
File Permissions                : -rw-rw-r--
File Type                       : Win32 EXE
File Type Extension             : exe
MIME Type                       : application/octet-stream
Machine Type                    : Intel 386 or later, and compatibles
Time Stamp                      : 2021:11:28 23:02:35-05:00
Image File Characteristics      : No relocs, Executable, 32-bit
PE Type                         : PE32
Linker Version                  : 9.0
Code Size                       : 200192
Initialized Data Size           : 4456960
Uninitialized Data Size         : 0
Entry Point                     : 0xabf0
OS Version                      : 5.0
Image Version                   : 0.0
Subsystem Version               : 5.0
Subsystem                       : Windows GUI
File Version Number             : 66.0.0.0
Product Version Number          : 4.0.0.0
File Flags Mask                 : 0x003f
File Flags                      : (none)
File OS                         : Windows NT 32-bit
Object File Type                : Executable application
File Subtype                    : 0

While running scanning commands like Detect-It-Easy you’ll probably notice a significant delay in getting output due to the sheer size of the sample. Usually when a malicious binary is this size, there is some form of garbage data appended to the end of the binary. The simplest way to do this is just append a bunch of zeroes/null bytes to the end. We can inspect this using xxd and tail.

1
2
3
4
5
6
7
8
9
10
11
remnux@remnux:~/cases/heavyweight$ xxd GoogleDrive.exe | tail
13222ce0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
13222cf0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
13222d00: 0000 0000 0000 0000 0000 0000 0000 0000  ................
13222d10: 0000 0000 0000 0000 0000 0000 0000 0000  ................
13222d20: 0000 0000 0000 0000 0000 0000 0000 0000  ................
13222d30: 0000 0000 0000 0000 0000 0000 0000 0000  ................
13222d40: 0000 0000 0000 0000 0000 0000 0000 0000  ................
13222d50: 0000 0000 0000 0000 0000 0000 0000 0000  ................
13222d60: 0000 0000 0000 0000 0000 0000 0000 0000  ................
13222d70: 0000 0000 0000 0000 0000 0000 0000 0000  ................

In addition, we can get some extra verification that the garbage data is zero-filled by performing an entropy measure. Shannon entropy measures the “randomness” of bytes in a file. Scores approaching 7.9 - 8.0 are usually encrypted, 5.0 - 7.0ish are usually compressed, and extremely low values indicate a lot of repeating data in a file. We can perform a simple entropy measure using diec.

1
2
3
4
5
6
7
remnux@remnux:~/cases/heavyweight$ diec --entropy GoogleDrive.exe 
Total 0.0212382: not packed
  0|PE Header|0|1024|2.2973: not packed
  1|Section(0)['.text']|1024|200192|6.11761: not packed
  2|Section(1)['.data']|201216|192000|7.70243: packed
  3|Section(2)['.rsrc']|393216|40960|5.76035: not packed
  4|Overlay|434176|320572800|4.50037e-09: not packed

The total entropy of the 300 MB binary is 0.02, which indicates there is a lot of repeated data in the file. Specifically looking the last row of output for an “Overlay”, the specific entropy for that portion is 0.00000000450037 (converted from scientific notation). This is a ridiculously low entropy value, and it indicates the data in that overlay portion appended to the binary is likely all a single value: zeroes.

Working hard for that weight loss with pecheck

Thankfully we don’t have to count calories to get this binary down to size! This post from Didier Stevens for SANS got me on track to using pecheck for lowering binary size. If you’re dealing with a signed, heavy binary, I highly recommend reading that post for the extra parts on removing the signature as well as the zeroes. To get this specific binary down to a manageable size, we can take a couple easy steps:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
remnux@remnux:~/cases/heavyweight$ pecheck GoogleDrive.exe 
PE check for 'GoogleDrive.exe':
Entropy: 0.021238 (Min=0.0, Max=8.0)
Size: 321006976
MD5     hash: 7a35755f3d17f119d7138c602a5842d1
SHA-1   hash: 632b2b4dcd42f45c8dc8108886319d93c01ac48d
SHA-256 hash: 218efc289854e3ef9086e9c3db36cf627d2171ceaece2c26085250c6203b31cd
SHA-512 hash: b7ede71413fe28c4aa44aba4527150f077fbb5a1f123e2f8b777c80a71b1408b8ce9acce70688d8b0f9ff2d92c6502ac232ce0483b597a274246d32cce4aee1c
.text entropy: 6.117600 (Min=0.0, Max=8.0)
.data entropy: 7.702422 (Min=0.0, Max=8.0)
.rsrc entropy: 5.760316 (Min=0.0, Max=8.0)
Dump Info:
----------Parsing Warnings----------

Byte 0x00 makes up 99.8878% of the file's contents. This may indicate truncation / malformation.

...

Overlay:
 Start offset:   0x0006a000
 Size:           0x131b8d80 305.7 MB 99.86%
 MD5:            a208289ffeff2be05f4489e9dfa9cd9e
 SHA-256:        80452ba8e588be2b7ed517c045403c7da638df0c02ac8d880330d7e41dc424e5
 MAGIC:          00000000 ....

...

Running pecheck with no options besides the binary name gives us confirmation that 305 MB of this file are all zeroes, and we can easily remove them with another pecheck command:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
remnux@remnux:~/cases/heavyweight$ pecheck -g s -D GoogleDrive.exe > lighter_GoogleDrive.exe

remnux@remnux:~/cases/heavyweight$ exiftool lighter_GoogleDrive.exe 
ExifTool Version Number         : 12.42
File Name                       : lighter_GoogleDrive.exe
Directory                       : .
File Size                       : 434 kB
File Modification Date/Time     : 2022:10:15 16:24:47-04:00
File Access Date/Time           : 2022:10:15 16:23:15-04:00
File Inode Change Date/Time     : 2022:10:15 16:24:47-04:00
File Permissions                : -rw-rw-r--
File Type                       : Win32 EXE
File Type Extension             : exe
MIME Type                       : application/octet-stream
Machine Type                    : Intel 386 or later, and compatibles
Time Stamp                      : 2021:11:28 23:02:35-05:00
Image File Characteristics      : No relocs, Executable, 32-bit
PE Type                         : PE32
Linker Version                  : 9.0
Code Size                       : 200192
Initialized Data Size           : 4456960
Uninitialized Data Size         : 0
Entry Point                     : 0xabf0
OS Version                      : 5.0
Image Version                   : 0.0
Subsystem Version               : 5.0
Subsystem                       : Windows GUI
File Version Number             : 66.0.0.0
Product Version Number          : 4.0.0.0
File Flags Mask                 : 0x003f
File Flags                      : (none)
File OS                         : Windows NT 32-bit
Object File Type                : Executable application
File Subtype                    : 0

The lighter_GoogleDrive.exe binary is much smaller than the original, and it’s still recognized as a Windows application. Now you can put the binary into sandboxes or any other analysis tools that have size limits! There’s just one caveat, as pointed out by David here:

The file hash for the lighter sample will indeed change from that of the original sample because the data within the sample has changed. The functionality of the sample will stay the same, just the hashes will be different. If you’re sharing the lighter sample with others, be kind and consider sharing the original hashes of the heavier sample as well.

Thanks for reading!

This post is licensed under CC BY 4.0 by the author.