Bad Guys Hate This Trick for Malware Weight Loss!
Lately I’ve had to work with multiple malware samples that are extremely heavyweight in size. Usually about 300 MB and above, depending on the sample. This large sample size can significantly hinder analysis with sandboxes due to upload size restrictions, and it can even hinder analysis tools on your local system by causing them to slow down while processing a large file. In this post I’ll go over one trick I use to reduce that malware sample size to make it easier to analyze or submit to sandboxes.
Triaging the large sample
For this post I’m using this 300 MB+ sample from VT: 218efc289854e3ef9086e9c3db36cf627d2171ceaece2c26085250c6203b31cd. Once we get it downloaded into our analysis machine we can take the triage steps:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
remnux@remnux:~/cases/heavyweight$ diec GoogleDrive.exe
PE32
Compiler: Microsoft Visual C/C++(2008)[libcmtd,wWinMain]
Linker: Microsoft Linker(9.0)[GUI32]
remnux@remnux:~/cases/heavyweight$ exiftool GoogleDrive.exe
ExifTool Version Number : 12.42
File Name : GoogleDrive.exe
Directory : .
File Size : 321 MB
File Modification Date/Time : 2022:10:15 16:01:45-04:00
File Access Date/Time : 2022:10:15 16:02:27-04:00
File Inode Change Date/Time : 2022:10:15 16:01:59-04:00
File Permissions : -rw-rw-r--
File Type : Win32 EXE
File Type Extension : exe
MIME Type : application/octet-stream
Machine Type : Intel 386 or later, and compatibles
Time Stamp : 2021:11:28 23:02:35-05:00
Image File Characteristics : No relocs, Executable, 32-bit
PE Type : PE32
Linker Version : 9.0
Code Size : 200192
Initialized Data Size : 4456960
Uninitialized Data Size : 0
Entry Point : 0xabf0
OS Version : 5.0
Image Version : 0.0
Subsystem Version : 5.0
Subsystem : Windows GUI
File Version Number : 66.0.0.0
Product Version Number : 4.0.0.0
File Flags Mask : 0x003f
File Flags : (none)
File OS : Windows NT 32-bit
Object File Type : Executable application
File Subtype : 0
While running scanning commands like Detect-It-Easy you’ll probably notice a significant delay in getting output due to the sheer size of the sample. Usually when a malicious binary is this size, there is some form of garbage data appended to the end of the binary. The simplest way to do this is just append a bunch of zeroes/null bytes to the end. We can inspect this using xxd
and tail
.
1
2
3
4
5
6
7
8
9
10
11
remnux@remnux:~/cases/heavyweight$ xxd GoogleDrive.exe | tail
13222ce0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
13222cf0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
13222d00: 0000 0000 0000 0000 0000 0000 0000 0000 ................
13222d10: 0000 0000 0000 0000 0000 0000 0000 0000 ................
13222d20: 0000 0000 0000 0000 0000 0000 0000 0000 ................
13222d30: 0000 0000 0000 0000 0000 0000 0000 0000 ................
13222d40: 0000 0000 0000 0000 0000 0000 0000 0000 ................
13222d50: 0000 0000 0000 0000 0000 0000 0000 0000 ................
13222d60: 0000 0000 0000 0000 0000 0000 0000 0000 ................
13222d70: 0000 0000 0000 0000 0000 0000 0000 0000 ................
In addition, we can get some extra verification that the garbage data is zero-filled by performing an entropy measure. Shannon entropy measures the “randomness” of bytes in a file. Scores approaching 7.9 - 8.0 are usually encrypted, 5.0 - 7.0ish are usually compressed, and extremely low values indicate a lot of repeating data in a file. We can perform a simple entropy measure using diec
.
1
2
3
4
5
6
7
remnux@remnux:~/cases/heavyweight$ diec --entropy GoogleDrive.exe
Total 0.0212382: not packed
0|PE Header|0|1024|2.2973: not packed
1|Section(0)['.text']|1024|200192|6.11761: not packed
2|Section(1)['.data']|201216|192000|7.70243: packed
3|Section(2)['.rsrc']|393216|40960|5.76035: not packed
4|Overlay|434176|320572800|4.50037e-09: not packed
The total entropy of the 300 MB binary is 0.02, which indicates there is a lot of repeated data in the file. Specifically looking the last row of output for an “Overlay”, the specific entropy for that portion is 0.00000000450037
(converted from scientific notation). This is a ridiculously low entropy value, and it indicates the data in that overlay portion appended to the binary is likely all a single value: zeroes.
Working hard for that weight loss with pecheck
Thankfully we don’t have to count calories to get this binary down to size! This post from Didier Stevens for SANS got me on track to using pecheck
for lowering binary size. If you’re dealing with a signed, heavy binary, I highly recommend reading that post for the extra parts on removing the signature as well as the zeroes. To get this specific binary down to a manageable size, we can take a couple easy steps:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
remnux@remnux:~/cases/heavyweight$ pecheck GoogleDrive.exe
PE check for 'GoogleDrive.exe':
Entropy: 0.021238 (Min=0.0, Max=8.0)
Size: 321006976
MD5 hash: 7a35755f3d17f119d7138c602a5842d1
SHA-1 hash: 632b2b4dcd42f45c8dc8108886319d93c01ac48d
SHA-256 hash: 218efc289854e3ef9086e9c3db36cf627d2171ceaece2c26085250c6203b31cd
SHA-512 hash: b7ede71413fe28c4aa44aba4527150f077fbb5a1f123e2f8b777c80a71b1408b8ce9acce70688d8b0f9ff2d92c6502ac232ce0483b597a274246d32cce4aee1c
.text entropy: 6.117600 (Min=0.0, Max=8.0)
.data entropy: 7.702422 (Min=0.0, Max=8.0)
.rsrc entropy: 5.760316 (Min=0.0, Max=8.0)
Dump Info:
----------Parsing Warnings----------
Byte 0x00 makes up 99.8878% of the file's contents. This may indicate truncation / malformation.
...
Overlay:
Start offset: 0x0006a000
Size: 0x131b8d80 305.7 MB 99.86%
MD5: a208289ffeff2be05f4489e9dfa9cd9e
SHA-256: 80452ba8e588be2b7ed517c045403c7da638df0c02ac8d880330d7e41dc424e5
MAGIC: 00000000 ....
...
Running pecheck
with no options besides the binary name gives us confirmation that 305 MB of this file are all zeroes, and we can easily remove them with another pecheck
command:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
remnux@remnux:~/cases/heavyweight$ pecheck -g s -D GoogleDrive.exe > lighter_GoogleDrive.exe
remnux@remnux:~/cases/heavyweight$ exiftool lighter_GoogleDrive.exe
ExifTool Version Number : 12.42
File Name : lighter_GoogleDrive.exe
Directory : .
File Size : 434 kB
File Modification Date/Time : 2022:10:15 16:24:47-04:00
File Access Date/Time : 2022:10:15 16:23:15-04:00
File Inode Change Date/Time : 2022:10:15 16:24:47-04:00
File Permissions : -rw-rw-r--
File Type : Win32 EXE
File Type Extension : exe
MIME Type : application/octet-stream
Machine Type : Intel 386 or later, and compatibles
Time Stamp : 2021:11:28 23:02:35-05:00
Image File Characteristics : No relocs, Executable, 32-bit
PE Type : PE32
Linker Version : 9.0
Code Size : 200192
Initialized Data Size : 4456960
Uninitialized Data Size : 0
Entry Point : 0xabf0
OS Version : 5.0
Image Version : 0.0
Subsystem Version : 5.0
Subsystem : Windows GUI
File Version Number : 66.0.0.0
Product Version Number : 4.0.0.0
File Flags Mask : 0x003f
File Flags : (none)
File OS : Windows NT 32-bit
Object File Type : Executable application
File Subtype : 0
The lighter_GoogleDrive.exe
binary is much smaller than the original, and it’s still recognized as a Windows application. Now you can put the binary into sandboxes or any other analysis tools that have size limits! There’s just one caveat, as pointed out by David here:
The downside is the file hash will change.
— David Ledbetter (@Ledtech3) October 13, 2022
So you may want to add a comment with the original hash before cleaning the null padding. ? 🤔
The file hash for the lighter sample will indeed change from that of the original sample because the data within the sample has changed. The functionality of the sample will stay the same, just the hashes will be different. If you’re sharing the lighter sample with others, be kind and consider sharing the original hashes of the heavier sample as well.
Thanks for reading!