About the Project
Today, we’re going to analyze a malicious binary recently identified by Arch Cloud Labs malware collection system “Archie”. This binary leverages the LoadLibraryA function to resolve DLLs at run time for additional functionality. Malware samples typically do this to ensure there’s limited information in the import table in an attempt to avoid triggering static rule detection, or evade EDR products. This particular sample struct me as interesting because of the stack string obfuscation method used which Ghidra did not disassemble correctly. A quick look of the disassembly differences between radare2 and it “breaking” (not being recognized by Ghidra’s AutoAnalysis) to grab your interest can be seen in the image below.
This is not a complete analysis of the binary. Rather, this is an isolated look at how the malware author implemented calling LoadLibraryA and the importance of understanding assembly when tools break such as shown in the image above.
If you’re following along at home, this binary can be downloaded via Malshare here.
Lets get started!
Triaging a Binary with Radare2
When looking at a daily dump of malware samples from malshare.com, I typically start by triaging binaries with just radare2. I do this because it’s easy to quickly look at functions, dump strings, and disassemble interesting sections before loading everything into a Ghidra or IDA. Also, since radare2 is a command line utility, it lets me sort through samples quickly, in order to find an interesting sample to spend time. After all, this is a hobbyist website performing malware analysis as a hobby. How one spends their ever shrinking free time is just as critical as what you spend it on, more on that in this previous blog post. With that out of the way, let’s take a look at opening and analyzing the binary via: r2 malware.exe
.
Next, we’ll perform the analyze commands to identify functions, xrefs, symbols, etc….
Now that analysis has finished, let’s perform some initial triage such as inspecting file sections.
Typically with commodity Linux malware, here you’d see an indicator UPX was used if a section was named “UPX0
”. While we don’t see any indication of packing, we do see a relatively small .text
section, a large resource section. In normal situations, this could just be a small hello-world program with a few PNGs embedded within it that resulted in the section size disparity. However, sneaking a peek a VirusTotal has indicated that this file is indeed malicious with 57 out of 75 vendors deeming it malicious. Let’s stop looking at VirusTotal for now and see if we can do some further analysis to identify core functionality.
Strings & Things
Radare2 can show strings via the iz
command. Piping this output to more
allows you to scroll through
large amounts of data like you would with any other file on Linux. The first string we see in the file is a reference to deflate. A quick google search of this copyright string below takes us to the zlib source code. This string artifact informs us that likely, there’s data to be decompressed within the binary. Data that has been compressed has a higher rate of entropy within a binary. The radare2 option of p==
will print entropy for the entire binary in a graph form. The output of this can be seen in the second image below. There’s clearly some compressed data in this binary, let’s keep that in mind as we do our analysis.
[Strings]
nth paddr vaddr len size section type string
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0 0x000090b8 0x004090b8 52 53 .rdata ascii deflate 1.2.2 Copyright 1995-2004 Jean-loup Gailly
1 0x00009168 0x00409168 5 6 .rdata ascii 1.2.2
[0x0040e000]> p==
█
█
█ █
█ █
█ █
█ █ █
█ █ █ █
█ ██ █ █ █ █
█ ██ █ █ █ █
██ ██ █ █ █ █
██ ██ █ █ █ █
██ ██ █ █ █ █
██████████████████████████████████████████████████████████████████████████████
Looking At Symbols
Symbols tell us what functions are being imported by the binary. These functions are key indicators to the underlying capabilities this given malware sample can do. Listing symbols within the binary via [0x00408616]> is
shows a handful of interesting functions being imported from Kernel32. These functions can be seen in the radare2 output below:
[0x00408616]> is
[Symbols]
nth paddr vaddr bind type size lib name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
1 0x00009000 0x00409000 NONE FUNC 0 KERNEL32.dll imp.CreateDirectoryA
2 0x00009004 0x00409004 NONE FUNC 0 KERNEL32.dll imp.CloseHandle
3 0x00009008 0x00409008 NONE FUNC 0 KERNEL32.dll imp.WriteFile
4 0x0000900c 0x0040900c NONE FUNC 0 KERNEL32.dll imp.CreateFileA
5 0x00009010 0x00409010 NONE FUNC 0 KERNEL32.dll imp.GetTempPathA
6 0x00009014 0x00409014 NONE FUNC 0 KERNEL32.dll imp.GetModuleFileNameA
7 0x00009018 0x00409018 NONE FUNC 0 KERNEL32.dll imp.ReadFile
8 0x0000901c 0x0040901c NONE FUNC 0 KERNEL32.dll imp.GetFileSize
9 0x00009020 0x00409020 NONE FUNC 0 KERNEL32.dll imp.GetProcAddress
10 0x00009024 0x00409024 NONE FUNC 0 KERNEL32.dll imp.LoadLibraryA
11 0x00009028 0x00409028 NONE FUNC 0 KERNEL32.dll imp.GetModuleHandleA
12 0x0000902c 0x0040902c NONE FUNC 0 KERNEL32.dll imp.GetStartupInfoA
See anything interesting? What hypothesis can we start forming with the knowledge the imports that DO exist within this binary? Perhaps a temporary directory is created, maybe contents are written to a file and then we load data from said file? Hmm, that resource section was large. Maybe something is getting dumped to a file from there? Who knows! Let’s inspect further.
XRefs w/ Radare2
Now that we have functions we’re interested in. Let’s analyze where they get called within the binary and see if anything in the surrounding code blocks reveal further information about this malware’s functionality.
First, let’s jump to Kernel32’s WriteFile
to see what is getting written to disk.
BOOL WriteFile(
[in] HANDLE hFile,
[in] LPCVOID lpBuffer,
[in] DWORD nNumberOfBytesToWrite,
[out, optional] LPDWORD lpNumberOfBytesWritten,
[in, out, optional] LPOVERLAPPED lpOverlapped
);
// https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile
The MSDN documentation show the section argument is a buffer of data to write to disk. This enables us to work backwards to said data and identify what the content will be. The overall analysis flow here is:
- Find interesting function.
- Find XREF to the function in the code.
- Understand parameters to the function, and how parameters are passed to the function (x86 vs x64 calling conventions).
The radare2 output below “seeks” (s
) to the WriteFile offset (note you can tab auto complete ex: sym.imp.<tab>
). Next, we print function calls TO this symbol via the axt
command. Here we see there are four different places WriteFile takes place in this application.
[0x00409008]> s sym.imp.KERNEL32.dll_WriteFile
[0x00409008]> axt
(nofunc) 0x402467 [CALL] call dword [sym.imp.KERNEL32.dll_WriteFile]
(nofunc) 0x4024ce [CALL] call dword [sym.imp.KERNEL32.dll_WriteFile]
(nofunc) 0x402a79 [CALL] call dword [sym.imp.KERNEL32.dll_WriteFile]
(nofunc) 0x402ac9 [CALL] call dword [sym.imp.KERNEL32.dll_WriteFile]
//radare2 tip
// If you're ever curious about a given radare2 function or command line flag,
// you can always use ? after a command to get more information.
Now, let’s seek to these addressed and then switch to a visual display mode to see the disassembly.
[0x00409008]> s 0x402467
[0x00409008]> v!
Neat-o burrito, we’re smack dab in the middle of some disassembly making WriteFile calls. Radare2 is even nice enough to annotate to the write of these files function arguments being passed from the disassembly. As a quick refresher on x86-calling conventions for fastcall, arguments are pushed right to left. So the first argument is pushed last. Radare2 will annotate some of these arguments for us.
The argument containing the data to be written is an offset of register ebp
. Jumping there without dynamically running the program is showing us no useful data as this is referencing a local variable on the stack. Thus, to identify what’s getting written to disk we’ll have to run this malware sample.
However, we’re still in the static analysis portion for this sample so lets explore a bit more before we execute this binary in an isolated VM environment.
XRefs to LoadLibraryA
As previously mentioned, the LoadLibrary methods enable the ability to load a DLL at run time. With the data we’ve seen so far, we understand the following:
- The binary calls WriteFile four times.
- There’s string references to a compression library.
- There’s high entropy portions of the binary indicating compression.
- There’s a very large .rscr section.
At this point it’s possible to start putting together some likely hypothesis for what’s going on. However, the proof is in the pudding or in this case the output of actually running the binary. For now, let’s continue our xref analysis with LoadLibraryA.
[0x0040344d]> s sym.imp.KERNEL32.dll_LoadLibraryA
[0x00409024]> axt
fcn.0040343b 0x40346e [DATA] mov ebx, dword [sym.imp.KERNEL32.dll_LoadLibraryA]
Okay, one function call to load library in a function defined at address 0x0040343b
. We see that the symbol is moved into register ebx
. This is important to note. Let’s seek to this call and analyze the disassembly.
[0x00409024]> s fcn.0040343b
[0x0040343b]> pdf
; CALL XREF from main @ +0x406
┌ 534: fcn.0040343b ();
│ ; var int32_t var_144h @ esp+0x90
│ ; var int32_t var_140h @ esp+0x94
│ ; var int32_t var_138h @ esp+0x9c
│ ; var int32_t var_12ch @ esp+0xa8
│ ; var int32_t var_fch @ esp+0xd8
│ ; var int32_t var_f4h @ esp+0xe0
│ ; var int32_t var_ech @ esp+0xe8
│ ; var int32_t var_e4h @ esp+0xf0
│ ; var int32_t var_d8h @ esp+0xfc
│ ; var int32_t var_cch @ esp+0x108
│ 0x0040343b 81ec30010000 sub esp, 0x130
│ 0x00403441 53 push ebx
│ 0x00403442 55 push ebp
│ 0x00403443 56 push esi
│ 0x00403444 57 push edi
│ 0x00403445 6a00 push 0
......................................abbreviated output ...............................................................
Now here’s where things get interesting!
In the disassembly below you’ll see numerous ASCII characters getting pushed onto the stack.
Looking at address 0x004034a1
upward you’ll see this kind of looks like ShellExecuteA
, but some characters are missing.
If we look closer at 0x00403486
the value at the top of the stack is popped off (pop ebp
) into the ebp
register. This value is hex 65 which is ASCII “e”.
Anywhere we see PUSH ebp
we’re actually pushing the hex value of “e” onto the stack to build a “stack string”.
0x0040346e 8b1d24904000 mov ebx, dword [sym.imp.KERNEL32.dll_LoadLibraryA] ; [0x409024:4]=0xc7b2 reloc.KERNEL32.dll_LoadLibraryA
0x00403474 83c434 add esp, 0x34
0x00403477 8d442414 lea eax, dword [var_12ch]
0x0040347b 50 push eax
0x0040347c ffd3 call ebx
0x0040347e 6a00 push 0
0x00403480 6a41 push 0x41 ; 'A' ; 65
0x00403482 6a65 push 0x65 ; 'e' ; 101
0x00403484 8bf8 mov edi, eax
0x00403486 5d pop ebp
0x00403487 8d8424800000. lea eax, dword [var_cch]
0x0040348e 55 push ebp
0x0040348f 6a74 push 0x74 ; 't' ; 116
0x00403491 6a75 push 0x75 ; 'u' ; 117
0x00403493 6a63 push 0x63 ; 'c' ; 99
0x00403495 55 push ebp
0x00403496 6a78 push 0x78 ; 'x' ; 120
0x00403498 6a45 push 0x45 ; 'E' ; 69
0x0040349a 6a6c push 0x6c ; 'l' ; 108
0x0040349c 6a6c push 0x6c ; 'l' ; 108
0x0040349e 55 push ebp
0x0040349f 6a68 push 0x68 ; 'h' ; 104
0x004034a1 6a53 push 0x53 ; 'S' ; 83 ; int32_t arg_8h
0x004034a3 50 push eax ; int32_t arg_4h
0x004034a4 e8ecdfffff call fcn.00401495
The output below is the same as above only annotated for ease of reading.
0x0040346e 8b1d24904000 mov ebx, dword [sym.imp.KERNEL32.dll_LoadLibraryA] ; [0x409024:4]=0xc7b2 reloc.KERNEL32.dll_LoadLibraryA
0x00403474 83c434 add esp, 0x34
0x00403477 8d442414 lea eax, dword [var_12ch]
0x0040347b 50 push eax
0x0040347c ffd3 call ebx
0x0040347e 6a00 push 0
0x00403480 6a41 push 0x41 ; 'A' ; 65
0x00403482 6a65 push 0x65 ; 'e' ; 101
0x00403484 8bf8 mov edi, eax
0x00403486 5d pop ebp ; // put 'e' into EBP
0x00403487 8d8424800000. lea eax, dword [var_cch]
0x0040348e 55 push ebp ; e
0x0040348f 6a74 push 0x74 ; 't' ; 116
0x00403491 6a75 push 0x75 ; 'u' ; 117
0x00403493 6a63 push 0x63 ; 'c' ; 99
0x00403495 55 push ebp ; e
0x00403496 6a78 push 0x78 ; 'x' ; 120
0x00403498 6a45 push 0x45 ; 'E' ; 69
0x0040349a 6a6c push 0x6c ; 'l' ; 108
0x0040349c 6a6c push 0x6c ; 'l' ; 108
0x0040349e 55 push ebp ; e
0x0040349f 6a68 push 0x68 ; 'h' ; 104
0x004034a1 6a53 push 0x53 ; 'S' ; 83 ; int32_t arg_8h
Huzzah! We have now discovered how that the malware author has implemented a neat trick with assembly to be able to build the string LoadLibraryA takes as an argument to load into the malicious process. What about that mov
instruction that placed LoadLibraryA into ebx
? If we continue looking through this code block we’ll see this stack string trick implemented a few times, before a call to another function at address 0x0401945
. The image below shows USER32.DLL
being loaded via the call to ebx
, but without any stack string obfuscation.
Looking at the disassembly in Ghidra, we see that Ghidra fails to recognize where this function beings.
When defining the function, Ghidra’s decompilation also fails short to recognize the values being passed from the stack strings. To fix this, we’d have to go through and modify the data types Ghidra is auto recgonizing.
I think this highlights an important piece of reverse engineering. No tool is perfect, and knowing the left and right bounds of each and how they can fail will enable you to be able to troubleshoot effectively. You simply can not always rely on the decompilation to be 100% accurate. Now that we feel warm and fuzzy about some interesting assembly tricks being played here, what is this call to function 00401945
? Let’s explore.
Analyzing Unknown Function
Aside from the LoadLibraryA call, the disassembly has numerous calls to a function ad address 0x401495
Let’s seek to this unknown function, and see what the disassembly looks like.
[0x00401495]> s fcn.00401495
[0x00401495]> pdf
; XREFS(21)
┌ 23: fcn.00401495 (int32_t arg_4h, int32_t arg_8h);
│ ; arg int32_t arg_4h @ esp+0x4
│ ; arg int32_t arg_8h @ esp+0x8
│ 0x00401495 8b4c2404 mov ecx, dword [arg_4h];// counter variable
│ 0x00401499 8d542408 lea edx, dword [arg_8h];/
│ ; CODE XREF from fcn.00401495 @ 0x4014a9
│ ┌─> 0x0040149d 8a02 mov al, byte [edx]
│ ╎ 0x0040149f 84c0 test al, al
│ ╎ 0x004014a1 8801 mov byte [ecx], al
│ ┌──< 0x004014a3 7406 je 0x4014ab
│ │╎ 0x004014a5 41 inc ecx
│ │╎ 0x004014a6 83c204 add edx, 4
│ │└─< 0x004014a9 ebf2 jmp 0x40149d
│ │ ; CODE XREF from fcn.00401495 @ 0x4014a3
└ └──> 0x004014ab c3 ret
radare2 can produce pseudocode of the assembly to help us with our analysis. Let’s produce this with pdc
to help with annotating our analysis
[0x00401495]> pdc
function fcn.00401495 () {
// 4 basic blocks
loc_0x401495:
//XREFS(21)
ecx = dword [arg_4h]
edx = dword [arg_8h]
do
{
loc_0x40149d:
//CODE XREF from fcn.00401495 @ 0x4014a9
al = byte [edx] // moving lower 8 bits of 32bit address into al register
var = al & al // this will set eax to 0
byte [ecx] = al // overwrite value in ecx
if (!var) goto 0x4014ab //unlikely
} while (?);
return;
loc_0x4014a5:
ecx++
edx += 4
goto 0x40149d
}
Alright, this functionality is kind of helpful, but not anything super useful.
It appears that some byte swapping is happening from the values on the stack.
If a test
call fails, the byte value is increased.
We’ll jump ahead here and step through this function in x86dbg to get a better idea of what’s going on.
The first image below shows the output after stepping through the unknown function numerous times that the string value of Shell32.dll is pushed onto the stack.
The second image shows LoadLibraryA being moved into ebx, the stack “growing” by adding a value of 32 to it, then moving “Shell32.DLL” into eax from an offset which is referencing the unknown functions result. Finally, eax is pushed onto the stack which is the value of Shell32.DLL before ebx (LoadLibraryA) being called.
At this point we’ve succesfully reverse engineered the LoadLibraryA functionality within this binary and its associated obfuscation mechanisms. Now, lets continue with our dynamic analysis to identify files being written to disk.
Dynamic Analysis - Working Through Decompression
Copying the sample of interest as “malware.exe” to a Windows-10 VM, we see that the resource section did indeed contain a PNG for the executable. The program launched can be seen in the second image below.
Launching malware.exe with x86dbg enables an analyst to specify where to place breakpoints.
Given our interest in LoadLibraryA
we’ll place a breakpoint on this function.
After continuing the binaries execution, we can see that the stack string trick does indeed resolved to specific DLLs as previously discussed.
The two images below show a couple of these DLLs being loaded.
Spawning Sysinternals’ ProcessExplorer and ProcessMonitor reveal two binaries being written to disk and spawned by this process. Thus our original hypothesis of additional files being written to disk appears to be proven true.
Placing a breakpoint on the WriteFile
function call and referencing the MSDN documentation for function arguments, we can identify the contents being written to disk
as shown in the image below.
Now the question is, where did these files come from?
Let’s search around some compression references within the binary.
The commands below take us to a string of “unknown_compression_method
”, and identify references to these strings.
[0x0040d1bc]> iz | grep 'compress'
17 0x0000d1bc 0x0040d1bc 26 27 .data ascii unknown compression method
[0x0040d1bc]> s str.unknown_compression_method
[0x0040d1bc]> axt
(nofunc) 0x404a17 [DATA] mov dword [eax + 0x18], str.unknown_compression_method
(nofunc) 0x404aee [DATA] mov dword [ecx + 0x18], str.unknown_compression_method
Lets seek to these offsets and explore further.
[0x0040d1bc] s 0x404a17
[0x0040d1bc] v!
Immediately, more string artifacts indicating some type of compression routine is being used can be identified within the function at
offset 0x004048b0
.
Setting a breakpoint at function 0x00408b0
, and running until we hit the function’s ret
instruction, we see that a MZ
header now miraculously appears! A few quick google searchers with the compression string artifacts reveal that the compression library is zlib.
Embedded Files
This blog is getting pretty long so lets execute the embedded files and cross-referencing the outbound connections to VirusTotal to identify any leads of malicious activity. Executing these embedded files lead to the discovery of domain “dns3-domain[.]com”.
This domain is associated with numerous malicious files that indicate this malware sample is a trojan and the malicious payload embeds itself in other applications. Based on what we’ve seen here, this makes sense. According to this Microsoft report It’s underlying functionality allows the binary to execute remote commands. Neat! The Virustotal relations table above shows numerous communicating files over the past couple years. Notably, most of the subdomains have zero detections for malicious activity. While the embedded payloads weren’t the most interesting thing in the world, the reverse engineering journey was worth the ride. Consider the skills you learn along the way and the journey is part that builds your skill set.
To wrap up this blog, I’ll leave the hashes in the IOC section should you dear reader choose to go and recreate and improve upon the analysis in this blog. If you made it this far, thank you for reading!
IOCs
585197476537724e04a6ba334f68458e malware.exe (original file)
dc24611ea25fda4c3f1e6b1cfde2f319 <unicode name>.exe (upx unpacked)
d2f69c1ade1686668ec85ecb19c7384a smss.exe
dns3[-]domain[.]com (https://www.virustotal.com/gui/domain/dns3-domain.com/relations)
Beyond The Blog
Arch Cloud Labs is currently revamping their malware analysis collection system. This system ingests malicious samples from malware providers such as VX-Underground, Malshare, and hybrid-analysis. These projects enables Arch Cloud Labs to both build a system to collect data as well as analyze emerging and trending threats. In the age of employment uncertainty, I use these projects to keep my skills sharp across a variety of areas and hope these silly blog inspire you to do the same. Thank you for reading, please consider sharing if you found this write up useful.