Analysis of a LoadLibraryA Stack String Obfuscation Technique with Radare2 & x86dbg
2022-11-14 00:33:36 Author: www.archcloudlabs.com(查看原文) 阅读量:108 收藏

About the Project

Today, we’re going to analyze a malicious binary recently identified by Arch Cloud Labs malware collection system “Archie”. This binary leverages the LoadLibraryA function to resolve DLLs at run time for additional functionality. Malware samples typically do this to ensure there’s limited information in the import table in an attempt to avoid triggering static rule detection, or evade EDR products. This particular sample struct me as interesting because of the stack string obfuscation method used which Ghidra did not disassemble correctly. A quick look of the disassembly differences between radare2 and it “breaking” (not being recognized by Ghidra’s AutoAnalysis) to grab your interest can be seen in the image below.

ghidravsr2.png

This is not a complete analysis of the binary. Rather, this is an isolated look at how the malware author implemented calling LoadLibraryA and the importance of understanding assembly when tools break such as shown in the image above.

If you’re following along at home, this binary can be downloaded via Malshare here.

Lets get started!

Triaging a Binary with Radare2

When looking at a daily dump of malware samples from malshare.com, I typically start by triaging binaries with just radare2. I do this because it’s easy to quickly look at functions, dump strings, and disassemble interesting sections before loading everything into a Ghidra or IDA. Also, since radare2 is a command line utility, it lets me sort through samples quickly, in order to find an interesting sample to spend time. After all, this is a hobbyist website performing malware analysis as a hobby. How one spends their ever shrinking free time is just as critical as what you spend it on, more on that in this previous blog post. With that out of the way, let’s take a look at opening and analyzing the binary via: r2 malware.exe.

Next, we’ll perform the analyze commands to identify functions, xrefs, symbols, etc….

Now that analysis has finished, let’s perform some initial triage such as inspecting file sections.

r2-section-size.png

Typically with commodity Linux malware, here you’d see an indicator UPX was used if a section was named “UPX0”. While we don’t see any indication of packing, we do see a relatively small .text section, a large resource section. In normal situations, this could just be a small hello-world program with a few PNGs embedded within it that resulted in the section size disparity. However, sneaking a peek a VirusTotal has indicated that this file is indeed malicious with 57 out of 75 vendors deeming it malicious. Let’s stop looking at VirusTotal for now and see if we can do some further analysis to identify core functionality.

virustotal.png

Strings & Things

Radare2 can show strings via the iz command. Piping this output to more allows you to scroll through large amounts of data like you would with any other file on Linux. The first string we see in the file is a reference to deflate. A quick google search of this copyright string below takes us to the zlib source code. This string artifact informs us that likely, there’s data to be decompressed within the binary. Data that has been compressed has a higher rate of entropy within a binary. The radare2 option of p== will print entropy for the entire binary in a graph form. The output of this can be seen in the second image below. There’s clearly some compressed data in this binary, let’s keep that in mind as we do our analysis.

[Strings]
nth  paddr      vaddr      len size section type    string
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0    0x000090b8 0x004090b8 52  53   .rdata  ascii    deflate 1.2.2 Copyright 1995-2004 Jean-loup Gailly 
1    0x00009168 0x00409168 5   6    .rdata  ascii   1.2.2
[0x0040e000]> p==
                                                                              
                                                       █          █           
                                                       █          █           
                                                       █          █           
                                 █                     █          █           
                                 █                     █     █    █           
            █                    ██                    █     █    █          █
            █                    ██                    █     █    █          █
           ██                    ██                    █     █    █          █
           ██                    ██                    █     █    █          █
           ██                    ██                    █     █    █          █
██████████████████████████████████████████████████████████████████████████████

Looking At Symbols

Symbols tell us what functions are being imported by the binary. These functions are key indicators to the underlying capabilities this given malware sample can do. Listing symbols within the binary via [0x00408616]> is shows a handful of interesting functions being imported from Kernel32. These functions can be seen in the radare2 output below:

[0x00408616]> is
[Symbols]

nth paddr       vaddr      bind type size lib          name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
1    0x00009000 0x00409000 NONE FUNC 0    KERNEL32.dll imp.CreateDirectoryA
2    0x00009004 0x00409004 NONE FUNC 0    KERNEL32.dll imp.CloseHandle
3    0x00009008 0x00409008 NONE FUNC 0    KERNEL32.dll imp.WriteFile
4    0x0000900c 0x0040900c NONE FUNC 0    KERNEL32.dll imp.CreateFileA
5    0x00009010 0x00409010 NONE FUNC 0    KERNEL32.dll imp.GetTempPathA
6    0x00009014 0x00409014 NONE FUNC 0    KERNEL32.dll imp.GetModuleFileNameA
7    0x00009018 0x00409018 NONE FUNC 0    KERNEL32.dll imp.ReadFile
8    0x0000901c 0x0040901c NONE FUNC 0    KERNEL32.dll imp.GetFileSize
9    0x00009020 0x00409020 NONE FUNC 0    KERNEL32.dll imp.GetProcAddress
10   0x00009024 0x00409024 NONE FUNC 0    KERNEL32.dll imp.LoadLibraryA
11   0x00009028 0x00409028 NONE FUNC 0    KERNEL32.dll imp.GetModuleHandleA
12   0x0000902c 0x0040902c NONE FUNC 0    KERNEL32.dll imp.GetStartupInfoA

See anything interesting? What hypothesis can we start forming with the knowledge the imports that DO exist within this binary? Perhaps a temporary directory is created, maybe contents are written to a file and then we load data from said file? Hmm, that resource section was large. Maybe something is getting dumped to a file from there? Who knows! Let’s inspect further.

XRefs w/ Radare2

Now that we have functions we’re interested in. Let’s analyze where they get called within the binary and see if anything in the surrounding code blocks reveal further information about this malware’s functionality.

First, let’s jump to Kernel32’s WriteFile to see what is getting written to disk.

BOOL WriteFile(
  [in]                HANDLE       hFile,
  [in]                LPCVOID      lpBuffer,
  [in]                DWORD        nNumberOfBytesToWrite,
  [out, optional]     LPDWORD      lpNumberOfBytesWritten,
  [in, out, optional] LPOVERLAPPED lpOverlapped
);
// https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile

The MSDN documentation show the section argument is a buffer of data to write to disk. This enables us to work backwards to said data and identify what the content will be. The overall analysis flow here is:

  1. Find interesting function.
  2. Find XREF to the function in the code.
  3. Understand parameters to the function, and how parameters are passed to the function (x86 vs x64 calling conventions).

The radare2 output below “seeks” (s) to the WriteFile offset (note you can tab auto complete ex: sym.imp.<tab>). Next, we print function calls TO this symbol via the axt command. Here we see there are four different places WriteFile takes place in this application.

[0x00409008]> s sym.imp.KERNEL32.dll_WriteFile 
[0x00409008]> axt
(nofunc) 0x402467 [CALL] call dword [sym.imp.KERNEL32.dll_WriteFile]
(nofunc) 0x4024ce [CALL] call dword [sym.imp.KERNEL32.dll_WriteFile]
(nofunc) 0x402a79 [CALL] call dword [sym.imp.KERNEL32.dll_WriteFile]
(nofunc) 0x402ac9 [CALL] call dword [sym.imp.KERNEL32.dll_WriteFile]

//radare2 tip
// If you're ever curious about a given radare2 function or command line flag, 
// you can always use ? after a command to get more information.

Now, let’s seek to these addressed and then switch to a visual display mode to see the disassembly.

[0x00409008]> s  0x402467 
[0x00409008]> v!

Neat-o burrito, we’re smack dab in the middle of some disassembly making WriteFile calls. Radare2 is even nice enough to annotate to the write of these files function arguments being passed from the disassembly. As a quick refresher on x86-calling conventions for fastcall, arguments are pushed right to left. So the first argument is pushed last. Radare2 will annotate some of these arguments for us. writefiles.png

The argument containing the data to be written is an offset of register ebp. Jumping there without dynamically running the program is showing us no useful data as this is referencing a local variable on the stack. Thus, to identify what’s getting written to disk we’ll have to run this malware sample. However, we’re still in the static analysis portion for this sample so lets explore a bit more before we execute this binary in an isolated VM environment.

XRefs to LoadLibraryA

As previously mentioned, the LoadLibrary methods enable the ability to load a DLL at run time. With the data we’ve seen so far, we understand the following:

  1. The binary calls WriteFile four times.
  2. There’s string references to a compression library.
  3. There’s high entropy portions of the binary indicating compression.
  4. There’s a very large .rscr section.

At this point it’s possible to start putting together some likely hypothesis for what’s going on. However, the proof is in the pudding or in this case the output of actually running the binary. For now, let’s continue our xref analysis with LoadLibraryA.

[0x0040344d]> s sym.imp.KERNEL32.dll_LoadLibraryA 
[0x00409024]> axt
fcn.0040343b 0x40346e [DATA] mov ebx, dword [sym.imp.KERNEL32.dll_LoadLibraryA]

Okay, one function call to load library in a function defined at address 0x0040343b. We see that the symbol is moved into register ebx. This is important to note. Let’s seek to this call and analyze the disassembly.

[0x00409024]> s fcn.0040343b                                                                   
[0x0040343b]> pdf                                                                              
            ; CALL XREF from main @ +0x406                                                     
┌ 534: fcn.0040343b ();                                                                        
│           ; var int32_t var_144h @ esp+0x90                                                  
│           ; var int32_t var_140h @ esp+0x94                                                  
│           ; var int32_t var_138h @ esp+0x9c                                                                                                                                                 
│           ; var int32_t var_12ch @ esp+0xa8                                                  
│           ; var int32_t var_fch @ esp+0xd8                                                   
│           ; var int32_t var_f4h @ esp+0xe0                                                   
│           ; var int32_t var_ech @ esp+0xe8                                                                                                                                                  
│           ; var int32_t var_e4h @ esp+0xf0                                                   
│           ; var int32_t var_d8h @ esp+0xfc                                                   
│           ; var int32_t var_cch @ esp+0x108                                                  
│           0x0040343b      81ec30010000   sub esp, 0x130                                      
│           0x00403441      53             push ebx                                                                                                                                           
│           0x00403442      55             push ebp                                                                                                                                           
│           0x00403443      56             push esi                                            
│           0x00403444      57             push edi                                                                                                                                           
│           0x00403445      6a00           push 0                                              

......................................abbreviated output ...............................................................

Now here’s where things get interesting! In the disassembly below you’ll see numerous ASCII characters getting pushed onto the stack. Looking at address 0x004034a1 upward you’ll see this kind of looks like ShellExecuteA, but some characters are missing. If we look closer at 0x00403486 the value at the top of the stack is popped off (pop ebp) into the ebp register. This value is hex 65 which is ASCII “e”. Anywhere we see PUSH ebpwe’re actually pushing the hex value of “e” onto the stack to build a “stack string”.

  0x0040346e      8b1d24904000   mov ebx, dword [sym.imp.KERNEL32.dll_LoadLibraryA] ; [0x409024:4]=0xc7b2 reloc.KERNEL32.dll_LoadLibraryA                                           
  0x00403474      83c434         add esp, 0x34                                                                                                                                      
  0x00403477      8d442414       lea eax, dword [var_12ch]                                                                                                                          
  0x0040347b      50             push eax                                                                                                                                           
  0x0040347c      ffd3           call ebx                                                                                                                                           
  0x0040347e      6a00           push 0                                                                                                                                             
  0x00403480      6a41           push 0x41                   ; 'A' ; 65                                                                                                             
  0x00403482      6a65           push 0x65                   ; 'e' ; 101                                                                                                            
  0x00403484      8bf8           mov edi, eax                                                                                                                                       
  0x00403486      5d             pop ebp                                                                                                                                            
  0x00403487      8d8424800000.  lea eax, dword [var_cch]                                                                                                                           
  0x0040348e      55             push ebp                                                                                                                                           
  0x0040348f      6a74           push 0x74                   ; 't' ; 116                                                                                                            
  0x00403491      6a75           push 0x75                   ; 'u' ; 117                                                                                                            
  0x00403493      6a63           push 0x63                   ; 'c' ; 99                                                                                                             
  0x00403495      55             push ebp                                                                                                                                           
  0x00403496      6a78           push 0x78                   ; 'x' ; 120                                                                                                            
  0x00403498      6a45           push 0x45                   ; 'E' ; 69                                                                                                             
  0x0040349a      6a6c           push 0x6c                   ; 'l' ; 108                                                                                                            
  0x0040349c      6a6c           push 0x6c                   ; 'l' ; 108                                                                                                            
  0x0040349e      55             push ebp                                                                                                                                           
  0x0040349f      6a68           push 0x68                   ; 'h' ; 104                                                                                                            
  0x004034a1      6a53           push 0x53                   ; 'S' ; 83 ; int32_t arg_8h                                                                                            
  0x004034a3      50             push eax                    ; int32_t arg_4h                                                                                                       
  0x004034a4      e8ecdfffff     call fcn.00401495                                                                                                                                  

The output below is the same as above only annotated for ease of reading.

      0x0040346e      8b1d24904000   mov ebx, dword [sym.imp.KERNEL32.dll_LoadLibraryA] ; [0x409024:4]=0xc7b2 reloc.KERNEL32.dll_LoadLibraryA
      0x00403474      83c434         add esp, 0x34
      0x00403477      8d442414       lea eax, dword [var_12ch]
      0x0040347b      50             push eax
      0x0040347c      ffd3           call ebx
      0x0040347e      6a00           push 0
      0x00403480      6a41           push 0x41                 ; 'A' ; 65
      0x00403482      6a65           push 0x65                 ; 'e' ; 101
      0x00403484      8bf8           mov edi, eax
      0x00403486      5d             pop ebp                   ; // put 'e' into EBP
      0x00403487      8d8424800000.  lea eax, dword [var_cch]
      0x0040348e      55             push ebp                  ; e
      0x0040348f      6a74           push 0x74                 ; 't' ; 116
      0x00403491      6a75           push 0x75                 ; 'u' ; 117
      0x00403493      6a63           push 0x63                 ; 'c' ; 99
      0x00403495      55             push ebp                  ; e
      0x00403496      6a78           push 0x78                 ; 'x' ; 120
      0x00403498      6a45           push 0x45                 ; 'E' ; 69
      0x0040349a      6a6c           push 0x6c                 ; 'l' ; 108
      0x0040349c      6a6c           push 0x6c                 ; 'l' ; 108
      0x0040349e      55             push ebp                  ; e
      0x0040349f      6a68           push 0x68                 ; 'h' ; 104
      0x004034a1      6a53           push 0x53                 ; 'S' ; 83 ; int32_t arg_8h

Huzzah! We have now discovered how that the malware author has implemented a neat trick with assembly to be able to build the string LoadLibraryA takes as an argument to load into the malicious process. What about that movinstruction that placed LoadLibraryA into ebx? If we continue looking through this code block we’ll see this stack string trick implemented a few times, before a call to another function at address 0x0401945. The image below shows USER32.DLL being loaded via the call to ebx, but without any stack string obfuscation.

callebx.png

Looking at the disassembly in Ghidra, we see that Ghidra fails to recognize where this function beings.

ghidra_1.png When defining the function, Ghidra’s decompilation also fails short to recognize the values being passed from the stack strings. To fix this, we’d have to go through and modify the data types Ghidra is auto recgonizing.

ghidra_2.png

I think this highlights an important piece of reverse engineering. No tool is perfect, and knowing the left and right bounds of each and how they can fail will enable you to be able to troubleshoot effectively. You simply can not always rely on the decompilation to be 100% accurate. Now that we feel warm and fuzzy about some interesting assembly tricks being played here, what is this call to function 00401945? Let’s explore.

Analyzing Unknown Function

Aside from the LoadLibraryA call, the disassembly has numerous calls to a function ad address 0x401495 Let’s seek to this unknown function, and see what the disassembly looks like.

[0x00401495]> s fcn.00401495
[0x00401495]> pdf
            ; XREFS(21)
┌ 23: fcn.00401495 (int32_t arg_4h, int32_t arg_8h);
│           ; arg int32_t arg_4h @ esp+0x4
│           ; arg int32_t arg_8h @ esp+0x8
│           0x00401495      8b4c2404       mov ecx, dword [arg_4h];// counter variable
│           0x00401499      8d542408       lea edx, dword [arg_8h];/
│           ; CODE XREF from fcn.00401495 @ 0x4014a9
│       ┌─> 0x0040149d      8a02           mov al, byte [edx]
│       ╎   0x0040149f      84c0           test al, al
│       ╎   0x004014a1      8801           mov byte [ecx], al
│      ┌──< 0x004014a3      7406           je 0x4014ab
│      │╎   0x004014a5      41             inc ecx
│      │╎   0x004014a6      83c204         add edx, 4
│      │└─< 0x004014a9      ebf2           jmp 0x40149d
│      │    ; CODE XREF from fcn.00401495 @ 0x4014a3
└      └──> 0x004014ab      c3             ret

radare2 can produce pseudocode of the assembly to help us with our analysis. Let’s produce this with pdcto help with annotating our analysis

[0x00401495]> pdc
function fcn.00401495 () {
    //  4 basic blocks

    loc_0x401495:

         //XREFS(21)
       ecx = dword [arg_4h] 
       edx = dword [arg_8h]
   do
   {
        loc_0x40149d:

           //CODE XREF from fcn.00401495 @ 0x4014a9
           al = byte [edx] // moving lower 8 bits of 32bit address into al register
           var = al & al //  this will set eax to 0
           byte [ecx] = al // overwrite value in ecx
           if (!var) goto 0x4014ab  //unlikely
       } while (?);
  return;

    loc_0x4014a5:

       ecx++
       edx += 4
       goto 0x40149d
}

Alright, this functionality is kind of helpful, but not anything super useful. It appears that some byte swapping is happening from the values on the stack. If a test call fails, the byte value is increased. We’ll jump ahead here and step through this function in x86dbg to get a better idea of what’s going on.

The first image below shows the output after stepping through the unknown function numerous times that the string value of Shell32.dll is pushed onto the stack.

The second image shows LoadLibraryA being moved into ebx, the stack “growing” by adding a value of 32 to it, then moving “Shell32.DLL” into eax from an offset which is referencing the unknown functions result. Finally, eax is pushed onto the stack which is the value of Shell32.DLL before ebx (LoadLibraryA) being called.

loadlibstrings.png loadlibsshell32.png

At this point we’ve succesfully reverse engineered the LoadLibraryA functionality within this binary and its associated obfuscation mechanisms. Now, lets continue with our dynamic analysis to identify files being written to disk.

Dynamic Analysis - Working Through Decompression

Copying the sample of interest as “malware.exe” to a Windows-10 VM, we see that the resource section did indeed contain a PNG for the executable. The program launched can be seen in the second image below. icon.png

malware_exe_qq_number_tool.png

Launching malware.exe with x86dbg enables an analyst to specify where to place breakpoints. Given our interest in LoadLibraryA we’ll place a breakpoint on this function. After continuing the binaries execution, we can see that the stack string trick does indeed resolved to specific DLLs as previously discussed. The two images below show a couple of these DLLs being loaded.

breakpoint_1.png breakpoint_2.png

Spawning Sysinternals’ ProcessExplorer and ProcessMonitor reveal two binaries being written to disk and spawned by this process. Thus our original hypothesis of additional files being written to disk appears to be proven true.

/malware_exe_file_writes.png

Placing a breakpoint on the WriteFilefunction call and referencing the MSDN documentation for function arguments, we can identify the contents being written to disk as shown in the image below.

./writing_file_to_disk_1.png

Now the question is, where did these files come from? Let’s search around some compression references within the binary. The commands below take us to a string of “unknown_compression_method”, and identify references to these strings.

[0x0040d1bc]> iz | grep 'compress'
17   0x0000d1bc 0x0040d1bc 26  27   .data   ascii   unknown compression method
[0x0040d1bc]> s str.unknown_compression_method 
[0x0040d1bc]> axt
(nofunc) 0x404a17 [DATA] mov dword [eax + 0x18], str.unknown_compression_method
(nofunc) 0x404aee [DATA] mov dword [ecx + 0x18], str.unknown_compression_method

Lets seek to these offsets and explore further.

[0x0040d1bc] s 0x404a17
[0x0040d1bc] v!

Immediately, more string artifacts indicating some type of compression routine is being used can be identified within the function at offset 0x004048b0.

compression_artifacts.png

Setting a breakpoint at function 0x00408b0, and running until we hit the function’s ret instruction, we see that a MZ header now miraculously appears! A few quick google searchers with the compression string artifacts reveal that the compression library is zlib.

new_mz_header

Embedded Files

This blog is getting pretty long so lets execute the embedded files and cross-referencing the outbound connections to VirusTotal to identify any leads of malicious activity. Executing these embedded files lead to the discovery of domain “dns3-domain[.]com”. ./wireshark-dns.png

vt-embeddedfiles.png This domain is associated with numerous malicious files that indicate this malware sample is a trojan and the malicious payload embeds itself in other applications. Based on what we’ve seen here, this makes sense. According to this Microsoft report It’s underlying functionality allows the binary to execute remote commands. Neat! The Virustotal relations table above shows numerous communicating files over the past couple years. Notably, most of the subdomains have zero detections for malicious activity. While the embedded payloads weren’t the most interesting thing in the world, the reverse engineering journey was worth the ride. Consider the skills you learn along the way and the journey is part that builds your skill set.

To wrap up this blog, I’ll leave the hashes in the IOC section should you dear reader choose to go and recreate and improve upon the analysis in this blog. If you made it this far, thank you for reading!

IOCs

585197476537724e04a6ba334f68458e   malware.exe (original file)
dc24611ea25fda4c3f1e6b1cfde2f319  <unicode name>.exe (upx unpacked)
d2f69c1ade1686668ec85ecb19c7384a  smss.exe
dns3[-]domain[.]com (https://www.virustotal.com/gui/domain/dns3-domain.com/relations)

Beyond The Blog

Arch Cloud Labs is currently revamping their malware analysis collection system. This system ingests malicious samples from malware providers such as VX-Underground, Malshare, and hybrid-analysis. These projects enables Arch Cloud Labs to both build a system to collect data as well as analyze emerging and trending threats. In the age of employment uncertainty, I use these projects to keep my skills sharp across a variety of areas and hope these silly blog inspire you to do the same. Thank you for reading, please consider sharing if you found this write up useful.


文章来源: https://www.archcloudlabs.com/projects/loadlibrary-analysis/
如有侵权请联系:admin#unsafe.sh