As we explained in previous parts of our blog series, we originally exploited CVE-2018-8611 without having access to any deep details about how the exploit worked since we only used the Kaspersky blog published in December 2018. However, after we built our exploit and wrote our blog series, we (admittedly very late) became aware of the BlueHat 2019 Shanghai presentation delivered by Kaspersky in May 2019. This talk detailed more about how the in-the-wild exploit actually worked.
In the end, we found this presentation extremely interesting, because it shows where our methodology deviated from the original exploit founders. We are also happy we found it much later, because it led us to come up with tricks that are still useful in spite of other situationally more powerful approaches used by the original 0day exploit.
For someone new to the joys of exploit development, seeing a comparitive analysis of approaches hopefully elucidates how this type of development process progresses. It is often based on intuitions during the exploit development process, rather than having a standard approach to all vulnerability exploitation problems.
We now endeavor to review and analyze Kaspersky’s explanations in light of our newfound understanding, and provide insights and corrections where applicable. We will also analyze the PreviousMode
overwrite approach used by the 0day exploit. We will see why it is a very powerful trick on 64-bit and why it does not work on 32-bit hosts, and therefore why the read/write primitives we discovered and described in part 4 of this series come to the rescue.
Before diving into Kaspersky’s details, we will detail how to safely detect if the vulnerability is present with zero risk of crashing the machine. Due to the way Microsoft chose to patch this vulnerability it is possible to do so. We do this by triggering code in the TmRecoverResourceManager()
loop but without triggering any race condition. Instead of trying to free an enlistment, we simply put the transaction manager into the offline state.
We set the transaction manager offline by closing the only handle we have opened to it from userland using CloseHandle()
, which triggers the TmpCloseTransactionManager()
function in the kernel:
void __fastcall TmpCloseTransactionManager(__int64 a1, _KTM *pKTM, __int64 a3, __int64 a4)
{
void *hRMHandle; // rsi MAPDST
struct _KRESOURCEMANAGER *pRM; // rax MAPDST
if ( a4 == 1 )
{
hRMHandle = 0i64;
pRM = 0i64;
TmpTmOffline(pKTM);
[...]
}
What is important for us is that TmpTmOffline()
sets the _KTM.State
to KKtmOffline
:
__int64 __fastcall TmpTmOffline(_KTM *pKTM)
{
...
pKTM->State = KKtmOffline;
Let’s recall that the patch (among other changes) added the following code in the TmRecoverResourceManager()
loop:
Tm_ = pResMgr->Tm;
if ( !Tm_ || Tm_->State != KKtmOnline )
{
ret = STATUS_TRANSACTIONMANAGER_NOT_ONLINE;
goto b_release_mutex;
}
pEnlistment_shifted = EnlistmentHead_addr->Flink;
After sending a notification, if TmRecoverResourceManager()
detects the transaction manager has gone offline it will exit early with the STATUS_TRANSACTIONMANAGER_NOT_ONLINE
error. In a vulnerable system the function will continue and RecoverResourceManager()
will eventually exit without error.
This is significantly easier to trigger than the actual race condition vulnerability, as we only need to suspend the recovery thread at any point in the TmRecoverResourceManager()
loop when it is handling one of what can be many thousands of enlistments. We do not care which enlistment it is touching. We also do not care where the recovery thread is suspended in the loop as we are not interested in winning any race condition. We only care whether or not the particular STATUS_TRANSACTIONMANAGER_NOT_ONLINE
error code is returned from calling NtRecoverResourceManager()
:
enum MACRO_STATUS
{
...
STATUS_TRANSACTIONMANAGER_NOT_ONLINE = 0xC0190052,
From a defense perspective this type of detection is useful for testing. Conversely, from an attack perspective, you could abuse this detection mechanism prior to downloading a full blown exploit you do not want being detected or to confirm you can just exploit a known vulnerability versus burning another 0day.
The BlueHat presentation was written by Boris Larin and Anton Ivanov from Kaspersky. They discuss CVE-2018-8611 as the third case study of in-the-wild 0day exploits they discovered. Their analysis is covered in the PDF in pages 50-73.
Page 50 notes the in-the-wild exploit worked on 7 through Windows 10 1803. They don’t mention if this is on 64-bit and 32-bit, however the examples are 64-bit and based on our subsequent analysis we suspect that this version of the exploit perhaps does not support 32-bit at all.
That said, through our own research we have independently confirmed that all vulnerable versions of Windows from Vista through Windows 10 1809 on x86 and x64 are exploitable for the purposes of elevating privileges.
If you have read the previous parts of this blog series, then the missing KTM and triggering details from slide 50 to 56 should be fairly clear. Note that, as discussed in part 3, it is likely the in-the-wild exploit approach to detecting the race condition was different from ours. We unfortunately are not able to tell for sure from the slides.
On page 56-57, Kaspersky discuss the race condition vulnerability. The description of what the actual vulnerability is in our opinion, slightly incorrect. Detection of the race condition could not be done by a successful execution of NtRecoverResourceManager()
since we need the recovery thread to be stuck in the TmRecoverResourceManager()
loop to trigger the write primitives. This is clearly used by the in-the-wild exploit based on slides 59+. Successful execution of NtRecoverResourceManager()
simply indicates that exploitation of the bug has completed. The vulnerability also does not technically have anything to do with a resource manager going offline, but rather with an enlistment becoming finalized. As we demonstrated in our earlier posts, it is possible to finalize an enlistment by committing it after it is in the necessary state.
On page 58, they note the patch’s changes. The second bullet point states that a check was added to see if the resource manager is online. In our patch analysis in part 2, we showed that this check corresponds to the transaction manager, and not the resource manager. We also believe that it is not directly indicative of the vulnerability itself, as the main culprit is the KENLISTMENT_FINALIZED
flag changes. The check for the transaction manager being offline is likely just an optimization change, which allows early exit from the recovery loop. (Of course it is also possible it fixes the vulnerability in a way we don’t understand). As discussed earlier in this post, we know that this new check for the transaction manager being offline allows us to benignly detect a vulnerable system prior to an exploitation attempt.
On page 60, Kaspersky note that there are a limited number of abusable functions inside the TmRecoverResourceManager()
loop. This matches our own analysis. Now things get quite interesting, because the approach of the in-the-wild exploit deviates from our approach!
Next, they discuss how the 0day exploit apparently triggers a 0 overwrite primitive (instead of an increment primitive that we used) and targets the _KTHREAD
PreviousMode
field to build a powerful arbitrary kernel read/write primitive. This is particularly interesting so we will explain this a bit more in the next section.
From slides 60 to 70, they explain that the 0day exploit crafts a fake userland dispatcher object of type EventNotificationObject
that goes into a wait state when KeWaitForSingleObbject()
is called. It allows them to modify the object before the following KeReleaseMutex()
call. The modification results in some code at the end of KiTryUnwaitThread()
being reached. This code triggers a 0
value overwrite (32-bit size on 32-bit and 64-bit size on 64-bit):
_InterlockedAnd64(&OwnerThread->ThreadLock, 0i64);
++WaitBlock->BlockState;
return result;
}
On page 65, Kaspersky explain that introducing a fake userland EventNotificationObject
causes the TmRecoverResourceManager()
thread to become stuck in a wait state. While the thread is waiting, they modify the dispatcher object to become a SemaphoreObject
.
The presentation does not mention how to detect from userland when the thread is blocked, but it appears the exploit uses a similar trick to our own check for the change in the KENLISTMENT_IS_NOTIFIABLE
flags. It is also possible that they check from userland for the modification of a lock value in the dispatcher object, or another value in some other object.
On page 67, they note that GetThreadContext()
is used to wake up the blocked thread, which likely explains their use of the EventNotificationObject
prior to modifying the dispatcher header.
As detailed in part 3 of our series, it is interesting to note that due to the absence of SMAP, the 0day exploit also detects the race win and coaxes the vulnerable function to start parsing controlled structures in userland. This is quite similar to our own approach, even if we used a different write primitive.
On page 71, Kaspersky finish explaining how the 0
value overwrite is used to overwrite the PreviousMode
in the leaked _KTHREAD
structure. This trick has some subtleties and quirks.
Instead of purposefully hitting the logic for OwnerThread->State == Suspended
in KiTryUnwaitThread()
and entering the corresponding code like we did, they exit the function immediately. The trick here is that they can point the controlled _KMUTANT.OwnerThread
pointer anywhere. They choose to do so in a way that allows them to overlap the OwnerThread.ThreadLock
field with the PreviousMode
field of the recovery thread associated _KTHREAD
structure, with the ultimate goal of setting PreviousMode
to 0
.
The code they abuse is here:
char __fastcall KiTryUnwaitThread(struct _KPRCB *CurrentPrcb, PKWAIT_BLOCK WaitBlock, PVOID WaitStatus, _QWORD *pOutputVar)
{
//...
OwnerThread = WaitBlock->Thread;
result = 0;
[1] if ( _interlockedbittestandset64(&OwnerThread->ThreadLock, 0i64) )
{
i = 0;
do
{
if ( ++i & HvlLongSpinCountMask || !(HvlEnlightenments & 0x40) )
_mm_pause();
else
HvlNotifyLongSpinWait(i);
}
while ( OwnerThread->ThreadLock || _interlockedbittestandset64(&OwnerThread->ThreadLock, 0i64) );
}
[2] if ( OwnerThread->State == Suspended )
{
//...
_InterlockedAdd(&ThreadQueue->CurrentCount, 1u);// our increment primitive
}
[3] _InterlockedAnd64(&OwnerThread->ThreadLock, 0i64); // their zero value overwrite primitive
++WaitBlock->BlockState;
return result;
}
There are three important points that happen in this order in the code: the initial locking of the ThreadLock
at [1]
, the State
test at [2]
, and the unlocking of the ThreadLock
at [3]
. We will analyze these slightly out of order.
The 0 value overwrite approach we are about to describe in more detail for both 32-bit and 64-bit doesn’t work on 64-bit Vista because of some code differences.
There are other 0 value overwrite primitives that appear to exist in the code. It is unclear to us if the author of the 0day exploit actually used such different 0 value overwrite primitives to support Vista 64-bit.
From our perspective, we didn’t investigate them because we already had our increment primitive. It was easier for us to simply wrap the PreviousMode
value to 0 using our increment primitive, to enable the more powerful and convenient arbitrary read/write primitive based on PreviousMode
being 0
.
First, we discuss the State
test at [2]
. Since they are providing an OwnerThread
pointer that overlaps with an existing _KTHREAD
structure in the kernel, they run the risk of the _KTHREAD.State
byte field holding a value of 5
. This corresponds to the Suspended
value:
.text:0000000140032E9E movzx eax, byte ptr [rbx+164h] ; _KTHREAD.State
.text:0000000140032EA5 cmp al, 5 ; Suspended
This could result in entering the if
condition at [2]
that they don’t want to go into. In practice their OwnerThread
pointer appears to overlap with a pointer in the target _KTHREAD
, so the chances are reasonably low this would happen. We’ve successfully tested writing PreviousMode
with this method on Windows 7 through Windows 10 1809 with no issues.
We know from the Kaspersky analysis that they are pointing the fake thread to a legitimate _KTHREAD
base address and adding 0x1eb
(more on why later). We see from the assembly above that the State
field is being pulled from offset 0x164
. This tells us that the State
variable will be tested from a 0x1eb + 0x164 = 0x34f
offset from the _KTHREAD
base. On Windows 10 1809 x64 this falls into an array of _KLOCK_ENTRY
structures.
struct _KLOCK_ENTRY LockEntries[6]; //0x320
struct _SINGLE_LIST_ENTRY PropagateBoostsEntry; //0x560
Since we know this array starts at 0x320
, and the above State
test offset is 0x34f
, we check what is at offset 0x2f
inside of the _KLOCK_ENTRY
structure.
union
{
struct _KLOCK_ENTRY_LOCK_STATE LockState; //0x20
VOID* volatile LockUnsafe; //0x20
struct
{
volatile UCHAR CrossThreadReleasableAndBusyByte; //0x20
UCHAR Reserved[6]; //0x21
volatile UCHAR InTreeByte; //0x27
union
{
VOID* SessionState; //0x28
struct
{
ULONG SessionId; //0x28
ULONG SessionPad; //0x2c
};
};
};
};
This is a union with quite a few fields, so it’s contextually hard to say if any of the values could contain the unwanted value 5 at a glance. Of course what it overlaps can also differ across Windows versions, so there is still the chance this type of blind overlapping could cause problems on some systems. In practice we’ve never seen the State == 5
case actually get hit during exploitation.
Next, let’s discuss the actual locking at [1]
. Wherever _KTHREAD.ThreadLock
happens to lie in memory will be locked using _interlockedbittestandset64(&OwnerThread->ThreadLock, 0i64)
. The underlying assembly for this particular locking macro is very important for this trick to work. The test on Windows 7 x64 looks like the following:
.text:0000000140032E88 lock bts qword ptr [rbx+40h], 0 ; _KTHREAD.ThreadLock
This bts
instruction only sets the 0th
placed bit (2nd operand) in the qword (first operand).
Had the instruction at [1]
been a larger bit-field width comparison, it could have included the PreviousMode
field that they are attempting to overwrite in the first place at [2]
. If that were the case, the lock at [1]
would never have been obtained in the first place, so the 0
overwrite would obviously never be reached at [2]
.
As Kaspersky allude to but does not actually explain, this lower bit at rbx+40h
that must be 0 is why the exploit used the target _KTHREAD
address, added to the weird offset 0x1eb
as the address for OwnerThread
. Let’s try to understand why.
Based on the offset they used being 0x1eb
, we use Windows 10 1809 offsets as an example. In _KTHREAD
, the PreviousMode
field is at 0x232
. Remember, 0x40
is being added to the OwnerThread
pointer in order to lock OwnerThread
. This gives us 0x1eb + 0x40 = 0x22b
. 0x22b
corresponds to one of the bytes of the UserAffinityFill
array , which is immediately before the PreviousMode
field they want to overwrite:
//0x5f0 bytes (sizeof)
struct _KTHREAD
{
...
union
{
struct _GROUP_AFFINITY UserAffinity; //0x228
struct
{
UCHAR UserAffinityFill[10]; //0x228
CHAR PreviousMode; //0x232
CHAR BasePriority; //0x233
union
{
CHAR PriorityDecrement; //0x234
struct
{
UCHAR ForegroundBoost:4; //0x234
UCHAR UnusualBoost:4; //0x234
};
};
UCHAR Preempted; //0x235
UCHAR AdjustReason; //0x236
CHAR AdjustIncrement; //0x237
};
This is presumably done because this particular offset in UserAffinityFill
is always 0, and this allows them to obtain the lock at [1]
.
Now let’s discuss the unlocking instruction at [3]
triggering the 0 value overwrite: _InterlockedAnd64(&OwnerThread->ThreadLock, 0i64)
. The corresponding assembly for this function is interesting:
.text:0000000140032F18 lock and qword ptr [rbx+40h], 0 ; _KTHREAD.ThreadLock
Because it unlocks the lock, it does not operate on a single bit, but rather sets the entire 64-bit value (qword
) to 0
. In this case, the values from 0x22b
through 0x22b+0x8=0x233
will be zeroed, including PreviousMode
and BasePriority
.
From this point, the chain of functions is exited without many additional constraints (compared to our increment primitive), and we assume the TmRecoverResourceManager()
loop is broken out of the same way we broke out of it by abusing the leaked _KRESOURCEMANAGER
address.
On page 72-73, Kaspersky explain that the 0day exploit abused PreviousMode
being 0
to read and write arbitrary kernel memory, as most sanity checks in kernel space are predicated on PreviousMode
being set to 1
for userland processes.
This is very interesting, because the PreviousMode
persisting at 0
after an overwrite directly contradicts the official documentation from Microsoft which states that the trap handler sets it for every system call when it originates from userland. This is quoted in the Kaspersky presentation.
Now, we will look at why the general 0
value overwrite trick abused in KiTryUnwaitThread()
does not actually work on all 32-bit versions. Then, we will return to the abuse of PreviousMode
and show it only works easily on 64-bit, and how that contrasts to 32-bit where it actually works the way the Microsoft documentation suggests.
Our exploit uses the increment primitive to exploit 32-bit systems, so we knew we could exploit the vulnerability to elevate privileges on 32-bit.
However, we were curious if the same 0
value overwrite primitive used by the in-the-wild exploit would also work on 32-bit. We discovered that it does not, so let’s take a look at why. The code examples are from Windows 7 32-bit.
We already know we can enter the KiTryUnwaitThread()
function, as this is what our exploit did, so let’s start there:
char KiTryUnwaitThread(_KWAIT_BLOCK *WaitBlock, struct _KPRCB *a2, int a3, _KTHREAD **a4)
{
//...
i = 0;
OwnerThread = WaitBlock->Thread;
v12 = 0;
OwnerThreadLock = (volatile signed __int32 *)&OwnerThread->ThreadLock;
while ( _InterlockedExchange(OwnerThreadLock, 1) )
{
do
{
if ( ++i & HvlLongSpinCountMask || !(HvlEnlightenments & 0x40) )
_mm_pause();
else
HvlNotifyLongSpinWait(i);
}
while ( *WaitingThreadLock );
}
if ( OwnerThread->State == Suspended )
{
//...
The first thing that stands out is that on 64-bit the while
loop was bounded by the macro _interlockedbittestandset64(&OwnerThread->ThreadLock, 0i64)
call, whereas here on 32-bit the call is _InterlockedExchange(OwnerThreadLock, 1)
. The corresponding assembly is:
.text:00478F4A xor eax, eax ; eax = 0
.text:00478F4C mov ecx, ebx ; _KTHREAD.ThreadLock
.text:00478F4E inc eax ; eax = 1
.text:00478F4F xchg eax, [ecx] ; *_KTHREAD.ThreadLock = 1, eax = old ThreadLock
.text:00478F51 test eax, eax ; old value == 0?
.text:00478F53 jnz short b_wait_loop
Above, the value 1
is written to the _KTHREAD.ThreadLock
and the old value is tested to see if it was non-zero. If the old value is non-zero, then the while
loop is entered to wait on the lock being available. The fact that the test is using the entire 32-bit value means that we are unable to use the unaligned lock pointer trick to overwrite PreviousMode
. This is because if there is a non-zero value in the lock that we wish to overwrite, we are prevented from locking it in the first place!
Had the macro been different, then the 0 value overwrite primitive could have been used the same way it was on 64-bit.
This then raises the question: is the macro always different on x86? It turns out after checking some x86 Windows versions that it seems Microsoft started using _interlockedbittestandset()
on Windows 8 and above. We checked Windows 8 and Windows 10 1809 x86. So this likely means that at least on Windows 8 and later 32-bit systems, it is possible to abuse an analogous 0 value overwrite primitive.
Even on systems like Windows 7 x86 where we don’t have the 0 value overwrite primitive, since we know the increment primitive works, we still have the opportunity to modify PreviousMode
by incrementing the value 255 times and wrapping it to 0 to achieve the same effect.
In theory, the ability to write 0 to PreviousMode
on 32-bit should give us an analogously powerful kernel read/write primitive on 32-bit. In practice this does not appear to be the case. We will describe why as we delve into PreviousMode
further.
As discussed earlier, the in-the-wild exploit leveraged a powerful technique to achieve an arbitrary read/write primitive, which is to set the PreviousMode
field of the _KTHREAD
to 0, which corresponds to the KernelMode
value from an enum like this:
typedef enum _MODE {
KernelMode = 0,
UserMode,
} MODE;
PreviousMode
is used to indicate that a syscall was called by the kernel, which we will look at in more detail shortly. If this value is set to 0
, then functions like NtReadVirtualMemory()
or NtWriteVirtualMemory()
can be abused to read or write to kernel memory, as address validation checks are skipped:
__int64 __fastcall NtWriteVirtualMemory(HANDLE ProcessHandle, PVOID BaseAddress, PVOID Buffer, __int64 BufferSize, __int64 *NumberOfBytesWritten)
{
pCurrentThread = KeGetCurrentThread();
PreviousMode = pCurrentThread->PreviousMode;
// Check only if called from UserMode
if ( PreviousMode )
{
EndAddress = BaseAddress + BufferSize;
if ( BaseAddress + BufferSize < BaseAddress )
return STATUS_ACCESS_VIOLATION;
BufferEnd = Buffer + BufferSize;
if ( BufferEnd MmHighestUserAddress || BufferEnd > MmHighestUserAddress )
return STATUS_ACCESS_VIOLATION;
if ( NumberOfBytesWritten )
{
NumberOfBytesWritten_ = NumberOfBytesWritten;
if ( NumberOfBytesWritten >= MmUserProbeAddress )
NumberOfBytesWritten_ = MmUserProbeAddress;
*NumberOfBytesWritten_ = *NumberOfBytesWritten_;
}
}
...
The earliest reference we could find to the PreviousMode
field being used as an exploit target was from Tarjei Mandt at Infiltrate 2011 in his Modern Kernel Pool Exploitation: Attacks and Techniques talk (see slide 124).
There have been other interesting ideas around abusing PreviousMode
logic. In March 2019, James Forshaw discussed similar ideas about finding code paths that make incorrect assumptions about how calls were made.
To us, the fact that overwriting PreviousMode
with 0
actually persisted across syscalls was unexpected, as the official Microsoft documentation states the exact opposite. The relevant excerpt is:
When a user-mode application calls the Nt or Zw version of a native system
services routine, the system call mechanism traps the calling thread to kernel
mode. To indicate that the parameter values originated in user mode, the trap
handler for the system call sets the PreviousMode field in the thread object of
the caller to UserMode.
If "the trap handler for the syscall sets the PreviousMode
field in the thread object of the caller to UserMode", then why is it that modifying PreviousMode
from inside one syscall leads to subsequent syscalls being able to abuse PreviousMode
? Shouldn’t it be reset on the entry of the next syscall? We decided to analyse why and found some interesting results.
We will start by looking at Windows 7 x64.
We identify which function is used as the entry point for syscalls by checking the MSR_LSTAR
value in WinDbg:
0: kd> rdmsr 0xC0000082
msr[c0000082] = fffff800`029d4bc0
0: kd> u fffff800`029d4bc0
nt!KiSystemCall64Shadow:
fffff800`029d4bc0 0f01f8 swapgs
...
So we started reverse engineering KiSystemCall64Shadow
and different labels/functions it calls into. Note that we generally look at the assembly instead of the decompiled code as this kind of function has been written in assembly anyway so is easier to read without using Hex-Rays.
One interesting thing we noted is that at no time does it actually set PreviousMode
on entry. There is, however, a check in KiSystemServiceExit
of the saved CS
register that dictates if it should restore a saved PreviousMode
value on exit from one of the functions:
.text:00000001400A19DB KiSystemServiceExit: ; CODE XREF: KiSystemCall64+6B0↓j
.text:00000001400A19DB ; KiSystemCall64+6BB↓j
.text:00000001400A19DB ; DATA XREF: KiCallUserMode+25A↑o
.text:00000001400A19DB ; KiSystemServiceHandler+48↑o
.text:00000001400A19DB mov rbx, [rbp+0C0h]
.text:00000001400A19E2 mov rdi, [rbp+0C8h]
.text:00000001400A19E9 mov rsi, [rbp+0D0h]
.text:00000001400A19F0 mov r11, gs:188h
.text:00000001400A19F9 test byte ptr [rbp+0F0h], 1 ; Test if bit 1 of _KTRAP_FRAME.SegCs is set
.text:00000001400A1A00 jz b_swap_previousmode_ret
...
.text:00000001400A1BA3 swapgs
.text:00000001400A1BA6 sysret ; return back to userland
...
.text:00000001400A1BA9 b_swap_previousmode_ret: ; CODE XREF: KiSystemCall64+480↑j
.text:00000001400A1BA9 mov rdx, [rbp+0B8h]
.text:00000001400A1BB0 mov [r11+1D8h], rdx
.text:00000001400A1BB7 mov dl, [rbp-58h] ; _KTRAP_FRAME.PreviousMode
.text:00000001400A1BBA mov [r11+1F6h], dl ; _KTHREAD.PreviousMode
.text:00000001400A1BC1 cli
.text:00000001400A1BC2 mov rsp, rbp
.text:00000001400A1BC5 mov rbp, [rbp+0D8h]
.text:00000001400A1BCC mov rsp, qword ptr [rsp+88h+anonymous_36]
.text:00000001400A1BD4 sti
.text:00000001400A1BD5 retn ; return back to kernel caller
On all of the versions of Windows we checked, for both 32-bit and 64-bit, the trap handler will use the lower bit of the saved CS
selector (from KTRAP_FRAME
SegCs
) as a means of indicating if a caller into a trap was from userland or from the kernel. As seen above, only if the saved CS
indicates a kernel caller will it reuse a previously saved PreviousMode
. If the caller is from userland, this code will never be executed due to the sysret
instruction making it return to userland.
If you are familiar with the Windows kernel’s Nt
and Zw
function prefix semantics, this should make sense to you. The difference above is hinted at by the use of the retn
instruction when the caller is from the kernel, rather than sysret
. The retn
instruction implies that the return will not transition between privilege modes, but rather return to some other kernel function. This reflects the case where a kernel function calls a syscalls Zw
wrapper function.
The Zw
wrappers all jump into KiServiceInternal
:
KiServiceInternal
saves the old PreviousMode
and sets the new one to KernelMode
. This allows the kernel calling into syscalls to avoid expensive security checks enforced against userland:
.text:00000001400A1500 KiServiceInternal proc near
.text:00000001400A1500
.text:00000001400A1500 sub rsp, 8
.text:00000001400A1504 push rbp
.text:00000001400A1505 sub rsp, 158h ; _KTRAP_FRAME
.text:00000001400A150C lea rbp, [rsp+80h] ; Offset into _KTRAP_FRAME
.text:00000001400A1514 mov [rbp+0E8h+var_28], rbx
.text:00000001400A151B mov [rbp+0E8h+var_20], rdi
.text:00000001400A1522 mov [rbp+0E8h+var_18], rsi
.text:00000001400A1529 sti
.text:00000001400A152A mov rbx, gs:188h
.text:00000001400A1533 prefetchw byte ptr [rbx+1D8h]
.text:00000001400A153A movzx edi, byte ptr [rbx+1F6h] ; Fetch old _KTHREAD.PreviousMode value
.text:00000001400A1541 mov [rbp-58h], dil ; Preserve in _KTRAP_FRAME.PreviousMode
.text:00000001400A1545 mov byte ptr [rbx+1F6h], 0 ; Override with KernelMode as caller was ZwXXX
.text:00000001400A154C mov r10, [rbx+1D8h]
.text:00000001400A1553 mov [rbp+0E8h+var_30], r10
.text:00000001400A155A lea r11, KiSystemServiceStart
.text:00000001400A1561 jmp r11 ; Continue syscall as normal
In the code excerpt above, rbp-58h
corresponds to the same rbp-58h
used earlier in the code labeled b_swap_previousmode_ret
to restore PreviousMode
when exiting the syscall (without transitioning privilege mode).
It is fairly easy to understand what rbp
is by looking at the KTRAP_FRAME
structure below. If we assume that after sub rsp, 158h
executes, rsp points to KTRAP_FRAME
, then rbp
should be pointing to the Xmm1
field (lea rbp, [rsp+80h]
). Then, we do relative variable references from there, so rbp-58h
is really rsp+28h
, which is PreviousMode
:
//0x190 bytes (sizeof)
struct _KTRAP_FRAME
{
ULONGLONG P1Home; //0x0
ULONGLONG P2Home; //0x8
ULONGLONG P3Home; //0x10
ULONGLONG P4Home; //0x18
ULONGLONG P5; //0x20
CHAR PreviousMode; //0x28
...
struct _M128A Xmm1; //0x80
The main take away from all of this is that on all the 64-bit Windows versions we looked at, the kernel appears to always assume that the _KTHREAD
tracking a userland thread doing a syscall has a PreviousMode
value of UserMode
and will not bother trying to preserve it. It only bothers actually preserving PreviousMode
if a Zw
-based function path is taken (as it temporarilly overrides it with KernelMode
to avoid certain checks). This means that if you ever get the ability to change the PreviousMode
of a userland _KTHREAD
to KernelMode
, it will never change the PreviousMode
field back to UserMode
for that thread. This behavior does not match what the documentation indicates should happen.
In light of this, the primitive is quite powerful on 64-bit. This failure to properly set PreviousMode
is not a vulnerability per se, as it works as intended in normal circumstances, but it seems like a highly abusable oversight in the kernel that could be changed.
After leveraging the PreviousMode
trick on 64-bit, we took a look at 32-bit. We found that it doesn’t work on 32-bit, because 32-bit behaves as the Microsoft documentation suggests it should. Let’s take a look.
On Windows 7 32-bit, the entry to the syscall routine is checked in WinDbg with the !idt command.
1: kd> !idt 2e
2e: 82a3f6be nt!KiSystemService
If we take a look at this function in ntkrnlpa.exe
, we immediately see something different:
.text:0043D6BE _KiSystemService
.text:0043D6BE
.text:0043D6BE push 0
.text:0043D6C0 push ebp
.text:0043D6C1 push ebx
.text:0043D6C2 push esi
.text:0043D6C3 push edi
.text:0043D6C4 push fs
.text:0043D6C6 mov ebx, 30h ; '0'
.text:0043D6CB mov fs, bx
.text:0043D6CE mov ebx, 23h ; '#'
.text:0043D6D3 mov ds, ebx
.text:0043D6D5 mov es, ebx
.text:0043D6D7 mov esi, large fs:124h
.text:0043D6DE push large dword ptr fs:0
.text:0043D6E5 mov large dword ptr fs:0, 0FFFFFFFFh
.text:0043D6F0 push dword ptr [esi+13Ah] ; Save old _KTHREAD.PreviousMode
.text:0043D6F6 sub esp, 48h ; _KTRAP_FRAME
.text:0043D6F9 mov ebx, [esp+6Ch] ; _KTRAP_FRAME.SegCS value
.text:0043D6FD and ebx, 1 ; Lower bit of CS selector
.text:0043D700 mov [esi+13Ah], bl ; Override _KTHREAD.PreviousMode using CS
One of the first things the function does is save the old _KTHREAD
PreviousMode
. It then sets the PreviousMode
that will be used while inside the syscall to a value based on the CS
segment register check. Since the CS
selector cannot be changed from userland, this will always indicate to the kernel that the entry came from userland. As a result, even if we used a write primitive to change the PreviousMode
value, upon syscall entry the modified value would be saved, and a safe value indicating userland would actually be used.
As we saw earlier when discussing the 0 value write primitive, the functionality differs between Windows 7 and Windows 8 or above, so we also checked for the PreviousMode
logic. We confirmed the same CS
-based check exists on Windows 8 x86 and Windows 10 1809 x86. This means this technique cannot easily be used on 32-bit Windows. We say "easily" because we believe it is still technically possible to abuse PreviousMode
, albeit with greater difficulty.
One way we can still abuse PreviousMode
on 32-bit would be to use one kernel write primitive (like the 0 value overwrite primitive) in a loop to constantly reset the PreviousMode
of another _KTHREAD
to 0. This target thread could in turn keep looping on a call to NtReadVirtualMemory()
to read some kernel memory address until the read actually works. You end up having to exploit a fabricated race condition for the PreviousMode
trick to work in the first place.
One drawback of this approach is that you need to be able to write to a _KTHREAD
whose address you know, which is not the _KTHREAD
that you are using in order to achieve the write primitive. This means you need to leverage the increment-based read/write primitives to find the target address for the 0 value overwrite primitive. Yet another drawback is related to a requirement we saw for hitting the write 0 primitive. If you want to use a single fake userland enlistment that is in an infinite loop writing 0 to some address, then this userland enlistment must have the KENLISTMENT_IS_NOTIFIABLE
flag set. We also know that the kernel will unset this flag each time the loop is executed. To counter this problem, we must have a userland thread constantly resetting the KENLISTMENT_IS_NOTIFIABLE
flag. These drawbacks make this approach fairly inconvenient to implement.
In the case of the CVE-2018-8611 vulnerability, the _KTHREAD
we naturally leak is the one of the recovery thread that is stuck inside TmRecoverResourceManager()
. We would therefore need to still rely on the increment primitive in order to build an arbitrary kernel read primitive, which would then allow us to find another _KTHREAD
to target, to which we would write the PreviousMode
value. It is also worth noting that if we tried to set the PreviousMode
using the increment primitive (keeping in mind its requirements explained in part 4), it would become significantly more difficult to win this secondary race due to the number of increments required to wrap the value to 0 for each successful win. Actually implementing such a technique for e.g. Windows 10 1809 x86 is left as an exercise for the reader.
The following table shows the status of the PreviousMode
trick and the ability to use the 0 value overwrite primitive across Windows versions. Please note that this may be approximate as not every version in between supported Windows versions was tested.
Windows versions | Arch | Increment primitive | 0 value overwrite primitive | PreviousMode permanent? | PreviousMode usage |
---|---|---|---|---|---|
Vista to 7 | x86 | Yes | No | No | Raceable only with increment primitive |
8 to 10 1809 | x86 | Yes | Yes | No | Raceable with 0 value overwrite primitive |
Vista | x64 | Yes | No | Yes | Direct |
7 to 10 1809 | x64 | Yes | Yes | Yes | Direct |
On page 73 of the Kaspersky presentation, they suggests it may be worth using a secret cookie for encoding the PreviousMode
value or similar. This is interesting in so far as our increment primitive shows that we can escalate privileges using CVE-2018-8661 without abusing the PreviousMode
value at all. So in the case of this vulnerability, that mitigation would not have been effective. A better and easier mitigation would be to explicitly set the PreviousMode
to UserMode
on a syscall trap from userland, as per their own documentation.
We agree with the other suggested mitigation. Futher hardening of kernel dispatcher objects could help close these primitives. The most effective way to prevent these types of problems however is to start using a mitigation like SMAP.
This vulnerability would be very hard to exploit if SMAP was in use and the whole approach of pointing to a userland _KENLISTMENT
would have failed from the beginning. As a reminder, both the in-the-wild and our approaches use this technique, even if we use different write primitives.
SMAP would mean that a prerequisite to exploitation would be that we already have the ability to introduce fully controlled data at some known location in kernel memory. On earlier versions of Windows this was possible by abusing the Desktop Heap, which is heavily abused by win32k exploits. Unfortunately for exploit developers, to our knowledge there are no publicly known public ways to leak the kernel address of the Desktop Heap in the latest versions of Windows without having a kernel read primitive.
We did most of this work without seeing the in-the-wild exploit for this vulnerability. This resulted in a pretty interesting and long research experience to get a working exploit on all versions, as well as a lot of time spent exploring various failed approaches. It also forced us to improve our tooling such as being able to heavily document our idbs in HexRays.
It may still be that the approaches we took and the 0day exploit took are not necessarily the very best ways to do it, but as with many things, you work with the ideas that come to mind. Hopefully you found this article informative!
Don’t hesitate to contact us:
Aaron Adams – aaron(dot)adams(at)nccgroup(dot)com – @fidgetingbits
Cedric Halbronn – cedric(dot)halbronn(at)nccgroup(dot)com – @saidelike
It appears that Microsoft is considering deprecating KTM. The following quote is found on their page about Transactional NTFS (TxF)
:
Microsoft strongly recommends developers utilize alternative means to achieve
your application’s needs. Many scenarios that TxF was developed for can be
achieved through simpler and more readily available techniques. Furthermore,
TxF may not be available in future versions of Microsoft Windows. For more
information, and alternatives to TxF, please see Alternatives to using
Transactional NTFS: https://technet.microsoft.com/fr-fr/office/hh802690(v=vs.80)
It therefore seems likely that if more vulnerabilities start to be discovered in KTM, its removal will be expedited.
Thanks to the readers who made it this far!
We would like to acknowledge that this work would be much more difficult without relying on the mountain of previous work done by the larger security community. Massive thanks to everyone that is willing to share their work and research with the rest of us.
Thanks to NCC Group for allowing us to do this type of interesting research as our day jobs. We would like to thank Nick Galloway for his review on the blogs of this series.
Thanks to the @poc_crew and to @offensive_con for letting us speak about it at POC2019 and OffensiveCon2020.
We want to give a big shoutout to Kaspersky for their great analysis and sharing the results of their findings, even if we happened to not come across them until 5 months later. We hope the readers have gained a greater appreciation for the beauty and nuance of exploiting non-trivial vulnerabilities.
Lastly, a random shoutout to whomever originally found and developed an exploit for this vulnerability! You found a really cool vulnerability, and presumably went through a similar maze while trying to figure out how to exploit it. We hope to look at the rest of your exploit some day to understand what other approaches we took that were different from yours!