You will learn the fundamentals of user mode asynchronous procedure calls in this post, as well as how to use them to inject shellcode into a remote process thread to obtain a reverse shell.
Hello World! I haven't touched the Windows API exploitation series on process injection in a very long time. I'll attempt to fill the void left by these days today by discussing a different method that was once more covert than the CreateRemoteThread API. If you haven't read it, I covered the very fundamentals of the injection process there.
It would be very simple to understand if you have a high-level language background or are familiar with callbacks and asynchronous programming. If you're not, think about a scenario in which you try to perform an action, such as reading from a file or waiting for network packets in a program. How would you find out whether the action was succeeded or failed, or simply how much progress it had made so far? That is what asynchronous programming in a nutshell; the efficient solution to this problem is to carry out the callback while the prior operation is already in progress.
APC Queues on Windows are used to accomplish it, which are executed in the context of the thread they are scheduled. While queuing the callbacks, the function accepts the target function, handle to thread, and a necessary ULONG_PTR as the pointer to parameter. The address value stored in the pointer is changed from \(base_{16}\) to \(base_{10}\) form when you convert the pointer to ULONG_PTR (think logically). For example, if the value is 0x00007FF656E51352
, it will be changed to 140695996535634
. Although I will use the thread id in this demonstration, I wanted to make sure you knew in case you preferred something else, like LoadLibraryA.
The function signature I am talking about is QueueUserAPC
from processthreadapi.h header file and kernel32 library.
Note: To have a kernel-mode APC function, you will need to write a device driver which will run in the kernel mode.
Each thread get its own APC queue and when the thread gets into alertable state, it will dequeue and execute the callback function with the parameter provided in the third argument. There are several ways to get a thread into alertable state (check out \(3^{rd}\) paragraph), in this post, I will be using the easiest one, SleepEx
function.
So when the thread reaches alertable state, the OS then issues a software interrupt to direct the thread execution to APC function and the wait operation returns WAIT_IO_COMPLETION and then it comes out from alertable state.
The SleepEx function signature from synchapi.h header file and Kernel32 library can be found below. It will suspend the thread (pause execution) unless either of the following conditions are met
- Callback for any I/O operation is called,
- An APC is queued to the thread, or
- Sleep function time-outs (
dwMilliseconds
elapsed)
I have written a short demo for you. It contains a thread, apc callback and queuing it to the thread's apc queue, feel free to play with it.
Since you have gained enough knowledge on the Asyncronous Procedure Call, let's write a code to inject the shellcode in the threads of the remote process. The goal is to find a process with maximum threads and then queue the APC function in all the threads, because we don't know which one will enter the alertable state shortly.
Note: To make this attack appear more silent, find and target the thread of the process that frequently enters the alertable state if you are aware of it.
As usual, start from opening process with PROCESS_VM_WRITE and PROCESS_VM_OPERATION access to allocated a buffer in the address space of the remote process and write the contents of the shellcode from the current process to another.
I have defined a function named GetProcessThreads() in the process_utils.h header file to get a handle of all the threads with THREAD_SET_CONTENT access created by the process. It is requirement of QueueUserAPC function.
Get the handle of the thread and queue APC in the user-mode. Provide the address of the shellcode and typecast it to PAPCFUNCTION and set the ULONG_PTR to NULL
, because we are not sending any parameter.
Note: The first parameter to the function would be the address of the LoadLibraryA function from the Kernel32 library, and the third parameter would be the address of a string containing the full path to the DLL if you were using the DLL injection method in this case.
At last, since we are good humans, it is better to clean the resources that are used during program's lifetime.
All done now,🤩! You can try the code in your environment by changing the shell code from the following commands. I have used a Reverse TCP Meterepreter shellcode from the Metasploit Framework.
The GitHub Repository, which is shared in the link below, contains the entire code.
Once you will compile the code and execute it providing the PID of the target process, after a while you will get a Meterpreter connect back on your attacker machine, confirming the execution of the APC callback.
Note: The video is recorded some time while ago, but nothing is changed since then. It will work the same.
There is another variant of the APC injection, as the name implies, it will force a thread to begin in the suspended mode in order to start the APC. found this interesting on the ired.team blog and couldn't resist adding it here as well, with more details ofcourse. If you look at the remarks section of the QueueUserAPC function documentation, it says
"If an application queues an APC before the thread begins running, the thread begins by calling the APC function. After the thread calls an APC function, it calls the APC functions for all APCs in its APC queue."
We will have to change the existing code a little bit to make this work. For example, remove the code for opening process and replace it with following code to start a new process in the CREATE_SUSPENDED mode.
When you will run the program, it will show you the output like below. The main thread will have suspended count state set to 1. If there is any APC queued to the thread, it will dequeue from there and execute the callback function.
Allocate the memory and write the contents of the shellcode into the virtual address space of the remote process and then call QueueUserAPC function to invoke that shellcode immediately.
Last but not least, use the ResumeThread() function to restart the thread so that the APC in the queue is fired first. When calling the QueueUserAPC function, this will accept the handle of the thread that was active during the suspend operation.
The following path of the repository contains the codebase for this method.
Here is a brief demonstration of the technique I'm using to attempt to inject shellcode into the C:\Windows\System32\calc.exe process.
Since it is frequently used in legitimate applications as well, it is undoubtedly more stealthy than the earlier techniques. Consequently, relying on the import table of the functions will also result in false positive alarms. However, it can also be found by using a combination of more than one techniques, such as looking for OpenProcess and OpenThread functions from IAT, examining the windows events produced by Sysmon for process creation (it is for early bird apc injection), and monitoring Windows api calls or function hooking.
💡
If you are aware of more techniques or wanted to provide more details on it. Feel free to ping me at @tbhaxor.
- https://rinseandrepeatanalysis.blogspot.com/2019/04/early-bird-injection-apc-abuse.html?m=1
- https://www.youtube.com/watch?v=AdrWBVYgzPw
- https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/types-of-apcs
- https://docs.microsoft.com/en-us/windows/win32/sync/asynchronous-procedure-calls
- https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-queueuserapc
- https://sevrosecurity.com/2020/04/13/process-injection-part-2-queueuserapc/
- https://attack.mitre.org/techniques/T1055/004/