Quick post about a common problem removing null bytes in the loader generated by Donut. Replacing opcodes that contain null bytes with equivalent snippets is enough to solve the problem for a shellcode of no more than a few hundred bytes. It’s also possible to automate using encoders found in msfvenom and pwntools. However, the problem most users experience is when the loader generated by Donut is a few hundred kilobytes or even a few megabytes! This post demonstrates how to use escape sequences to facilitate faster encoding of null bytes. Maybe “escape codes” is a better description? You can find a PoC encoder here, which can be used to add an x86/AMD64 decoder to a shellcode generated by Donut.
Readers will be aware of the eXclusive-OR (XOR) cipher and its extensive use as a component or building block in many cryptographic primitives. It’s also a popular choice for obfuscating shellcode and specifically removing null bytes. In the past, the following code in C is what I’d probably use to find a suitable key. It will work with keys of any length, but is slow as hell for anything more than 24-Bits.
int find_xor_key( const void *inbuf, u32 inlen, void *outbuf, int outlen) { int i, j, keylen=1; u8 *in = (u8*)inbuf, *key=(u8*)outbuf; // initialize key for(i=0; i<outlen; i++) { key[i] = (i < keylen) ? 0 : -1; } // while keylen is less than max key requested while(keylen < outlen) { // xor data with current key for(i=0; i<inlen; i++) { // if the result of xor is zero. end loop if((in[i] ^ key[i % keylen]) == 0) break; } // if we processed all data successfully if(i == inlen) { // return current key and its length return keylen; } // otherwise, update the key for(i=0; ; i++) { if(++key[i]) break; } // update the key length if(i == keylen) keylen++; } // return nothing found return 0; }
The following function can be used to test it and works relatively fast for something that’s compact, like 1KB, but sucks for anything > 3072 bytes, which I admit is unusual for shellcode.
void test_key(void) { int i, keylen; u8 key[8], data[1024]; srand(time(0)); // fill buffer with pseudo-random bytes for(i=0; i<sizeof(data); i++) { data[i] = rand(); } // try find a suitable XOR key for the data keylen = find_xor_key(data, sizeof(data), key, sizeof(key)); printf("Suitable key %sfound.\n\n", keylen ? "" : "could not be "); if(keylen) { printf("Key length : %i\nKey : ", keylen); while(keylen--) { printf("%02x ", key[keylen]); } putchar('\n'); } }
find_xor_key() could be re-written to use multiple threads and this would speed up the search. You might even be able to use a GPU or cluster of computers, but the overall problem isn’t finding a key. We’re not trying to crack ciphertext. All we want to do is encode and later decode null bytes, and for the Donut loader, this approach is very inefficient.
Escape sequences have been used in computing since the 1970s and most of you will already be familiar with them. I’m not sure if I’m using the correct terminology for what I describe next, but hopefully you’ll understand why I did. Textual encoding algorithms like Base64, Ascii85 and BasE91 were considered first of course. And Qkumba wrote a very cool base64 decoder that uses just ASCII characters that I was very tempted to use. In the end, using an escape code to indicate a null byte is simpler to implement.
Although I use an XOR cipher in step 5, it could be replaced with something else.
static void nullz_encode(FILE *in, FILE *out) { char c, t; for(;;) { // read byte c = getc(in); // end of file? exit if(feof(in)) break; // adding one is just an example t = c + 1; // is the result 0(avoid) or 1(escape)? if(t == 0 || t == 1) { // write escape sequence putc(0x01, out); // The XOR is an optional step. // Avoid using 0x00 or 0xFF with XOR! putc(c ^ NULLZ_KEY, out); } else { // save byte plus 1 putc(c + 1, out); } } }
static void nullz_decode(FILE *in, FILE *out) { char c, t; for(;;) { // read byte c = getc(in); // end of file? exit if(feof(in)) break; // if this is an escape sequence if(c == 0x01) { // read next byte and XOR it c = getc(in); // The XOR is an optional step. putc(c ^ NULLZ_KEY, out); } else { // else subtract byte putc(c - 1, out); } } }
This assembly is compatible with both 32-Bit and 64-bit modes. It expects to run from RWX memory, so YMMV with this. If you want to execute from RX memory only, then this will require allocation of memory on the stack.
bits 32 %define NULLZ_KEY 0x4D nullz_decode: _nullz_decode: jmp init_code load_code: pop esi lodsd ; load original length of data xor eax, 0x12345678 ; change to 32-bit key xchg eax, ecx push esi ; save pointer to code on stack pop edi ; push esi decode_main: lodsb ; read a byte dec al ; c - 1 jnz save_byte lodsb ; read next byte xor al, NULLZ_KEY ; c ^= NULLZ_KEY save_byte: stosb ; save in buffer loop decode_main ret ; execute shellcode init_code: call load_code ; XOR encoded shellcode goes here..
An option to update the XOR key is left up to you.
// compatible with x86 and x86-64 char NULLZ_DECODER[] = { /* 0000 */ "\xeb\x17" /* jmp 0x19 */ /* 0002 */ "\x5e" /* pop esi */ /* 0003 */ "\xad" /* lodsd */ #define NULLZ_LEN 5 /* 0004 */ "\x35\x78\x56\x34\x12" /* xor eax, 0x12345678 */ /* 0009 */ "\x91" /* xchg eax, ecx */ /* 000A */ "\x56" /* push esi */ /* 000B */ "\x5f" /* pop edi */ /* 000C */ "\x56" /* push esi */ /* 000D */ "\xac" /* lodsb */ /* 000E */ "\xfe\xc8" /* dec al */ /* 0010 */ "\x75\x03" /* jne 0x15 */ /* 0012 */ "\xac" /* lodsb */ /* 0013 */ "\x34\x4d" /* xor al, 0x4d */ /* 0015 */ "\xaa" /* stosb */ /* 0016 */ "\xe2\xf5" /* loop 0xd */ /* 0018 */ "\xc3" /* ret */ /* 0019 */ "\xe8\xe4\xff\xff\xff" /* call 2 */ };
Before settling with escape sequences, I examined a number of other ways that null bytes might be encoded and decoded at runtime by a shellcode.
Initially, I thought of byte substitution, which is a non-linear operation used by legacy block ciphers. Scrapped that idea.
Experimented with match referencing, which is very common for lossless compression algorithms. Wrote a few bits of code to process files and then calculate the change in size. For every null byte found in a file, save the position and length before passing the null bytes to a function F for modification. An involution, like an XOR is fine to use as F. Then encode the offset and length using elias gamma2 codes. The change in file size was approx. 4% and I thought this might be the best way. It requires more code and is more complicated, but certainly an option.
Thought about bit tags. Essentially using 1-Bit to indicate whether a byte is encoded or not. Change in file size would be ~12% since every byte would require 1-Bit. This eventually led to escape sequences, which I think is the best approach.