This post examines data compression algorithms suitable for position-independent codes and assumes you’re already familiar with the concept and purpose of data compression. For those of you curious to know more about the science, or information theory, read Data Compression Explained by Matt Mahoney. For historical perspective, read History of Lossless Data Compression Algorithms. Charles Bloom has a great blog on the subject that goes way over my head. For questions and discussions, Encode’s Forum is popular among experts and should be able to help with any queries you have.
For shellcode, algorithms based on the following conditions are considered:
Meeting the requirements isn’t that easy. Search for “lightweight compression algorithms” and you’ll soon find recommendations for algorithms that aren’t compact at all. It’s not an issue on machines with 1TB hard drives of course. It’s a problem for resource-constrained environments like microcontrollers and wireless sensors. The best algorithms are usually optimized for speed. They contain arrays and constants that allow them to be easily identified with signature-based tools.
Algorithms that are compact might have suboptimal compression ratios. The compressor component is closed source or restricted by licensing. There is light at the end of the tunnel, however, thanks primarily to the efforts of those designing executable compression. First, we look at those algorithms and then what Windows API can be used as an alternative. There are open source libraries designed for interoperability that support Windows compression on other platforms like Linux.
The first tool known to compress executables and save disk space was Realia SpaceMaker published sometime in 1982 by Robert Dewar. The first virus known to use compression in its infection routine was Cruncher published in June 1993. The author of Cruncher used routines from the disk reduction utility for DOS called DIET. Later on, many different viruses utilized compression as part of their infection routine to reduce the size of infected files, presumably to help evade detection longer. Although completely unrelated to shellcode, I decided to look at e-zines from twenty years ago when there was a lot of interest in using lightweight compression algorithms.
The following list of viruses used compression back in the late 90s/early 00s. It’s not an extensive list, as I only searched the more popular e-zines like 29A and Xine by iKX.
The following compression engines were examined. A 1MB EXE file was used as the raw data and not all of them were tested.
BCE that appeared in 29a#4 was disappointing with only an 8% compression ratio. BNCE that appeared in DCA#1 was no better at 9%, although the decompressor is only 54 bytes. The decompressor for LSCE is 25 bytes, but the compressor simply encodes repeated sequences of zero and nothing else. JQCoding has a ~20% compression ratio while LZCE provides the best at 36%. With exception to the last two mentioned, I was unable to find anything in the e-zines with a good compression ratio. They were super tiny, but also super eh..inefficient. Worth a mention is KITTY, by snowcat.
While I could be wrong, the earliest example of compression being used to unpack shellcode can be found in a generator written by Z0MBiE/29A in 2004. (shown in figure 1). NRV compression algorithms, similar to what’s used in UPX, were re-purposed to decompress the shellcode (see freenrv2 for more details).
UPX is a very popular tool for executable compression based on UCL. Included with the source is a PE packer example called UCLpack (thanks Peter) which is ideal for shellcode, too. aPLib also provides good compression ratio and the decompressor doesn’t contain lots of unique constants that would assist in detection by signature. The problem is that the compressor isn’t open source and requires linking with static or dynamic libraries compiled by the author. Thankfully, an open-source implementation by Emmanuel Marty is available and this is also ideal for shellcode.
Other libraries worth mentioning that I didn’t think were entirely suitable are Tiny Inflate and uzlib. The rest of this post focuses on compression provided by various Windows API.
Used by the Sofacy group to decompress a payload, RtlDecompressBuffer is also popular for PE Packers and in-memory execution. rtlcompress.c demonstrates using the API.
Obtain the size of the workspace required for compression via the RtlGetCompressionWorkSpaceSize API. Allocate memory for the compressed data and pass both memory buffer and the raw data to RtlCompressBuffer. The following example in C demonstrates this.
DWORD CompressBuffer(DWORD engine, LPVOID inbuf, DWORD inlen, HANDLE outfile) { ULONG wspace, fspace; SIZE_T outlen; DWORD len; NTSTATUS nts; PVOID ws, outbuf; HMODULE m; RtlGetCompressionWorkSpaceSize_t RtlGetCompressionWorkSpaceSize; RtlCompressBuffer_t RtlCompressBuffer; m = GetModuleHandle("ntdll"); RtlGetCompressionWorkSpaceSize = (RtlGetCompressionWorkSpaceSize_t)GetProcAddress(m, "RtlGetCompressionWorkSpaceSize"); RtlCompressBuffer = (RtlCompressBuffer_t)GetProcAddress(m, "RtlCompressBuffer"); if(RtlGetCompressionWorkSpaceSize == NULL || RtlCompressBuffer == NULL) { printf("Unable to resolve RTL API\n"); return 0; } // 1. obtain the size of workspace nts = RtlGetCompressionWorkSpaceSize( engine | COMPRESSION_ENGINE_MAXIMUM, &wspace, &fspace); if(nts == 0) { // 2. allocate memory for workspace ws = malloc(wspace); if(ws != NULL) { // 3. allocate memory for output outbuf = malloc(inlen); if(outbuf != NULL) { // 4. compress data nts = RtlCompressBuffer( engine | COMPRESSION_ENGINE_MAXIMUM, inbuf, inlen, outbuf, inlen, 0, (PULONG)&outlen, ws); if(nts == 0) { // 5. write the original length WriteFile(outfile, &inlen, sizeof(DWORD), &len, 0); // 6. write compressed data to file WriteFile(outfile, outbuf, outlen, &len, 0); } // 7. free output buffer free(outbuf); } // 8. free workspace free(ws); } } return outlen; }
LZNT1
and Xpress
data can be unpacked using RtlDecompressBuffer, however, Xpress Huffman
data can only be unpacked using RtlDecompressBufferEx or the multi-threaded RtlDecompressBufferEx2. The last two require a WorkSpace buffer.
typedef NTSTATUS (WINAPI *RtlDecompressBufferEx_t)( USHORT CompressionFormatAndEngine, PUCHAR UncompressedBuffer, ULONG UncompressedBufferSize, PUCHAR CompressedBuffer, ULONG CompressedBufferSize, PULONG FinalUncompressedSize, PVOID WorkSpace); DWORD DecompressBuffer(DWORD engine, LPVOID inbuf, DWORD inlen, HANDLE outfile) { ULONG wspace, fspace; SIZE_T outlen = 0; DWORD len; NTSTATUS nts; PVOID ws, outbuf; HMODULE m; RtlGetCompressionWorkSpaceSize_t RtlGetCompressionWorkSpaceSize; RtlDecompressBufferEx_t RtlDecompressBufferEx; m = GetModuleHandle("ntdll"); RtlGetCompressionWorkSpaceSize = (RtlGetCompressionWorkSpaceSize_t)GetProcAddress(m, "RtlGetCompressionWorkSpaceSize"); RtlDecompressBufferEx = (RtlDecompressBufferEx_t)GetProcAddress(m, "RtlDecompressBufferEx"); if(RtlGetCompressionWorkSpaceSize == NULL || RtlDecompressBufferEx == NULL) { printf("Unable to resolve RTL API\n"); return 0; } // 1. obtain the size of workspace nts = RtlGetCompressionWorkSpaceSize( engine | COMPRESSION_ENGINE_MAXIMUM, &wspace, &fspace); if(nts == 0) { // 2. allocate memory for workspace ws = malloc(wspace); if(ws != NULL) { // 3. allocate memory for output outlen = *(DWORD*)inbuf; outbuf = malloc(outlen); if(outbuf != NULL) { // 4. decompress data nts = RtlDecompressBufferEx( engine | COMPRESSION_ENGINE_MAXIMUM, outbuf, outlen, (PBYTE)inbuf + sizeof(DWORD), inlen - sizeof(DWORD), (PULONG)&outlen, ws); if(nts == 0) { // 5. write decompressed data to file WriteFile(outfile, outbuf, outlen, &len, 0); } else { printf("RtlDecompressBufferEx failed with %08lx\n", nts); } // 6. free output buffer free(outbuf); } else { printf("malloc() failed\n"); } // 7. free workspace free(ws); } } return outlen; }
Despite being well documented and offering better compression ratios than RtlCompressBuffer
, it’s unusual to see these API used at all. Four engines are supported: MSZIP, Xpress, Xpress Huffman and LZMS. To demonstrate using these API, see xpress.c
DWORD CompressBuffer(DWORD engine, LPVOID inbuf, DWORD inlen, HANDLE outfile) { COMPRESSOR_HANDLE ch = NULL; BOOL r; SIZE_T outlen, len; LPVOID outbuf; DWORD wr; // Create a compressor r = CreateCompressor(engine, NULL, &ch); if(r) { // Query compressed buffer size. Compress(ch, inbuf, inlen, NULL, 0, &len); if(GetLastError() == ERROR_INSUFFICIENT_BUFFER) { // allocate memory for compressed data outbuf = malloc(len); if(outbuf != NULL) { // Compress data and write data to outbuf. r = Compress(ch, inbuf, inlen, outbuf, len, &outlen); // if compressed ok, write to file if(r) { WriteFile(outfile, outbuf, outlen, &wr, NULL); } else xstrerror("Compress()"); free(outbuf); } else xstrerror("malloc()"); } else xstrerror("Compress()"); CloseCompressor(ch); } else xstrerror("CreateCompressor()"); return r; }
DWORD DecompressBuffer(DWORD engine, LPVOID inbuf, DWORD inlen, HANDLE outfile) { DECOMPRESSOR_HANDLE dh = NULL; BOOL r; SIZE_T outlen, len; LPVOID outbuf; DWORD wr; // Create a decompressor r = CreateDecompressor(engine, NULL, &dh); if(r) { // Query Decompressed buffer size. Decompress(dh, inbuf, inlen, NULL, 0, &len); if(GetLastError() == ERROR_INSUFFICIENT_BUFFER) { // allocate memory for decompressed data outbuf = malloc(len); if(outbuf != NULL) { // Decompress data and write data to outbuf. r = Decompress(dh, inbuf, inlen, outbuf, len, &outlen); // if decompressed ok, write to file if(r) { WriteFile(outfile, outbuf, outlen, &wr, NULL); } else xstrerror("Decompress()"); free(outbuf); } else xstrerror("malloc()"); } else xstrerror("Decompress()"); CloseDecompressor(dh); } else xstrerror("CreateDecompressor()"); return r; }
If you’re a developer that wants to sell a Windows application to customers on the Microsoft Store, you must submit a package that uses the Open Packaging Conventions (OPC) format. Visual Studio automates building packages (.msix or .appx) and bundles (.msixbundle or .appxbundle). There’s also a well documented interface (IAppxFactory) that allows building them manually. While not intended to be used specifically for compression, there’s no reason why you can’t. An SDK sample to extract the contents of packages uses SHCreateStreamOnFileEx to read the package from disk. However, you can also use SHCreateMemStream and decompress a package entirely in memory.
These encode and decode .wim files on disk. WIMCreateFile internally calls CreateFile to return a file handle to an archive that’s then used with WIMCaptureImage to compress and add files to the archive. From what I can tell, there’s no way to work with .wim files in memory using these API.
For Linux, the Windows Imaging (WIM) library supports Xpress, LZX and LZMS algorithms. libmspack and this repo provide good information on the various compression algorithms supported by Windows.
Believe it or not, the best compression ratio on Windows is provided by the Direct3D API. Internally, they use the DXT/Block Compression (BC) algorithms, which are designed specifically for textures/images. The algorithms provide higher quality compression rates than anything else available on Windows. The compression ratio was 60% for a 1MB EXE file and using the API is very easy. The following example in C uses D3DCompressShaders and D3DDecompressShaders. While untested, I believe OpenGL API could likely be used in a similar way.
#pragma comment(lib, "D3DCompiler.lib") #include <d3dcompiler.h> uint32_t d3d_compress(const void *inbuf, uint32_t inlen) { D3D_SHADER_DATA dsa; HRESULT hr; ID3DBlob *blob; SIZE_T outlen = 0; LPVOID outbuf; HANDLE file; DWORD len; file = CreateFile("compressed.bin", GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); if(file == INVALID_HANDLE_VALUE) return 0; dsa.pBytecode = inbuf; dsa.BytecodeLength = inlen; // compress data hr = D3DCompressShaders(1, &dsa, D3D_COMPRESS_SHADER_KEEP_ALL_PARTS, &blob); if(hr == S_OK) { // write to file outlen = blob->lpVtbl->GetBufferSize(blob); outbuf = blob->lpVtbl->GetBufferPointer(blob); WriteFile(file, outbuf, outlen, &len, 0); blob->lpVtbl->Release(blob); } CloseHandle(file); return outlen; }
uint32_t d3d_decompress(const void *inbuf, uint32_t inlen) { D3D_SHADER_DATA dsa; HRESULT hr; ID3DBlob *blob; SIZE_T outlen = 0; LPVOID outbuf; HANDLE file; DWORD len; // create file to save decompressed data to file = CreateFile("decompressed.bin", GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); if(file == INVALID_HANDLE_VALUE) return 0; dsa.pBytecode = inbuf; dsa.BytecodeLength = inlen; // decompress buffer hr = D3DDecompressShaders(inbuf, inlen, 1, 0, 0, 0, &blob, NULL); if(hr == S_OK) { // write to file outlen = blob->lpVtbl->GetBufferSize(blob); outbuf = blob->lpVtbl->GetBufferPointer(blob); WriteFile(file, outbuf, outlen, &len, 0); blob->lpVtbl->Release(blob); } CloseHandle(file); return outlen; }
The main problem with dynamically resolving these API is knowing what version is installed. The file name on my Windows 10 system is “D3DCompiler_47.dll”. It will likely be different on legacy systems.
Since the release of Windows 10 build 17063, the tape archiving tool ‘bsdtar’ is available and uses a stripped down version of the open source Multi-format archive and compression library to create and extract compressed files both in memory and on disk. The version found on windows supports bzip2, compress and gzip formats. Although, bsdtar shows support for xz and lzma, at least on my system along with lzip, they appear to be unsupported.
Windows 10 Fall Creators Update and Windows Server 1709 include support for an OpenSSH client and server. The crypto library used by this port appears to have been compiled from the LibreSSL project, and if available can be found in C:\Windows\System32\libcrypto.dll. As some of you know, Transport Layer Security (TLS) supports compression prior to encryption. LibreSSL supports the ZLib and RLE methods, so it’s entirely possible to use COMP_compress_block and COMP_expand_block to compress and decompress raw data in memory.
This namespace located in Windows.Storage.Compress.dll internally uses Windows Compression API. CreateCompressor
is invoked with the COMPRESS_RAW
flag set. It also invokes SetCompressorInformation with COMPRESS_INFORMATION_CLASS_BLOCK_SIZE
flag if the user specifies one in the Compressor method.
DLLs on Windows use the DEFLATE algorithm extensively to support various audio, video, image encoders/decoders and file archives. Normally, the deflate routines are used internally and can’t be resolved dynamically via GetProcAddress.
However, between at least Windows 7 and 10 is a DLL called PresentationNative_v0300.dll that can be found in the C:\Windows\System32 directory. (There may also be PresentationNative_v0400.dll, but I haven’t investigated this thoroughly enough.) Four public symbols grabbed my attention, which are ums_deflate_init
, ums_deflate
, ums_inflate_init
and ums_inflate
. For a PoC demonstrating how to use them, see winflate.c
The following code uses zlib.h to compress a buffer and write to file.
DWORD CompressBuffer(LPVOID inbuf, DWORD inlen, HANDLE outfile) { SIZE_T outlen, len; LPVOID outbuf; DWORD wr; HMODULE m; z_stream ds; ums_deflate_t ums_deflate; ums_deflate_init_t ums_deflate_init; int err; m = LoadLibrary("PresentationNative_v0300.dll"); ums_deflate_init = (ums_deflate_init_t)GetProcAddress(m, "ums_deflate_init"); ums_deflate = (ums_deflate_t)GetProcAddress(m, "ums_deflate"); if(ums_deflate_init == NULL || ums_deflate == NULL) { printf(" [ unable to resolve deflate API.\n"); return 0; } // allocate memory for compressed data outbuf = malloc(inlen); if(outbuf != NULL) { // Compress data and write data to outbuf. ds.zalloc = Z_NULL; ds.zfree = Z_NULL; ds.opaque = Z_NULL; ds.avail_in = (uInt)inlen; // size of input ds.next_in = (Bytef *)inbuf; // input buffer ds.avail_out = (uInt)inlen; // size of output buffer ds.next_out = (Bytef *)outbuf; // output buffer if(ums_deflate_init(&ds, Z_BEST_COMPRESSION, "1", sizeof(ds)) == Z_OK) { if((err = ums_deflate(&ds, Z_FINISH)) == Z_STREAM_END) { // write the original length first WriteFile(outfile, &inlen, sizeof(DWORD), &wr, NULL); // then the data WriteFile(outfile, outbuf, ds.avail_out, &wr, NULL); FlushFileBuffers(outfile); } else { printf(" [ ums_deflate() : %x\n", err); } } else { printf(" [ ums_deflate_init()\n"); } free(outbuf); } return 0; }
Inflating/decompressing the data is based on an example using zlib.
DWORD DecompressBuffer(LPVOID inbuf, DWORD inlen, HANDLE outfile) { SIZE_T outlen, len; LPVOID outbuf; DWORD wr; HMODULE m; z_stream ds; ums_inflate_t ums_inflate; ums_inflate_init_t ums_inflate_init; m = LoadLibrary("PresentationNative_v0300.dll"); ums_inflate_init = (ums_inflate_init_t)GetProcAddress(m, "ums_inflate_init"); ums_inflate = (ums_inflate_t)GetProcAddress(m, "ums_inflate"); if(ums_inflate_init == NULL || ums_inflate == NULL) { printf(" [ unable to resolve inflate API.\n"); return 0; } // allocate memory for decompressed data outlen = *(DWORD*)inbuf; outbuf = malloc(outlen*2); if(outbuf != NULL) { // decompress data and write data to outbuf. ds.zalloc = Z_NULL; ds.zfree = Z_NULL; ds.opaque = Z_NULL; ds.avail_in = (uInt)inlen - 8; // size of input ds.next_in = (Bytef*)inbuf + 4; // input buffer ds.avail_out = (uInt)outlen*2; // size of output buffer ds.next_out = (Bytef*)outbuf; // output buffer printf(" [ initializing inflate...\n"); if(ums_inflate_init(&ds, "1", sizeof(ds)) == Z_OK) { printf(" [ inflating...\n"); if(ums_inflate(&ds, Z_FINISH) == Z_STREAM_END) { WriteFile(outfile, outbuf, ds.avail_out, &wr, NULL); FlushFileBuffers(outfile); } else { printf(" [ ums_inflate()\n"); } } else { printf(" [ ums_inflate_init()\n"); } free(outbuf); } else { printf(" [ malloc()\n"); } return 0; }
That sums up the algorithms I think are suitable for a shellcode. For the moment, UCL and apultra seem to provide the best solution. Using Windows API is a good option. They are also susceptible to monitoring and may not be portable. One area I didn’t cover due to time is Media Foundation API. It may be possible to use audio, video and image encoders to compress raw data and the decoders to decompress. Worth researching?
Library / API | Algorithm / Engine | Compression Ratio |
---|---|---|
RtlCompressBuffer | LZNT1 | 39% |
RtlCompressBuffer | Xpress | 47% |
RtlCompressBuffer | Xpress Huffman | 53% |
Compress | MSZIP | 55% |
Compress | Xpress | 40% |
Compress | Xpress Huffman | 48% |
Compress | LZMS | 58% |
D3DCompressShaders | DXT/BC | 60% |
aPLib | N/A | 45% |
UCL | N/A | 42% |
Undocumented API | DEFLATE | 46% |