This post provides detailed analysis for CVE-2019-8014 which was fixed in Adobe Acrobat Reader / Pro DC recently. Interestingly, it’s a patch bypass of CVE-2013-2729 which was fixed six years ago. This post also discusses how to exploit the vulnerability.
Author: Ke Liu of Tencent Security Xuanwu Lab
0x01. Introduction
Adobe released security updates for Adobe Acrobat and Reader in APSB19-41 in August. As usual, lots of vulnerabilities were fixed in the updates. When I was reviewing the corresponding advisories on ZDI , my attention was attracted by one of them: ZDI-19-725 / CVE-2019-8014 . Following text is the title and description of this case:
Adobe Acrobat Pro DC AcroForm Bitmap File Parsing Heap-based Buffer Overflow Remote Code Execution Vulnerability
The specific flaw exists within the parsing of run length encoding in BMP images. The issue results from the lack of proper validation of the length of user-supplied data prior to copying it to a fixed-length, heap-based buffer. An attacker can leverage this vulnerability to execute code in the context of the current process.
What surprised me most is that the flaw exists within the parsing of run length encoding in BMP images because I remembered that six years ago a similar case CVE-2013-2729 was fixed in Adobe Reader. If you have the same wondering that what’s the relationship between CVE-2013-2729 and CVE-2019-8014, then let me reveal the truth for you.
By the way, the credit of CVE-2019-8014 goes to ktkitty (https://ktkitty.github.io)
.
0x02. Debugging Environment
Before diving deep into the details of the vulnerability, let’s set up the debugging environment first. According to APSB19-41 , 2019.012.20035
and earlier versions of Adobe Acrobat and Reader on Windows were affected, and the released version was 2019.012.20036
. We’ll carry out our analysis on these two versions.
Steps to install Adobe Acrobat Reader DC 2019.012.20035
:
- Download and install
2019.012.20034
(Download Link) - Upgrade to
2019.012.20035
(Download Link)
Steps to install Adobe Acrobat Reader DC 2019.012.20036
:
- Download and install
2019.012.20036
(Download Link)
Please remember to disconnect the Internet or disable the Adobe Acrobat Update Service , otherwise your Adobe Acrobat Reader DC will be updated automatically.
0x03. Bitmap Structures
Again, before diving deep into the details of the vulnerability, let’s learn some essential concepts of bitmap images. You can skip this section if you’re already familiar with it.
3.1 Structures
Generally speaking, a bitmap image is composed of four parts:
- Bitmap File Header
- Bitmap Info Header
- RGBQUAD Array
- Bitmap Data
3.1.1 Bitmap File Header
The BITMAPFILEHEADER structure contains information about the type, size, and layout of the bitmap file. Following is the definition of this structure:
1 | typedef struct tagBITMAPFILEHEADER { |
3.1.2 Bitmap Info Header
The BITMAPINFOHEADER) structure contains information about the dimensions and color format of the bitmap file. Following is the definition of this structure:
1 | typedef struct tagBITMAPINFOHEADER { |
The value of biCompression
represents the compression method of the bitmap. Following are some of the possible values of it:
1 | #define BI_RGB 0 // uncompressed format |
3.1.3 RGBQUAD Array
The RGBQUAD structure describes a color consisting of relative intensities of red, green, and blue. Following is the definition of this structure:
1 | typedef struct tagRGBQUAD { |
The elements of the RGBQUAD array make up the color table. The number of entries in the array depends on the values of the biBitCount
and biClrUsed
members of the BITMAPINFOHEADER structure.
3.1.4 Bitmap Data
Bits data of the bitmap. The layout of this section depends on the compression method of the bitmap.
One thing should be noted is that usually pixels are stored “bottom-up”, starting in the lower left corner, going from left to right, and then row by row from the bottom to the top of the image [wikipedia)].
3.2 Run Length Encoding
Two types of run length encoding methods can be used in bitmap files: RLE4 and RLE8 .
3.2.1 RLE8
The RLE8 compression algorithm is used to compress an 8-bit bitmap. This format specifies encoded and absolute modes, and either mode can occur anywhere in a given bitmap.
Encoded mode involves two bytes:
If the first byte of a pair is greater than zero, it specifies the number of consecutive pixels to be drawn using the color index that is contained in the second byte.
If the first byte of a pair is zero and the second byte is 0x02 or less, the second byte is an escape value that can denote the end of a line, the end of the bitmap, or a relative pixel position, as follows.
- 0x00 - End of line
- 0x01 - End of bitmap
- 0x02 - Delta
When a delta is specified, the 2 bytes following the escape value contain unsigned values indicating the horizontal and vertical offsets of the next pixel relative to the current position.
In absolute mode, the first byte is zero, and the second byte is a value in the range 0x03 through 0xFF. The second byte represents the number of bytes that follow, each of which contains the color index of a single pixel. In absolute mode, each run is aligned on a word boundary.
The following example shows the hexadecimal contents of an 8-bit compressed bitmap:
1 | [03 04] [05 06] [00 03 45 56 67] [02 78] [00 02 05 01] |
The bitmap expands as follows (two-digit values represent a color index for a single pixel):
1 | 04 04 04 |
3.2.2 RLE4
The RLE4 compression algorithm is used to compress a 4-bit bitmap. This format specifies encoded and absolute modes, and either mode can occur anywhere in a given bitmap.
Encoded mode involves two bytes. If the first byte of a pair is greater than zero, it specifies the number of consecutive pixels to be drawn using the two color indexes that are contained in the high-order and low-order bits of the second byte.
The first pixel is drawn using the color specified by the high-order 4 bits, the second is drawn using the color in the low-order 4 bits, the third is drawn using the color in the high-order 4 bits, and so on, until all the pixels specified by the first byte have been drawn.
If the first byte of a pair is zero and the second byte is 0x02 or less, the second byte is an escape value that can denote the end of a line, the end of the bitmap, or a relative pixel position, as follows.
- 0x00 - End of line
- 0x01 - End of bitmap
- 0x02 - Delta
When a delta is specified, the 2 bytes following the escape value contain unsigned values indicating the horizontal and vertical offsets of the next pixel relative to the current position.
In absolute mode, the first byte is zero, and the second byte is a value in the range 0x03 through 0xFF. The second byte contains the number of 4-bit color indexes that follow. Subsequent bytes contain color indexes in their high- and low-order 4 bits, one color index for each pixel. In absolute mode, each run is aligned on a word boundary.
The following example shows the hexadecimal contents of a 4-bit compressed bitmap:
1 | [03 04] [05 06] [00 06 45 56 67 00] [04 78] [00 02 05 01] |
The bitmap expands as follows:
1 | 0 4 0 |
0x04. Vulnerability Details
4.1 Code Identification
According to the advisory on ZDI’s website, we know that the flaw exists within the AcroForm module. It’s the forms plug-in of Adobe Acrobat Reader DC and is responsible for parsing XFA forms . Following is the path of binary file of this plug-in:
1 | %PROGRAMFILES(X86)%\Adobe\Acrobat Reader DC\Reader\plug_ins\AcroForm.api |
Generally speaking, when doing patch analysis we may want to use BinDiff to help identify the changed functions between the old and new versions of the binary file. But it won’t be easy to find the target one if too many functions were changed. And that’s the case of AcroForm.api
. Here we’ll use some trivial tricks to identify the related functions.
The following analysis was carried out on Adobe Acrobat Reader DC 2019.012.20035
. The same method can be applied to version 2019.012.20036
.
- Search string
PNG
in IDA and we’ll find one at.rdata:20F9A374
- Find cross references to
20F9A374
and we’ll go to functionsub_20CF3A3F
- Obviously function
sub_20CF3A3F
is responsible for identifying the type of the image - Find cross references to
sub_20CF3A3F
and we’ll go to functionsub_20CF4BE8
- Function
sub_20CF4BE8
will call corresponding image parsing functions according to image types - Function
sub_20CF3E5F
, which will be called by functionsub_20CF4870
, is responsible for parsing bitmap images
The result of BinDiff shows that some basic blocks were changed in function sub_20CF3E5F
. Let’s take the basic block which begins at 20CF440F
as an example to show the difference.
1 | // 20CF440F in AcroForm 2019.012.20035 |
It’s obvious that the code was changed to prevent integer overflow circumstances.
4.2 Vulnerability Analysis
Thanks to feliam’s write up for CVE-2013-2729 , we can quickly understand what’s going on in function sub_20CF3E5F
.
4.2.1 RLE8 Decoding
Following pseudo code, which was extracted from function sub_20CF3E5F
, was responsible for parsing the RLE8 compressed data.
1 | if ( bmih.biCompression == 1 ) // RLE8 algorithm |
Based on previous patch analysis, it’s obvious that integer overflow can be triggered in the following if
statement.
1 | // 20CF440F, this basic block was patched in the updated binary file |
The flaw exists within the arithmetic computation of (unsigned __int8)cmd + xpos
. Here the value of both variables can be controlled by the attacker. And Out-Of-Bounds write can be triggered when decompressing RLE8 compressed data.
- The value of
(unsigned __int8)cmd
can be controlled directly in the bitmap file
1 | fn_read_bytes(v1[2], &cmd, 2u); // read 2 bytes |
- The value of
xpos
can be controlled by arranging lots ofdelta
commands in encoded mode
1 | else if ( BYTE1(cmd) == 2 ) // delta |
- Out-Of-Bounds write can be triggered when decompressing RLE8 compressed data
1 | index = 0; |
4.2.2 RLE4 Decoding
Following pseudo code, which was also extracted from function sub_20CF3E5F
, was responsible for parsing the RLE4 compressed data. The decoding process was almost the same, but it’s a little more complicated than RLE8 since the data unit was not a byte.
1 | if ( bmih.biCompression == 2 ) // RLE4 algorithm |
Integer overflow can be triggered in two spots, one exists within the handling of compressed data:
1 | high_4bits = BYTE1(cmd) >> 4; // high-order 4 bits |
Another one exists within the handling of uncompressed data:
1 | // 20CF44EA, this basic block was patched in the updated binary file |
0x05. Exploit
5.1 Overflow Candidate
Three integer overflows were found within the function. Here we’ll choose the one within the handling of RLE8 data. It’s more exploit friendly than the others.
In terms of RLE4 data decoding, the value of xpos
will be divided by 2
when putting data into the scan line. The maximum offset value for the scan line is 0xFFFFFFFF / 2 = 0x7FFFFFFF
, it means that we can only write forward and the address we are trying to write is probably out of our control.
For RLE8 data decoding, the offset value for the scan line is xpos
itself, thus we can write backward and the distance can be controlled. In the following if
statement, the maximum value of (unsigned __int8)cmd
is 0xFF
. And to bypass the check, the minimum value of xpos
is 0xFFFFFF01
which should be -255
in signed int
form. In other words, we can write backward as large as 0xFF
bytes.
1 | // 20CF440F, this basic block was patched in the updated binary file |
However, the interval we’re trying to write can only be filled with the same value. This will cause some problems when writing exploit, it will be explained later.
1 | index = 0; |
5.2 SpiderMonkey Concepts
Adobe Reader uses SpiderMonkey as its JavaScript engine. Before writing the exploit, let’s learn some essential knowledge of the SpiderMonkey engine.
5.2.1 ArrayBuffer
When the value of byteLength
is greater than 0x68
, the backing store of the ArrayBuffer object will be allocated from system heap (through ucrtbase!calloc
), otherwise it will be allocated from SpiderMonkey’s tenured heap . Also, when allocating from system heap, the underlying heap buffer will be 0x10
bytes larger to store the ObjectElements
object.
1 | class ObjectElements { |
The names of the members in ObjectElements
are meaningless for ArrayBuffer
. Here the second member holds the byteLength
value and the third member holds a pointer of the associated DataView object. The values of the other members are meaningless and can be any digits.
1 | var ab = new ArrayBuffer(0x70); |
When executing the above JavaScript code in Adobe Reader, the backing store of the ArrayBuffer
object will be looked like this:
1 | ; -, byteLength, viewobj, -, |
If we can change the value of the byteLength
of ArrayBuffer
, then we can achieve Out-Of-Bounds access. But be careful with the pointer of the associated DataView
object, it can only be 0 or a valid DataView
pointer, the process may crash immediately if we change it to some other values.
5.2.2 Array
When the value of length
is greater than 14
, the Array object can be allocated from system heap (through ucrtbase!calloc
), otherwise it may be allocated from SpiderMonkey’s nursery heap . Also, when allocating from system heap, the underlying heap buffer will be 0x10
bytes larger to store the ObjectElements
object.
1 | class ObjectElements { |
1 | var array = new Array(15); |
When executing the above JavaScript code in Adobe Reader, the underlying storage buffer of the Array
object will be looked like this:
1 | 0:010> dd 34cb0f88-10 L90/4 |
The contents of both array[0]
and array[14]
are 41424344 ffffff81
, here the higher four bytes of data 0xFFFFFF81
indicates that the type of the element is INT32
. And the contents of the elements within [1, 13]
are all filled with 00000000 ffffff84
which means that they’re undefined
.
If we can change the values of capacity
and length
, we can only achieve Out-Of-Bounds write, and the space after the original initialized elements and before the Out-Of-Bounds wrote elements will be filled with 00000000 ffffff84
. That’s some kind of meaningless.
It’s not a good idea to change initializedLength
to a large value. This may lead to crash when scanning the array elements during GC. We’ll probably encounter inaccessible memory page and crash the process.
5.2.3 JSObject
In SpiderMonkey, almost all JavaScript objects are inherited from JSObject
, and the later class is inherited from ObjectImpl
.
1 | class ObjectImpl : public gc::Cell { |
For DataView
object, the elements
member will point to emptyElementsHeader
which can be used to leak the base address of the JavaScript engine module.
1 | static ObjectElements emptyElementsHeader(0, 0); |
5.3 Bitmap Construct
Following python code can be used to generate a RLE compressed bitmap image.
1 | #!/usr/bin/env python |
Here we’ll generate a RLE8 bitmap with the following parameters:
- width is
0xF0
- height is
1
- bit count is
8
Here the size of the heap buffer will be 0xF0
and we will be able to write 0xF4
bytes backward with value 0x10
.
5.4 PDF Construct
This section explains how to embed the generated BMP image into a PDF file. Following is the PDF template that will be used later.
1 | %PDF-1.7 |
The size of the generated BMP file will be larger than 60MB. And it will be encoded in base64 and embedded within 6 0 obj
of the PDF file. To reduce the file size, this object will be compressed using the zlib/deflate compression method.
To exploit the vulnerability, we’ll need to have chances to run JavaScript code before and after triggering the vulnerability. This can be done by putting the JavaScript code within the initialize
event and the docReady
event.
Following python code can be used to generate the PDF file.
1 | #!/usr/bin/env python |
5.5 Exploit Tricks
5.5.1 Memory Layout (1)
In this case, ArrayBuffer
is more suitable for exploiting the vulnerability.
Firstly, we can create lots of ArrayBuffer
objects with byteLength
setting to 0xE0
. And free one ArrayBuffer
object of every ArrayBuffer
pair to create holes.
1 | ┌─────────────┬─────────────┬─────────────┬─────────────┐ |
Then we trigger the vulnerability, and the heap buffer of the bitmap will be placed in one of the holes.
1 | ┌─────────────┬─────────────┬─────────────┬─────────────┐ |
Since we are able to write 0xF4
bytes backward with value 0x10
. The backing store of the ArrayBuffer
will be filled with 0x10
.
1 | 0:014> dd 304c8398 |
Now the byteLength
of the ArrayBuffer
object has been changed to 0x10101010
and we can achieve Out-Of-Bounds access now. So far so good? The fact is that the process will crash immediately since we also changed the DataView
pointer.
5.5.2 Memory Layout (0)
We can avoid the crash if we can make 0x10101010
acts like a valid pointer. Obviously, we should arrange the memory layout before triggering the vulnerability. To make it more stable, it should be done even before we create and free the ArrayBuffer
objects.
We need the ability to put any value at any memory address, such as 0x10101010
. To achieve this goal, we can create lots of ArrayBuffer
objects with byteLength
setting to 0xFFE8
. That’s a carefully selected size to make sure that the ArrayBuffer
objects will be allocated at predictable addresses.
1 | // 0xFFE8 -> byteLength |
I’m not going to discuss how to avoid the crash in details, it’s very easy to figure out the specific conditions. Following code can be used to avoid the crash.
1 | function fillHeap() { |
It’s not done yet. The process still crashes when we try to create a new DataView
object for it. We can avoid the crash using the same tricks. Following is the improved code.
1 | function fillHeap() { |
5.5.3 Global Read / Write
Once we overwrote the byteLength
of any ArrayBuffer
object with 0x10101010
, we can leverage this ArrayBuffer
object to overwrite next one’s byteLength
to 0xFFFFFFFF
. It’s very easy to search the next ArrayBuffer
object if we put a flag value within all the ArrayBuffer
objects.
1 | (1)byteLength (3)Global Access |
Now we have the ability to read and write any memory address within the user space.
5.5.4 Absolute Address Access
Once we have the global access ability, we can search backward to calculate the base address of the ArrayBuffer
object’s backing store buffer, thus we can read and write at any given absolute memory address.
We can search two flags, ffeeffee
or f0e0d0c0
, to calculate the base address. To make it more accurate, the bytes around the flag value also need to be verified.
1 | 0:014> dd 30080000 |
5.5.5 Remaining Steps
Once we can read and write at any given absolute memory address, it’s very easy to achieve code execution. Following are the remaining steps that will not be discussed in this post:
- EIP hijack
- ASLR bypass
- DEP bypass
- CFG bypass
0x06. CVE-2013-2729
Three integer overflows were found within the handling of RLE compressed data, one in RLE8 decompression and the other two in RLE4 decompression.
Why shouldn’t we found four? Because another one have been patched six years ago. You can read feliam’s write up for CVE-2013-2729 if you haven’t read it yet.
Also, the patch for CVE-2013-2729 can be found within the handling of RLE8 compressed data.
1 | dst_xpos = BYTE1(cmd) + xpos; |
It’s astonishing that Adobe only patched the case that was reported and ignored the other three.
0x07. Lessons Learned
For product developers, please try to understand the root cause of the vulnerability and eliminate similar ones as much as you can.
For security researchers, patch analysis is a good way to figure out what the developers were thinking, and maybe you can find bypass solutions (this happens sometimes).