CVE-2024-4367 – Arbitrary JavaScript execution in PDF.js
2024-5-21 01:22:14 Author: movaxbx.ru(查看原文) 阅读量:40 收藏

research by Thomas Rinsma

CVE-2024-4367 – Arbitrary JavaScript execution in PDF.js

TL;DR 

This post details CVE-2024-4367, a vulnerability in PDF.js found by Codean Labs. PDF.js is a JavaScript-based PDF viewer maintained by Mozilla. This bug allows an attacker to execute arbitrary JavaScript code as soon as a malicious PDF file is opened. This affects all Firefox users (<126) because PDF.js is used by Firefox to show PDF files, but also seriously impacts many web- and Electron-based applications that (indirectly) use PDF.js for preview functionality.

If you are a developer of a JavaScript/Typescript-based application that handles PDF files in any way, we recommend checking that you are not (indirectly) using a version a vulnerable version of PDF.js. See the end of this post for mitigation details.

Introduction 

There are two common use-cases for PDF.js. First, it is Firefox’s built-in PDF viewer. If you use Firefox and you’ve ever downloaded or browsed to a PDF file you’ll have seen it in action. Second, it is bundled into a Node module called 

, with ~2.7 million weekly downloads according to NPM. In this form, websites can use it to provide embedded PDF preview functionality. This is used by everything from Git-hosting platforms to note-taking applications. The one you’re thinking of now is likely using PDF.js.

The PDF format is famously complex. With support for various media types, complicated font rendering and even rudimentary scripting, PDF readers are a common target for vulnerability researchers. With such a large amount of parsing logic, there are bound to be some mistakes, and PDF.js is no exception to this. What makes it unique however is that it is written in JavaScript as opposed to C or C++. This means that there is no opportunity for memory corruption problems, but as we will see it comes with its own set of risks.

Glyph rendering 

You might be surprised to hear that this bug is not related to the PDF format’s (JavaScript!) scripting functionality. Instead, it is an oversight in a specific part of the font rendering code.

Fonts in PDFs can come in several different formats, some of them more obscure than others (at least for us). For modern formats like TrueType, PDF.js defers mostly to the browser’s own font renderer. In other cases, it has to manually turn glyph (i.e., character) descriptions into curves on the page. To optimize this for performance, a path generator function is pre-compiled for every glyph. If supported, this is done by making a JavaScript 

 object with a body (

) containing the instructions that make up the path:

// If we can, compile cmds into JS for MAXIMUM SPEED...
if (this.isEvalSupported && FeatureTest.isEvalSupported) {
  const jsBuf = [];
  for (const current of cmds) {
    const args = current.args !== undefined ? current.args.join(",") : "";
    jsBuf.push("c.", current.cmd, "(", args, ");\n");
  }
  // eslint-disable-next-line no-new-func
  console.log(jsBuf.join(""));
  return (this.compiledGlyphs[character] = new Function(
    "c",
    "size",
    jsBuf.join("")
  ));
}

From an attacker perspective this is really interesting: if we can somehow control these 

 going into the 

 body and insert our own code, it would be executed as soon as such a glyph is rendered.

Well, let’s look at how this list of commands is generated. Following the logic back to the 

 class we find the method 

. This method initializes the 

 array with a few general commands (

 and 

), and defers to a 

 method to fill in the actual 

compileGlyph(code, glyphId) {
    if (!code || code.length === 0 || code[0] === 14) {
      return NOOP;
    }

    let fontMatrix = this.fontMatrix;
    ...

    const cmds = [
      { cmd: "save" },
      { cmd: "transform", args: fontMatrix.slice() },
      { cmd: "scale", args: ["size", "-size"] },
    ];
    this.compileGlyphImpl(code, cmds, glyphId);

    cmds.push({ cmd: "restore" });

    return cmds;
  }

If we instrument the PDF.js code to log generated 

 objects, we see that the generated code indeed contains those commands:

c.save();
c.transform(0.001,0,0,0.001,0,0);
c.scale(size,-size);
c.moveTo(0,0);
c.restore();

At this point we could audit the font parsing code and the various commands and arguments that can be produced by glyphs, like 

 and 

, but all of this seems pretty innocent with no ability to control anything other than numbers. What turns out to be much more interesting however is the 

 command we saw above:

{ cmd: "transform", args: fontMatrix.slice() },

This 

 array is copied (with 

) and inserted into the body of the 

 object, joined by commas. The code clearly assumes that it is a numeric array, but is that always the case? Any string inside this array would be inserted literally, without any quotes surrounding it. Hence, that would break the JavaScript syntax at best, and give arbitrary code execution at worst. But can we even control the contents of 

 to that degree?

Enter the FontMatrix 

The value of 

 defaults to 

[0.001, 0, 0, 0.001, 0, 0]

, but is often set to a custom matrix by a font itself, i.e., in its own embedded metadata. How this is done exactly differs per font format. Here’s the Type1parser for example:

extractFontHeader(properties) {
    let token;
    while ((token = this.getToken()) !== null) {
      if (token !== "/") {
        continue;
      }
      token = this.getToken();
      switch (token) {
        case "FontMatrix":
          const matrix = this.readNumberArray();
          properties.fontMatrix = matrix;
          break;
        ...
      }
      ...
    }
    ...
  }

This is not very interesting for us. Even though Type1 fonts technically contain arbitrary Postscript code in their header, no sane PDF reader supports this fully and most just try to read predefined key-value pairs with expected types. In this case, PDF.js just reads a number array when it encounters a 

 key. It appears that the 

 parser — used for several other font formats — is similar in this regard. All in all, it looks like we are indeed limited to numbers.

However, it turns out that there is more than one potential origin of this matrix. Apparently, it is also possible to specify a custom 

 value outside of a font, namely in a metadata object in the PDF! Looking carefully at the 

PartialEvaluator.translateFont(...)

 method, we see that it loads various attributes from PDF dictionaries associated with the font, one of them being the 

:

const properties = {
      type,
      name: fontName.name,
      subtype,
      file: fontFile,
      ...
      fontMatrix: dict.getArray("FontMatrix") || FONT_IDENTITY_MATRIX,
      ...
      bbox: descriptor.getArray("FontBBox") || dict.getArray("FontBBox"),
      ascent: descriptor.get("Ascent"),
      descent: descriptor.get("Descent"),
      xHeight: descriptor.get("XHeight") || 0,
      capHeight: descriptor.get("CapHeight") || 0,
      flags: descriptor.get("Flags"),
      italicAngle: descriptor.get("ItalicAngle") || 0,
      ...
    };

In the PDF format, font definitions consists of several objects. The 

, its 

 and the actual 

. For example, here represented by objects 1, 2 and 3:

1 0 obj
<<
  /Type /Font
  /Subtype /Type1
  /FontDescriptor 2 0 R
  /BaseFont /FooBarFont
>>
endobj

2 0 obj
<<
  /Type /FontDescriptor
  /FontName /FooBarFont
  /FontFile 3 0 R
  /ItalicAngle 0
  /Flags 4
>>
endobj

3 0 obj
<<
  /Length 100
>>
... (actual binary font data) ...
endobj

The 

 referenced by the code above refers to the 

 object. Hence, we should be able to define a custom 

 array like this:

1 0 obj
<<
  /Type /Font
  /Subtype /Type1
  /FontDescriptor 2 0 R
  /BaseFont /FooBarFont
  /FontMatrix [1 2 3 4 5 6]   % <-----
>>
endobj

When attempting to do this it initially looks like this doesn’t work, as the 

 operations in generated 

 bodies still use the default matrix. However, this happens because the font file itself is overwriting the value. Luckily, when using a Type1 font without an internal 

 definition, the PDF-specified value is authoritative as the 

 value is not overwritten.

Now that we can control this array from a PDF object we have all the flexibility we want, as PDF supports more than just number-type primitives. Let’s try inserting a string-type value instead of a number (in PDF, strings are delimited by parentheses):

/FontMatrix [1 2 3 4 5 (foobar)]

And indeed, it is plainly inserted into the 

 body!

c.save();
c.transform(1,2,3,4,5,foobar);
c.scale(size,-size);
c.moveTo(0,0);
c.restore();

Exploitation and impact 

Inserting arbitrary JavaScript code is now only a matter of juggling the syntax properly. Here’s a classical example triggering an alert, by first closing the 

 function, and making use of the trailing parenthesis:

/FontMatrix [1 2 3 4 5 (0\); alert\('foobar')]

The result is exactly as expected:

Exploitation of CVE-2024-4367

You can find a proof-of-concept PDF file here. It is made to be easy to adapt using a regular text editor. To demonstrate the context in which the JavaScript is running, the alert will show you the value of 

. Interestingly enough, this is not the 

 path you see in the URL bar (if you’ve downloaded the file). Instead, PDF.js runs under the origin 

. This prevents access to local files, but it is slightly more privileged in other aspects. For example, it is possible to invoke a file download (through a dialog), even to “download” arbitrary 

 URLs. Additionally, the real path of the opened PDF file is stored in 

window.PDFViewerApplication.url

, allowing an attacker to spy on people opening a PDF file, learning not just when they open the file and what they’re doing with it, but also where the file is located on their machine.

In applications that embed PDF.js, the impact is potentially even worse. If no mitigations are in place (see below), this essentially gives an attacker an XSS primitive on the domain which includes the PDF viewer. Depending on the application this can lead to data leaks, malicious actions being performed in the name of a victim, or even a full account take-over. On Electron apps that do not properly sandbox JavaScript code, this vulnerability even leads to native code execution (!). We found this to be the case for at least one popular Electron app.

Mitigation 

At Codean Labs we realize it is difficult to keep track of dependencies like this and their associated risks. It is our pleasure to take this burden from you. We perform application security assessments in an efficient, thorough and human manner, allowing you to focus on development. Click here to learn more.

The best mitigation against this vulnerability is to update PDF.js to version 4.2.67 or higher. Most wrapper libraries like 

 

have also released patched versions. Because some higher level PDF-related libraries statically embed PDF.js, we recommend recursively checking your 

 folder for files called 

to be sure. Headless use-cases of PDF.js (e.g., on the server-side to obtain statistics and data from PDFs) seem not to be affected, but we didn’t thoroughly test this. It is also advised to update.

Additionally, a simple workaround is to set the PDF.js setting 

 to 

. This will disable the vulnerable code-path. If you have a strict content-security policy (disabling the use of 

 and the 

constructor), the vulnerability is also not reachable.

Timeline 

  • 2024-04-26 – vulnerability disclosed to Mozilla
  • 2024-04-29 – PDF.js v4.2.67 released to NPM, fixing the issue
  • 2024-05-14 – Firefox 126, Firefox ESR 115.11 and Thunderbird 115.11 released including the fixed version of PDF.js
  • 2024-05-20 – publication of this blogpost

文章来源: https://movaxbx.ru/2024/05/20/cve-2024-4367-arbitrary-javascript-execution-in-pdf-js/
如有侵权请联系:admin#unsafe.sh