The Power of Taint Analysis: Uncovering Critical Code Vulnerability in OpenAPI Generator

The Power of Taint Analysis: Uncovering Critical Code Vulnerability in OpenAPI Generator
2024-10-22 23:0:0 Author: securityboulevard.com(查看原文) 阅读量:4 收藏

The OpenAPI Generator is a popular tool with more than 20k stars on GitHub that allows users to automatically generate source code based on an OpenAPI spec. This code generation is also available via a web API, which can be self-hosted but is also publicly available at https://api.openapi-generator.tech/.

In our continuous effort to help secure open-source projects and improve our Clean Code solution, we regularly scan open-source projects via SonarCloud and evaluate the findings. In fact, everybody can also do it – SonarCloud is a free code analysis product for open-source projects, regardless of their size or language.

When scanning the code base of the OpenAPI Generator, SonarCloud reported a complex taint flow vulnerability, that propagates user-controlled data via 28 steps to a dangerous sink:

In this blog post, we will explain the technical details behind this taint flow vulnerability, which became CVE-2024-35219, a critical arbitrary file read and deletion vulnerability in the OpenAPI Generator.

Impact

OpenAPI Generator versions 7.5.0 and below are prone to an Arbitrary File Read/Delete vulnerability. Attackers can exploit this vulnerability to read and delete files and folders from an arbitrary, writable directory.

The vulnerability is tracked as CVE-2024-35219 and has been fixed with pull request #18652, which is included in version 7.6.0.

Technical Details

In this section, we will explain how the technique that SonarCloud uses to identify taint flow vulnerabilities works and then examine the specific vulnerability in the OpenAPI Generator.

Taint Analysis

Taint analysis is one of the techniques that the engine powering SonarQube and SonarCloud uses to identify security vulnerabilities in the analyzed source code. So, what is taint analysis?

An application’s logic is all about data, which is passed from one part of the code to another. For example, when you call a method, you pass some data to it as a parameter. This method may call another method and again passes on the parameter. This flow of data can be visualized as a graph like this:

There are specific entry points to this data flow called Source. An example of this could be the request body of an API handler method. At this point, an attacker could feed some data to the application and thus control the data that is passed onwards.

The counterpart to a Source is a dangerous Sink at the end of a flow. A Sink is a function or method that is known to be security-relevant when attacker-controlled data reaches it.

From a security point of view, the big question is whether an attacker can reach a security-sensitive sink. In other words: Is there a path from a Source to a Sink?

In the above example, the answer is yes. Data originating from an attacker-controllable Source eventually reaches a dangerous Sink. The steps in between the flow from the Source to the Sink are called Passthrough as these simply pass on the data.

In a real application, a shallow taint flow like the above example is not very realistic. It could have been easily spotted manually and never made it to production. However, a huge advantage of taint analysis is that it can follow all code paths and even find very complex taint flows to a deeply hidden Sink in the application’s source code:

In this example, the tainted data from a Source traverses many method and function calls before reaching a Sink. This critical flow is much harder to identify manually.

OpenAPI Generator Vulnerability

With this background knowledge, let’s have a look at the taint flow vulnerability SonarCloud reported for the OpenAPI Generator:

Click here to see the issue on SonarCloud yourself.

On the left side of the SonarCloud UI, we can see all the steps of the vulnerable flow. The first step is highlighted as SOURCE. This is the entry point where attackers might be able to feed in data to the application. In this case, the entry point is the @RequestBody sent to the API endpoint /gen/clients/{language} as we can see in the source code on the right side. SonarCloud highlights the source code so that the flow can be easily tracked here. Following the flow, we can see that the request body is supposed to contain a GeneratorInput object, which is highlighted as the next step.

An example request to the /gen/clients/{language} endpoint with a GeneratorInput object looks like this:

POST /api/gen/clients/csharp HTTP/1.1
Host: api.openapi-generator.tech
...

{
  "authorizationValue": {
    "keyName": "string",
    "type": "string",
    "value": "string"
  },
 "openAPIUrl": "https://raw.githubusercontent.com/OpenAPITools/openapi-generator/master/modules/openapi-generator/src/test/resources/2_0/petstore.yaml",
  "options": {},
  "spec": {}
}

The provided JSON body is mapped to a GeneratorInput object that looks like this:

package org.openapitools.codegen.online.model;

// ...

public class GeneratorInput {
    private JsonNode spec;
    private Map options;
    private String openAPIUrl;
    private AuthorizationValue authorizationValue;
    // ...
}

By following the flow on SonarCloud, we can see that this GeneratorInput object is eventually passed to a call to Generator::generate as the opts parameter. If the options member is set (opts.getOptions), the destPath is populated with the outputFolder option:

package org.openapitools.codegen.online.service;

// ...

public class Generator {
   // ...
   private static String generate(String language, GeneratorInput opts, Type type) {
        // ...
        if (opts.getOptions() != null) {
            destPath = opts.getOptions().get("outputFolder");
        }
        // ...

This destPath is further concatenated to the final outputFolder directory used to store all generated source code files. This directory is passed to a call to the zip.compressFiles method, which is used to store all generated source code files in a zip archive:

        // ...
        String outputFolder = getTmpFolder().getAbsolutePath() + File.separator + destPath;
        // ...
        try {
            List files = new DefaultGenerator().opts(clientOptInput).generate();
            if (files.size() > 0) {
                List filesToAdd = new ArrayList<>();
                LOGGER.debug("adding to {}", outputFolder);
                filesToAdd.add(new File(outputFolder));
                ZipUtil zip = new ZipUtil();
                zip.compressFiles(filesToAdd, outputFilename);
                // ...

Further following the flow on SonarCloud, we can see that the zip.compressFiles method iterates over all files and folders in the provided directory and stores them in a zip archive via addFolderToZip and addFileToZip:

    public void compressFiles(List listFiles, String destZipFile)
            throws IOException {

        try (FileOutputStream fileOutputStream = new FileOutputStream(destZipFile);
             ZipOutputStream zos = new ZipOutputStream(fileOutputStream)) {

            for (File file : listFiles) {
                if (file.isDirectory()) {
                    addFolderToZip(file, file.getName(), zos);
                } else {
                    addFileToZip(file, zos);
                }
            }

            zos.flush();
        }
    }

In the case of the user-controlled outputFolder, the flow continues with a call to addFolderToZip as we can see in the SonarCloud UI. Here, this complex taint flow ends with the invocation of the listFiles method on a user-controlled File object in step 28. This is the final Sink:

As indicated by the message beneath the Sink, processing this File object – in this case, with a call to listFiles – is dangerous, because the path of the File object was constructed based on user-controlled data.

Security Impact

Since attackers can control this path and there is no verification that the provided directory resides within the intended temporary folder, attackers can use a path traversal sequence (../) to target an arbitrary, writable folder. The zip.compressFiles method recursively adds all files and folders from this directory to the zip archive, which can then be downloaded. For example, the following request can be used to set the directory to /home/user/.ssh:

POST /api/gen/clients/csharp HTTP/1.1
Host: api.openapi-generator.tech
...

{
  "authorizationValue": {
    "keyName": "string",
    "type": "string",
    "value": "string"
  },
 "openAPIUrl": "https://raw.githubusercontent.com/OpenAPITools/openapi-generator/master/modules/openapi-generator/src/test/resources/2_0/petstore.yaml",
  "options": {"outputFolder":"../../../../home/user/.ssh"},
  "spec": {}
}

The generated source code files will be stored in /home/user/.ssh. All files and folders in this directory will be added to the zip archive, which can be downloaded via the /gen/download/{fileId} endpoint. This way, all files and folders from an arbitrary folder can be exfiltrated, including a potentially existing SSH key (id_rsa) in this case:

user@host:~$ zipinfo csharp-client-generated.zip
Archive:  csharp-client-generated.zip
Zip file size: 42785 bytes, number of entries: 34
...
-rw----     2.0 fat     1113 bl defN 24-Apr-26 06:26 .ssh/id_rsa
...

However, downloading the generated zip archive has another destructive effect: the parent folder of the directory, including all files and folders, will be deleted after the zip archive has been generated. In this case, this includes all files and folders in /home/user:

    public ResponseEntity downloadFile(String fileId) {
        Generated g = fileMap.get(fileId);
        // ...

        File file = new File(g.getFilename());
        // ...
        try {
            FileUtils.deleteDirectory(file.getParentFile());
            // ...

Thus, attackers can use this vulnerability not only to read arbitrary files and folders, but also to delete them.

Patch

While identifying a vulnerability as deeply hidden as this can be difficult, the actual patching process is typically straightforward. The issue was fixed by removing the code that concatenates the attacker-controllable option into the destination folder:

-        String destPath = null;
-
-        if (opts.getOptions() != null) {
-            destPath = opts.getOptions().get("outputFolder");
-        }
-        if (destPath == null) {
-            destPath = language + "-" + type.getTypeName();
-        }
+        // do not use opts.getOptions().get("outputFolder") as the input can contain ../../
+        // to access other folders in the server
+        String destPath = language + "-" + type.getTypeName();

Timeline

Date	Action
2024-04-29	We report all issues to the OpenAPI Generator maintainers.
2024-05-11	We reach out to the maintainers again to ask for the status.
2024-05-13	The maintainers share a fix with us for review.
2024-05-21	The fix is released as part of version v.7.6.0.
2024-05-21	CVE-2024-35219 is assigned.
2024-05-27	The related security advisory is made public.

Summary

In this blog post, we have seen how taint analysis can uncover deeply hidden vulnerabilities in source code. By tracking data from its origin (Source) to its ultimate use (Sink), this method can unveil complex taint flows that could lead to severe security vulnerabilities.

We examined a real-world example by covering a critical vulnerability in the OpenAPI Generator, which is based on a complex taint flow that SonarCloud detected. This discovery highlights the importance of leveraging SAST-based tools like SonarQube and SonarCloud to safeguard your application against these deeply hidden vulnerabilities.

Finally, we would like to thank the OpenAPI Generator maintainers for providing a comprehensive patch, and transparently informing all users.