A Hands-On Intro to Semgrep's Autofix
2021-10-26 12:00:47 Author: parsiya.net(查看原文) 阅读量:88 收藏

Semgrep's experimental autofix feature can automagically modify vulnerable code. A few things can be fixed like this but it's worth exploring. This post is an introduction to creating fixes for your Semgrep rules.

I have included links to the playground for practicing. If you prefer running the rules via the command-line please see the rules and code at https://github.com/parsiya/Parsia-Code/tree/master/semgrep-autofix.

Note: This is an experimental feature. The following is correct at the time of writing (October 2021). If you are from the future, things will be different.

You don't have to be an appsec guru but this blog assumes you are somewhat familiar with:

  1. Semgrep and have done https://semgrep.dev/learn.
  2. Java and Go code.
  3. some application security topics like HttpOnly and XSS.

Testing Rules

An essential part of any experimentation like this is running rules against code snippets. With Semgrep we have two options:

  1. Semgrep playground
  2. Semgrep CLI

Semgrep Playground

The playground is at https://semgrep.dev/editor and I will mostly use it to show my work. If you create an account, you can save your rules and save them or, you can use it without one.

Semgrep CLI

If you don't like your rules to hit the cloud (and I usually want to keep my rules and code secret), you can use the Semgrep CLI. Getting the CLI is as easy as python3 -m pip install semgrep (I use it inside WSL). See more at https://semgrep.dev/docs/getting-started/.

After writing a rule verify it to catch any formatting/logical errors (-c can also, point to a directory of rules):

semgrep -c rule1.yaml --validate

Run a rule on a file or directory (add --debug for troubleshooting):

semgrep -c rule1.yaml example.java

semgrep -c my-rules-directory src-directory

Use the --autofix switch to automagically modify files. --dryrun shows the changes w/o modification. However, executing rules with fix sections without --autofix does the same:

semgrep -c rule1.yaml example.java --autofix --dryrun

No need to use dryrun No need to use dryrun

There are two ways to include it in rules:

  • fix
  • regex-fix

Fix

Straightforward feature. Whatever has been caught by the rule will be replaced by what you specify.

Python - sys.exit

The example in the documentation is at https://semgrep.dev/s/R6g and replaces exit with sys.exit. If I click the first Apply fix button, exit(3) is replaced with sys.exit(3).

First fix applied First fix applied

As you can see metavariables in fix are replaced by their values. This is very useful.

Java - CBC Padding Oracle

fix shines when we are catching and replacing a string. Look at the java.lang.security.audit.cbc-padding-oracle rule (I have modified the rule to make it easier to read).

# java-cbc-padding-oracle/cbc-padding-oracle.yaml
rules:
  - id: cbc-padding-oracle
    severity: WARNING
    message: Match found
    languages:
      - java
    pattern: $CIPHER.getInstance("=~/.*\/CBC\/PKCS5Padding/")
    fix: $CIPHER.getInstance("AES/GCM/NoPadding")

It looks for anything that is similar to object.getInstance("string") where the string contains CBC/PKCS5Padding.

You can see Semgrep's string matching in =~/.*\/CBC\/PKCS5Padding/. It returns a match if the regex matches the string parameter of the getInstance method. See it in action at https://semgrep.dev/s/parsiya:java-cbc-padding-oracle.

After running the rule you can see the fix in the right side and click on Apply fix to modify the code (if you want to repeat, both the rule and the test code support ctrl+z).

String matching is deprecated. Let's rewrite the rule with metavariable-regex.

# java-cbc-padding-oracle/cbc-padding-oracle-metavariable-regex.yaml
rules:
  - id: cbc-padding-oracle-metavariable-regex
    message: Match found
    languages:
      - java
    severity: WARNING
    patterns:
      - pattern: $CIPHER.getInstance($INS)
      - metavariable-regex:
          metavariable: $INS
          regex: .*\/CBC\/PKCS5Padding
    fix: $CIPHER.getInstance("AES/GCM/NoPadding")

The string parameter is now a metavariable and we directly run the regex against it. Try changing the regex to see what else you can match in https://semgrep.dev/s/parsiya:java-cbc-padding-oracle-metavariable-regex.

It looks like the metavariable-regex version has more "computation." My very "scientific" experiment of 50 runs shows they are not that different.

$ multitime -q -n 50 ./cbc-padding-oracle.sh
===> multitime results
1: -q ./cbc-padding-oracle.sh
            Mean        Std.Dev.    Min         Median      Max
real        0.781       0.006       0.773       0.780       0.806
user        0.501       0.041       0.406       0.500       0.609
sys         0.256       0.044       0.172       0.258       0.359

$ multitime -q -n 50 ./cbc-padding-oracle-metavariable-regex.sh
===> multitime results
1: -q ./cbc-padding-oracle-metavariable-regex.sh
            Mean        Std.Dev.    Min         Median      Max
real        0.788       0.007       0.778       0.786       0.813
user        0.516       0.047       0.406       0.516       0.609
sys         0.247       0.048       0.156       0.250       0.359

Java - HttpOnly Cookies

We want our cookies to have the HttpOnly and Secure attributes. I am going to explain the fix for HttpOnly and let you write the ones for Secure (almost identical). Summarized rule from the semgrep-rules repo:

# java-httponly/httponly-practice.yaml
rules:
- id: cookie-missing-httponly
  message: Match found
  severity: WARNING
  languages: [java]
  patterns:
  - pattern-not-inside: $COOKIE.setValue(""); ...
  - pattern-either:
    - pattern: $COOKIE.setHttpOnly(false);
    - patterns:
      - pattern-not-inside: $COOKIE.setHttpOnly(...); ...
      - pattern: $RESPONSE.addCookie($COOKIE);

It matches if the code is calling:

  1. $COOKIE.setHttpOnly(false); manually.
  2. $RESPONSE.addCookie($COOKIE); without $COOKIE.setHttpOnly(...).

The playground link is https://semgrep.dev/s/parsiya:java-httponly-practice. If you are running locally:

semgrep -c httponly-practice.yaml httponly.java

The fix is different for each pattern. It should replace:

  1. $COOKIE.setHttpOnly(false); with $COOKIE.setHttpOnly(true);.
  2. $RESPONSE.addCookie($COOKIE); with $COOKIE.setHttpOnly(true); $RESPONSE.addCookie($COOKIE);.

We cannot create a fix that matches both cases. We can break the rule and create separate fixes.

HttpOnly Pattern 1

The first pattern only checks $COOKIE.setHttpOnly(false); and we just need to replace false with true. Playground link: https://semgrep.dev/s/parsiya:java-httponly-practice-1.

# java-httponly/httponly-practice-1.yaml
rules:
- id: cookie-missing-httponly-1
  message: Match found
  severity: WARNING
  languages: [java]
  patterns:
  - pattern-not-inside: $COOKIE.setValue(""); ...
  - pattern: $COOKIE.setHttpOnly(false);
  fix: $COOKIE.setHttpOnly(true);

semgrep -c httponly-practice-1.yaml httponly.java semgrep -c httponly-practice-1.yaml httponly.java

HttpOnly Pattern 2

The second pattern matches if we see $RESPONSE.addCookie($COOKIE); but no setHttpOnly. The fix is $COOKIE.setHttpOnly(true); as a new line before the match. Playground link is https://semgrep.dev/s/parsiya:java-httponly-practice-2 or use the CLI.

semgrep -c httponly-pracitce-2.yaml httponly.java
# java-httponly/httponly-practice-2.yaml
rules:
  - id: cookie-missing-httponly-2
    message: Match found
    severity: WARNING
    languages:
      - java
    patterns:
      - pattern-not-inside: $COOKIE.setValue(""); ...
      - pattern-not-inside: $COOKIE.setHttpOnly(...); ...
      - pattern: $RESPONSE.addCookie($COOKIE);
    fix: |
      $COOKIE.setHttpOnly(true);
      $RESPONSE.addCookie($COOKIE);      

The fix works but it's not aligned properly.

httponly-pracitce-2 fix httponly-pracitce-2 fix

This is not an issue in Java but we can fix this with fix-regex.

fix-regex

fix is great for simple replacements (e.g., badFunc to goodFunc). But, fix-regex has the power of regular expressions. See the docs at https://semgrep.dev/docs/experiments/overview/#autofix-with-regular-expression-replacement.

It has three fields:

  • regex: Runs a regex on the text captured by the rule.
  • replacement: The replacement to the text captured by the rule.
  • count: (optional) How many instances of regex are replaced with replacement.

Note: Currently (2021-10-25), fix-regex does not support metavariables unlike fix. If have metavariables in the replacement section, they will be treated as text. You can track this bug at https://github.com/returntocorp/semgrep/issues/3269.

Java - HttpOnly Cookies Revisited

Previously, we "fixed" HttpOnly but the alignment was not correct. I am going to solve the same problem with fix-regex. The second pattern matched the 4th line in the code below:

  @RequestMapping(value = "/cookie1", method = "GET")
  public void setCookie(@RequestParam String value, HttpServletResponse response) {
      Cookie cookie = new Cookie("cookie", value);
      response.addCookie(cookie); // <--- This line was matched.
  }

Let's experiment with how the capture works (if you want to tag along https://semgrep.dev/s/parsiya:java-httponly-fix-regex-practice).

To see everything that was matched let's run this rule.

# java-httponly/httponly-fix-regex-practice.yaml
rules:
  - id: cookie-missing-httponly-fix-regex-practice
    message: Match found
    severity: WARNING
    languages:
      - java
    patterns:
      - pattern-not-inside: $COOKIE.setValue(""); ...
      - pattern-not-inside: $COOKIE.setHttpOnly(...); ...
      - pattern: $RESPONSE.addCookie($COOKIE);
    fix-regex:
      regex: (.*)
      replacement: //\1

We are creating a capture group in regex and then using it in the replacement section with \1.

The result after running the above The result after running the above

So what happened here? To regex is greedy (the docs mention this). Two different matches were captured. Both were prepended with // (we can fix this with (.*+) but let's continue).

  1. The line and its whitespace.
    1. [8xSpace]response.addCookie(cookie);
  2. The "nothing" after the line above.
    1. At least that's what I think it is.

We can fix this with the count field. We only want to replace the first match so we set it to 1.

Running the rule with count: 1 Running the rule with count: 1

This is much better. But, I have not fixed the alignment. We can capture the whitespace with another capture group.

    fix-regex:
      regex: (\s*)(.*)
      replacement: \1// \2
      count: 1

The first capture group is the whitespace and the second is the code.

Alignment fixed Alignment fixed

Now we need to add the new line with the correct whitespace (it's in \1) above the captured line. \2 is the original addCookie line. If fix-regex supported metavariable replacement like fix the rule would look like:

# java-httponly/httponly-fix-regex-practice-2.yaml
    fix-regex:
      regex: (\s*)(.*)
      replacement: |
        \1$COOKIE.setHttpOnly(true);
        \1\2        
      count: 1

This is not implemented, yet. See https://semgrep.dev/s/parsiya:java-httponly-fix-regex-practice-2.

Metavariable in replacement Metavariable in replacement

We must capture cookie from response.addCookie(cookie); (the match) and add .setHttpOnly(true).

The new regex is (\s*)(.*addCookie\((.*)\).*):

  1. (\s*): The first capture group is still whitespace.
  2. (.*addCookie\((.*)\).*): The second is response.addCookie(cookie); without the whitespace. We will print it as-is in the 2nd line of the fix.
  3. .*addCookie\((.*)\): This is inside the second group and is trying to capture whatever comes after addCookie( and before ).

We have everything we need to create the fix:

# java-httponly/httponly-fix-regex-practice-final.yaml
rules:
  - id: cookie-missing-httponly-fix-regex-practice-final
    message: Match found
    severity: WARNING
    languages:
      - java
    patterns:
      - pattern-not-inside: $COOKIE.setValue(""); ...
      - pattern-not-inside: $COOKIE.setHttpOnly(...); ...
      - pattern: $RESPONSE.addCookie($COOKIE);
    fix-regex:
      regex: (\s*)(.*addCookie\((.*)\).*)
      replacement: |
        \1\3.setHttpOnly(true);
        \1\2        
      count: 1

Correct fix with fix-regex Correct fix with fix-regex

Playground link https://semgrep.dev/s/parsiya:java-httponly-fix-regex-practice-final.

Go - text/template

In Go you can use text/template and html/template. The latter does some output encoding and is safer for web use. It's possible to use text/template correctly or in a non-web use case but, it's not usually the case.

At first glance you would think we can use fix and replace text/template with html/template. To test this theory, add a fix section to the Semgrep's import-text-template rule at https://semgrep.dev/s/parsiya:go-import-text-template-fix.

The first reaction is to add a fix section like this:

# go-import-text-template/import-text-template-fix.yaml
rules:
- id: import-text-template-fix
  message: Match found.
  severity: WARNING
  pattern: |
        import "text/template"
  languages:
    - go
  fix: import "html/template"

When we ask Semgrep to match import "text/template" it matches everything from the import keyword until the end of "text/template". This usually includes other imports and our fix will replace other imports.

The matched string in the rule above The matched string in the rule above

fix-regex "fixes" this (har har). The regex only looks for text/template in the matched text and replaces it with html/template without overwriting anything else. count is optional here because there's only one match. It's a good habit to include it anyways.

# go-import-text-template/import-text-template-fix-regex.yaml
rules:
- id: import-text-template-fix-regex
  message: Match found.
  severity: WARNING
  pattern: |
        import "text/template"
  languages:
    - go
  fix-regex:
    regex: text/template
    replacement: html/template
    count: 1

See the magic at https://semgrep.dev/s/parsiya:import-text-template-fix-regex.

text/template replaced with html/template text/template replaced with html/template

Go - HttpOnly Cookies

Similar to the Java version, we can check if a cookie in Go has the HttpOnly attribute. The original rule from the repo is at https://semgrep.dev/s/parsiya:go-httponly-original.

# go-httponly/go-httponly-original.yaml
rules:
  - id: cookie-missing-httponly
    patterns:
      - pattern-not-inside: |
          http.Cookie{
            ...,
            HttpOnly: true,
            ...,
          }          
      - pattern: |
          http.Cookie{
            ...,
          }          
    message: Match found
    fix-regex:
      regex: (HttpOnly\s*:\s+)false
      replacement: \1true
    severity: WARNING
    languages:
      - go

The rule checks if you have http.Cookie without HttpOnly: true. However, the fix section only kicks in if the code has HttpOnly: false. If Go, the default value for both Secure and HttpOnly is false so the absence of this attribute is still a vulnerability

We can take a brute force approach and always add HttpOnly: true. This will create a compiler error if the property already exists. So, I am gonna create a new rule to only catch if there is no mention of HttpOnly.

# go-httponly/go-httponly-1.yaml
rules:
  - id: cookie-missing-httponly-1
    severity: WARNING
    languages:
      - go
    patterns:
      - pattern-not-inside: |
          http.Cookie{
            ...,
            HttpOnly: ...,
            ...,
          }          
      - pattern: |
          http.Cookie{
            ...,
          }          
    message: Match found
    fix-regex:
      regex: (?s)(\s+)(.*)
      replacement: |
        \1\2
        \1    HttpOnly: true,        
      count: 1

Playground link https://semgrep.dev/s/parsiya:go-cookie-missing-httponly-1.

Fix with the new rule Fix with the new rule

As an exercise try and figure out how the fix works. It assumes code is formatted by gofmt (see the space before HttpOnly). Rob Pike will come and flip your work desk if you don't run it anyways.

This has gone long enough, these blogs take up a lot of time. Especially, with playground links and local files for each exercise. I am just gonna do another one and finish.

We want to add a comment before the vuln location. Things like remediation messages or annotations are good candidates. I am gonna use the previous example (Go) but, most programming languages that do not rely on whitespace and use // should be similar.

# go-comment/go-comment
rules:
  - id: cookie-missing-httponly-comment
    severity: WARNING
    languages:
      - go
    patterns:
      - pattern-not-inside: |
          http.Cookie{
            ...,
            HttpOnly: ...,
            ...,
          }          
      - pattern: |
          http.Cookie{
            ...,
          }          
    message: Match found
    fix-regex:
      regex: (?s)(\s+)(.*)
      replacement: |
        \1// Match found by cookie-missing-httponly-comment.
        \1// HttpOnly must be set to true here.
        \1\2        
      count: 1

Playground link https://semgrep.dev/s/parsiya:go-comment.

Comment added to the code Comment added to the code

We learned how to use Semgrep's fix and fix-regex with several "real world" examples. Go through the [semgrep-rules] repo on GitHub and see if you can add fixes for new rules or optimize/correct existing ones.


文章来源: https://parsiya.net/blog/2021-10-25-a-hands-on-intro-to-semgreps-autofix/
如有侵权请联系:admin#unsafe.sh