Semgrep has a new and more readable rule syntax. I am converting my own reference to a tutorial.
Disclaimer: Semgrep (binary, playground, cloud, etc.) supports the new syntax, but it's not released. If you're from the future and things have changed, let me know somehow. E.g., make an issue in the blog's source at parsiya/parsiya.net or create a pull request.
Use these tables:
Old | New |
---|---|
patterns (top-level) | match and all |
patterns (other) | all |
pattern | [can be removed] |
pattern-not | - not |
pattern-either | any |
pattern-inside | inside |
pattern-not-inside | inside under not |
These items go inside a where
clause:
Old | New |
---|---|
metavariable-pattern | metavariable and pattern |
metavariable-regex | metavariable and regex |
metavariable-comparison | metavariable and comparison |
metavariable-analysis | metavariable and analyzer |
focus-metavariable | focus |
Taint mode changes
Old | New |
---|---|
mode:taint | removed |
match (taint mode) | taint |
pattern-sources | sources |
pattern-sinks | sinks |
pattern-propagators | propagators |
pattern-sanitizers | sanitizers |
I've only been able to find two references so far:
Modified version of the first example in the Advanced Rule Tutorials, practice playground link.
rules:
- id: blog-2023-10-use-decimalfield-for-money-old
patterns:
# I know this `patterns` can be replaced by one `pattern`
# but it's modified for the tutorial.
- patterns:
- pattern: $F = django.db.models.FloatField(...)
- pattern: $F = django.db.models.FloatField(...)
- pattern-inside: |
class $M(...):
...
- metavariable-regex:
metavariable: '$F'
regex: '.*(price|fee|salary).*'
message: _removed_
languages: [python]
severity: ERROR
The top-level pattern
or patterns
becomes match
. It's almost always
followed by all
or any
.
rules:
- id: use-decimalfield-for-money-new-syntax
# top-level patterns replaced by match and all.
match:
# the rest of the patterns
# # I know this `patterns` can be replaced by one `pattern`
# # but it's modified for the tutorial.
# - patterns:
# - pattern: $F = django.db.models.FloatField(...)
# - pattern: $F = django.db.models.FloatField(...)
# - pattern-inside: |
# class $M(...):
# ...
# - metavariable-regex:
# metavariable: '$F'
# regex: '.*(price|fee|salary).*'
message: _removed_
languages: [python]
severity: ERROR
Other patterns
keys that are a subset of the top-level one are replaced by
all
. Our example has a redundant patterns
with two identical children to
show how it will be modified.
Note that if we had a pattern-either
here we would use any
.
rules:
- id: use-decimalfield-for-money-new-syntax
# top-level patterns replaced by match and all.
match:
all:
# rest of the patterns
- pattern: $F = django.db.models.FloatField(...)
- pattern: $F = django.db.models.FloatField(...)
# - pattern-inside: |
# class $M(...):
# ...
# - metavariable-regex:
# metavariable: '$F'
# regex: '.*(price|fee|salary).*'
message: _removed_
languages: [python]
severity: ERROR
The pattern
keyword can be omitted. Replace pattern: [something]
with just
- [something]
.
- pattern: [something] ---> - [something]
- pattern: | ---> - |
[something] [something]
[more lines] [more lines]
More changes:
rules:
- id: use-decimalfield-for-money-new-syntax
# top-level patterns replaced by match and all.
match:
all:
# the rest of the patterns
# I know this `patterns` can be replaced by one `pattern`
# but it's modified for the tutorial.
- $F = django.db.models.FloatField(...)
- |
$F = django.db.models.FloatField(...)
# - pattern-inside: |
# class $M(...):
# ...
# - metavariable-regex:
# metavariable: '$F'
# regex: '.*(price|fee|salary).*'
message: _removed_
languages: [python]
severity: ERROR
There's one catch, if your pattern contains :
it might mess with the yaml
format. Either use a bar to send it to the next line or enclose it in "
,
explanation at 1:26 in the reference video.
We don't have it in our current example, but it's similar to pattern
.
- pattern-not: [something] ---> - not: [something]
- pattern-not: | ---> - not: |
[something] [something]
[more lines] [more lines]
Easy, peasy.
rules:
- id: use-decimalfield-for-money-new-syntax
# top-level patterns replaced by match and all.
match:
all:
# the rest of the patterns
# I know this `patterns` can be replaced by one `pattern`
# but it's modified for the tutorial.
- $F = django.db.models.FloatField(...)
- |
$F = django.db.models.FloatField(...)
# pattern-inside
- inside: |
class $M(...):
...
# - metavariable-regex:
# metavariable: '$F'
# regex: '.*(price|fee|salary).*'
message: _removed_
languages: [python]
severity: ERROR
Acts as a container for some elements that add conditions to metavariables. We
will use metavariable-regex
as an example:
where
clause in the same level as all
metavariable-regex
is also replaced with metavariable
and regex
.rules:
- id: use-decimalfield-for-money-new-syntax
# top-level patterns replaced by match and all.
match:
all:
# I know this `patterns` can be replaced by one `pattern`
# but it's modified for the tutorial.
- $F = django.db.models.FloatField(...)
- |
$F = django.db.models.FloatField(...)
# pattern-inside
- inside: |
class $M(...):
...
where:
# metavariable-regex
- metavariable: $F
regex: '.*(price|fee|salary).*'
message: _removed_
languages: [python]
severity: ERROR
See the final rule in the playground.
Other elements that appear under where
have also been modified:
metavariable-pattern
metavariable-analysis
metavariable-comparison
focus-metavariable
We can use them like this:
rules:
- id: sample-rule
match:
all:
# removed
where:
# metavariable-regex
- metavariable: $F
regex: '.*(price|fee|salary).*'
# metavariable-analysis
- metavariable: $F
analyzer: redos
# focus-metavariable becomes `focus`
- focus: $F
message: _removed_
languages: [python]
severity: ERROR
metavariable-pattern
is tricky because it can contain multiple patterns, but
it's similar to the patterns we've seen before.
where:
# metavariable-pattern
- metavariable: $F
pattern: "some pattern"
# if it had multiple patterns
- metavariable: $F
all:
- "pattern1"
- "pattern2"
This one is a C++ Hotspot rule that tracks when arrays are passed to functions. The complete rule is on GitHub and has a handy triage guide.
I will be using a partial version of the rule, playground link.
rules:
- id: arrays-passed-to-functions-partial
patterns:
# a lot of ways to create an array
- pattern-either:
- pattern-inside: |
$TYPE $BUF[$SIZE] = $EXPR;
...
- pattern-inside: |
$TYPE $BUF[$SIZE];
...
# we don't want to flag these usages again
- pattern-not-inside: free($BUF);
- pattern-not-inside: delete($BUF);
# exclude uppercase variables, these are usually constants
- metavariable-regex:
metavariable: $BUF
regex: (?![A-Z0-9_]+\b)
# flag if it's passed to a function
- pattern: $FUNC(..., $BUF, ...);
message: _removed_
languages:
- cpp
severity: WARNING
The only new item here is pattern-not-inside
.
rules:
- id: arrays-passed-to-functions-partial
match:
# removed everything else
- pattern-not-inside: free($BUF);
- pattern-not-inside: delete($BUF);
First, we create a not
and then add an inside
under it. Also note how the
inside
is indented unlike - not: [pattern]
(from pattern-not
).
rules:
- id: arrays-passed-to-functions-partial
match:
# removed everything else
- not:
inside: free($BUF);
- not:
inside: delete($BUF);
I thought I could merge the two not
s. You cannot. It's a map and if you add
two inside
, you will get an error that keys must be unique.
any
will act as OR.
- pattern-either:
- pattern-inside: |
$TYPE $BUF[$SIZE] = $EXPR;
...
- pattern-inside: |
$TYPE $BUF[$SIZE];
...
becomes:
- any:
- inside: |
$TYPE $BUF[$SIZE] = $EXPR;
...
- inside: |
$TYPE $BUF[$SIZE];
...
The rest is routine:
patterns
-> match
.pattern-either
-> any
.pattern-not-inside
-> not
and inside
.metavariable-regex
-> metavariable
and regex
.pattern
(the word) is just removed.rules:
- id: arrays-passed-to-functions-partial
match:
# a lot of ways to create an array
- any:
- inside: |
$TYPE $BUF[$SIZE] = $EXPR;
...
- inside: |
$TYPE $BUF[$SIZE];
...
# we don't want to flag these usages again
- pattern-not-inside: free($BUF);
- pattern-not-inside: delete($BUF);
# exclude uppercase variables, these are usually constants
- metavariable-regex:
metavariable: $BUF
regex: (?![A-Z0-9_]+\b)
# flag if it's passed to a function
- pattern: $FUNC(..., $BUF, ...);
message: _removed_
languages:
- cpp
severity: WARNING
The original rule and the one we created do not have the same matches. The original rule has three matches, playground link.
old rule
The modified rule only returns one match, playground link.
new rule
The reason is that constant propagation is on by default in the new syntax (at least for now). Credit: Cooper Pierce, Semgrep.
We can get the same result by adding an options key and get the same matches, playground link.
rules:
- id: blah-blah
options:
constant_propagation: false
match:
all:
# the rest of the rule
In the last example we will look at a complex metavariable-pattern
rule from
Semgrep examples, playground link for practice.
rules:
- id: blog-2023-10-open-redirect-old
languages:
- python
message: Match found
severity: WARNING
patterns:
- pattern-inside: |
def $FUNC(...):
...
return django.http.HttpResponseRedirect(..., $DATA, ...)
- metavariable-pattern:
metavariable: $DATA
patterns:
# patterns
Converting the outside patterns is easy.
rules:
- id: blog-2023-10-open-redirect-new
languages:
- python
message: Match found
match:
all:
- inside: |
def $FUNC(...):
...
return django.http.HttpResponseRedirect(..., $DATA, ...)
where:
- metavariable: $DATA
patterns:
# patterns
Now we do the same process for the inner patterns and replace patterns
with
all
(we don't need a match
). Things can get complicated quickly. We have
three nested where
clauses. One for the top metavariable-pattern
, another
for the 2nd one, and the last one is for metavariable-regex
.
The result is in this playground link.
rules:
- id: blog-2023-10-open-redirect-new
languages:
- python
message: Match found
severity: WARNING
match:
all:
- inside: |
def $FUNC(...):
...
return django.http.HttpResponseRedirect(..., $DATA, ...)
where:
- metavariable: $DATA
all:
- any:
- $REQUEST
- $STR.format(..., $REQUEST, ...)
- $STR % $REQUEST
- $STR + $REQUEST
- f"...{$REQUEST}..."
where:
- metavariable: $REQUEST
all:
- any:
- request.$W
- request.$W.get(...)
- request.$W(...)
- request.$W[...]
where:
- metavariable: $W
regex: (?!get_full_path)
We learned to convert rules from the old Semgrep syntax to the new one. IMO, the new syntax is more readable. There are some inconsistencies like the constant propagation section (and probably more), but not a big issue.