(Collaborative post by hebi and Gynvael Coldwind)
Crow is an asynchronous C++ HTTP/WebSocket framework for creating "flask-like" web services. In early August we discovered a pretty interesting use-after-free vulnerability. Since Crow takes advantage of the Asio library for asynchronous input/output operations, analysis of this vulnerability took a few long evenings since the cause was split between multiple interweaved tasks and callbacks. Eventually we traced the root cause to an interesting mismatch between two layers of code, one of which - the HTTP parser - was supporting HTTP pipelining (or rather was agnostic towards it, which resulted in pipelining being inadvertently supported), while the other - HTTP server logic - was not designed to take HTTP pipelining into account. This resulted in some interesting "race conditions" with one task "thinking" an HTTP connection was over (and deleting objects) while another still using them while processing a separate HTTP request.
One thing to note is that we never proved exploitability (as in: actual RCE, since it's pretty easy to just trigger this vulnerability to cause a DoS) due to ENOTIME, though we believe it should be possible, if highly complex.
The vulnerability in question was reported mid-August and fixed within 6 days.
Human readable details are in the next section.
*** Summary:
Affected: Crow version 1.0+4 and older
https://github.com/CrowCpp/Crow - maintained version (fork)
https://github.com/ipkn/crow - original version
Discovered by:
hebi
Gynvael Coldwind
Method of discovery:
Fuzzing, Manual Analysis (reading the code)
"Crow is a C++ framework for creating HTTP or Websocket web services. It uses
routing similar to Python's Flask which makes it easy to use. It is also
extremely fast, beating multiple existing C++ frameworks as well as non C++
frameworks." (source: project's README.md)
Crow version 1.0+4 and older are vulnerable to a use-after-free condition due to
incorrect Connection object lifetime management (though partial HTTP pipelining
support is the deeper underlying problem). On successful exploitation this can
potentially lead to either Remote Code Execution or Information Disclosure (a
failed exploitation will lead to a Denial of Service as the server will crash).
See sections below for details.
IMPORTANT: This vulnerability is reported under the 90-day policy
(version 2021), i.e. this report will be shared publicly with the defensive
community on 16th November 2022 if a patch/fix is not available by that time, or
30 days after the fix becomes available. For details please see:
https://googleprojectzero.blogspot.com/2021/04/policy-and-disclosure-2021-edition.html
*** Crow execution model and life of a request
Note: If you understand Crow's execution model you can safely skip this section.
Note: The description below is a bit simplified and contains only the
information relevant to the vulnerability.
Crow is using an asynchronous queued task execution model (implemented using the
Asio C++ library - https://think-async.com/Asio/), which is not unlike typical
JavaScript model.
Technically there are actually N+1 task queues (where N denotes the number of
threads chosen by the application developer) - one per each worker thread and
an extra "main" queue for some timers, signals, etc.
When a new client is connected, the connection (represented by a Connection
object) is bound to a certain thread and its task queue. This means that all
processing done by that connection is going to happen within the constrains of
the same thread.
In case the server handles more simultaneous connections than there are threads,
each thread will handle tasks related to multiple connections. On the flip side,
if there are very few connections, then basically a connection has a task queue
(and thread) for itself.
After the connection is established (and a Connection object made) a task
tasked with receiving some data is added to the task queue. Once it completes,
the completion callback is called (that's the lambda function inside the
Connection::do_read() method in http_connection.h).
When data is received this callback passes the data to the parser (call to
parser_.feed() method; the implementation is split between parser.h and
http_parser_merged.h).
The parser (still operating within the same task) can then invoke a set of
callbacks on various parsing events like method received, URL received,
header name / value received, and so on. While all of these callbacks
are technically processed within parser.h, some of them invoke methods of the
Connection object.
Sooner (error) or later (no error) one of these called methods will end up with
a prepared ready-to-send response (e.g. a 404 response on error or 200 response
with some app-generated data on success). In such case the response will either
be immediately (synchronously) sent out or a new task will be added to the task
queue to send the response out later and call a completion callback (see the
lambda function inside the Connection::do_write()'s method).
Once the parser returns, it might either require more data (in which case a new
task to receive more data will be added to the task queue) or decide it's time
to close the connection and finish the task.
Next, if any data was asynchronously being scheduled to be sent out, a
completion callback (do_write()'s lambda) might be called. The completion
callback clears the response buffers and might close the connection as well if
needed.
The last task related to the given connection will delete the Connection object.
To summarize, the list of tasks executed while processing a connection might
look like this:
async_accept →
async_accept's completion cb (creates Connection object) →
async_receive (receive HTTP request) →
async_receive's completion cb (invokes parser/schedules sending) →
async_write (send HTTP response) →
aync_write's completion cb (deletes Connection object if needed).
There might be more async_receive related tasks if data is trickling in slowly
(multiple receives per HTTP request), and multiple async_write tasks if there
are multiple responses to be sent out (Connection: keep-alive).
As mentioned, the last task is responsible for deleting the Connection object.
To control and decide when that should happen a set of two flags is used:
Connection::is_reading
Connection::is_writing
If any receiving task is scheduled, is_reading flag is set.
If any sending task is scheduled, is_writing flag is set.
Once the tasks are complete and the connection is being closed, respective flags
are cleared.
In various places a method called Connection::check_destroy() is called, which
checks the status of both these flags. If neither is set (i.e. both are
cleared), then the Connection object is deemed to be no longed needed and is
deleted.
*** High-level root cause
In essence, the vulnerability in this report boils down to incomplete HTTP
pipelining support.
When using HTTP pipelining the client sends multiple HTTP requests as part of
the same connection one after another without waiting to receive HTTP responses
for any of the previous requests (in opposition to the "normal" way of doing it,
where only one request is sent, then the client waits for the complete response,
and only then it proceeds with the next request). In theory this allows HTTP
servers to process requests in parallel and then just send replies in order.
Historically though HTTP pipelining proved to be tricky to implement correctly
on both the client and server side. As such none of popular modern browsers have
it enabled by default.
In case of Crow the HTTP parser does support HTTP pipelining, or rather it
continues to invoke callbacks for all HTTP packets that are present in the
received TCP packet (perhaps it's more fair to say it's HTTP pipelining
agnostic).
That being said, the asynchronous Connection layer is unaware of pipelining,
which leads to various logic errors born from more than one HTTP request being
handled at the same time, while being limited to a state tailored to a single
request and response.
The simplest (and benign) way to demonstrate this is to create two endpoints
that return 512KB of "1" and "2", as shown below:
CROW_ROUTE(app, "/one")
([] {
return std::string(1024 * 512, '1');
});
CROW_ROUTE(app, "/two")
([] {
return std::string(1024 * 512, '2');
});
And then request data from both endpoints in the same TCP packet:
GET /one HTTP/1.1
Host: example.com
GET /two HTTP/1.1
Host: example.com
In the response we would expect (in this order):
1. HTTP response header for endpoint /one.
2. 512KB of "1"s.
3. HTTP response header for endpoint /two.
4. 512KB of "2"s.
Here's what we actually receive:
$ cat packet.http - | nc -v 127.0.0.1 18080 | hexdump -C
Connection to 127.0.0.1 18080 port [tcp/*] succeeded!
00000000 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d |HTTP/1.1 200 OK.|
00000010 0a 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a |.Content-Length:|
00000020 20 35 32 34 32 38 38 0d 0a 53 65 72 76 65 72 3a | 524288..Server:|
00000030 20 43 72 6f 77 2f 6d 61 73 74 65 72 0d 0a 44 61 | Crow/master..Da|
00000040 74 65 3a 20 54 75 65 2c 20 31 36 20 41 75 67 20 |te: Tue, 16 Aug |
00000050 32 30 32 32 20 31 39 3a 33 36 3a 34 38 20 47 4d |2022 19:36:48 GM|
00000060 54 0d 0a 43 6f 6e 6e 65 63 74 69 6f 6e 3a 20 4b |T..Connection: K|
00000070 65 65 70 2d 41 6c 69 76 65 0d 0a 0d 0a 31 31 31 |eep-Alive....111|
00000080 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 |1111111111111111|
*
00010000 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d |HTTP/1.1 200 OK.|
00010010 0a 63 6f 6e 6e 65 63 74 69 6f 6e 3a 20 4b 65 65 |.connection: Kee|
00010020 70 2d 41 6c 69 76 65 0d 0a 43 6f 6e 74 65 6e 74 |p-Alive..Content|
00010030 2d 4c 65 6e 67 74 68 3a 20 30 0d 0a 53 65 72 76 |-Length: 0..Serv|
00010040 65 72 3a 20 43 72 6f 77 2f 6d 61 73 74 65 72 0d |er: Crow/master.|
00010050 0a 44 61 74 65 3a 20 54 75 65 2c 20 31 36 20 41 |.Date: Tue, 16 A|
00010060 75 67 20 32 30 32 32 20 31 39 3a 33 36 3a 34 38 |ug 2022 19:36:48|
00010070 20 47 4d 54 0d 0a 43 6f 6e 6e 65 63 74 69 6f 6e | GMT..Connection|
00010080 3a 20 4b 65 65 70 2d 41 6c 69 76 65 0d 0a 31 31 |: Keep-Alive..11|
00010090 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 |1111111111111111|
*
00020080 31 31 31 31 31 31 31 31 31 31 31 31 31 31 0d 0a |11111111111111..|
00020090 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 |1111111111111111|
*
00080100 31 31 31 31 31 31 31 31 31 31 31 31 31 |1111111111111|
0008010d
Note: The * sign above denotes a repeated row.
To present the response in a form of the list, we get:
1. HTTP response header for endpoint /one.
2. 64KB (minus response header size) of "1"s.
3. HTTP response header for endpoint /two with Content-Length set to 0.
4. The remaining amount of "1"s.
Pretty obviously responses to both packets were mixed, both in sending (HTTP
response header of /two injected in the middle of /one's response) and in
processing (Content-Length: 0 in /two's response).
Note: As mentioned before, thankfully modern browsers disabled HTTP pipelining.
Otherwise the above behavior under certain circumstances would allow for
a UXSS (or straight-up JS injection).
To summarize, the vulnerability discussed below exploits the fact that the
Connection layer isn't aware it processes multiple requests before ever
finalizing previous ones.
*** Vulnerability (use-after-free)
As mentioned before, if both the is_reading and is_writing flags are cleared,
then the next call to Connection::check_destroy() will delete the Connection
object. In case there would still be another task related to the Connection
scheduled to be executed, this would lead to a use-after-free vulnerability (and
potentially also a double-free in case Connection::check_destroy() is invoked
again).
One of the discovered ways to achieve this is to send two HTTP requests in one
TCP packet:
1. The first requests any dynamic endpoint whose response is smaller than
1MB (required to fall into a specific code path in
Connection::do_write_general()).
2. The second is both an invalid HTTP request (e.g. missing Host header) and
is referring to a non-existing endpoint (to get a 404 error response).
For example:
GET /endpoint HTTP/1.1
Host: 127.0.0.1
GET /idontexist HTTP/1.1
The execution for the above example will flow as follows:
async_accept →
async_accept's completion cb →
async_receive (receive HTTP request) →
- The is_reading flag is set.
async_receive's completion cb (lambda in do_read())
- The method parser_::feed() is invoked with the received data.
- ...
- Parser calls on_url callback, which in turn calls
Connection::handle_url(), which establishes a route for /endpoint.
- ...
- Parser calls on_message_complete callback, which in turn invokes the
endpoint's handler.
- After the endpoint's handler finishes, the data is scheduled to be
sent (i.e. a new "write" task is added to the task queue). This also
causes the is_writing flag to be set.
- ... parser continues with the second packet ...
- Parser calls on_url callback, which in turn calls
Connection::handle_url(), which establishes that there is no route
specified for /idontexist.
- This in turn causes a 404 packet to be scheduled to be sent to the
client (i.e. a second "write" task is added to the task queue). This
also causes the is_writing to be set (yes, it was already set).
- The Parser notices there are no headers, which is invalid for HTTP/1.1
as at least Host: is required, and exists with an error.
- Given Parser's error, the is_reading is cleared, the socket is closed,
and this tasks finishes (note: Connection object is still alive, since
is_writing is still set). →
async_write (try to send first HTTP response) →
aync_write's completion cb (lambda in do_write())
- The is_writing flag is cleared.
- Since the socket was closed (and therefore sending failed),
Connection::check_destroy() is called.
- Since both is_reading and is_writing flags are cleared, the
Connection object is deleted.
- The task finishes. →
async_write (try to send 404 HTTP response) →
async_write's completion cb (lambda in do_write())
- Attempts to access the deleted Connection object, which by this
point is already freed. Application crashes.
In case of the example packet Crow will likely crash at that point, though it
can potentially already crash when trying to execute the second scheduled
async_write (since the buffers with the response data are already deleted).
While we haven't attempted to exploit this vulnerability beyond triggering it,
we believe successful use-after-free exploitation cannot be ruled out at this
time. Exploitation would likely involve either using two Connections in the
same thread (i.e. using the same task queue) with carefully chosen order of
data transmission. Likely winning a race condition might be requires as well.
Successful exploitation of this use-after-free vulnerability would lead to
Remote Code Execution, i.e. executing arbitrary code chosen by the attacker in
the context of the Crow app's process.
*** Proposed fix:
Given that HTTP pipelining support is not needed (browsers have it blocked,
there is HTTP/2 which supports multiplexing), the best solution is to block
HTTP pipelining altogether.
This would likely require adding a mechanism to make sure that any evidence of
multiple HTTP requests within a single parser_.feed() call (as indicated by
the on_... callbacks being invoked) will result in the connection being dropped
after the first reply, and no extra HTTP request will be processed.
*** Other worrying patterns
As mentioned before, the Connection::check_destroy() function potentially
deletes the Connection object. Given this, if the object is deleted, no other
operations can be performed on it (i.e. any other operation is technically a
use-after-free).
That said, there are 3 additional places where check_destroy() is called, but
operations are still made on the object regardless.
It must be said that we were not able to find a way to actually trigger these
code paths, but these do look like accidents bound to happen some day, so
perhaps some refactoring in the area might be in order.
Place 1: End of do_write_static() function
if (close_connection_)
{
adaptor_.shutdown_readwrite();
adaptor_.close();
CROW_LOG_DEBUG << this << " from write (static)";
check_destroy();
}
res.end(); // Connection is potentially already deleted here.
res.clear();
buffers_.clear();
}
Place 2: End of do_write_general() for large (over 1MB) responses
if (close_connection_)
{
adaptor_.shutdown_readwrite();
adaptor_.close();
CROW_LOG_DEBUG << this << " from write (res_stream)";
check_destroy();
}
res.end(); // Connection is potentially already deleted here.
res.clear();
buffers_.clear();
}
}
Place 3: do_write_sync()'s completion lambda (executed before function returns)
Note: In this case the issue lies in functions calling do_write_sync() not being
aware that the Connection object might be potentially deleted in this
method.
else
{
CROW_LOG_ERROR << ec << " - happened while sending buffers";
CROW_LOG_DEBUG << this << " from write (sync)(2)";
check_destroy();
return true;
}
});
Perhaps making check_destroy() return a boolean value indicating whether the
object was or wasn't deleted might be a solution. That way the calling functions
would know whether to immediately exit, or continue execution.