This is a four part blog post series that starts with Rustproofing Linux (Part 1/4 Leaking Addresses).
Shared memory is often used to share data without the performance hit of copying. Whenever a shared resource is consumed by one component while being modified by another component, there is potential for Time-Of-Check-Time-Of-Use (TOCTOU) or Double Fetch vulnerabilities. In these examples we focus on the case where double fetching occurs in the kernel and the software changing that data is in userspace, making this an avenue for user-to-kernel privilege escalation. However, note that this same type of vulnerability could exist when accessing memory that is shared between a device driver and a peripheral, two userspace processes, hypervisor and kernel, etc.
As a side note, we would like to mention that double fetch vulnerabilities can also arise due to compiler introduced problems.
Our vulnerable example is a bit contrived for the sake of brevity, but it should illustrate a common buggy pattern of shared memory usage:
static int vuln_open(struct inode *ino, struct file *filp) { struct file_state *state; state = kzalloc(sizeof(*state), GFP_KERNEL); if (!state) return -ENOMEM; state->page = alloc_pages(GFP_KERNEL | __GFP_ZERO, 0);
A memory page is allocated
static int vuln_mmap(struct file *filp, struct vm_area_struct *vma) { struct file_state *state = filp->private_data; int ret = 0; ret = vm_map_pages_zero(vma, &state->page, 1); return ret; }
The page is mapped into userspace
static long vuln_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct file_state *state = filp->private_data; volatile u32 *sh_buf = page_to_virt(state->page); u8 tmp_buf[32]; switch (cmd) { case VULN_PROCESS_BUF: if (sh_buf[0] <= sizeof(tmp_buf)) { memcpy(tmp_buf, (void *)&sh_buf[1], sh_buf[0] );
Data is read from shared memory
The vulnerability is in reading sh_buf[0]
twice. If
memory contents change between the reads, this could lead to a buffer
overflow of tmp_buf
.
A PoC
was created to change sh_buf[0]
value between the two
fetches by repeatedly changing the memory contents in one process while
calling vuln_ioctl
in the other:
volatile u32 *buf = mmap(NULL, LEN, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (buf == MAP_FAILED) { perror("mmap"); return -1; } int child = fork() == 0; cpu_set_t set; CPU_ZERO(&set); CPU_SET(child, &set); if (sched_setaffinity(getpid(), sizeof(set), &set) < 0) { perror("sched_setaffinity error"); return -1; } if (child) { while (1) { buf[0] = 32; buf[0] = 128; } } else { while (1) { ioctl(fd, VULN_PROCESS_BUF, 0); } }
One process changing memory contents, the other calling
VULN_PROCESS_BUF
When this PoC is executed, KASAN
reports the
vulnerability as a 128 byte out of bounds write.
The code we ported
to Rust looks similar, but is guided by mm::virt::Area
and pages::Pages
abstractions. This starts with the
mmap
implementation:
fn mmap(state: &Self, _file: &File, vma: &mut mm::virt::Area) -> Result { vma.insert_page(vma.start(), &state.mutable.lock().page)?; Ok(()) }
mmap() callback implementation in Rust
The mmap
method we implement for
file::Operations
has an vma: mm::virt::Area
argument. While this struct only has one
member, a pointer to C’s struct vm_area_struct
, it is
private, so we need to use the only available method to create a
mapping, insert_page()
.
insert_page()
requires a pages::Pages<0>
argument, and similarly we don’t get access to the underlying
struct page
and are limited to provided methods to access
the memory contents:
fn ioctl(state: &Self, _file: &File, cmd: &mut IoctlCommand) -> Result<i32> { let (cmd, _arg) = cmd.raw(); match cmd { VULN_PROCESS_BUF => { let mut tmp_buf = Box::try_new([0u8; 32])?; // on heap let page = &state.mutable.lock().page; let mut size = 0u32; unsafe { page.read(&mut size as *mut u32 as _, 0, 4)? }; if size as usize <= core::mem::size_of_val(&tmp_buf) { unsafe { page.read(&mut size as *mut u32 as _, 0, 4)? }; unsafe { page.read(tmp_buf.as_mut_ptr(), 4, size as usize)? }; if tmp_buf[0] == 'A' as u8 { return Ok(0); } }
ioctl() callback using
Pages<0>::read()
to read memory
Let’s compare the above marked lines to the same C-based PoC, where
the first word of the shared buffer is accessed simply as
sh_buf[0]
. Since these two highlighted lines are identical,
and don’t really have a purpose except to intentionally introduce a
TOCTOU vulnerability, we believe it would be very unusual for a
developer to do this. Thus, it seems unlikely for such TOCTOU
vulnerabilities to be naively ported from C to Rust.
In the above port, the abstractions were preventing us from
dereferencing a memory pointer like we did in C. Since Rust is a low-level
language we should be able to bypass the Pages
struct
abstraction and directly use C’s struct page
it contains.
In our experiment we created our own copy of Pages
,
ExposedPages
, and we used core::mem::transmute
to basically cast Pages
into our new type.
VULN_PROCESS_BUF => { let mut tmp_buf = Box::try_new([0u8; 32])?; // on heap let page = &state.mutable.lock().page; // page.pages is private, page.kmap() is private, tricks required let page: &ExposedPages = unsafe { core::mem::transmute(page) }; let sh_buf: *mut u32 = unsafe { bindings::kmap(page.pages) } as _; // XXX assembly shows this will be only one access to *sh_buf if unsafe { *sh_buf } as usize <= tmp_buf.len() { unsafe { core::ptr::copy(sh_buf.offset(1) as *mut u8, tmp_buf.as_mut_ptr(), *sh_buf as _) }; if tmp_buf[0] == 'A' as u8 { return Ok(0); } }
Dereferencing a raw pointer to access shared memory
This PoC is closer to the C-language version (sh_buf[0]
in C code could also be written as *sh_buf
, so that part
could be identical), but since we can’t just mark the pointer as
volatile
, the compiler optimises out the second
*sh_buf
. For those interested, a full
example is provided.
While Rust has no
volatile
keyword, it does offer a way to dereference
pointers the same way with core::ptr::read_volatile()
and
core::ptr::write_volatile()
.
Our next
variation uses read_volatile
instead of pointer
dereference:
if unsafe { read_volatile(sh_buf) } as usize <= tmp_buf.len() { unsafe { copy(sh_buf.offset(1) as *mut u8, tmp_buf.as_mut_ptr(), read_volatile(sh_buf) as _) };
Using core::ptr::read_volatile
This does trigger the TOCTOU vulnerability, and one could find it
plausible for a developer to use read_volatile(sh_buf)
twice instead of declaring a temporary variable.
We have also explored accessing raw contents of
mm::virt::Area
instead of pages::Pages
, but the
source code then becomes even more like C, and uses more C
bindings.
The ways we have tried to access shared memory in a vulnerable way
all felt a bit forced or contrived, and did not feel like idiomatic
Rust. Rust abstractions require us to read memory in a way that makes a
double fetch more obvious. While the abstractions can be bypassed, even
a cursory code inspection should pick up the unsafe
block
with transmute
and later also a read_volatile
,
making sure that the code would be harshly reviewed, and maybe even
removed.
To conclude this four part blog series (one, two, three, four) we note that Rust brings some very nice features to the table. Writing Linux device drivers in Rust will almost certainly improve the kernel’s overall security posture.
However, the security improvements in the Rust language are not free or completely automatic. Porting C code to Rust is a non-trivial matter that has its own set of unique pitfalls. We believe that Rust is a tool which still requires considerable expertise of its master to avoid shooting themself in the foot. As we’ve shown, naïve ports from C to Rust may still exhibit vulnerabilities.
While it is easy to spot the unsafe
keyword when
auditing Rust code, thoroughly inspecting and documenting it requires a
deeper understanding of Rust and the driver code. Even with all
unsafe
blocks removed (or proven to be memory safe) there’s
still potential for other vulnerabilities, although those will probably
be less severe, since by design they should not be related to memory
safety.
In particular, we wish to highlight the MutexGuard
usage
caveat that we discussed in post #2 – while the
automatic unlock at the guard variable’s end of life is very nice, one
should be aware of patterns like the demonstrated .lock()
method chaining, where we produced a race condition because a mutex was
unlocked between two guarded variable accesses.
From our experimentation, integer overflows as well as shared memory accesses seem to be less likely causes of vulnerabilities, since the programmer needs to go out of their way to introduce a bug.
Finally, leaking kernel addresses seems to be as easy as always. While the benefits of KASLR are questioned by some already, the bypasses probably won’t go away either.
We hope the future is less buggy and software more secure. As Rust gets used more in the Linux kernel, we predict that the security research community will start to discover new manifestations of traditional driver vulnerabilities. Collectively, we probably need more time to discover these new vulnerability patterns, and better tools are likely needed to automatically detect and eliminate them.
Thanks to Miguel Ojeda, Alex Gaynor, Gary Guo and other Rust for Linux maintainers for valuable insights.
Special thanks to Jeremy Boone for all his help and suggestions.