Garmin Forerunner 235

One upside to living in a cyberpunk-adjacent fever dream is the multitude of (relatively) inexpensive supercomputers you can strap to your body. I recently bought a watch equipped with an array of sensors (and supporting microcontrollers) to record hikes, runs, and rides. The device, a Garmin Forerunner 235, is far from the most advanced piece of technology you can buy to perform these tasks but, so far, has performed well. My partner also has a Garmin watch and rushed to show me all of the customization options available via Garmin’s ConnectIQ Store and App. That's how this all started.

Here at Atredis Partners, I spend a good chunk of time under the delusion I'm a modern Sherlock Holmes. From the outside, I'm just a middle aged person lacking sleep and, evidently, the wherewithal to shave regularly. But, in my head, I'm hot on the trail of some computational mystery. Each engagement is a frantic sprint from layman to myopic expert. To give our customers the best assessment of their technology, we have to optimize where our time is spent. We need to understand the system design in order to evaluate tradeoffs between attack surface, impact, and complexity. The sooner we understand a technology, the more hours we have to allocate and arrange, Jenga-like, into a plan of attack. The more complete our understanding, the more accurate and complete our determination of impact and severity. When you spend your life understanding the most important (for some definition) parts of hundreds of devices in three week bursts of effort, every device looks like a new mystery to be solved.

Some people would spend their time away from these mental sprints actually hiking, running, or biking with their cool new watch (and I do that, sometimes). I, instead, needed to understand how this wrist-based computational cluster worked. To be precise, this project was driven by my curiosity, not by nascent privacy concern. This project wasn't an effort to point out all the security bugs or persuade you that balaclava-clad shadows are tracking your every movement. I make no privacy judgement either way -- you'll have to judge your own risk tolerance. Finally, I've enjoyed my Garmin watch and the company was easy to work with while reporting issues. This isn't an indictment of their products.

TL;DR: I'm a nerd. I bought an exercise watch and promptly stopped exercising to tear it apart.

Information Gathering

ConnectIQ

All of this started with a casual mention that Garmin provides a third party app store, ConnectIQ (abbreviated as CIQ), for Garmin devices. CIQ consists of an app store (https://apps.garmin.com/en-US/), a smart phone app to install CIQ Apps on your Garmin device, and a free software development kit (SDK) for developing CIQ Apps (https://developer.garmin.com/connect-iq/overview/). With my deerstalker on and a pipe firmly between my teeth, the link to the ConnectIQ SDK was the first note I took in my gumshoe notebook. As far as attack surface goes (even in our broader lens of overall system understanding), being able to run code on the device is hard to beat when it comes to tools for understanding a system.

Firmware

The firmware was the next clue, and this one was a gamble. Devices often have encrypted or "encrypted" (encoded with the intent to obfuscate) firmware. In this case, a quick web search turned up a community repository of installable firmware updates for Garmin devices (repository is currently down). I jotted down the Forerunner 235 firmware in my metaphorical steno pad.

Hardware

The firmware runs on some set of programmable devices within the watch. Without knowing which microcontrollers are included in the watch design, reversing is more difficult. Knowing the architecture and memory map of the system-on-chips (SoCs) used will provide more clues towards understanding how the firmware is loaded and executed. Having a bill-of-materials or some approximation of such is not a strict necessity, but provides a good reference going forward while taking apart the firmware. Another web search turned up a teardown for a similar device and the FCC images provided additional clues. These were also recorded in the notepad, providing another category of data to draw from.

The Screaming Hoards

Lastly and reluctantly, it’s time to check if anyone has stolen our fun. Has someone already written up their efforts at understanding a Garmin device? Some searching produced a very nice write up of a TomTom watch and a handful of file format reverse engineering. This is a considerably good outcome -- our fun hadn’t been cut off but we do have a head start on some of the artifacts we'll need to analyze. I took note of links to the firmware update format (RGN) and a GitHub repository related to the CIQ application format (PRG).

Our Investigative Notes So Far

Device Hardware

Datasheets (based on the 735XT teardown -- not sure about 235)

Development Kits

Device Firmware

Host Tools

Similar Stuff

Moving on to Monkey C

The Game is Afoot

With our initial flurry of web searches done, it was time to start somewhere. As I mentioned above, the ability to run your own code on the watch seems like a great place to start. Heading to the Garmin ConnectIQ site and reading more about the developer tools revealed that CIQ Apps are developed in a custom language called Monkey C. A custom language is surprising enough to require some follow-up research before diving into the actual SDK provided. The question at hand was: why did Garmin decide on a custom language?

Before unraveling that question, it’s important to take a glance at the language. As you can see below, the language appears to be made of JavaScript and Java.

using Toybox.WatchUi;
using Toybox.Graphics;
using Toybox.System;
using Toybox.Lang;

class AtrediFaceView extends WatchUi.WatchFace {
...
    function onExitSleep() {
        System.println("onExitSleep");
        foo();
    }

    function foo() {
        var x = 0xf00d;
        System.println("0xf00d + 1 = " + (x + 1).toString());
    }
}

Using the SDK provided by Garmin, it is possible to compile and run this code:

(venv) ➜  AtrediFace make
monkeyc -o ./bin/AtrediFace.prg \
        -y ../connectiq-sdk-mac-3.1.7-2020-01-23-a3869d977/developer_key \
        -f ./monkey.jungle \
        -d fr235
(venv) ➜  AtrediFace touch /Volumes/GARMIN/GARMIN/APPS/LOGS/AtrediFace.TXT
(venv) ➜  AtrediFace cp bin/AtrediFace.prg /Volumes/GARMIN/GARMIN/APPS/

Notice that it was possible to sideload an App by copying the PRG file onto the watch. When the watch is plugged into the computer, it exposes a file system as a USB Mass Storage device.

Atredis bird on watch face

Once the watch is unplugged, we'll see our beautiful Atredis bird soaring onto the watch face. After the watch face program has executed, debug output can be found on the FAT file system (after, once again, plugging the watch back into the computer).

(venv) ➜  AtrediFace cat /Volumes/GARMIN/GARMIN/APPS/LOGS/AtrediFace.TXT
onExitSleep
0xf00d + 1 = 61454

Now that we're able to code some simple Monkey C, compile it to a PRG file, and run the code on the watch, we can get back to trying to answer the burning question of: But why?

Further reading on the Garmin developer website, forum, and a few web searches provides more background. A Garmin-authored presentation provides the justification for a new language and corresponding virtual machine. The Garmin applications, like Java applications, execute bytecode on a virtual machine. Like Android (and the Infocom Z-machine and Java Card systems before it), the CIQ applications are intended to run on a wide variety of devices. Further, the Garmin devices are limited in resources (computation, memory, and battery) and any runtime/OS environment should be able to restrict each client application's usage of these resources. Finally, the Garmin OS and application execution environment need to be able to enforce access control and isolation -- this includes memory isolation when lacking strong virtual memory subsystem within the OS. A badly behaving CIQ application should not be able to bring down the entire watch (i.e., Garmin wanted to be better than Windows 95).

The reasoning for running these applications in a virtual machine is clear. Garmin decided to develop a full ecosystem of language, compiler, runtime, and virtual machine to support this. That means we get to reverse-engineer all of it! 🎉 The language is documented in the SDK documentation. The compiler is provided in the SDK and can be reversed from that. The language runtime is implemented in firmware with the interface specified in the SDK. The virtual machine is not publicly documented but can be understood based on a combination of the compiler and the firmware.

This last bit, the details around the virtual machine, is most interesting to me. Using this mapping between concept and implementation, we'll attempt to answer the following questions by reverse engineering the compiler and firmware:

  1. What does the virtual machine executable image look like?

  2. Can the virtual applications mix native code with bytecode?

  3. What is the architecture of the virtual machine?

  4. How does the virtual machine interface with native code for the SDK?

Compiler

The downloadable SDK is mostly Java class files. It decompiles extremely well. The monkeybrains package includes a number of interesting tools but we focus on the compiler and assembler that work together to produce a PRG file. Pulling these apart provides a decent view of the PRG file structure. The high-level structure encapsulates a number of sections enveloped as type-length-value (TLV) structures. These sections include debugging metadata, bytecode, data, resources (e.g., strings for translations, bitmaps), and linking information for the runtime. There is an existing open source project, ciqdb, to parse much of this file format (although it does not handle the sections with the bytecode or the embedded resources yet).

Within the asm package, the Opcode class contains constants with mnemonic names for 55 different opcodes. Now, we have (mostly) familiar looking mnemonics that we can map to opcodes. Further reversing of the decompiled asm package leads us to an understanding of the bytecode stream from the PRG files. Below is a short hand disassembly of the foo function shown in Monkey C above:

The PRG bytecode for the foo function is:

00000110: 35 01 01 01 25 00 00 F0  0D 13 01 27 00 80 00 05  5...%......'....
00000120: 30 27 00 80 00 67 0D 2A  18 00 00 02 CF 12 01 25  0'...g.*.......%
00000130: 00 00 00 01 03 27 00 80  00 AF 0D 2A 0F 01 03 0F  .....'.....*....
00000140: 02 02 16 35 01 01 00 12  00 27 00 80 02 9C 0D 27  ...5.....'.....'

The disassembly looks something like:

00000110: 35 01            ARGC 1
00000112: 01 01            INCSP 1
00000114: 25 00 00 F0 0D   IPUSH 0xF00D
00000119: 13 01            LPUTV 1
0000011B: 27 00 80 00 05   SPUSH 0x800005 ; "Toybox_System"
00000120: 30               GETM
00000121: 27 00 80 00 67   SPUSH 0x800067 ; "println"
00000126: 0D               GETV
00000127: 2A               FRPUSH
00000128: 18 00 00 02 CF   NEWS 0x2CF ; "0xf00d + 1 = " 
0000012d: 12 01            LGETV 1
0000012f: 25 00 00 00 01   IPUSH 0x01
00000134: 03               ADD
00000135: 27 00 80 00 AF   SPUSH 0x8000AF ; "toString"
0000013a: 0D               GETV
0000013b: 2A               FRPUSH
0000013c: 0F 01            INVOKE 1
0000013e: 03               ADD
0000013f: 0F 02            INVOKE 2
00000141: 02               POPV
00000142: 16               RETURN

So far, we've answered our initial question about the executable image format and we can start guessing at the virtual machine organization. Unfortunately, we don't have quite enough information in the compiler/assembler to answer much more about the system definitively. For that, we should move along and start working on the firmware. Specifically, we need to find the portion of the firmware responsible for loading, parsing, and executing these PRG images.

Firmware

A quick google for Garmin Forerunner firmware provides the official Garmin website. While the release notes are good, there does not appear to be a direct download of the firmware images from the website. Luckily, someone else already yanked the firmware from wherever the Garmin Connect app pulls from. (Or, at least, they used to. The archive of firmware was found at http://gawisp.com/perry/forerunner/ but it seems the site is currently down.)

With the firmware in hand, we need to determine how Garmin performs an update. Is the image a flat flash image? Does it contain metadata or a header? Does the format support a partial update? Again, we're lucky because someone has also already figured out the type-length-value (TLV) envelope structure of the GCD update files. There is a document providing information on the structure. All it takes is a little time with our best friend hexdump -C to see that the Forerunner 235 update contains two "large" images that can be pulled out. Interestingly, one is the size of the SRAM on the SoC we identified earlier via the teardown (of the Maxim MAX32630) and the other is the size of the internal flash. If I had to bet, I'd believe the first is a bootstrap that is written into SRAM so the firmware that is eXecute-In-Place can be replaced. We can write a quick Python script to extract the "main" firmware that we believe is written to the internal flash.

    @classmethod
    def parse(cls, f):
        header = f.read(8)
        if header != b'GARMINd\x00':
            raise Exception('Unknown firmware format')

        tlvs = []
        while True:
            data = f.read(4)
            if len(data) != 4: break

            tag, length = struct.unpack('<HH', data)
            value = f.read(length)
            tlvs.append((tag, length, value))
            print('  0x{:04x}: 0x{:04x}'.format(tag, length))

        return cls(tlvs)

With the main firmware extracted, our ARM RE fingers should really be starting to itch. From the datasheet of the MAX32630, we know the internal flash is probably mapped starting at 0x0. Since the extracted flat image is mapped directly, there is no need for a dedicated IDA loader plugin -- the IDA load UI is flexible enough. Once the image is loaded and the appropriate architecture for the Cortex-M is selected, adding segments for SRAM and the peripheral ranges provide a solid starting point for the reverse engineering effort.

After running some Thumb function finding heuristic scripts against the initial database, what next? Our first goal is to find the code responsible for parsing and running the virtual machine programs. The parsing logic is probably the better of the two to start with -- the identification of the parsing logic will also help identify the runtime representation of the program. In most cases, the parsing logic will output an internal runtime representation. Understanding this runtime structure provides context for all reverse engineering of the execution or processing surrounding the loaded program. In this case, taking the time to create and refine the internal runtime context structure is worth the effort.

A quick look at the strings identified by IDA doesn't immediately provide any hints around the PRG processing. When strings are lacking, the next best handhold is using unique constants. In this case, the PRG tags are unique 32-bit integers perfect for IDA's "Search -> Immediate value...". Searching for 0xd000d000, the main PRG header tag, reveals a single function passing this value into a sub function. Perfect!

unsigned int __fastcall read_prg_header(int a1, _DWORD *a2, int a3)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  v4 = a1;
  v5 = prg_extract_section_data(a1, 0xD000D000, &a3a, &out_offset, 1u);
  v6 = (void *)mem_alloc(a3a.length, 3, &handle);
  handle = v6;
  if ( !v6 )
    goto LABEL_2;
  if ( v5 == (void *)1 )
  {
    v9 = (int *)mem_pointer_borrow(v6);
    v10 = v9;
    v11 = file_read_(v4, v9, a3a.length);
    if ( v11 == a3a.length )
    {
      v17 = *v10;
      if ( a2 )
      {
        v12 = a3a.length;
        v13 = v17;
        *a2 = a3a.tag;
...

Using this as a starting point and walking up and down the call stack surrounding this function reveals, as we hoped, the code for parsing a PRG file. We will spare the reader three weeks of reverse engineering play-by-play as the virtual machine, deemed the "TVM" by Garmin, is analyzed and the runtime objects and utilities are reversed. In addition to the TVM, the OS structures and APIs need to be reversed along the way. The watch runs a Garmin developed OS but context clues and a few useful strings help determine the general OS object APIs. The OS provides abstractions for objects such as semaphores, tasks, events, and queues. A layer above this provides a file system abstraction and memory allocation routines. The TVM layers a richer abstraction on the memory allocation logic for tracking TVM program quotas and for maintaining reference counts on allocated buffers.

The TVM is a stack-based virtual machine. Each runtime value is stored along with the accompanying type. Opcodes manipulate values stored on the stack and can reference local variables reserved on the stack by index. Values are created at runtime by loading data from the PRG data section or via immediate values embedded in the bytecode stream. Once loaded onto the stack, the value can be manipulated and passed around the system. All runtime allocations are tracked per TVM instance. This tracking is an effort to prevent a buggy or malicious program from taking down the entire system via resource exhaustion. Runtime objects are also reference counted, as noted above, and are deterministically garbage collected when the last reference is released.

During analysis of the TVM context block, the PRG loading, and the runtime initialization, we're able to make some progress toward understanding how the virtual machine interacts with the native runtime (one of our overall goals). Below is an excerpt from a function we named tvm_run_function. This function is used to enter a TVM function based on a TVM virtual address, for example when handling a CALL opcode or to run initialization function after loading the PRG. We can see that, based on the high bits of the address, the TVM either executes a native function based on a function pointer table (tvm_native_methods) or executes bytecode by entering the opcode dispatch loop (tvm_execute_opcodes).

  if ( (function_addr.value & 0xFF000000) == 0x40000000 )
  {
    v17 = LOWORD(function_addr.value);
    if ( LOWORD(function_addr.value) > 0xC5u )
    {
      v8 = 15;
      goto LABEL_4;
    }
    ctx->pc_ptr = (char *)tvm_native_methods[LOWORD(function_addr.value)];
    v18 = tvm_native_methods[v17]((int)ctx, a4);
    if ( v18 == 21 )
      return v8;
    if ( tvm_native_methods[v17] != sub_10F18C )
    {
      if ( v18 )
        goto LABEL_15;
      v18 = tvm_value_incref(ctx, (struct tvm_value *)ctx->stack_ptr);
    }
    if ( !v18 )
      v18 = tvm_op_return(ctx);
  }
  else
  {
    v18 = tvm_tvmaddr_to_ptr(ctx, function_addr.value, &ctx->pc_ptr);
    if ( !v18 )
      v18 = tvm_execute_opcodes(ctx);
  }

After weeks of reverse engineering and marking up an IDA database, we've answered questions 2, 3, and 4 pretty well. Additionally, along the way, we've identified a handful of code that appears to violate contracts made amongst the virtual machine runtime. Maybe the real treasures were the bugs we found along the way?

TVM Opcode Bugs

While reversing the TVM system, we noted a number of the opcode handlers performed operations that appeared to break the virtual machine abstraction. Below, we'll follow up on each of those. More information about each vulnerability, including the disclosure timeline, can be found at ATREDIS-2020-0004, ATREDIS-2020-0005, ATREDIS-2020-0006, and ATREDIS-2020-0007.

NEWA

One instruction, NEWA, is used to create an runtime array of TVM values of a fixed size. The array is initialized with the null value. NEWA expects a number-like value on the top of the stack indicating the size of the array. Decompilation of the NEWA opcode implementation shows just the one check on the length value (ensuring it is not negative) before passing it to tvm_value_array_allocate for the array size calculation.

int __fastcall tvm_op_newa(struct tvm *ctx)
{
  struct stack_value *sp;
  int rv;
  unsigned int length;
  struct tvm_value value;

  sp = ctx->stack_ptr;
  length = 0;
  value = *sp;
  rv = tvm_value_to_int(ctx, &value, &length);
  if ( rv ) {
    if ( length < 0 )
    {
      rv = 10;
      tvm_value_decref(ctx, &value);
      return rv;
    }

    rv = tvm_value_array_allocate(ctx, length, ctx->stack_ptr);
    if ( rv )
    {
      rv = tvm_value_decref(ctx, &value);
      if ( !rv )
        return tvm_value_incref(ctx, ctx->stack_ptr);
    }
  }
  tvm_value_decref(ctx, &value);
  return rv;
}

The tvm_value_array_allocate function will perform the unchecked array size calculation as shown below.

int __fastcall tvm_value_array_allocate(struct tvm *ctx, int length, struct tvm_value *array_value)
{
  unsigned int allocation_size; // r6
  int rv; // r0 MAPDST
  struct tvm_value_array_data *array_data; // r9
  void *array_data_handle; // [sp+4h] [bp-24h] MAPDST

  array_data_handle = 0;
  allocation_size = 5 * length + 15;
  rv = tvm_alloc_for_app(ctx, allocation_size, &array_data_handle);
  if ( !array_data_handle )
    return 7;
  array_data = (struct tvm_value_array_data *)mem_pointer_borrow(array_data_handle);
  memset((int *)array_data, 0, allocation_size);
  array_data->m_0x01 = 1;
  array_data->type = ARRAY;
  array_data->length = length;
  mem_pointer_release(array_data_handle);
  array_value->type = ARRAY;
  array_value->value = (unsigned int)array_data_handle;
  return rv;
}

The allocation size calculation can overflow the 32-bit integer and can be triggered by creating an array of size 0x33333333. This value is still positive for a 32-bit integer (passing the check in the tvm_op_newa function). When the allocation_size is calculated, the result will overflow the 32-bit unsigned int:

>>> length = 0x33333333
>>> allocation_size = 5 * length + 15
>>> hex(allocation_size)
'0x10000000e'
>>> hex(allocation_size & 0xffffffff)
'0xe'

The original length value (0x33333333) is stored in the resulting tvm_value_array_data and this is the value used to check bounds during the array read and write operations (performed by the AGETV and APUTV instructions).

This can be directly triggered through Monkey C and does not require direct bytecode manipulation to create a proof-of-concept. There are a number of additional constraints to turn this into a reliable read/write anything anywhere primitive but it provides are strong exploit building block.

LGETV and LPUTV

The instructions LGETV and LPUTV are used to read and write to a local variable. The virtual machine maintains a frame pointer used to point at the start of the frame on the stack. The entry of a method will reserve some space on the stack to store local variables. The LGETV and LPUTV instructions expect a single byte operand specifying the local variable index for that instruction. The implementation does not check that this index is within the previously allocated local variable space as seen below.


int __fastcall tvm_op_lgetv(struct tvm *ctx)
{
  char *pc_at_entry; // r3
  struct stack_value *sp_at_entry; // r1
  int local_var_idx; // t1
  struct stack_value *local_var_ptr; // r2
  struct stack_value *v6; // r5

  pc_at_entry = ctx->pc_ptr;
  sp_at_entry = ctx->stack_ptr;
  local_var_idx = (unsigned __int8)*pc_at_entry;
  ctx->pc_ptr = pc_at_entry + 1;
  local_var_ptr = &ctx->frame_ptr[local_var_idx + 1];
  ctx->stack_ptr = sp_at_entry + 1;
  sp_at_entry[1] = *local_var_ptr;
  v6 = (struct stack_value *)&ctx->m_0x007b;
  tvm_value_incref(ctx, (struct tvm_value *)ctx->stack_ptr);
  tvm_value_decref(ctx, v6);
  ctx->m_0x007b = (struct tvm_value)*ctx->stack_ptr;
  tvm_value_incref(ctx, (struct tvm_value *)v6);
  return 0;
}

The unchecked offset from the frame_ptr of the execution context provides a path to both memory access past the end of the TVM context allocation (the stack is allocated at the end of this structure) and a primitive to construct a use-after-free taking advantage of the way values outside of the valid stack are treated.

NEWS

The NEWS instruction creates a runtime string object from a string definition structure in the data section of the PRG. Upon execution, this instruction pushes a new tvm_value of type STRING onto the top of the stack. The value of the string is loaded from an address provided as a 32-bit operand. The data at the provided address is expected to contain a string definition of the form:

uint8_t one; // 0x01
uint16_t length;
uint8_t utf8_string[length];

The string data buffer is allocated to hold length bytes and then a function similar to strcpy is used to populate it. The strcpy-like function will only stop when a NUL byte is encountered possibly overflowing the buffer beyond the size of the initial allocation.

int __fastcall tvm_op_news(struct tvm *ctx)
{
  int tvm_addr_for_string; // r0
  struct stack_value *v3; // r2
  int result; // r0

  tvm_addr_for_string = tvm_fetch_int((int *)&ctx->pc_ptr);
  v3 = ctx->stack_ptr;
  ctx->stack_ptr = v3 + 1;
  v3[1].type = NULL;
  ctx->stack_ptr->value = 0;
  result = tvm_value_load_string(ctx, tvm_addr_for_string, (int)ctx->stack_ptr);
  if ( !result )
    result = tvm_value_incref(ctx, (struct tvm_value *)ctx->stack_ptr);
  return result;
}

int __fastcall tvm_value_load_string(struct tvm *ctx, int string_def_addr, int string_value_out)
{
  int rv; // r0
  unsigned __int8 *string_def; // [sp+4h] [bp-14h]

  rv = tvm_tvmaddr_to_ptr(ctx, string_def_addr, &string_def);
  if ( !rv )
    rv = tvm_string_def_to_value(ctx, string_def, (unsigned __int8 *)string_value_out, 1);
  return rv;
}

int __fastcall tvm_string_def_to_value(_BYTE *a1, unsigned __int8 *a2, unsigned __int8 *a3, int a4)
{
  _BYTE *v4; // r6
  unsigned __int8 *v5; // r4
  struct tvm_value *v6; // r5
  int result; // r0
  _BYTE *v8; // r4
  int v9; // r6
  __int16 v10; // r0
  int v11; // r3
  int v12; // [sp+4h] [bp-14h]

  v4 = a1;
  v5 = a2;
  v6 = (struct tvm_value *)a3;
  if ( a4 )
  {
    if ( *a2 != 1 )
      return 5;
    v5 = a2 + 1;
  }
  result = tvm_value_string_alloc_by_size((struct tvm *)a1, v5[1] | (*v5 << 8), (int)a3);
  if ( !result )
  {
    result = tvm_value_string_to_ptr(v4, v6, &v12);
    if ( !result )
    {
      v8 = v5 + 2;
      v9 = v12;
      v10 = strlen_utf8(v8);
      v11 = v12;
      *(_WORD *)(v9 + 6) = v10;
      strcpy_(v11 + 8, (unsigned int)v8);
      if ( v6->type == STRING )
        return sub_10DE28(v6);
      return 5;
    }
  }
  return result;
}

The tvm_string_def_to_value function allocates the string using the size found in memory and then proceeds to strcpy the provided data into the freshly allocated buffer.

DUP

The DUP instruction allows the running program to duplicate a value from any slot on the stack and push the copy on the top of the stack.

int __fastcall tvm_op_dup(struct tvm *ctx)
{
  char *pc; // r1
  struct stack_value *sp; // r2
  int stack_offset; // t1
  struct tvm *ctx:v4; // r3
  int v5; // r0
  struct stack_value v7; // [sp+0h] [bp-10h]

  pc = ctx->pc_ptr;
  sp = ctx->stack_ptr;
  stack_offset = (unsigned __int8)*pc;
  ctx->pc_ptr = pc + 1;
  ctx:v4 = ctx;
  v7 = sp[-stack_offset];
  v5 = *(_DWORD *)&v7.type;
  ctx:v4->stack_ptr = sp + 1;
  *(_DWORD *)&sp[1].type = v5;
  HIBYTE(sp[1].value) = HIBYTE(v7.value);
  tvm_value_incref(ctx:v4, (struct tvm_value *)&v7);
  return 0;
}

The implementation reads the next byte from the instruction stream, uses this byte as the negative offset to read from the top of the stack, and then copies that value to the next stack entry. Finally, the function increases the reference count in the tvm_value. The lack of a bounds check allows referencing memory outside of the stack for the tvm_value copy resulting in multiple primitives including use-after-free.

What next?

Well, those are a handful of bugs found via static code analysis. They were also found by accident without a dedicated plan of attack or comprehensive audit of the TVM attack surface. While the TVM appears clean in design and implementation, these bugs suggest the CIQ applications were likely not considered attack surface in the past. Finding more of the lower hanging bugs should be straightforward using a dynamic fuzzing approach. Unfortunately, doing so on an off-the-shelf device is slow and lacks reliability. An interesting next step would be running the firmware, either stock or modified, on a devkit or within a QEMU emulated environment.

We've spent some time working towards a functioning QEMU patch that emulates the MAX32630 and some of the relevant peripherals. We're not yet to the point where the watch comes all the way up but have learned more and more about the firmware in the process. A more direct approach would set up a runtime state that allowed just the PRG loader and TVM interpreter to run. This seems possible but the Garmin RTOS provides a number of services that would need to be stubbed out.

Another interesting task would be to finish a full code execution exploit for these bugs and to pivot towards exploitation of one of the attached microcontrollers (the Bluetooth controller, for instance).

Edit (11-18-2020): Clarified firmware current during analysis and added a link to the updated (patched) firmware.

This blog post was written by Dion Blazakis, technical peer review by Zach Lanier, and edited for the web by Lacey Kasten at Atredis Partners.