All about sanitizer interceptors

Many sanitizers want to know every function in the program. User functions are instrumented and therefore known by the sanitizer runtime. For library functions, some (e.g. mmap, munmap, memory allocation/deallocation functions, longjmp, vfork) need special treatment. Sanitizers leverage symbol interposition to redirect such function calls to its own implementation: interceptors. Other library functions can be treated as normal user code. Either instrumenting the function or providing an interceptor is fine. In some cases instrumenting is infeasible (e.g. When can glibc be built with Clang?) or is inefficient, and interceptors may be the practical choice.

This article talks about how interceptors work and the requirements of sanitizer interceptors.

How interceptors work

In Clang, sanitizer runtime files are named $resource_dir/lib/$triple/libclang_rt.*.{a,so} on ELF platforms. In the older LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=off configuration (default before LLVM 15.0.0), the files are named $resource_dir/lib/libclang_rt.*-$arch.{a,so}. The .a files are called static runtime while the .so files are called shared runtime or dynamic runtime. As of 2023-01, Apple (Mach-O), Android, and Fuchsia use shared runtime while other platforms (Linux, *BSD, etc) use static runtime.

In GCC, AddressSanitizer runtime files are named libasan.{a,so}. Shared runtime is the default. Specify -static-libasan to use static runtime.

Static runtime

Most static runtime files are only used when linking executables. When linking an executable, Clang Driver passes --whole-archive $resource_dir/lib/libclang_rt.$name.a --no-whole-archive to the linker. For the following example, if libclang_rt.$name.a defines malloc and free, the executable will get the definitions.

1 2	printf '#include <stdlib.h>\nint main() { void *p = malloc(42); free(p); }' > a.c clang -fsanitize=address a.c -o a

malloc, free, and actually all libc interceptors are exported to .dynsym because they are defined/referenced by a link-time shared object (glibc libc.so.6), even if -Wl,--export-dynamic is not specified. See Explain GNU style linker options#--export-dynamic for detail. -Wl,--gc-sections cannot discard these interceptors.

Also, note that the definitions are unversioned.

1
2
3

% nm -D a | grep -w 'malloc\|free'
00000000000ead10 W free
00000000000eb010 W malloc

When linking a shared object, the static runtime is not used. On Linux glibc, the shared object has a versioned reference.

1 2	printf '#include <stdlib.h>\nvoid *foo() { return malloc(42); }' > b.c clang -fsanitize=address -fpic -shared b.c -o b.so

If we make b.so a link-time dependency of the executable a or dlopen b.so at run-time, the [email protected]_2.2.5 reference from b.so will be bound to the definition in a. The dynamic loader computes a breadth-first symbol search list (executable, needed0, needed1, needed2, needed0_of_needed0, needed1_of_needed0, ...). For each symbol reference, the dynamic loader iterates over the list and finds the first component which provides a definition. The executable provides a definition and the search stops at the executable. See the first few paragraphs of ELF interposition and -Bsymbolic.

Actually we also use a rule that a [email protected]_2.2.5 reference can be bound to a malloc definition of VER_NDX_GLOBAL. See All about symbol versioning#rtld-behavior.

Since the executable defines malloc and free, its calls to the two symbols do not use PLT. This is an advantage over shared runtime. Calls from the shared object still need PLT. Preloading malloc functions does not work since the executable's definitions take priority.

Shared runtime

On targets that default to static runtime, use -shared-libsan to select this configuration. Both executables and shared objects will link against libclang_rt.$name.so. Here is an example on Linux glibc.

% clang -fsanitize=address -shared-libsan a.c -o a
% readelf -Wd a | grep 'clang_rt\|libc'
 0x0000000000000001 (NEEDED)             Shared library: [libclang_rt.asan.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
% readelf -d b.so | grep 'clang_rt\|libc'
 0x0000000000000001 (NEEDED)             Shared library: [libclang_rt.asan.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
% readelf -W --dyn-syms $(clang --print-file-name=libclang_rt.asan.so) | grep -w malloc
  1295: 000000000011a2d0   295 FUNC    WEAK   DEFAULT   11 malloc

In the symbol search list, libclang_rt.asan.so appears before libc.so.6, so malloc references from a and b.so will be bound to the definition malloc in libclang_rt.asan.so. The rule that a versioned reference can be bound to a definition of VER_NDX_GLOBAL kicks in again.

Preloading a shared object is dangerous and the asan runtime warns about it.

1
2
3

% clang -fsanitize=address -shared-libsan -Wl,-rpath=$(dirname $(clang --print-file-name=libclang_rt.asan.so)) a.c -o a
% LD_PRELOAD=/lib/x86_64-linux-gnu/libjemalloc.so.2 ./a
==1650190==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.

`dlsym RTLD_NEXT`

An interceptor needs to call the intercepted library function. This is implemented by looking up its address with dlsym(RTLD_NEXT, name). In the previous examples, whether the definition is in the executable or libclang_rt.asan.so, the component is prior to libc.so.6 in the symbol search list.

Before glibc 2.36, dlsym(RTLD_NEXT, name) returned the address of the oldest version definition of name in libc.so.6. And we got to use the old semantics. This is usually benign but not ideal (see https://github.com/google/sanitizers/issues/1371 for a regexec issue). I fixed glibc 2.36 (BZ #14932) so that dlsym(RTLD_NEXT, name) returns the default version definition now.

This fix led to an interesting issue. glibc made __pthread_mutex_lock a non-default version definition in 2.34, so dlsym(RTLD_NEXT, "__pthread_mutex_lock") would return NULL. I fixed it by disabling the interceptor if glibc>=2.34 at build time.

Interceptor requirements

AddressSanitizer

AddressSanitizer detects addressability bugs. For a mapped memory region, AddressSanitizer uses shadow memory to track whether user bytes are unaddressable (poisoned): accesses are considered a bug (heap-buffer-overflow, heap-use-after-free, stack-buffer-overflow, stack-use-after-{return,scope}, etc). 8 (granule) aligned user bytes are mapped to one shadow memory byte. (This can be patched to support other granules, e.g. now-deleted Myriad RTEMS used 32 as the granule.)

A shadow memory byte is 0 (unpoisoned, all 8 bytes are addressable) or a non-zero integer (poisoned, not all 8 bytes are addressable). The non-zero integer may be smaller than 8 (the first X bytes are addressable) or a predefined special value (to indicate a bug category).

At the start of an interceptor, AsanInitFromRtl is called if the runtime hasn't been initialized yet.

mmap and munmap do not need special treatment.

When a chunk of memory is reserved from a mapped region for heap allocation, the associated shadow memory is poisoned with 0xfa (kAsanHeapLeftRedzoneMagic). For a malloc-family function, its interceptor records the allocation information (thread ID, requested size, stack trace, allocation type (malloc, new, new[]), etc) and unpoisons the shadow (sets to zeros) which may be 0xfa if unallocated previously.

For a free-family function, its interceptor detects double free and alloc-dealloc type mismatch bugs, records the deallocation information (thread ID, stack trace), and poisons the shadow with 0xfd (kAsanHeapFreeMagic). Instrumented/intercepted accesses to the deallocated memory will cause an error.

For a library function which performs memory reads or writes, its interceptor emulates an instrumented memory read/write: check the shadow memory and report an error in case of a poisoned byte.

Some library functions allocate memory internally. An implementation typically carefully uses an interposable symbol malloc instead of a private alias, so the allocation/deallocation will be known by AddressSanitizer and be unpoisoned/poisoned properly.

An non-special function which is neither instrumented nor intercepted just leads to fewer detected errors.

Stack use after scope

An instrumented function poisons stack variables to catch stack use after scope bugs. Instrumentation unpoisons stack variables before an epilogue. If a process creates a subprocess with shared memory, and the subprocess exits due to a noreturn function, the unpoisoned shadow may cause a false positive to the first process. This is fixed by calling __asan_handle_no_return before calling noreturn functions to conservatively unpoison the whole stack.

vfork has similar issues and needs an interceptor.

longjmp-family functions have similar issues. The stack memory may be reused causing a false positive. These functions are intercepted in the runtime to call __asan_handle_no_return.

HWAddressSanitizer

HWAddressSanitizer detects addressability bugs (the same class of errors as the main feature of AddressSanitizer) using a different algorithm (software memory tagging). 16 (granule) aligned user bytes are associated with a non-zero tag. The tag is implemented as one byte and stored in the shadow memory.

A memory allocation chooses a random non-zero tag and sets it in the high bits of the returned pointer. The shadow memory of the allocated chunk is filled with the tag. To support accessing the pointer with non-zero high bits, hardware features (ARM Top Byte Ignore, Intel Linear Address Masking, RISC-V Pointer Masking) or page aliases are needed.

The interceptor behavior is similar to AddressSanitizer.

mmap and munmap do not need special treatment.

For an instrumented memory read or write operation, the pointer tag is checked against the tag stored in the shadow memory. Report an error in case of a mismatch.

To detect use-after-free, a memory deallocation needs to clear the associated shadow memory.

For an interceptor, do something similar to AddressSanitizer: check the pointer tag against the shadow memory and report an error in case of a mismatch.

HWAddressSanitizer is deployed on Android. Its C library bionic is instrumented so that very few interceptors are needed.

To use HWAddressSanitizer with glibc in the future, either interceptors need to be provided or glibc can be instrumented.

longjmp-family functions have issues similar to AddressSanitizer. The stack memory may be reused causing a false positive. These functions are intercepted to call __hwasan_handle_longjmp to clear the shadow memory. vfork needs an interceptor similar to AddressSanitizer.

ThreadSanitizer

In the old runtime (tsan v2), 8 aligned user bytes are mapped to a shadow cell of 32 bytes, which contains 4 shadow values. The representation uses 13 bits to record a thread ID (up to 8192 threads are supported), and 42 bits to record a vector clock timestamp.

In the new runtime (tsan v3), 8 aligned user bytes are mapped to a shadow cell of 16 bytes, which contains 4 shadow values. A shadow value records the bitmask of accessed bytes (8 bites), a thread slot ID (8 bits), a vector clock timestamp (14 bites), is_read (1 bit), is_atomic (1 bit). The shrinking of time is made available because the timestamp increments more slowly (only on atomic releases, mutex unlocks, thread creation/destruction).

At the start of an interceptor, call cur_thread_init and retrieve the thread state and the return address of the current function. For a memory read or write operation, the access is recorded as a thread event (EventAccess), forms a new shadow value, and is checked with existing shadow values (at most 4) in the shadow cell. If the current shadow value and a previous shadow value interact in the bitmask of accessed bytes, have different thread slot IDs, have at least one write, and have at least one non-atomic access, report a data race. Otherwise replace one shadow value with the new one.

pthread mutex functions such as pthread_mutex_{init,destroy,lock,trylock,timedlock,unlock} (and pthread_{rwlock,spin,cond,barrier}_* pthread_once) are intercepted to record mutex lifetime and synchronization points.

Most libc functions do not have synchronization semantics.

An non-special function which is neither instrumented nor intercepted just leads to fewer detected errors.

MemorySanitizer

MemorySanitizer uses shadow memory to track whether a memory region has uninitialized values. One user byte is mapped to one shadow memory byte. A shadow memory bit is one if the associated user memory bit is uninitialized.

At the start of an interceptor, __msan_init is called if the runtime hasn't been initialized yet. Then __errno_location() is unpoisoned. To support -fsanitize=memory,fuzzer, interceptors introduce a small overhead by checking whether interceptors are disabled due to libFuzzer. This is so that the libFuzzer runtime does not need to be instrumented by MemorySanitizer.

For a memory read operation, if the shadow memory is poisoned, report a use-of-uninitialized-value error. For a memory write operation, unpoison the shadow memory, i.e. mark the memory region as initialized.

If an uninstrumented function which performs memory writes does not have an interceptor, the lack of unpoisoning may lead to false positives when the memory is subsequently read. This property is different from AddressSanitizer/ThreadSanitizer where a missing interceptor usually just leads to fewer detected errors.

DataFlowSanitizer

DataFlowSanitizer is a dynamic data flow analysis (taint analysis) tool. It allows tagging a user byte with up to 8 labels. Compiler instrumentation propagates labels when a user byte affects the computation of another one. This process is similar to uninitialized value propagation in MemorySanitizer. A user byte is mapped to a shadow memory byte which supports 8 labels.

At the start of an interceptor, dfsan_init is called if the runtime hasn't been initialized yet. Then the label of __errno_location() is cleared.

For a malloc-family or free-family function, its interceptor clears labels (assuming the bytes are unaffected by other values) by default.

For a library function which performs memory writes, its interceptor propagates the label of the source value.

An non-special function which performs memory writes and is neither instrumented nor intercepted misses label propagation. This may cause false negatives.

Standalone LeakSanitizer

For most major 64-bit platforms (except Apple), AddressSanitizer integrates and enables LeakSanitizer by default. LeakSanitizer can be used standalone as well to just detect memory leak bugs.

Standalone LeakSanitizer intercepts very few functions: malloc-family and free-family functions (like a preloaded memory allocator), and a few functions like pthread_create.

At the start of an interceptor, __lsan_init is called if the runtime hasn't been initialized yet.

For a malloc-family function, its interceptor records the allocation information (requested size, stack trace).

For pthread_create, its interceptor ensures correct thread ID, ignores allocations from the real pthread_create, and registers the new thread.

At exit time, LeakSanitizer performs a GC style stop-the-world and scans all reachable memory chunks. For an unreachable chunk, report an error using the recorded information.

MemProf

TODO MemProf isn't a sanitizer, but it uses sanitizer interceptors.

Portability and maintainability

Sanitizers support many operating systems and many architectures.

Implementing interceptors (though the code can be shared) may be the most challenging part porting a sanitizer to a new platform.

A libc implementation more or less supports some extensions. These functions need to be intercepted.
Type definitions from a newer standard may not be supported by an implementation. A version dispatch is needed or a shim may be provided.

Sanitier runtime is therefore scattered with #if conditional inclusions. Runtime tests with combinatorial explosions sometimes make debugging tricky.

Therefore, supporting a new operating system should be taken very carefully. I believe sanitizer maintainers tend to focus on the ability to instrument libc and provide sanitizer callbacks if a new OS is ever considered.

musl

I ported asan, cfi, lsan, msan, tsan, ubsan to Linux musl in 2021-01 and learned a lot in the process. It seemed that code readability improved in some places (where glibc-ism was refined with a better condition, e.g. change from "Linux but not Android" or "all OSes but not X/Y/Z" (where X/Y/Z basically enumerates all non-glibc platforms) to "Linux glibc"). Maintaining the port required low efforts. I applied very few fixes in 2021 and 2022.

Non-intercepted library function calls an intercepted library function

MemorySanitizer/DataFlowSanitizer and ptsname

ptsname used not to be intercepted and this caused false positives with MemorySanitizer.

In glibc, ptsname calls __ptsname_r with a static variable (unknown to MemorySanitizer, having a zero value shadow). Since __ptsname_r is not instrumented, the shadow memory of numbuf is whatever the last shadow value for the memory region that numbuf happens to occupy. If memcpy is interposable, its interceptor copies the incorrect (poisoned) shadow to buf, which may cause a false positive if buf is accessed by the caller.

int
__ptsname_r (int fd, char *buf, size_t buflen)
{
  ...
      char numbuf[21];
  ...
      numbuf[sizeof (numbuf) - 1] = '\0';
      p = _itoa_word (ptyno, &numbuf[sizeof (numbuf) - 1], 10, 0);
  ...
      memcpy (__stpcpy (buf, devpts), p, &numbuf[sizeof (numbuf)] - p);
  ...
}

The fix is to instrument ptsname and ptsname_r.

In glibc before 2018 (BZ #18822), many functions had PLT calls. There were many lurking issues like the above.