UndefinedBehaviorSanitizer (UBSan) is an undefined behavior detector for C/C++. It consists of code instrumentation and a runtime. Both components have multiple independent implementations.
Clang implemented
the first few checks in 2009-12, initially named
-fcatch-undefined-behavior
. In 2012
-fsanitize=undefined
was added and
-fcatch-undefined-behavior
was removed.
GCC 4.9 implemented
-fsanitize=undefined
in 2013-08.
The runtime used by Clang lives in
llvm-project/compiler-rt/lib/ubsan
. GCC from time to time
syncs its downstream fork of the sanitizers part of compiler-rt
(libsanitizer
). The end of the article lists some
alternative runtime implementations.
Available checks
There are many undefined behavior checks
which can be detected. One can specify -fsanitize=xxx,yyy
where xxx and yyy are the names of the individual checks, or more
commonly, specify -fsanitize=undefined
to enable all of
them.
Some checks are not undefined behavior per language standards, but
are often code smell and lead to surprising results. They are
implicit-unsigned-integer-truncation
,
implicit-integer-sign-change
,
unsigned-integer-overflow
, etc.
We can use -###
to get the list of default UBSan checks.
1 | % clang -fsanitize=undefined -xc /dev/null '-###' |
GCC implements slightly fewer checks than Clang.
Modes
UBSan provides 3 modes to allow tradeoffs between code size and diagnostic verboseness.
- Default mode
- Minimal runtime (
-fsanitize-minimal-runtime
) - Trap mode (
-fsanitize-trap=undefined
): no runtime
Let's use the following program to compare the instrumentation and runtime behaviors.
1 | #include <stdio.h> |
Default mode
With the umbrella option -fsanitize=undefined
or the
specific -fsanitize=signed-integer-overflow
, Clang emits
the following LLVM IR for foo
:
1 | define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 { |
The add
and icmp
instructions check whether
the argument is in the range [-0x40000000, 0x40000000) (no overflow). If
yes, the cont
branch is taken and the non-overflow result
is returned. Otherwise, the emitted code calls the callback
__ubsan_handle_mul_overflow
(implemented by the runtime)
with arguments describing the source location.
By default, all UBSan checks except return,unreachable
are recoverable (i.e. non-fatal). After
__ubsan_handle_mul_overflow
(or another callback) prints an
error, the program keeps executing. The source location is marked and
further errors about this location are suppressed.
This deduplication feature makes logs less noisy. In one program
invocation, we may observe errors from potentially multiple source
locations.
We can let Clang exit the program upon a UBSan error (with error code
1, customized by UBSAN_OPTIONS=exitcode=2
). Just specify
-fno-sanitize-recover
(alias for
-fno-sanitize-recover=all
),
-fno-sanitize-recover=undefined
, or
-fno-sanitize-recover=signed-integer-overflow
. A non-return
callback will be emitted in place of
__ubsan_handle_mul_overflow
. Note that the emitted LLVM IR
terminates the basic block with an unreachable
instruction.
1 | handler.mul_overflow: |
Linking
The UBSan callbacks are provided by the runtime. For a link action,
we need to inform the compiler driver that the runtime should be linked.
This can be done by specifying -fsanitize=undefined
.
(Actually, any specific UBSan check that needs runtime can be used, e.g.
-fsanitize=signed-integer-overflow
.)
1 | clang -c -O2 -fsanitize=undefined a.c |
Some checks (function,vptr
) call C++ specific callbacks
implemented in libclang_rt.ubsan_standalone_cxx.a
. We need
to use clang++ -fsanitize=undefined
for the link action (or
use
clang -fsanitize=undefined -fsanitize-link-c++-runtime
).
1 | % clang -fsanitize=undefined a.o '-###' |& grep --color ubsan |
When linking an executable, with the default
-static-libsan
mode on many targets, Clang Driver passes
--whole-archive $resource_dir/lib/$triple/libclang_rt.ubsan.a --no-whole-archive
to the linker. GCC and some platforms prefer shared runtime/dynamic
runtime. See All
about sanitizer interceptors.
Some sanitizers (address
, memory
,
thread
, etc) ship a copy of UBSan runtime files.
Minimal runtime
The default mode provides verbose diagnostics which help programmers identify the undefined behavior. On the other hand, the detailed log helps attackers and the code size may be a concern in some configurations.
UBSan provides a minimal runtime mode to log very little information.
Specify -fsanitize-minimal-runtime
for both compile actions
and link actions to enable the mode.
1 | % clang -fsanitize=undefined -fsanitize-minimal-runtime a.c -o a |
In the emitted LLVM IR, a different set of callbacks
__ubsan_handle_*_minimal
are used. They take no argument
and therefore make instrumented code much smaller.
1 | handler.mul_overflow: |
The UBSan minimal runtime uses a separate set of runtime files
(libclang_rt.ubsan_minimal.*
) to decrease the runtime size.
1 | % clang -fsanitize=undefined -fsanitize-minimal-runtime a.c '-###' |& grep --color=auto ubsan_minimal |
Trap mode
When -fsanitize=signed-integer-overflow
is in effect, we
can specify -fsanitize-trap=signed-integer-overflow
so that
Clang will emit a trap instruction instead of a callback. This will
greatly decrease the size bloat from instrumentation compared to the
default mode.
Usually, we specify the umbrella options
-fsanitize-trap=undefined
or -fsanitize-trap
(alias for -fsanitize-trap=all
).
1 | clang -S -emit-llvm -O2 -fsanitize=undefined -fsanitize-trap=undefined a.c |
(-fsanitize-undefined-trap-on-error
is a deprecated
alias for -fsanitize-trap=undefined
.)
Instead of calling a callback, the emitted LLVM IR will call the LLVM
intrinsic llvm.ubsantrap
which lowers to a trap instruction
aborting the program. As a side benefit of avoiding UBSan callbacks, we
don't need a runtime. Therefore, -fsanitize=undefined
can
be omitted for link actions. Note: -fsanitize-trap=xxx
overrides -fsanitize-recover=xxx
.
1 | define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 { |
llvm.ubsantrap
has an integer argument. On some
architectures this can change the encoding of the trap instruction with
different error types.
On x86-64, llvm.ubsantrap
lowers to the 4-byte
ud1l ubsan_type(%eax),%eax
(UD1 with an address-size
override prefix; ud1l
is the AT&T syntax). AArch64
provides BRK
with 16-bit immediate.
However, many other architectures do not provide sufficient encoding
space for their trap instructions. For example, PowerPC provides
trap
(alias for tw 31,0,0
). In
tw TO,RA,RB
, when we specify RA=RB=0
, only 4
bits of the 5-bit TO
can be used, which is
insufficient.
For AArch64 and x86-64, we can register a signal handler to
disassemble the instruction at si->si_addr
. It can
distinguish UBSan check errors from other faults. We can even decode the
UBSan type numbers from #define LIST_SANITIZER_CHECKS
and give a better diagnostic.
1 | #include <signal.h> |
1 | % clang++ -fsanitize=undefined -fsanitize-trap=undefined a.cc -o a |
TODO: arm64: Support Clang UBSAN trap codes for better reporting
Together with other sanitizers
UBSan can be used together with many other sanitizers.
1 | clang -fsanitize=address,undefined a.c |
Typically one may want to test multiple sanitizers for a project. The
ability to use UBSan with another sanitizer decreases the number of
configurations. In practice people prefer
-fsanitize=address,undefined
and
-fsanitize=hwaddress,undefined
over other combinations. Of
course, adding UBSan on top of another instrumentation means more
overhead and size bloat.
Using UBSan with libFuzzer is interesting. We can run
-fsanitize=fuzzer,address,undefined -fno-sanitize-recover
to make libFuzzer abort when an AddressSanitizer or
UndefinedBehaviorSanitizer error is detected. Note: original libFuzzer
developers have stopped active
work on libFuzzer and switched to Centipede.
Runtime options
As mentioned, most UBSan checks are recoverable by default. Specify
halt_on_error=1
to get a behavior similar to compile-time
-fno-sanitize-recover=undefined
.
1 | % UBSAN_OPTIONS=halt_on_error=1 ./a <<< 1073741824 |
We can define __ubsan_default_options
to set the default
options.
1 | extern "C" const char *__ubsan_default_options() { |
Issue suppression
The GNU function attribute
__attribute__((no_sanitize("undefined")))
can disable
instrumentation for a function. (GCC additionally supports a deprecated
attribute __attribute__((no_sanitize_undefined)))
.)
If $resource_dir/share/ubsan_ignorelist.txt
is present,
it will be used as the default system ignorelist (which can be
overridden with -fsanitize-system-ignorelist=
). See Sanitizer
special case list for its format. This file instructs Clang CodeGen
to disable instrumentations for specified functions or files. One can
specify -fsanitize-ignorelist=
multiple times to use more
ignorelists.
In a large code base, enabling a check entails fixing all existing
issues or working around them with the no_sanitize
function
attribute. An ignorelist offers a mechanism to incrementally enable a
check. This is useful for toolchain maintainers who need to fight with
new code.
For example, once all source files except [a-m]*
are
free of UBSan errors (or suppressed with function attributes), we can
use the following patterns in ubsan_ignorelist.txt
.
1 | [alignment] |
./
is for included files.
Suppression can be done at runtime as well. UBSan supports a runtime
option suppressions
:
UBSAN_OPTIONS=suppressions=a.supp
.
Stack traces
By default the diagnostic does not include a stack trace. Specify
print_stacktrace=1
to get one. If the program is compiled
with -g1
or -g
, and
llvm-symbolizer
is in a PATH
directory, we can
get a symbolized stack trace.
1 | % UBSAN_OPTIONS=print_stacktrace=1 ./a <<< 1073741824 |
Like other sanitizers, UBSan runtime supports stack unwinding with
DWARF Call Frame Information and frame pointers. Many targets enable
.eh_frame
(a variant of DWARF Call Frame Information) by
default (-fasynchronous-unwind-tables
). If it is disabled,
ensure -fno-omit-frame-pointer
is in effect (many targets
default to -fomit-frame-pointer
with -O1
and
above).
Miscellaneous
In Clang, unlike many other sanitizers, UndefinedBehaviorSanitizer is performed as part of Clang CodeGen instead of a LLVM pass in the optimization pipeline.
With the default mode and the minimal runtime, UBSan instrumentation inserts callbacks. This makes the instrumented function non-leaf.
Selected applications
compiler-rt/lib/ubsan
is written in C++. It uses
features that are unavailable in some environments (e.g. some operating
system kernels). Adopting UBSan in kernels typically requires
reimplementing the runtime (usually in C).
In the Linux kernel, UBSAN:
run-time undefined behavior sanity checker introduced
CONFIG_UBSAN
and a runtime implementation in 2016.
NetBSD implemented µUBSan in 2018.
Android's doc: https://source.android.com/docs/security/test/sanitizers#undefinedbehaviorsanitizer