The Linux kernel is written in C, but it also leverages extensions
provided by GCC. In 2022, it moved from GCC/Clang
-std=gnu89
to -std=gnu11
. This article
explores my notes on how these GNU extensions are utilized within the
kernel.
Statement expressions
Statement expressions are commonly used in macros.
Local labels
Some macros use this extension to restart a for loop in a macro's replacement list.
1 |
|
include/linux/instruction_pointer.h
utilizes the feature
to obtain the current instruction pointer:
1 | #define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; }) |
Labels as values and computed goto statements
Bytecode interpreters often leverage this extension. BPF, for
instance, utilizes this extension. We can also see this extension in
drm_exec_retry_on_contention
for the Direct Rendering
Manager.
When it comes to Position-Dependent Code (PDC), a switch statement
with a default label marked as __builtin_unreachable
can be
just as efficient as using labels as values. However, for
Position-Independent Code (PIC), absolute addresses offer a slight
performance edge, although at the cost of requiring more dynamic
relocations.
1 | static void *tbl[] = {&&do_ret, &&do_inc, &&do_dec}; |
typeof
and
__auto_type
typeof
is used in numerous code.
__auto_type
has a few occurrences.
C23 standardizes typeof
and auto
.
Conditionals with omitted operands
x ?: y
is known as the "Elvis operator". This is used in
numerous code.
Empty structures
The C standard (at least C11 and C23) specifies that:
If the member declaration list does not contain any named members, either directly or via an anonymous structure or anonymous union, the behavior is undefined.
The empty structure extension enables a dummy structure when a configuration option disables the functionality.
1 | #ifdef CONFIG_GENERIC_ENTRY |
Case ranges
This extension (case low ... high:
), frequently used in
the kernel, allows us to specify a range of consecutive values in a
single case label within a switch statement.
Object size checking
glibc 2.3.4 introduced _FORTIFY_SOURCE
in 2004 to catch
security errors due to misuse of some C library functions (primarily
buffer overflow). The implementation leverages inline functions and
__builtin_object_size
.
Linux kernel introduced CONFIG_FORTIFY_SOURCE
in 2017-07.
Like the userspace, the implementation relies on optimizations like
inlining and constant folding.
For example, include/linux/string.h
combines this
feature with BUILD_BUG_ON
(__attribute__((error(...)))
) to provide compile-time
errors:
1 | #define strtomem(dest, src) do { \ |
The Clang implementation utilizes
__attribute__((pass_object_size(type)))
and
__attribute__((overloadable(...)))
.
Clang introduced
__builtin_dynamic_object_size
in 2019 to possibly
evaluate the size dynamically. The feature was ported to GCC in 2021 and
is used by the kernel.
Pragmas
#pragma GCC diagnostic
is occasionally used to disable a local diagnostic.
include/linux/hidden.h
uses #pragma GCC visibility("hidden")
to force direct access (instead of GOT indirection) for external symbols
for -fpie/-fpic
.
bpf uses #pragma GCC poison
to forbid some undesired identifiers.
Structure-layout pragmas are commonly used.
Inline assembly
The Linux kernel uses inline assembly for a few reasons
- Hardware interaction: Certain operations require direct interaction with the underlying hardware capabilities, which might not be expressible in standard C.
- Performance optimization: In critical code paths, inline assembly can be used to squeeze out the last bit of performance by utilizing processor-specific instructions or bypassing certain C language constructs that might introduce overhead.
- Special facility: Some features, such as static keys, alternatives runtime patching, cannot be implemented with standard C.
Built-in functions
Special machine code instructions
GCC provides built-in functions to generate machine code instructions designed for certain tasks like popcount/clz/ctz. The compiler has more information about the function's purpose and leverages internal optimizations tailored for these tasks.
__builtin_choose_expr
This is analogous to ? :
but the condition is a constant
expression and the return type is not altered by promotion rules. In
GCC, __builtin_choose_expr
is not available in C++.
1 |
|
__builtin_constant_p
__builtin_constant_p
identifies whether an expression
can be evaluated to a constant. This capability unlocks several code
patterns.
Conditional static assertions:
1 | BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1); |
__builtin_constant_p
decides whether
MAYBE_BUILD_BUG_ON
expands to a compile-time static assert
mechanism (__attribute__((error(...)))
) or a runtime
mechanism.
1 | #define MAYBE_BUILD_BUG_ON(cond) \ |
Alternative code paths:
Sometimes, the constant expression case might enable a more efficient code path.
1 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond)) |
Constant Folding Optimizations:
By knowing expressions are constant, the compiler can perform
constant folding optimizations. This means pre-calculating the
expression's value during compilation and replacing it with the actual
constant in the code. This leads to potentially smaller and faster code
as the calculation doesn't need to be done at runtime. However, I notice
that __builtin_constant_p
is often abused. This is one of
the source why the kernel cannot be built with -O0
.
__builtin_expect
Used by likely
and unlikely
macros to give
the compiler hints for optimization.
1 |
|
__builtin_frame_address
Used by stack trace utilities.
__builtin_offsetof
Used to implement offsetof
.
__builtin_*_overflow
Used by include/linux/overflow.h
for overflow
checking.
__builtin_prefetch
Used as the default implementation of prefetch
and
prefetchw
.
__builtin_unreachable
Used by include/linux/compiler.h:unreachable
to inform
unreachability for optimization or diagnostics suppressing purposes.
__builtin_types_compatible_p
This is often used to emulate C11 static_assert
1 | #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) |
or implement C++ function overloading, e.g.
1 |
|
include/linux/jump_label.h
combines
__builtin_types_compatible_p
and dead code elimination to
emulate C++17 if constexpr
.
____wrong_branch_error
is not defined. If the unexpected
code path is taken, there will be a linker error.
1 | #define static_branch_likely(x) \ |
BTF built-in functions
Clang thread safety analysis
See perf mutex: Add thread safety annotations.
Function attributes
tools/include/linux/compiler.h
defines macros for many
commonly used attributes. Many of them are commonly used by other
projects and probably not worth describing.
tools/testing/selftests
has a few
__attribute__((constructor))
for initialization.
__attribute__((error(...)))
is used with inlining and
dead code elimination to provide static assertion.
1 |
|
__attribute__((naked))
(__naked
) is used by
arm and bpf code.
__attribute__((section(...)))
is used to place functions
in the specified section, e.g. .init.text
.
__attribute__((weak))
is used to allow a generic
function to be replaced by an arch-specific implementation. The weak
reference feature is used by some
__start_<sectionname>
/__stop_<sectionname>
encapsulation symbols.
#define __noendbr __attribute__((nocf_check))
is used
to disable Intel CET for some special functions.
Variable attributes
__attribute__((cleanup(...)))
runs a function when the
variable goes out of scope. include/linux/cleanup.h
defines
DEFINE_FREE
and __free
based on this
feature.
1 |
|
Type attributes
__attribute__((aligned(...)))
and
__attribute__((packed))
are often to control structure
layouts.
Language dialects
-fno-delete-null-pointer-checks
Assume that programs can safely dereference null pointers, and code or data element may reside at address zero.
-fno-strict-aliasing
The C11 standard (section 6.5) defines strict aliasing rules.
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object, — a qualified version of a type compatible with the effective type of the object, — a type that is the signed or unsigned type corresponding to the effective type of the object, — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, — an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or — a character type.
Compilers leverage these aliasing rules for optimizations, which can
be disabled with -fno-strict-aliasing
.
Linus Torvalds has expressed reservations about strict aliasing. Here is a discussion from 2018.
-fno-strict-overflow
In GCC, this option is identical to
-fwrapv -fwrapv-pointer
: make signed integer overflow
defined using wraparound semantics. While avoiding undefined behaviors,
unexpected arithmetic overflow bugs might be lurking.
[RFC] Mitigating unexpected arithmetic overflow is a 2024-05 thread about detecting and mitigating unsigned integer overflow.