GCC supports some function attributes for function multi-versioning: a way for a function to have multiple implementations, each using a different set of ISA extensions. A function attribute specifies different requirements of ISA extensions. The generated program decodes the CPU model and features at run-time, and picks the most restrictive implementation that is satisfied by the CPU, assuming that the most restrictive implementation has the best performance.
__attribute__((target(...)))
__attribute__((target(...)))
has been available for a
long time, even before attributes for function multi-versioning were
introduced. Here are some links to relevant documentation.
- https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#:~:text=target%20(
- Attributes in Clang#target
- https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html
- https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html
Usually we use different function names for different implementations and define a dispatch function. This approach is like a manual ifunc.
1 | extern int flags; |
The function bodies are duplicated. We can define a
[[gnu::always_inline]]
function shared by the different
implementations.
1 | __attribute__((always_inline)) static inline foo_impl(int a) { return a & a-1; } |
Let's check the behavior of an external linkage. In C++ mode, GCC and
Clang emit two symbols _Z3foov
and
_Z3foov.sse4.2
for the following program:
1 | __attribute__((target("default"))) int foo(void) { return 0; } |
In C mode, GCC reports error: redefinition of ‘foo’
.
Clang emits two symbols foo
and
foo.see4.2
.
With more than one declaration, the compiler merges the attributes.
1 | int foo(void); |
__attribute__((target_clones(...)))
This is the first attribute that GCC introduced to convenient function multi-versioning. Since GCC 6, we can just define one function with the attribute specifying all supported targets.
1 |
|
See the GCC doc (Common Function Attributes) and Attributes
in Clang#target_clones. Clang only supports some basic forms, not
arch=
.
For the above function, GCC emits three implementations:
foo.default
, foo.arch_x86_64_v2
, and
foo.arch_x86_64_v3
. foo
is a dispatch function
that selects one of the implementations. This is implemented as a GNU indirect function
(ifunc). The ifunc resolver is called once by rtld at the relocation
resolving phase. The resolver references a function and a variable
defined in the runtime (libgcc).
1 | .section .text.foo.resolver,"axG",@progbits,foo.resolver,comdat |
The attribute can apply to a non-definition declaration.
foo.default
, foo.arch_x86_64_v2
, and
foo.arch_x86_64_v3
are undefined symbols while (GCC:
foo
, Clang: foo.ifunc
) and
foo.resolver
remain as definitions.
1 |
|
In llvm-project, compiler-rt provides an alternative implementation.
Drawbacks
Compilers largely don't know the semantics of ifunc and are very
conservative. Ifunc defeats most interprocedural optimizations. We can
see that the target_clones
function foo
is not
inlined into foo_plus_1
. Fortunately, functions called by a
target_clones
function are still inlinable.
An ifunc call needs a PLT entry, regardless of whether it is preemptive or not. On the contrary, a non-preemptive function does not need a PLT entry.
x86
The runtime executes cpuid
, extracts information about
the x86 family model and available CPU features, and stores them into
__cpu_model
and __cpu_features2
. The resolver
decodes the information and selects the best implementation.
AArch64
The support is missing/incomplete as of GCC 12 and Clang 16.0. When
implemented, +
separated features can be specified.
1 | __attribute__((target_clones("sha2+memtag2", "fcma+sve2-pmull128"))) |
(compiler-rt/lib/builtins/cpu_model.c
defines some
symbols like __aarch64_have_lse_atomics
. GCC
commit)
__attribute__((cpu_dispatch(...)))
and __attribute__((cpu_specific(...)))
Supported by Intel C++ Compiler and later ported to Clang. GCC
doesn't support the two attributes. They feel like legacy and are a
subset of target_clones
.
The declaration and definition can be in different translation units
like target_clones
, but different attributes are used.
1 | echo '__attribute__((cpu_dispatch(ivybridge, atom, sandybridge))) void foo(void); int main(void) { foo(); }' > a.c |
__attribute__((target_version(...)))
Arm C Language Extensions introduced a new GNU attribute
target_version
.
- https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
- Attributes in Clang#target_version
1 | int __attribute__((target_version("default"))) tv(void) { return 0; } |
The semantics are not very clear in the latest Clang. GCC does not support the attribute as of 2023-02.