UNDER CONSTRUCTION
GCC supports some function attributes for function multi-versioning: a way for a function to have multiple implementations, each using a different set of ISA extensions. A function attribute specifies different requirements of ISA extensions. The generated program decodes the CPU model and features at run-time, and picks the most restrictive implementation which is satisfied by the CPU, assuming that the most restrictive implementation has the best performance.
__attribute__((target(...)))
__attribute__((target(...)))
has been available for a
long time, even before attributes for function multi-versioning were
introduced. Here are some links to relevant documentation.
- https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#:~:text=target%20(
- Attributes in Clang#target
- https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html
- https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html
Usually we use different function names for different implementations and define a dispatch function. This approach is like a manual ifunc.
1 | extern int flags; |
The function bodies are duplicated. We can define a
[[gnu::always_inline]]
function shared by the different
implementations.
1 | __attribute__((always_inline)) static inline foo_impl(int a) { return a & a-1; } |
Let's check the behavior of an external linkage. In C++ mode, GCC and
Clang emit two symbols _Z3foov
and
_Z3foov.sse4.2
for the following program:
1 | __attribute__((target("default"))) int foo(void) { return 0; } |
In C mode, GCC reports error: redefinition of ‘foo’
.
Clang emits two symbols foo
and
foo.see4.2
.
TODO forward declaring
__attribute__((target_clones(...)))
GCC 6 introduced __attribute__((target_clones(...)))
. We
can just define one function with the attribute specifying all supported
targets.
1 |
|
See the GCC doc (Common Function Attributes) and Attributes
in Clang#target_clones. Clang only supports some basic forms, not
arch=
.
For the above function, GCC emits three implementations
foo.default
, foo.arch_x86_64_v2
, and
foo.arch_x86_64_v3
. foo
is a dispatch function
which selects one of the implementations. This is implemented as a GNU
indirect function (ifunc). The ifunc resolver is called once by rtld at
the relocation resolving phase. The resolver references a function and a
variable defined in the runtime (libgcc).
1 | .section .text.foo.resolver,"axG",@progbits,foo.resolver,comdat |
As an ifunc, foo
defeats interprocedural optimizations.
We can see that foo_plus_1
does not inline
foo
.
The attribute can apply to a non-definition declaration.
foo.default
, foo.arch_x86_64_v2
, and
foo.arch_x86_64_v3
are undefined symbols while (GCC:
foo
, Clang: foo.ifunc
) and
foo.resolver
remain as definitions.
1 |
|
In llvm-project, compiler-rt provides an alternative implementation.
x86
The runtime executes cpuid
, extracts information about
the x86 family model and available CPU features, and stores them into
__cpu_model
and __cpu_features2
. The resolver
decodes the information and selects the best implementation.
AArch64
The support is missing/incomplete as of GCC 12 and Clang 16.0.
1 | __attribute__((target_clones("sha2+memtag2", "fcma+sve2-pmull128"))) |
(compiler-rt/lib/builtins/cpu_model.c
defines some
symbols like __aarch64_have_lse_atomics
. GCC
commit)
__attribute__((cpu_dispatch(...)))
and __attribute__((cpu_specific(...)))
Supported by Intel C++ Compiler and later ported to Clang. GCC doesn't support the two attributes.
The declaration and definition can be in different translation units.
1 | echo '__attribute__((cpu_dispatch(ivybridge, atom, sandybridge))) void foo(void); int main(void) { foo(); }' > a.c |
__attribute__((target_version(...)))
Arm C Language Extensions introduced a new GNU attribute
target_version
. Clang 17
- https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
- Attributes in Clang#target_version
1 | int __attribute__((target_version("default"))) tv(void) { return 0; } |