This article discusses ELF interposition and the linker option -Bsymbolic
and its friends. (I wrote -fno-semantic-interposition first but realized a reorganization would improve readability, so moved some parts and added more stuff to this new article.)
I added the -fno-semantic-interposition
and contributed some optimization in Clang 11 and felt motivated enough to write a post after I had seen a great post by Daniel Colascione ("Python is 1.3x faster when compiled in a way that re-examines shitty technical decisions from the 1990s.") and recent rant from Linus Torvalds on shared objects' performance issues.
Say, we have two default visibility functions f and g. g calls f. We compile g into a shared object. There are 3 cases for f.
First case: f is defined in the same translation unit of g. I will discuss this in depth in my next article -fno-semantic-interposition. One notable point: GCC -fpic
suppresses interprocedural optimizations including inlining for such non-inline external linkage functions.
1 |
|
Second case: f is defined in a different object file which will be linked into the same shared object.
1 |
|
Third case: f is defined in a different shared object or the executable. The symbol search on f cannot be prevented.
1 |
|
You can see that in all three cases no annotation is required on f and g. This reflects an important design philosophy of ELF: dynamic linking should be similar to static linking.
When linked as a shared object (-shared
), the linker notices that f
is preemptible and will resolve the branch target to a PLT entry with a dynamic relocation R_*_JUMP_SLOT
. The cost comes from two places:
- The dynamic relocation requires a symbol search by the dynamic loader.
- Every call site goes through a PLT indirection.
The third case is about an external call, where a symbol search cannot be prevented. But why do the first cases (where f
is defined in the same shared object) need a PLT? You may read from somewhere that any of the linker options can avoid the PLT: -Bsymbolic
, -Bsymbolic-functions
, --dynamic-list
. Read on.
Dynamic linking model in ELF
Since 2000-07-17, the ELF specification says the following for the STV_DEFAULT
visibility (this is the default visibility. You get this unless you do thing like -fvisibility=
or __attribute__((visibility(...)))
):
Global and weak symbols are also preemptable, that is, they may by preempted by (typo: be) definitions of the same name in another component."
In Chapter 5 Dynamic Linking, the specification says:
When the dynamic linker creates the memory segments for an object file, the dependencies (recorded in DT_NEEDED entries of the dynamic structure) tell what shared objects are needed to supply the program's services. By repeatedly connecting referenced shared objects and their dependencies, the dynamic linker builds a complete process image. When resolving symbolic references, the dynamic linker examines the symbol tables with a breadth-first search. That is, it first looks at the symbol table of the executable program itself, then at the symbol tables of the DT_NEEDED entries (in order), and then at the second level DT_NEEDED entries, and so on. Shared object files must be readable by the process; other permissions are not required.
The wording remains unchanged since then, i.e. the evolution of dynamic linking has not contributed back to the specification.
This paragraph is probably difficult to follow. Let me rephrase it with some additions of dynamic loader behaviors. The dynamic loader does one critical job: resolving dynamic relocations and binding symbol references from one component to another. (A component is an executable or shared object, sometimes called a module.) There is a flat namespace for symbol search. The dynamic loader computes a breadth-first search list (executable, needed0, needed1, needed2, needed0_of_needed0, needed1_of_needed0, ...
). For each symbol reference, the dynamic loader iterates over the list and finds the first component which provides a definition. (For dlsym
with an explicit handle, the symbol search uses the dependency order, a breadth-first search rooted at the handle.)
The implication is that STB_GLOBAL
and STB_WEAK
definitions are equivalent in terms of symbol search. A STB_WEAK
definition can preempt a STB_GLOBAL
definition.
While not mentioned in the ELF specification, many dynamic loader implementations allow the environment variable LD_PRELOAD
to inject shared objects. The effect is like the LD_PRELOAD
list is inserted at the beginning of the executable's DT_NEEDED
list. The search list may look like executable, preload0, preload1, needed0, needed1, needed2, needed0_of_preload0, ..., needed0_of_needed0, needed1_of_needed0, ...
(If the program calls dlopen
with RTLD_GLOBAL
, the newly loaded component and its dependencies (if not loaded) will be appended to the list.) Here is the algorithm:
1 | fn load(c) { |
Note that the executable is always the first element of the search list, so a defined symbol of any binding in the executable cannot be preempted (interposed). In a shared object, a default visibility STB_GLOBAL
or STB_WEAK
symbol can be preempted (interposed) because an earlier component may define a symbol of the same name. It may be a bit surprising that a defined default visibility STB_GLOBAL
symbol can be interposed.
Alternative symbol search models
Solaris names the above the default search model and introduced an alternative model: direct bindings. With -z defs
, one can ensure the dependencies are provided as part of the link and all symbol references are satisfied. The linker can record the bound component for each symbol reference.
Here is an example from Solaris's Linkers and Libraries Guide:
1 | $ elfdump -y W.so.2 |
With the information about the component name, the dynamic loader can speed up its symbol search by just looking at one component. In particular, frequently the bound component is the component itself.
In Mac OS X, the two-level namespace introduced in 10.1 (default unless you use ld -flat_namespace
) is a similar model.
Prelink can be conceived as a direct binding model without great ergonomics.
The standard ELF specification defines DF_SYMBOLIC
which can be conceived as a special case of direct bindings. When a shared object is marked as DF_SYMBOLIC
(set by ld -Bsymbolic
), the symbol search checks the shared object itself before starting the linear search from the executable. It is quite common for a shared object to call STV_DEFAULT
definitions in itself. DF_SYMBOLIC
can improve the performance greatly.
-Bsymbolic
The linker option -Bsymbolic
can be used together with -shared
. ld -shared -Bsymbolic
is very similar to -pie
.
-Bsymbolic
follows ELF DF_SYMBOLIC
semantics: all defined symbols are non-preemptible. This can optimize relocation processing:
- function calls: a branch instruction (e.g.
call foo@PLT
) will not create a PLT entry. The associatedR_*_JUMP_SLOT
dynamic relocation will be suppressed. - variable access and function addresses: the GOT entry will not cause a
R_*_GLOB_DAT
dynamic relocation. On x86-64, withR_X86_64_GOTPCRELX
/R_X86_64_REX_GOTPCRELX
, the GOT indirection code sequence can be rewritten. However, the code sequence is still longer than that without GOT. On PowerPC64, there is a similar TOC optimization. On other architectures, there is no difference on code sequences.
-fno-semantic-interposition
can address pessimization when the definition is the same as the use site. Working at the shared object level, -Bsymbolic
can address cross-translation-unit pessimization which cannot be optimized with -fno-semantic-interposition
.
However, in practice, deployment of -Bsymbolic
may run into pointer equality problems. Many objects in C++ are not clearly part of a single object file, but are required by the ODR to have a single definition. For example, C++ [dcl.inline]: "An inline function or variable with external or module linkage can be defined in multiple translation units ([basic.def.odr]), but is one entity with one address. A type or static variable defined in the body of such a function is therefore a single entity."
We will discuss variables and functions separately.
Pointer equality for variables
An inline variable with external linkage and a local static variable defined in an inline function with external linkage are required to be unique. The address of such a variable seen by a -Bsymbolic
linked shared object may be different from the address seen from outside the shared object. Fortunately it is uncommon to export such a vague linkage variable to both the executable and a shared object.
1 |
|
(ELF specific) In addition, a regular non-inline variable with external linkage can cause incompatibility problems due to copy relocations. GCC/Clang -fno-pic
emit direct access relocations referencing a global variable. If the global variable turns out to be defined in a shared object, there will be a copy relocation in the executable. The object the shared object sees and the executable sees will be different.
1 |
|
In Clang, the direct access relocation can be avoided with -fno-pic -fno-direct-access-access-external-data
. GCC feature request: PR98112.
See Copy relocations, canonical PLT entries and protected visibility for details.
(In C++, typeid()
on an incomplete class can define a typeinfo name object. A -Bsymbolic
linked shared object may see a different copy, but the address can hardly cause a problem.).
Pointer equality for functions
The address of an inline function seen by a -Bsymbolic
linked shared object may be different from the address seen from outside the shared object. Fortunately such cases are rare. Windows link.exe has Identical COMDAT Folding. ELF/Mach-O programs may use -fvisibility-inlines-hidden
. Assuming pointer equality will break Identical COMDAT Folding and -fvisibility-inlines-hidden
anyway.
In Mach-O, such symbols are placed into __LINKEDIT,__weak_binding
so that dyld can coalesce the definitions across dylibs.
(ELF specific) In addition, a regular non-inline function with external linkage can cause incompatibility problems due to canonical PLT entries. GCC/Clang -fno-pic
emit direct access relocations when taking the address of an external function. If the global variable turns out to be defined in a shared object, there will be a canonical PLT entry in the executable. The function address the shared object sees and the executable sees will be different.
I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 in the hope that the -fno-pic
behavior of direct access can be dropped.
-Bsymbolic-functions
The function incompatibility problems are uncommon. It is often benign when the function address seen by a shared object is different from outside the shared object. However, the variable case is usually severe: the executable and a shared object may act on different copies of a variable supposed to be the same entity.
In practice, we can usually use the linker option -Bsymbolic-functions
. The option applies to STT_FUNC
symbols in ld.lld and non-STT_OBJECT
symbols in GNU ld and gold, avoiding variable incompatibility problems. Though rare, it may make sense to add a linker option (say, -Bsymbolic-global-functions
which applies to STT_FUNC
STB_GLOBAL
symbols to bypass vague linkage STB_WEAK
symbols.
Relation with -fvisibility=protected
A non-default visibility symbol cannot be preempted, even if the binding is STB_WEAK
. -fvisibility=protected
can make all definitions protected and thus non-preemptible, nullifying the performance benefit of -fno-semantic-interposition
and -Bsymbolic
. Note: if you want a definition to be preemptible, you will need a default visibility attribute, even if it is weak (e.g. __attribute__((weak,visibility("default")))
).
However, -fvisibility=protected
shares the same problem with -Bsymbolic
: too coarse-grained. It can cause the same sets of problems as discussed above in Pointer equality for variables.
In GCC/binutils's x86 port, there is another STT_OBJECT
issue resulting in poor Clang interoperability.
1 | % cat a.s |
See Copy relocations, canonical PLT entries and protected visibility for details. There is no problem when you only use Clang and LLD.
-fvisibility=hidden
can make all definitions hidden and thus non-preemptible, nullifying the performance benefit of -fno-semantic-interposition
.
-fvisibility=hidden
requires annotation of exported symbols (__attribute__((visibility("default")))
). The explicit annotation sometimes makes it inconvenient to split and join libraries.
However, projects with Windows portability in mind will define macros to dispatch to either the visibility attribute or __declspec(dllexport)
.
Interaction with LD_PRELOAD
There are several types of LD_PRELOAD
usage.
First, use LD_PRELOAD=same_soname.so
to replace a DT_NEEDED
entry with the same SONAME. Both -fno-semantic-interposition
and -Bsymbolic
are compatible with such usage.
Second, use LD_PRELOAD=malloc.so
to intercept some functions not defined in the application or any of its shared object dependencies. Both -fno-semantic-interposition
and -Bsymbolic
are compatible.
1 | void *f() { return malloc(0xb612); } |
Third, use LD_PRELOAD=different_soname.so
to replace a function defined in a shared object dependency and the SONAME is different. (This usage is unlikely compatible with C++'s one definition rule.) Such usage is incompatible with -Bsymbolic
and -fno-semantic-interposition
.
The Last Alliance of ELF and Men
I wish that distributions default to -fno-semantic-interposition
and (in the long term) a variant of -Wl,-Bsymbolic-functions
, bringing back the lost performance for decades. We can start with a configure-time option, like GCC's --enable-default-pie
.
Such interposition doesn't work on macOS (by default) and Windows, so there is good chance that most pieces of portable software are already in a good state. However, I can imagine that there is still a decent amount of work by annotating software which cannot be built with -fno-semantic-interposition
or -Wl,-Bsymbolic-functions
. Distributions need to put into resources (likely less than the -fno-pic->-fPIE transition (GCC's --enable-default-pie)).
There is a trade-off and the downside is that LD_PRELOAD
replacing a fragment of a shared object will be more difficult. In some rare cases the user may need LD_PRELOAD: sometimes as a workaround for some broken software. I feel that distributions should not provide such flexibility by default at such a great cost. The users can build the software by themselves.
We need a linker option to cancel default -Bsymbolic-functions
. I have added -Bno-symbolic
to GNU ld and gold (binutils 2.37; PR27834) and ld.lld 13.
We need a -Bsymbolic-functions
variant which only applies to STB_GLOBAL
symbols (i.e. STB_WEAK
symbols are excluded). The address of an inline function is required to be unique in C++.
(From Peter Smith) The linker can introduce a debugging option for executables to catch accidental interposition, say, --warn-interposition
: "Warning symbol S of type STT_FUNC is defined in executable A and shared objects B and C, using definition in A."
We need an option to disable interposition for functions but enable interposition for variables, because we want to be compatible with copy relocations, which will require years to fix. GCC feature request.
GCC -fno-pic
should be fixed to use GOT to take address of an external default visibility function. PR100593.