UNDER CONSTRUCTION (COFF, Mach-O)
After the linker reads an input file (object file, shared object, archive, LLVM bitcode file), the most critical task is to process its symbol table.
There is a global symbol table. Every input symbol table may interact with the global one, and affect archive processing and future steps (LTO, relocation processing, as-needed shared objects, etc).
ELF
Symbol tables
An object file can optionally have symbol tables.
A relocatable object file almost always has a symbol table, which is represented by a section .symtab
of type SHT_SYMTAB
. The symbol table is sometimes called a "static symbol table".
An executable or shared object almost always has a dynamic symtable table, which is represented by a section .dynsym
of type SHT_DYNSYM
. The dynamic symbol table specifies defined and undefined symbols, which can be seen as its export and import lists. They are needed by runtime relocation processing and symbol binding. A position dependent statically linked executable usually has no dynamic symbol table, because (1) it usually does not need dynamic relocations and (2) there is only one component and every needed symbol is defined internally, no need for symbol binding.
An executable or shared object may optionally have a symbol table of type SHT_SYMTAB
. ld produces the symbol table (.symtab
) by default. strip
can remove it along with .strtab
. The static symbol table is a superset of the dynamic symbol table and has many entries (local symbols and other non-exported symbols) not needed by runtime. It has value for symbolization without debug information but otherwise is not useful. Therefore an executable or shared object is usually post-processed by strip --strip-all
which can remove .symtab
along with .strtab
and debug sections.
An archive is like a tarball. It almost always contains multiple relocatable object files. Almost all archives have a symbol index which is a collection of (defined_symbol, member_name)
pairs. An archive requires special processing. See Dependency related linker options#Archive processing for details.
Symbols
A symbol table holds an array of entries. Each symbol table entry indicates a symbol. Let's look at the representation of a 64-bit ELF object file:
1 | typedef struct { |
Here is the description from the ELF specification:
st_name
: This member holds an index into the object file's symbol string table, which holds the character representations of the symbol names. If the value is non-zero, it represents a string table index that gives the symbol name. Otherwise, the symbol table entry has no name.st_value
: This member gives the value of the associated symbol. Depending on the context, this may be an absolute value, an address, and so on; details appear below.st_size
: Many symbols have associated sizes. For example, a data object's size is the number of bytes contained in the object. This member holds 0 if the symbol has no size or an unknown size.st_info
: This member specifies the symbol's type and binding attributes. A list of the values and meanings appears below. The following code shows how to manipulate the values for both 32 and 64-bit objects.st_other
: This member currently specifies a symbol's visibility. A list of the values and meanings appears below. The following code shows how to manipulate the values for both 32 and 64-bit objects. Other bits contain 0 and have no defined meaning.st_shndx
: Every symbol table entry is defined in relation to some section. This member holds the relevant section header table index. As thesh_link
and sh_info interpretation table and the related text describe, some section indexes indicate special meanings. If this member containsSHN_XINDEX
, then the actual section header index is too large to fit in this field. The actual value is contained in the associated section of typeSHT_SYMTAB_SHNDX
.
Explanation:
st_name
indicates the name.
st_shndx
and st_value
indicate whether the symbol is defined or undefined, and the associated section and the offset if defined. If st_shndx==SHN_UNDEF
, we say the symbol is undefined. For an undefined symbol foo
, we often say the object file references foo
. If st_shndx!=SHN_UNDEF
, we say the symbol is defined.
Some st_shndx
values are special. If st_shndx==SHN_ABS
, this is an absolute symbol. If st_shndx==SHN_COMMON
, this is a common symbol (FORTRAN COMMON
blocks or C tentative definitions). The binding must be STB_GLOBAL
. A common symbol can also be represented as hasing a type of STT_COMMON
but that is uncommon.
st_info
encodes the type and the binding. Among types, STT_FILE
, STT_SECTION
and STT_TLS
are special. Most symbols are of type STT_NOTYPE
, STT_OBJECT
, and STT_FUNC
. Other types are uncommon. The binding is a very important attribute. All of STB_LOCAL
, STB_GLOBAL
, and STB_WEAK
are important. A symbol of binding STB_LOCAL
is often called a local symbol. A local symbol must be defined. It is not visible outside the object file, therefore it does contribute to the global symbol table. STB_WEAK
represents a weak symbol. See Weak symbol for details. STB_GLOBAL
represents a regular symbol visible outside the object file. Both weak and global symbols contribute to the global symbol table.
st_other
encodes the visibility. The other bits are used by ppc64 ELFv2, AArch64, MIPS, etc. The visibility attribute represents different symbol resolution strategies for a non-local symbol. The linker only uses the information for a relocatable object file, not for a shared object.
A STV_HIDDEN
or STV_INTERNAL
symbol will be made STB_LOCAL
in the linker output. This provides a mechanism to ensure a relocatable object file symbol will not be visible to other components. A STV_PROTECTED
symbol provides a way to defeat performance loss due to symbol interposition for a relocatable object file which will be linked into a shared object. STV_DEFAULT
is the default.
If multiple relocatable object files have a non-local symbol, the most constraining visibility will be the visibility in the output. The attributes, ordered from least to most constraining, are: STV_DEFAULT
, STV_PROTECTED
, STV_HIDDEN
, and STV_INTERNAL
. For a non-definition declaration in C/C++, we can make it STV_PROTECTED
or STV_HIDDEN
to ensure the symbol must be defined in the component. Actually, if every undefined is STV_PROTECTED
by default, the model will be similar to PE-COFF's non-export by default model.
Symbol resolution
The following pseudocode gives a summary of input file processing in the linker.
1 | for file in input { |
The linker maintains a global symbol table for STB_GLOBAL
and STB_WEAK
symbols. The table can be seen as a collection mapping names to states. Each state encodes the symbol kind. In the following list, I place the LLD internal struct name before the description.
Undefined
: An undefined symbol only referended by shared objectsUndefined
: An undefined symbol referenced by at least one relocatable object fileLazyArchive
/LazyObjFile
: An entry in an archive index or a definition in a relocatable object file inside a pair of--start-lib
--end-lib
Shared
: A definition in a shared objectDefined
: A definition in a relocatable object file or LLVM bitcode file
The kinds are listed in increasing priority. If the symbol in the current input file has a higher priority than the global symbol table entry, the global symbol table entry will be overwritten.
Note the first Undefined
kind. Such an undefined symbol only referenced by shared objects will not contribute a symbol table entry to the output. It is needed to implement --no-allow-shlib-undefined
. Such an undefined symbol can make an executable link to know that the symbol needs to be exported if ends up defined.
We will use some examples to explain the symbol resolution rules.
Duplicate definitions between relocatable object files
Both a.o
and b.o
define STB_GLOBAL
foo
. The linker command line is ld a.o b.o
.
- For
a.o
, insertfoo
to the global symbol table. - For
b.o
, notice thatfoo
exists. Both areSTB_GLOBAL
=> duplicate definition error.
STB_GLOBAL overrides STB_WEAK between relocatable object files
a.o
defines STB_GLOBAL
foo
. c.o
defines STB_WEAK
foo
. The linker command line is ld a.o c.o
.
- For
a.o
, insertfoo
as aDefined
to the global symbol table. - For
c.o
, notice thatfoo
exists. TheSTB_GLOBAL
definition takes precedence.
For ld c.o a.o
, the existing STB_WEAK
definition will be overridden by the incoming STB_GLOBAL
definition.
Note: the STB_GLOBAL
overriding STB_WEAK
rule is between two relocatable object files.
STB_WEAK overrides common between relocatable object files
c.o
defines STB_WEAK
foo
. d.o
defines STB_GLOBAL SHN_COMMON
foo
.
Relocatable object file overriding shared object
a.so
defines STB_GLOBAL
foo
. c.o
defines STB_WEAK
foo
. The linker command line is ld a.so c.o
.
- For
a.so
, insertfoo
as aShared
to the global symbol table. - For
c.o
, notice thatfoo
exists. The relocatable object file definition wins.
For ld c.o a.so
, the definition in a.so
will be ignored.
Note: the binding in a shared object is ignored for symbol resolution. The STB_GLOBAL
overriding STB_WEAK
rule does not apply, because a shared object is involved.
First shared object wins
a.so
defines STB_GLOBAL
foo
. c.so
defines STB_WEAK
foo
. The linker command line is ld c.so a.so
.
- For
c.so
, insertfoo
as aShared
to the global symbol table. - For
a.so
, notice thatfoo
exists. The first shared object wins.
Note: the binding in a shared object is ignored for symbol resolution. The STB_GLOBAL
overriding STB_WEAK
rule does not apply, because two shared objects are involved.
An undefined symbol in a shared object does not change the binding
w.o
references STB_WEAK
foo
. x.so
references STB_GLOBAL
foo
. The linker command line is ld w.o x.so
.
- For
w.o
, insertfoo
as anUndefined
to the global symbol table. - For
x.so
, notice thatfoo
exists. The binding is unchanged.
The output binding is STB_WEAK
. For an executable link, -z defs
is the default. The linker will report an error.
Shared object overriding archive
Both a.o
and b.o
define foo
. b.a
contains b.o
. The linker command line is ld a.so b.a
.
- For
a.so
, insertfoo
as aShared
to the global symbol table. - For
b.a
, try inserting every symbol from the index to the global symbol table.foo
is already a shared definition, soa.so
wins.
ld a.so --start-lib b.o --end-lib
is similar.
0.o
references bar
. a.o
defines foo
. b.o
defines foo
and bar
. b.a
contains b.o
. The linker command line is ld 0.o a.so b.a
.
- For
0.o
, insertbar
as anUndefined
to the global symbol table. - For
a.so
, insertfoo
as aShared
to the global symbol table. - For
b.a
, try inserting every symbol from the index to the global symbol table.foo
is already a shared definition.bar
can resolve anUndefined
, so the member providingbar
(b.o
) is extracted. b.a(b.o)
is extracted and added like a relocatable object file. Itsfoo
definition overrides theShared
entry in the global symbol table.
The linker command line is ld 0.o a.so b.a
.
- For
0.o
, insertbar
as anUndefined
to the global symbol table. - For
b.a
,b.a(b.o)
is extracted. b.a(b.o)
is extracted and added like a relocatable object file. Itsfoo
definition overrides theUndefined
entry in the global symbol table.- For
a.so
, its shared definition loses to theDefined
entry in the global symbol table.
m.o
defines memcpy
. libc.a(memcpy.o)
defines memcpy
. The linker command line is ld ... m.o -lc
.
- For
m.o
, insertmemcpy
as aDefined
to the global symbol table. - For
libc.a
, try inserting every symbol from the index to the global symbol table. Some members are extracted. However, because no symbol defined bymemcpy.o
isUndefined
in the global symbol table,memcpy.o
is not extracted.
As a result, m.o
succeeds in shadowing libc.a(memcpy.o)
. In practice, m.o
may be an object file providing more mem*
optimized routines. As long as m.o
is before libc.a
and m.o
defines all libc.a(memcpy.o)
symbols which may be referenced, this interposing scheme will be reliable.