UNDER CONSTRUCTION (COFF, Mach-O)
After the linker reads an input file (object file, shared object, archive, LLVM bitcode file), the most critical task is to process its symbol table.
There is a global symbol table. Every input symbol table may interact with the global one, and affect archive processing and future steps (LTO, relocation processing, as-needed shared objects, etc).
ELF
Symbol tables
An object file can optionally have symbol tables.
A relocatable object file almost always has a symbol table, which is represented by a section .symtab
of type SHT_SYMTAB
. The symbol table is sometimes called a "static symbol table".
An executable or shared object almost always has a dynamic symtable table, which is represented by a section .dynsym
of type SHT_DYNSYM
. The dynamic symbol table specifies defined and undefined symbols, which can be seen as its export and import lists. They are needed by runtime relocation processing and symbol binding. A position dependent statically linked executable usually has no dynamic symbol table, because (1) it usually does not need dynamic relocations and (2) there is only one component and every needed symbol is defined internally, no need for symbol binding.
An executable or shared object may optionally have a symbol table of type SHT_SYMTAB
. ld produces the symbol table (.symtab
) by default. strip
can remove it along with .strtab
. The static symbol table is a superset of the dynamic symbol table and has many entries (local symbols and other non-exported symbols) not needed by runtime. It has value for symbolization without debug information but otherwise not useful. Therefore an executable or shared object is usually post processed by strip --strip-all
which can remove .symtab
along with .strtab
and debug sections.
An archive is a like a tarball. It almost always contains multiple relocatable object files. Almost all archives have a symbol index which is a collection of (defined_symbol, member_name)
pairs. An archive requires special processing. See Dependency related linker options#Archive processing for details.
Symbols
A symbol table holds an array of entries. Each symbol table entry a symbol. Let's look at the representation of a 64-bit ELF object file:
1 | typedef struct { |
Here is the description from the ELF specification:
st_name
: This member holds an index into the object file's symbol string table, which holds the character representations of the symbol names. If the value is non-zero, it represents a string table index that gives the symbol name. Otherwise, the symbol table entry has no name.st_value
: This member gives the value of the associated symbol. Depending on the context, this may be an absolute value, an address, and so on; details appear below.st_size
: Many symbols have associated sizes. For example, a data object's size is the number of bytes contained in the object. This member holds 0 if the symbol has no size or an unknown size.st_info
: This member specifies the symbol's type and binding attributes. A list of the values and meanings appears below. The following code shows how to manipulate the values for both 32 and 64-bit objects.st_other
: This member currently specifies a symbol's visibility. A list of the values and meanings appears below. The following code shows how to manipulate the values for both 32 and 64-bit objects. Other bits contain 0 and have no defined meaning.st_shndx
: Every symbol table entry is defined in relation to some section. This member holds the relevant section header table index. As thesh_link
and sh_info interpretation table and the related text describe, some section indexes indicate special meanings. If this member containsSHN_XINDEX
, then the actual section header index is too large to fit in this field. The actual value is contained in the associated section of typeSHT_SYMTAB_SHNDX
.
Some explanation:
st_name
indicates the name.
st_shndx
and st_value
indicate whether the symbol is defined or undefined, and the associated section and the offset if defined.
st_info
encodes the type and the binding. For the type, STT_FILE
, STT_SECTION
and STT_TLS
are special. Most symbols are of type STT_NOTYPE
, STT_OBJECT
, and STT_FUNC
. Other types are uncommon. The binding is a very important attribute. All of STB_LOCAL
, STB_GLOBAL
, and STB_WEAK
are important. A symbol of binding STB_LOCAL
is often called a local symbol. A local symbol must be defined. It is not visible outside the object file, therefore it does contribute to the global symbol table. STB_WEAK
represents a weak symbol. See Weak symbol for details. STB_GLOBAL
represents a regular symbol visible outside the object file. Both weak and global symbols contribute to the global symbol table.
st_other
encodes the visibility. The other bits are used by ppc64 ELFv2, AArch64, MIPS, etc. The visibility attribute represents different symbol resolution strategies for a non-local symbol. The linker only uses the information for a relocatable object file, not for a shared object.
A STV_HIDDEN
or STV_INTERNAL
symbol will be made STB_LOCAL
in the linker output. This provides a mechanism to ensure a relocatable object file symbol will not be visibile to other components. A STV_PROTECTED
symbol provides a way to defeat performance loss due to symbol interposition for a relocatable object file which will be linked into a shared object. STV_DEFAULT
is the default.
If multiple relocatable object files have a non-local symbol, the most constraining visibility will be the visibility in the output. The attributes, ordered from least to most constraining, are: STV_DEFAULT
, STV_PROTECTED
, STV_HIDDEN
, and STV_INTERNAL
. For a non-definition declaration in C/C++, we can make it STV_PROTECTED
or STV_HIDDEN
to ensure the symbol must be defined in the component. Actually, if every undefined is STV_PROTECTED
by default, the model will be similar to PE-COFF's non-export by default model.