Linker symbol processing

UNDER CONSTRUCTION (COFF, Mach-O)

After the linker reads an input file (object file, shared object, archive, LLVM bitcode file), the most critical task is to process its symbol table.

There is a global symbol table. Every input symbol table may interact with the global one, and affect archive processing and future steps (LTO, relocation processing, as-needed shared objects, etc).

ELF

Symbol tables

An object file can optionally have symbol tables.

A relocatable object file almost always has a symbol table, which is represented by a section .symtab of type SHT_SYMTAB. The symbol table is sometimes called a "static symbol table".

An executable or shared object almost always has a dynamic symtable table, which is represented by a section .dynsym of type SHT_DYNSYM. The dynamic symbol table specifies defined and undefined symbols, which can be seen as its export and import lists. They are needed by runtime relocation processing and symbol binding. A position dependent statically linked executable usually has no dynamic symbol table, because (1) it usually does not need dynamic relocations and (2) there is only one component and every needed symbol is defined internally, no need for symbol binding.

An executable or shared object may optionally have a symbol table of type SHT_SYMTAB. ld produces the symbol table (.symtab) by default. strip can remove it along with .strtab. The static symbol table is a superset of the dynamic symbol table and has many entries (local symbols and other non-exported symbols) not needed by runtime. It has value for symbolization without debug information but otherwise not useful. Therefore an executable or shared object is usually post processed by strip --strip-all which can remove .symtab along with .strtab and debug sections.

An archive is a like a tarball. It almost always contains multiple relocatable object files. Almost all archives have a symbol index which is a collection of (defined_symbol, member_name) pairs. An archive requires special processing. See Dependency related linker options#Archive processing for details.

Symbols

A symbol table holds an array of entries. Each symbol table entry a symbol. Let's look at the representation of a 64-bit ELF object file:

typedef struct {
  uint32_t      st_name;
  unsigned char st_info;
  unsigned char st_other;
  uint16_t      st_shndx;
  uint64_t      st_value;
  uint64_t      st_size;
} Elf64_Sym;

Here is the description from the ELF specification:

st_name: This member holds an index into the object file's symbol string table, which holds the character representations of the symbol names. If the value is non-zero, it represents a string table index that gives the symbol name. Otherwise, the symbol table entry has no name.
st_value: This member gives the value of the associated symbol. Depending on the context, this may be an absolute value, an address, and so on; details appear below.
st_size: Many symbols have associated sizes. For example, a data object's size is the number of bytes contained in the object. This member holds 0 if the symbol has no size or an unknown size.
st_info: This member specifies the symbol's type and binding attributes. A list of the values and meanings appears below. The following code shows how to manipulate the values for both 32 and 64-bit objects.
st_other: This member currently specifies a symbol's visibility. A list of the values and meanings appears below. The following code shows how to manipulate the values for both 32 and 64-bit objects. Other bits contain 0 and have no defined meaning.
st_shndx: Every symbol table entry is defined in relation to some section. This member holds the relevant section header table index. As the sh_link and sh_info interpretation table and the related text describe, some section indexes indicate special meanings. If this member contains SHN_XINDEX, then the actual section header index is too large to fit in this field. The actual value is contained in the associated section of type SHT_SYMTAB_SHNDX.

Some explanation:

st_name indicates the name.

st_shndx and st_value indicate whether the symbol is defined or undefined, and the associated section and the offset if defined.

st_info encodes the type and the binding. For the type, STT_FILE, STT_SECTION and STT_TLS are special. Most symbols are of type STT_NOTYPE, STT_OBJECT, and STT_FUNC. Other types are uncommon. The binding is a very important attribute. All of STB_LOCAL, STB_GLOBAL, and STB_WEAK are important. A symbol of binding STB_LOCAL is often called a local symbol. A local symbol must be defined. It is not visible outside the object file, therefore it does contribute to the global symbol table. STB_WEAK represents a weak symbol. See Weak symbol for details. STB_GLOBAL represents a regular symbol visible outside the object file. Both weak and global symbols contribute to the global symbol table.

st_other encodes the visibility. The other bits are used by ppc64 ELFv2, AArch64, MIPS, etc. The visibility attribute represents different symbol resolution strategies for a non-local symbol. The linker only uses the information for a relocatable object file, not for a shared object.

A STV_HIDDEN or STV_INTERNAL symbol will be made STB_LOCAL in the linker output. This provides a mechanism to ensure a relocatable object file symbol will not be visibile to other components. A STV_PROTECTED symbol provides a way to defeat performance loss due to symbol interposition for a relocatable object file which will be linked into a shared object. STV_DEFAULT is the default.

If multiple relocatable object files have a non-local symbol, the most constraining visibility will be the visibility in the output. The attributes, ordered from least to most constraining, are: STV_DEFAULT, STV_PROTECTED, STV_HIDDEN, and STV_INTERNAL. For a non-definition declaration in C/C++, we can make it STV_PROTECTED or STV_HIDDEN to ensure the symbol must be defined in the component. Actually, if every undefined is STV_PROTECTED by default, the model will be similar to PE-COFF's non-export by default model.