llvm-project 14 will be released soon. I added some lld/ELF notes to https://github.com/llvm/llvm-project/blob/release/14.x/lld/docs/ReleaseNotes.rst. Here I will elaborate on some changes.
--export-dynamic-symbol-list
has been added. (D107317) When I added--export-dynamic-symbol
to GNU ld, H.J. Lu asked me to add this option. I asked myself whether this was necessary but then realized this may help deprecate--dynamic-list
in the long term.--dynamic-list
is confusing. It has a different semantics for executables and shared objects. The symbolic intention for shared objects isn't clear.--why-extract
has been added to query why archive members/lazy object files are extracted. (D109572) This was a long missing feature from ld.lld-Map
. I picked a separate option because I realized that this need is often orthogonal to input section to output section map.- If
-Map
is specified,--cref
will be printed to the specified file. (D114663) A linker's stdout output is often interleaved with different information, so being able to redirect a piece of information to a file is useful. I think it would be nice if GNU ld had--cref=<file>
and not reused-Map
. -z bti-report
and-z cet-report
are now supported. (D113901)--lto-pgo-warn-mismatch
has been added. (D104431)- Archives without an index (symbol table) are now supported and work with
--warn-backrefs
. One may build such an archive withllvm-ar rcS [--thin]
to save space. (D117284) In 15.0.0, the archive symbol table will be entirely ignored. Archives and --start-lib has more context. - No longer deduplicate local symbol names at the default optimization level of
-O1
. This results in a larger.strtab
(usually less than 1%) but a faster link time. Use optimization level-O2
to restore the deduplication. In 15.0.0, the-O2
deduplication is dropped to help parallel.symtab
write. - In relocatable output, relocations to discarded symbols now use tombstone values. (D116946)
--compress-debug-sections=zlib
is now run in parallel.{clang,gcc} -gz
link actions are significantly faster. (D117853) Compressed debug sections#linkers has more context.- "relocation out of range" diagnostics and a few uncommon diagnostics now report an object file location beside a source file location. (D112518)
- The write of
.rela.dyn
andSHF_MERGE|SHF_STRINGS
sections (e.g..debug_str
) is now run in parallel.
Linker script changes:
- Orphan section placement now picks a more suitable segment. Previously the algorithm might pick a read-only segment for a writable orphan section and make the segment writable. (D111717)
- An empty output section moved by an
INSERT
comment now gets appropriate flags. (D118529) - Negation in a memory region attribute is now correctly handled. (D113771)
Architecture specific changes:
- The AArch64 port now supports adrp+ldr and adrp+add optimizations.
--no-relax
can suppress the optimization. (D112063) (D117614) - The x86-32 port now supports TLSDESC (
-mtls-dialect=gnu2
). (D112582) - The x86-64 port now handles non-RAX/non-adjacent
R_X86_64_GOTPC32_TLSDESC
andR_X86_64_TLSDESC_CALL
(-mtls-dialect=gnu2
). (D114416) - The x86-32 and x86-64 ports now support mixed TLSDESC and TLS GD, i.e. mixing objects compiled with and without
-mtls-dialect=gnu2
referencing the same TLS variable is now supported. (D114416) - For x86-64,
--no-relax
now suppressesR_X86_64_GOTPCRELX
andR_X86_64_REX_GOTPCRELX
GOT optimization (D113615) R_X86_64_PLTOFF64
is now supported. (D112386)R_AARCH64_NONE
,R_PPC_NONE
, andR_PPC64_NONE
in input REL relocation sections are now supported.
Breaking changes
e_entry
no longer falls back to the address of.text
if the entry symbol does not exist. Instead, a value of 0 will be written. (D110014)--lto-pseudo-probe-for-profiling
has been removed. In LTO, the compiler enables this feature automatically. (D110209)- Use of
--[no-]define-common
,-d
,-dc
, and-dp
will now get a warning. They will be removed or ignored in 15.0.0. (llvm-project#53660 <https://github.com/llvm/llvm-project/issues/53660>
_)
Speed
I use a -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXE_LINKER_FLAGS=-Wl,--push-state,$HOME/Dev/mimalloc/out/release/libmimalloc.a,--pop-state -DLLVM_ENABLE_PROJECTS='clang;lld' -DLLVM_TARGETS_TO_BUILD=X86
-fno-pic -no-pie
build. The host compiler is a close-to-main clang. (Compared with glibc malloc, linking against libmimalloc.a is 1.12x as fast.)
I have made dozens of changes scattering across the lld/ELF codebase to improve performance, e.g.
- Some changes as mentioned in the release notes
- [ELF] Remove unneeded SyntheticSection memset(, 0, )
- [ELF] Optimize replaceCommonSymbols
- [ELF] Optimize --wrap to only check non-local symbols
Linking a -DCMAKE_BUILD_TYPE=Release
build of clang:
1 | % hyperfine --warmup 2 --min-runs 16 "numactl -C 20-27 "/tmp/llvm-{13,14}/out/release/bin/ld.lld" @response.txt --threads=8" |
(--threads=2
=> 1.17x)
Linking a -DCMAKE_BUILD_TYPE=Debug
build of clang:
1 | % hyperfine --warmup 2 --min-runs 16 "numactl -C 20-27 "/tmp/llvm-{13,14}/out/release/bin/ld.lld" @response.txt --threads=8" |
(--threads=2
=> 1.11x)
Linking a default build of chrome:
1 | % hyperfine --warmup 2 --min-runs 16 "numactl -C 20-27 "/tmp/llvm-{13,14}/out/release/bin/ld.lld" @response.txt --threads=8" |
Memory usage
I have made some changes decreasing sizeof(SymbolUnion)
and sizeof(InputSection)
. There is a 1~2% decrease for some programs with several malloc implementations.
ThinLTO application will see more reduction. lld uses file-backed mmap to read input files. For ThinLTO indexing, the page buffers are nearly unused after symbol resolution. I have changed lld to call madvise(MADV_DONTNEED)
to overlap the page buffer memory with the memory allocated by LTO library (mostly ThinLTO import and export lists): https://reviews.llvm.org/D116367. This change led to a 16% reduction when linking a large executable.
I have made another change that changed the -–start-lib
code path to cache the symbol interning result, which led to 0.6% reduction: https://reviews.llvm.org/D116390.