GNU ld's output section layout is determined by a linker script,
which can be either internal (default) or external (specified with
-T
or -dT
). Within the linker script,
SECTIONS
commands define how input sections are mapped into
output sections.
Input sections not explicitly placed by SECTIONS
commands are termed "orphan
sections".
Orphan sections are sections present in the input files which are not explicitly placed into the output file by the linker script. The linker will still copy these sections into the output file by either finding, or creating a suitable output section in which to place the orphaned input section.
GNU ld's default behavior is to create output sections to hold these orphan sections and insert these output sections into appropriate places.
Orphan section placement is crucial because GNU ld's built-in linker
scripts, while understanding common sections like
.text
/.rodata
/.data
, are unaware
of custom sections. These custom sections should still be included in
the final output file.
- Grouping: Orphan input sections are grouped into orphan output sections that share the same name.
- Placement: These grouped orphan output sections are then inserted
into the output sections defined in the linker script. They are placed
near similar sections to minimize the number of
PT_LOAD
segments needed.
GNU ld's algorithm
GNU ld's orphan section placement algorithm is primarily specified
within ld/ldlang.c:lang_place_orphans
and
ld/ldelf.c:ldelf_place_orphan
.
lang_place_orphans
is a linker pass that is between
INSERT
processing and SHF_MERGE
section
merging.
The algorithm utilizes a structure (orphan_save
) to
associate desired BFD flags (e.g., SEC_ALLOC, SEC_LOAD
)
with specific section names (e.g., .text, .rodata
) and a
reference to the last associated output section.
For each output section that holds orphan sections:
- GNU ld identifies the matching
orphan_save
element based on the section's flags. - If an associated output section exists related the
orphan_save
element, the orphan section is placed after it. The associated output section is initialized to the specific section names (e.g.,.text, .rodata
), if present. - Otherwise, heuristics are applied to place the orphan section after
a similar existing section. For example:
- .rodata-like sections follow .text-like sections.
- .tdata-like sections follow .data-like sections.
- .sdata-like sections follow .data-like sections.
- .data-like sections can follow .rodata-like sections.
- The associated output section is replaced with the new output section. The next orphan output section of similar flags will be placed after the current output section.
For example, custom code section mytext
(with
SHF_ALLOC | SHF_EXECINSTR
) would typically be placed after
.text
, and custom data section mydata (with
SHF_ALLOC | SHF_WRITE
) after .data
.
1 | static struct orphan_save hold[] = |
For each orphan section, GNU ld maps it to the output section and
finds the orphan_save
element with matching flags. If the
associated output section is not null, the orphan section will be placed
after that output section. Otherwise, the orphan section will be after a
similar section using a few heuristics, e.g.
- .rodata-like sections can be placed after .text-like sections.
- .tdata-like sections can be placed after .data-like sections.
- .sdata-like sections can be placed after .data-like sections
- .data-like sections can be placed after .rodata-like sections
Noteworthy details:
.interp
and .rodata
have the same BFD
flags, but they are anchors for different sections.
SHT_NOTE
sections go after .interp
, while
other read-only sections go after .rodata
.
lld's algorithm
The LLVM linker lld implements a large subset of the GNU ld linker script. However, due to the lack of an official specification and the complexity of GNU ld, there can be subtle differences in behavior.
While lld strives to provide a similar linker script behavior, it occasionally makes informed decisions to deviate where deemed beneficial. We balance compatibility with practicality and interpretability.
Users should be aware of these potential discrepancies when transitioning from GNU ld to lld, especially when dealing with intricate linker script features.
Rank-based sorting
lld assigns a rank to each output section, calculated using various
flags like RF_NOT_ALLOC, RF_EXEC, RF_RODATA
, etc. Orphan
output sections are then sorted by these ranks.
1 | enum RankFlags { |
Finding the most similar section
For each orphan section, lld identifies the output section with the most similar rank. The similarity is determined by counting the number of leading zeros in the XOR of the two ranks.
1 |
|
Placement decision
The orphan section is placed either before or after the most similar section, based on a complex rule involving:
- The relative ranks of the orphan and similar section.
- The presence of PHDRS or MEMORY commands in the linker script.
- Scanning backward or forward through the script for suitable insertion points.
In essence:
- If the orphan section's rank is lower than the similar section's
rank, and no
PHDRS
/MEMORY
commands exist, it's placed before the similar section. - Otherwise, it's placed after the similar section, potentially skipping symbol assignments or output sections without input sections in the process.
1 | auto isOutputSecWithInputSections = [](SectionCommand *cmd) { |
Special case: last section
If the orphan section happens to be the last one, it's placed at the very end of the output, mimicking GNU ld's behavior for cases where the linker script fully specifies the beginning but not the end of the file.
Special case: skipping symbol assignments
It is common to surround an output section description with
encapsulation symbols. lld has a special case to not place orphans
between foo
and a following symbol assignment.
Backward scan example:
1 | begin_previous = .; |
Forward scan example:
1 | similar0 : { *(similar0) } |
However, an assignment to the location counter can serve as a barrier to stop the forward scan.
1 | begin_previous = .; |
Analysis
By employing this rank-based approach, lld provides an elegant
implementation that does not hard code specific section names (e.g.,
.text
/.rodata
/.data
). In GNU ld,
if you rename special section names
.text
/.rodata
/.data
in the linker
script, the output could become subtle different.
Portability
To maximize portability of linker scripts across different linkers, it's essential to establish clear boundaries for PT_LOAD segments. This can be achieved by:
- Explicit alignment: Utilizing
MAXPAGESIZE
alignment to distinctly separate sections within the linker script. - Anchoring sections: Ensuring that the first section in each
PT_LOAD
segment includes at least one input section, preventing ambiguous placement decisions by the linker.
By adhering to these guidelines, you can reduce reliance on linker-specific orphan section placement algorithms, promoting consistency across GNU ld and lld.
Disabling orphan sections
For projects that require absolute control over section placement,
GNU ld version 2.26 and later provides
--orphan-handling=[place|warn|error|discard]
. This allows
you to choose how orphan sections are handled:
- place (default): The linker places orphan sections according to its internal algorithm.
- warn: The linker places orphan sections but also issues warnings for each instance.
- error: The linker treats orphan sections as errors, preventing the linking process from completing.
- discard: The linker discards orphan sections entirely.