Understanding orphan sections

GNU ld's output section layout is determined by a linker script, which can be either internal (default) or external (specified with -T or -dT). Within the linker script, SECTIONS commands define how input sections are mapped into output sections.

Input sections not explicitly placed by SECTIONS commands are termed "orphan sections".

Orphan sections are sections present in the input files which are not explicitly placed into the output file by the linker script. The linker will still copy these sections into the output file by either finding, or creating a suitable output section in which to place the orphaned input section.

GNU ld's default behavior is to create output sections to hold these orphan sections and insert these output sections into appropriate places.

Orphan section placement is crucial because GNU ld's built-in linker scripts, while understanding common sections like .text/.rodata/.data, are unaware of custom sections. These custom sections should still be included in the final output file.

Grouping: Orphan input sections are grouped into orphan output sections that share the same name.
Placement: These grouped orphan output sections are then inserted into the output sections defined in the linker script. They are placed near similar sections to minimize the number of PT_LOAD segments needed.

GNU ld's algorithm

GNU ld's orphan section placement algorithm is primarily specified within ld/ldlang.c:lang_place_orphans and ld/ldelf.c:ldelf_place_orphan. lang_place_orphans is a linker pass that is between INSERT processing and SHF_MERGE section merging.

The algorithm utilizes a structure (orphan_save) to associate desired BFD flags (e.g., SEC_ALLOC, SEC_LOAD) with specific section names (e.g., .text, .rodata) and a reference to the last associated output section.

For each output section that holds orphan sections:

GNU ld identifies the matching orphan_save element based on the section's flags.
If an associated output section exists related the orphan_save element, the orphan section is placed after it. The associated output section is initialized to the specific section names (e.g., .text, .rodata), if present.
Otherwise, heuristics are applied to place the orphan section after a similar existing section. For example:
- .rodata-like sections follow .text-like sections.
- .tdata-like sections follow .data-like sections.
- .sdata-like sections follow .data-like sections.
- .data-like sections can follow .rodata-like sections.
The associated output section is replaced with the new output section. The next orphan output section of similar flags will be placed after the current output section.

For example, custom code section mytext (with SHF_ALLOC | SHF_EXECINSTR) would typically be placed after .text, and custom data section mydata (with SHF_ALLOC | SHF_WRITE) after .data.

static struct orphan_save hold[] =
  {
    { ".text", SEC_HAS_CONTENTS | SEC_ALLOC | SEC_LOAD | SEC_READONLY | SEC_CODE, 0, 0, 0, 0 },
    { ".rodata", SEC_HAS_CONTENTS | SEC_ALLOC | SEC_LOAD | SEC_READONLY | SEC_DATA, 0, 0, 0, 0 },
    { ".tdata", SEC_HAS_CONTENTS | SEC_ALLOC | SEC_LOAD | SEC_DATA | SEC_THREAD_LOCAL, 0, 0, 0, 0 },
    { ".data", SEC_HAS_CONTENTS | SEC_ALLOC | SEC_LOAD | SEC_DATA, 0, 0, 0, 0 },
    { ".bss", SEC_ALLOC, 0, 0, 0, 0 },
    { 0, SEC_HAS_CONTENTS | SEC_ALLOC | SEC_LOAD | SEC_READONLY | SEC_DATA, 0, 0, 0, 0 },
    { ".interp", SEC_HAS_CONTENTS | SEC_ALLOC | SEC_LOAD | SEC_READONLY | SEC_DATA, 0, 0, 0, 0 },
    { ".sdata", SEC_HAS_CONTENTS | SEC_ALLOC | SEC_LOAD | SEC_DATA | SEC_SMALL_DATA, 0, 0, 0, 0 },
    { ".comment", SEC_HAS_CONTENTS, 0, 0, 0, 0 },
  };

For each orphan section, GNU ld maps it to the output section and finds the orphan_save element with matching flags. If the associated output section is not null, the orphan section will be placed after that output section. Otherwise, the orphan section will be after a similar section using a few heuristics, e.g.

.rodata-like sections can be placed after .text-like sections.
.tdata-like sections can be placed after .data-like sections.
.sdata-like sections can be placed after .data-like sections
.data-like sections can be placed after .rodata-like sections

Noteworthy details:

.interp and .rodata have the same BFD flags, but they are anchors for different sections. SHT_NOTE sections go after .interp, while other read-only sections go after .rodata.

lld's algorithm

The LLVM linker lld implements a large subset of the GNU ld linker script. However, due to the lack of an official specification and the complexity of GNU ld, there can be subtle differences in behavior.

While lld strives to provide a similar linker script behavior, it occasionally makes informed decisions to deviate where deemed beneficial. We balance compatibility with practicality and interpretability.

Users should be aware of these potential discrepancies when transitioning from GNU ld to lld, especially when dealing with intricate linker script features.

Rank-based sorting

lld assigns a rank to each output section, calculated using various flags like RF_NOT_ALLOC, RF_EXEC, RF_RODATA, etc. Orphan output sections are then sorted by these ranks.

enum RankFlags {
  RF_NOT_ADDR_SET = 1 << 27,
  RF_NOT_ALLOC = 1 << 26,
  RF_PARTITION = 1 << 18, 
  RF_LARGE_ALT = 1 << 15,
  RF_WRITE = 1 << 14,
  RF_EXEC_WRITE = 1 << 13,
  RF_EXEC = 1 << 12,
  RF_RODATA = 1 << 11,
  RF_LARGE = 1 << 10,
  RF_NOT_RELRO = 1 << 9,
  RF_NOT_TLS = 1 << 8,
  RF_BSS = 1 << 7,
};

Finding the most similar section

For each orphan section, lld identifies the output section with the most similar rank. The similarity is determined by counting the number of leading zeros in the XOR of the two ranks.





static int getRankProximity(OutputSection *a, SectionCommand *b) {
  auto *osd = dyn_cast<OutputDesc>(b);
  return (osd && osd->osec.hasInputSections)
             ? llvm::countl_zero(a->sortRank ^ osd->osec.sortRank)
             : -1;
}

Placement decision

The orphan section is placed either before or after the most similar section, based on a complex rule involving:

The relative ranks of the orphan and similar section.
The presence of PHDRS or MEMORY commands in the linker script.
Scanning backward or forward through the script for suitable insertion points.

In essence:

If the orphan section's rank is lower than the similar section's rank, and no PHDRS/MEMORY commands exist, it's placed before the similar section.
Otherwise, it's placed after the similar section, potentially skipping symbol assignments or output sections without input sections in the process.

auto isOutputSecWithInputSections = [](SectionCommand *cmd) {
  auto *osd = dyn_cast<OutputDesc>(cmd);
  return osd && osd->osec.hasInputSections;
};








bool mustAfter = script->hasPhdrsCommands() || !script->memoryRegions.empty();
if (cast<OutputDesc>(*i)->osec.sortRank <= sec->sortRank || mustAfter) {
  for (auto j = ++i; j != e; ++j) {
    if (!isOutputSecWithInputSections(*j))
      continue;
    if (getRankProximity(sec, *j) != proximity)
      break;
    i = j + 1;
  }
} else {
  for (; i != b; --i)
    if (isOutputSecWithInputSections(i[-1]))
      break;
}






if (std::find_if(i, e, isOutputSecWithInputSections) == e)
  return e;

while (i != e && shouldSkip(*i))
  ++i;
return i;

Special case: last section

If the orphan section happens to be the last one, it's placed at the very end of the output, mimicking GNU ld's behavior for cases where the linker script fully specifies the beginning but not the end of the file.

Special case: skipping symbol assignments

It is common to surround an output section description with encapsulation symbols. lld has a special case to not place orphans between foo and a following symbol assignment.

Backward scan example:

begin_previous = .;
previous : { *(previous) } // Found output section with a backward scan
end_previous = .;          // We should place the orphan after instead of before this symbol assignment

similar : { *(similar) }   // The most similar section found by the first step

Forward scan example:

similar0 : { *(similar0) }
begin_similar1 = .;
similar1 : { *(similar1) } // The most similar section found by the first step
end_similar1 = .;          // We should place the orphan after instead of before this symbol assignment

However, an assignment to the location counter can serve as a barrier to stop the forward scan.

begin_previous = .;
previous : { *(previous) } // Found output section with a backward scan
end_previous = .;          // We should place the orphan after instead of before this symbol assignment
symbol = .;                // We conservatively assume any symbol as a probable "end" symbol.
. = ALIGN(CONSTANT(MAXPAGESIZE)); // Barrier

similar : { *(similar) }   // The most similar section found by the first step

Analysis

By employing this rank-based approach, lld provides an elegant implementation that does not hard code specific section names (e.g., .text/.rodata/.data). In GNU ld, if you rename special section names .text/.rodata/.data in the linker script, the output could become subtle different.

Portability

To maximize portability of linker scripts across different linkers, it's essential to establish clear boundaries for PT_LOAD segments. This can be achieved by:

Explicit alignment: Utilizing MAXPAGESIZE alignment to distinctly separate sections within the linker script.
Anchoring sections: Ensuring that the first section in each PT_LOAD segment includes at least one input section, preventing ambiguous placement decisions by the linker.

By adhering to these guidelines, you can reduce reliance on linker-specific orphan section placement algorithms, promoting consistency across GNU ld and lld.

Disabling orphan sections

For projects that require absolute control over section placement, GNU ld version 2.26 and later provides --orphan-handling=[place|warn|error|discard]. This allows you to choose how orphan sections are handled:

place (default): The linker places orphan sections according to its internal algorithm.
warn: The linker places orphan sections but also issues warnings for each instance.
error: The linker treats orphan sections as errors, preventing the linking process from completing.
discard: The linker discards orphan sections entirely.