A new relocation format for ELF

UNDER CONSTRUCTION still working on experiments

ELF's design emphasizes natural size and alignment guidelines for its control structures. This principle, outlined in Proceedings of the Summer 1990 USENIX Conference, ELF: An Object File to Mitigate Mischievous Misoneism, promotes ease of random access for structures like program headers, section headers, and symbols.

All data structures that the object file format defines follow the "natural" size and alignment guidelines for the relevant class. If necessary, data structures contain explicit padding to ensure 4-byte alignment for 4-byte objects, to force structure sizes to a multiple of four, etc. Data also have suitable alignment from the beginning of the file. Thus, for example, a structure containing an E1£32_Addr member will be aligned on a 4-byte boundary within the file. Other classes would have appropriately scaled definitions. To illustrate, the 64-bit class would define E1£64 Addr as an 8-byte object, aligned on an 8-byte boundary. Following the strictest alignment for each object allows the format to work on any machine in a class. That is, all ELF structures on all 32-bit machines have congruent templates. For portability, ELF uses neither bit-fields nor floating-point values, because their representations vary, even among pro- cessors with the same byte order. Of course the pro- grams in an ELF file may use these types, but the format itself does not.

While beneficial for many control structures, the natural size guideline does have drawbacks. Relocations, which are typically processed sequentially, don't gain the same random-access advantages. The large 24-byte Elf64_Rela structure highlights this. Exploring object file formats#Relocations compares relocations from different object file formats.

Furthermore, Elf32_Rel and Elf32_Rela sacrifice flexibility to maintain a smaller size, limiting relocation types to a maximum of 255. This constraint has become noticeable for AArch32 and RISC-V. The 24-bit symbol index field is also less elegant, but real-world use cases haven't run into a problem due to this bound.

In contrast, the WebAssembly object file format uses LEB128 encoding for relocations and other constrol structures. This approach offers a significant size advantage over ELF.

I will explore some real-world scenarios where relocation size plays a significant role and propose an alternative format inspired by WebAssembly.

Use cases

Dynamic relocations

A substantial part of position-independent executables (PIEs) and dynamic shared objects (DSOs) is occupied by dynamic relocations. While RELR (a compact relative relocation format) offers size-saving benefits for relative relocations, other dynamic relocations can benefit from a compact relocation format.

ld.lld --pack-dyn-relocs=android was an earlier design that applies to all dynamic relocations at the cost of complexity.

Additionally, Apple linkers and dyld use LEB128 encoding for bind opcodes.

`.llvm_addrsig`

On many Linux targets, Clang emits a special section called .llvm_addrsig (type SHT_LLVM_ADDRSIG, LLVM address-significance table) by default to allow ld.lld --icf=safe. The .llvm_addrsig section stores symbol indexes in ULEB128 format, independent of relocations. Consequently, tools like ld -r and objcopy risk invalidate the section due to symbol table modifications.

Ideally, using relocations would allow certain operations. However, the size concern of REL/RELA in ELF hinders this approach. In contrast, lld's Mach-O port chose a relocation-based representation for __DATA,__llvm_addrsig.

`.llvm.call-graph-profile`

LLVM leverages a special section called .llvm.call-graph-profile (type SHT_LLVM_CALL_GRAPH_PROFILE) for both instrumentation- and sample-based profile-guided optimization (PGO). lld utilizes this information ((from_symbol, to_symbol, weight) tuples) to optimize section ordering within an input section description, enhancing cache utilization and minimizing TLB thrashing.

Similar to .llvm_addrsig, the .llvm.call-graph-profile section initially faced the symbol index invalidation problem, which was solved by switching to relocations. I opted for REL over RELA to reduce code size.

`.debug_names`

DWARF v5 accelerated name-based access with the introduction of the .debug_names section. However, in a clang -g -gpubnames generated relocatable file, the .rela.debug_names section can consume a significant portion (approximately 10%) of the file size. This size increase has sparked discussions within the LLVM community about potentially altering the file format for linking purposes.

The availability of a more compact relocation format would likely alleviate the need for such format changes.

Compressed relocations

While the standard SHF_COMPRESSED feature is commonly used for debug sections, its application can easily extend to relocation sections. I have developed a Clang/lld prototype that demonstrates this by compressing SHT_RELA sections.

The compressed SHT_RELA section occupies sizeof(Elf64_Chdr) + size(compressed) bytes. The implementation retains uncompressed content if compression would result in a larger size.

In scenarios with numerous smaller relocation sections (such as when using -ffunction-sections -fdata-sections), the 24-byte Elf64_Chdr header can introduce significant overhead. This observation raises the question of whether encoding Elf64_Chdr fields using ULEB128 could further optimize file sizes. With larger monolithic sections (.text, .data, .eh_frame), compression ratio would be higher as well.


configure-llvm s2-custom0 -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS='clang;lld'
configure-llvm s2-custom1 -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS='clang;lld' -DCMAKE_{C,CXX}_FLAGS=-Xclang=--compress-relocations=zstd
ninja -C /tmp/out/s2-custom0 lld
ninja -C /tmp/out/s2-custom1 lld

ruby -e 'p Dir.glob("/tmp/out/s2-custom0/**/*.o").sum{|f| File.size(f)}'  
ruby -e 'p Dir.glob("/tmp/out/s2-custom1/**/*.o").sum{|f| File.size(f)}'

Despite the overhead of -ffunction-sections -fdata-sections, the compression technique yields a significant reduction of 14.5%!

RELLEB

The 1990 ELF paper ELF: An Object File to Mitigate Mischievous Misoneism says "ELF allows extension and redefinition for other control structures." Let's explore a new relocation format similar to WebAssembly's.

A SHT_RELLEB section begins with the number of relocations encoded in ULEB128. A sequence of relocation entries follow the header.

typedef struct {
  Elf32_Addr r_offset; 
  Elf32_Word r_type;   
  Elf32_Word r_symidx; 
  Elf32_Sxword r_addend; 
} Elf32_Relleb;

typedef struct {
  Elf64_Addr r_offset; 
  Elf64_Word r_type;   
  Elf64_Word r_symidx; 
  Elf64_Sxword r_addend; 
} Elf64_Relleb;

Here's the core concept:

SLEB128 and delta encoding for r_offset:

Since section offsets can be large and relocations are typically ordered, storing the difference between consecutive relocation offsets offers potential for compression. In most cases, a single byte should suffice for encoding this offset delta. There are exceptions. For example, the general dynamic TLS model of s390/s390x uses a local "out-of-order" pair: R_390_PLT32DBL(offset=o) R_390_TLS_GDCALL(offset=o-2). This is not common and can be fixed by a redesign to TLSDESC. I think it makes sense to optimize for the most common cases and use ULEB128 instead of SLEB128.

SLEB128 and delta encoding for r_type:

While many psABIs define all relocation types smaller than 128 (encodable in a single byte using ULEB128), AArch64 utilizes larger values. Its static relocation types begin at 257, necessitating two bytes with ULEB128/SLEB128 encoding. Delta encoding offers an advantage in this scenario, allowing all but the first relocation's type to be encoded in a single byte. For x86-64, switching to ULEB128 for type encoding would result in minimal size reductions (~0.0003%).

An alternative design is to define a base type code in the header and let relocation entries' type relative to the base, which would introduces slight complexity.

ULEB128 for r_symidx:

Using SLEB128 and delta encoding instead of ULEB128 for the symbol index field would increase the total size by 0.4%. While potentially beneficial for dynamic relocations with consecutive indexes, this gain might be negligible in practice. So I decide to stick with ULEB128.

configure-llvm s2-custom2 -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS='clang;lld' -DCMAKE_{C,CXX}_FLAGS=-mllvm=-small-relocs
# with a hack to use lld

ruby -e 'p Dir.glob("/tmp/out/s2-custom2/**/*.o").sum{|f| File.size(f)}'  # 115480056