This article describes an interesting overflow bug in the ELF hash function.
The System V Application Binary Interface (generic ABI) specifies the
ELF object file format. When producing an output executable or shared
object needing a dynamic symbol table (.dynsym
), a linker
generates a .hash
section with type SHT_HASH
to hold a symbol
hash table. A DT_HASH
tag is produced to hold the
address of .hash
.
DT_HASH
is used by a dynamic loader to perform symbol
lookup (for dynamic relocations and dlsym
family
functions). A detailed description of the format can be found in ELF: symbol lookup via
DT_HASH
.
Other use cases
In a Solaris Version Definition Section, vd_hash
holds a
value generated using the ELF hash function. The GNU symbol versioning
scheme inherits this field from Solaris.
Overflow bug
The generic ABI gives the following code fragment in "Figure 5-13: Hashing Function".
1 | unsigned long |
The function is supposed to return a value no larger than 0x0fffffff.
Unfortunately, there is a bug. When unsigned long
consists
of more than 32 bits, the return value may be larger than
UINT32_MAX
. For instance,
elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12")
returns 0x100000002, which is clearly unintended, as the function should
behave the same way regardless of whether long
represents a
32-bit integer or a 64-bit integer.
It is possible to use 7-bit ASCII characters to trigger the issue.
For instance,
elf_hash((const unsigned char *)"iiiiii\ni")) == 100000009
.
Most ELF operating systems have switched from DT_HASH
to
DT_GNU_HASH
for many years.
Project survey
glibc fixed the overflow issue while optimizing the function in April 1995. The two XOR operations were optimized to one in Dec 2011.
binutils-gdb fixed the overflow issue in May 2003.
musl has had the elegant and efficient implementation since 2011
(initial check-in). It is worth noting that uint_fast32_t
is used, so that an architecture can optimize the implementation if the
architecture has slow 32-bit integer arithmetic operations.
1 | static uint32_t sysv_hash(const char *s0) |
Nathan Sidwell raised the
issue for llvm-project and pointed out a bug about using
char
instead of unsigned char
on
2023-04-09.
I asked What if the result of elf_hash is larger than UINT32_MAX? on 2023-04-11.