PI_STATIC_AND_HIDDEN in glibc rtld
2022-4-24 15:0:0 Author: maskray.me(查看原文) 阅读量:33 收藏

In 2002, PI_STATIC_AND_HIDDEN was introduced into glibc rtld (runtime loader). This macro indicates whether accesses to static variables and hidden variables need dynamic relocations.

The static and hidden conditions are to confine the discussion to compile-time non-preemptible symbols (non-local STV_DEFAULT symbols may be preemptible). Only such variables are needed by rtld.

PI in the macro name is an abbreviation for "position independent". Here the usage is wrong: a code sequence using GOT is typically position-independent as well, but the dynamic relocation does not satisfy the requirement. Instead, the actual condition here is a code sequence not needing dynamic relocations. Most modern architectures provide PC-relative instructions: aarch64, riscv, and x86-64. x86-32 provides this with a detour: compute PC, add _GLOBAL_OFFSET_TABLE_-PC, then add S-_GLOBAL_OFFSET_TABLE_.

1
2
static int var;
int foo() { return ++var; }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# aarch64
adrp x1, .LANCHOR0
ldr w0, [x1, #:lo12:.LANCHOR0]
add w0, w0, 1
str w0, [x1, #:lo12:.LANCHOR0]
# riscv64
lla a5,.LANCHOR0
lw a0,0(a5)
addiw a0,a0,1
sw a0,0(a5)
# x86-32
call __x86.get_pc_thunk.dx
addl $_GLOBAL_OFFSET_TABLE_, %edx
movl [email protected](%edx), %eax
addl $1, %eax
movl %eax, [email protected](%edx)
# x86-64
movl var(%rip), %eax
addl $1, %eax
movl %eax, var(%rip)

Older architectures tend to use a GOT. See All about Global Offset Table.

The first task of rtld is to relocate itself and bind all symbols to itself. Afterward, non-preemptible functions and data can be freely accessed.

On architectures where a GOT entry is used to access a non-preemptible variable, rtld needs to be careful not to reference such variables before relative relocations are applied. In rtld.c, _dl_start has the following code:

1
2
3
4
5
6
7
8
9
10
if (bootstrap_map.l_addr)
{

ELF_DYNAMIC_RELOCATE (&bootstrap_map, NULL, 0, 0, 0);
}

__rtld_malloc_init_stubs ();


GLRO (dl_find_object) = &_dl_find_object;

_rtld_local_ro is a hidden global variable. Taking its address may be reordered before ELF_DYNAMIC_RELOCATE by the compiler. On an architecture using a GOT entry to load the address, the reordering will make the subsequent memory store (_rtld_local_ro.dl_find_object) to crash, since the GOT address is incorrect: it's zero or the link-time address instead of the run-time address.

powerpc32

I recently cleaned up the bootstrap code a bit with elf: Move elf_dynamic_do_Rel RTLD_BOOTSTRAP branches outside. Afterwards, GCC powerpc32 appears to reliably reorder _rtld_local_ro, causing ld.so to crash right away.

1
2
mkdir -p out/ppc; cd out/ppc
../../configure --prefix=/tmp/glibc/ppc --host=powerpc-linux-gnu CC=powerpc-linux-gnu-gcc CXX=powerpc-linux-gnu-g++ && make -j 50 && make -j 50 install && 'cp' -f /usr/powerpc-linux-gnu/lib/libgcc_s.so.1 /tmp/glibc/ppc/lib
1
2
3
% elf/ld.so
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
[1] 373503 segmentation fault elf/ld.so

I was pretty sure there is a relocation bug but was not immediately clear which piece of code may be at fault.

Void Linux ppc provides powerpc32 glibc and musl images. I downloaded one and fed it into qemu, booted it with qemu-system-ppc -machine mac99 -m 2047M -cdrom void-live-ppc-20210825.iso -net nic -net user,smb=$HOME/Dev -boot d. Thankfully the live CD has a disk of about 1GiB and I can install cifs-utils and gdb. gdb aborts immediately with a 5.13.12 kernel. Daniel Kolesa told me that 4.x may work, so I tried 4.4.261. It would be nice if somebody can fix the gdb and kernel incompatibility.

1
2
3
4
5
6
xbps-install -S
xbps-install cifs-utils gdb cgdb
mkdir ~/Dev
mount -t cifs -o vers=3.0 //10.0.2.4/qemu ~/Dev
cd ~/Dev/glibc/out/ppc
cgdb -ex 'directory ../../elf' -ex r elf/ld.so

gdb says stw r9,1168(r25) triggers SIGSEGV.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
% powerpc-linux-gnu-objdump --disassemble=_dl_start -S elf/ld.so
...
if (bootstrap_map.l_addr)
1f468: 40 96 01 64 bne cr5,1f5cc <_dl_start+0x3ac>
1f46c: 83 3e ff c4 lwz r25,-60(r30) # load the address of _rtld_local_ro from GOT => 0
1f470: 3a 61 00 10 addi r19,r1,16
bootstrap_map.l_relocated = 1;
1f474: a1 21 01 b0 lhz r9,432(r1)
1f478: 61 29 20 00 ori r9,r9,8192
1f47c: b1 21 01 b0 sth r9,432(r1)
__rtld_malloc_init_stubs ();
1f480: 4b ff d5 f1 bl 1ca70 <__rtld_malloc_init_stubs>
GLRO (dl_find_object) = &_dl_find_object;
1f484: 81 3e ff b8 lwz r9,-72(r30)
ElfW(Addr) entry = _dl_start_final (arg, &info);
1f488: 7e 64 9b 78 mr r4,r19
1f48c: 7f 83 e3 78 mr r3,r28
GLRO (dl_find_object) = &_dl_find_object;
1f490: 91 39 04 90 stw r9,1168(r25) # access 0+1168 => SIGSEGV
ElfW(Addr) entry = _dl_start_final (arg, &info);
1f494: 4b ff fb 9d bl 1f030 <_dl_start_final>

Then I confirm that the GOT entry corresponds to _rtld_local_ro.

1
2
3
4
5
6
% readelf -Ws elf/ld.so | grep 4ffb8
0004ffb8 00000016 R_PPC_RELATIVE 4efc8
% readelf -Ws elf/ld.so | grep 4efc8
7: 0004efc8 1192 OBJECT GLOBAL DEFAULT 14 [email protected]@GLIBC_PRIVATE
583: 0004efc8 1192 OBJECT LOCAL DEFAULT 14 _rtld_local_ro
726: 0004efc8 1192 OBJECT GLOBAL DEFAULT 14 _rtld_global_ro

elf: Move post-relocation code of _dl_start into _dl_start_final shall fix the bug.

Note: in the absence of a powerpc32 system, qemu-ppc-static -d in_asm elf/ld.so may provide some clue about the faulty basic block.

1
2
3
4
5
6
7
8
9
10
----------------
IN: _dl_start
0x4001f484: 813effb8 lwz r9, -0x48(r30)
0x4001f488: 7e649b78 mr r4, r19
0x4001f48c: 7f83e378 mr r3, r28
0x4001f490: 91390490 stw r9, 0x490(r25)
0x4001f494: 4bfffb9d bl 0x4001f030

qemu: uncaught target signal 11 (Segmentation fault) - core dumped
[1] 383218 segmentation fault qemu-ppc-static -d in_asm elf/ld.so

m68k

Last week I fixed a similar bug for m68k: m68k: Removal of ELF_DURING_STARTUP optimization broke ld.so.

ld.so has 671 R_68K_RELATIVE relocations and one R_68K_GLOB_DAT for [email protected]@GLIBC_2.4. The following function is used to apply a relocation. It is shared by self-relocation and relocation for other modules. The self-relocation code defines RTLD_BOOTSTRAP and needs just R_68K_RELATIVE, R_68K_GLOB_DAT, and R_68K_JMP_SLOT.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

static inline void __attribute__ ((unused, always_inline))
elf_machine_rela (struct link_map *map, struct r_scope_elem *scope[],
const Elf32_Rela *reloc, const Elf32_Sym *sym,
const struct r_found_version *version,
void *const reloc_addr_arg, int skip_ifunc)
{
Elf32_Addr *const reloc_addr = reloc_addr_arg;
const unsigned int r_type = ELF32_R_TYPE (reloc->r_info);

if (__builtin_expect (r_type == R_68K_RELATIVE, 0))
*reloc_addr = map->l_addr + reloc->r_addend;
else
{
...
switch (r_type)
{
case R_68K_COPY:
...
case R_68K_GLOB_DAT:
case R_68K_JMP_SLOT:
*reloc_addr = value;
break;

However, somehow many case labels were available for self-relocation. GCC compiles the switch statement into a jump table which requires loading an address from GOT. With some clean-up to generic relocation code, GCC decides to perform loop-invariant code motion and hoists the load of the jump table address. The hoisted load is before relative relocations are applied, so the jump table address is incorrect.

The foolproof approach is to add an optimization barrier (e.g. calling an non-inlinable function after relative relocations are resolved). That is non-trivial given the code structure. So Andreas Schwab suggested a simple approach by avoiding the jump table: handle just the essential relocations.

The faulty code concealed well and I could not have found it without a debugger. It took me a while to set up a m68k image using q800. The memory is limited to 1000MiB and the emulation is very slow. Linux 5.19 is expected to gain the support for a virtual Motorola 68000 machine. With qemu-system-m68k -M virt things will become better.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15


7z x debian-11.0.0-m68k-NETINST-1.iso install/kernels/vmlinux-5.16.0-5-m68k install/cdrom/initrd.gz
mv install/kernels/vmlinux-5.16.0-5-m68k install/cdrom/initrd.gz .

qemu-img create -f qcow2 debian-m68k.qcow2 8G
qemu-system-m68k -M q800 -m 1000m -serial none -serial mon:stdio -net nic,model=dp83932 -net user -kernel vmlinux-5.16.0-5-m68k -initrd initrd.gz -append 'console=ttyS0 vga=off' -drive file=debian-m68k.qcow2,format=qcow2 -drive file=debian-11.0.0-m68k-NETINST-1.iso,format=raw,media=cdrom -nographic -boot d



sudo qemu-nbd -c /dev/nbd0 m68k-deb10.qcow2

sudo qemu-nbd -d /dev/nbd0 m68k-deb10.qcow2

qemu-system-m68k -M q800 -m 1000M -kernel vmlinux-5.16.0-6-m68k -initrd initrd.img-5.16.0-6-m68k -append 'root=/dev/sda2 console=tty' -hda debian-m68k.qcow2 -net nic,model=dp83932 -net user,smb=$HOME/Dev

文章来源: https://maskray.me/blog/2022-04-24-pi-static-and-hidden-in-glibc-rtld
如有侵权请联系:admin#unsafe.sh