Note: The article will likely get frequent updates in the next few days.
This article describes some approaches to distribute debug information. Commands below will use two simple C files for demonstration.
1 | cat > a.c <<eof |
This is the simplest model. Debug information resides in the executable or shared object.
1 | gcc -c -g a.c b.c |
The linker collects input debug sections, resolves relocations, does minimum merging (SHF_STRING
merge for .debug_str
and .debug_line_str
), and combines them into output debug sections.
1 | 0 0 a7 1 .debug_abbrev |
Separate debug files
Debug information is large and not needed by many people. As a general size optimization, many distributions don't provide debug information in main software packages. For debugging needs, distributions may provide debug information in separate packages, leveraging a debugger feature that debug information may reside in a separate file. See https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html.
objcopy from binutils-gdb can create a separate debug file (since 2003).
1 | objcopy --only-keep-debug a a.debug |
eu-strip from elfutils can generate two output files in one invocation.
1 | eu-strip -f a.debug a -o a.stripped |
The elfutils way is convenient for simple use cases and is adopted by rpm. But there is general ambiguity whether an operation applies to one output file or both. The --only-keep-debug
way is orthogonal and integrates well with other features (e.g. --compress-debug-sections
, --remove-section
). I favor --only-keep-debug
and implemented it in llvm-objcopy.
When debugging a.stripped
in gdb, use add-symbol-file -o xxx a.debug
to load the separate debug file.
Solaris 11 Update 1 introduced a similar feature called Ancillary Object. See Ancillary Objects: Separate Debug ELF Files For Solaris. They made a nice choice that ld -z ancilliary
creates two output files, saving one objcopy command in the GNU linking model.
.gnu_debuglink
objcopy --add-gnu-debuglink=a.debug a.stripped
adds a non-SHF_ALLOC
section .gnu_debuglink
to a.stripped
. The section contains a filename (no directory information) and a four-byte CRC checksum.
1 | % objdump -g a.stripped |
gdb has supported .gnu_debuglink
since 2003. It finds a debug file in the same directory of the executable and .debug/
relative to the directory.
1 | % gdb -ex q a.stripped |
Directories specified by debug-file-directory
are used as well. This option needs to be set before loading the inferior.
1 | % pwd |
A debug file can be found by build ID. A build ID resides in an ELF note section. Many Linux distributions configure GCC with --enable-linker-build-id
to generate a build ID by default. See --build-id for the option.
1 | % readelf -Wn a.stripped |
lldb uses target.debug-file-search-paths
to locate a separate debug file. (TODO)
On Debian, if we install hello-dbgsym
, a debug file will be available in /usr/lib/debug
.
1 | % gdb -batch -ex 'show debug-file-directory' |
Debian uses dh_strip
for packaging commands. dh_strip
uses objcopy --only-keep-debug --compress-debug-sections
to compress debug sections.
MiniDebugInfo
See https://sourceware.org/gdb/onlinedocs/gdb/MiniDebugInfo.html (implemented in 2012). When a binary contains .gnu_debugdata
, gdb decompresses it with xz and loads it.
1 | objcopy --only-keep-debug a a.debug |
1 | % gdb -ex q a.stripped |
Fedora uses the feature to improve symbolization of stack traces in the absence of debug information. A Fedora MiniDebugInfo file mostly just provides note sections and a .symtab
with symbols not in .dynsym
. Non-SHF_ALLOC
SHT_PROGBITS/SHT_NOTE/SHT_NOBITS
sections (e.g. .comment
) are removed. Note sections are duplicated in the original binary and the MiniDebugInfo file due to a small missing optimization in eu-strip -f
.
See Support MiniDebugInfo in rpm for the original implementation. The new implementation is in scripts/find-debuginfo.in
in the debugedit
repository. Support for mini-debuginfo in LLDB introduces the lldb implementation.
Here is a simplified demonstration of what scripts/find-debuginfo.in
does:
1 | eu-strip --remove-comment -f a.mini a -o a.stripped |
DWARF supplementary object files
A DWARF supplementary object file contains debug sections which can be referenced by multiple executable and shared object files. dwz provides -m file
to extracts duplicate debug information into a supplementary object file and rewrite input files to reference the supplementary object file. The supplementary object file may contain debugging information entries, strings, and macro descriptions.
1 | print '#include <stdio.h>\nint main() { puts("hello"); }' | gcc -g -gdwarf-5 -xc - -o 1 |
3
is a supplementary object file with a few .debug_*
sections. Its .debug_sup
uses is_supplementary=1
.
1
gets a new section .debug_sup
with is_supplementary=0
. It may use DW_FORM_ref_sup4 (DW_FORM_GNU_ref_alt), DW_FORM_strp_sup (DW_FORM_GNU_strp_alt), DW_MACRO_define_sup, DW_MACRO_undef_sup, DW_MACRO_import_sup
attributes to reference the supplementary object file. In this simple example only DW_FORM_ref_sup4
and DW_FORM_strp_sup
attributes are used.
Before standardization in DWARF v5, the special section name .gnu_debugaltlink
is used.
Split DWARF object files
This was originally proposed as GCC debug fission. Later in DWARF version 5 this feature was standardized as split DWARF object files, commonly abbreviated as "split DWARF". The idea is to move the bulk of .debug_*
sections into a separate file (.dwo
) and leave just a small amount in the relocatable object file (.o
). .dwo
files are not handled by the linker: this reduces the input section combining work and relocation work for the linker, leading to smaller link time and lower memory usage. The smaller input has advantages for a distributed build farm.
1 | % clang -c -g -gsplit-dwarf a.c b.c |
(Note: don't perform compiling and linking in one action: the DWO files' location may be surprising.)
-ggnu-pubnames
implied by -gsplit-dwarf
generates .debug_gnu_pubnames
and .debug_gnu_pubtypes
sections which are used to build .gdb_index
by a linker (gold, ld.lld, mold).
When gdb loads a symbol file, it constructs an internal symbol table. .gdb_index
can improve startup time. With split DWARF, the .dwo
files as referenced by the executable or shared object will be fully parsed on demand.
Clang supports -gsplit-dwarf=single
to embed .dwo
sections in the relocatable object file. The compilation will not produce a .dwo
file. This mode is convenient for single-machine linking.
lldb uses target.debug-file-search-paths
for .dwo
searching but does not use a directory structure. (TODO)
Distributing a number of .dwo
files can be inconvenient. A DWARF package file (typically given the extension .dwp
) can be used in replace of .dwo
files. (Note: currently DWP has incomplete DWARF64 support.) dwp from binutils-gdb/gold can build a DWARF package file. llvm-dwp is another implementation in llvm-project, but there are some memory usage scaling problems.
1 | dwp a.dwo b.dwo -o a.dwp |
It is a TODO to integrate DWARF compressing (e.g. dwz) into dwp.
gold is under-maintained in recent years. dwp does not support DWARF v5 yet.
1 |
|
debuginfod
While gdb can hint that debug information is missing (e.g. Missing separate debuginfos, use: dnf debuginfo-install xxx
), the manual step of installing the relevant debug information package is considered by many as inconvenient. In 2019, elfutils introduced a new program debuginfod. Some distributions have hosted debug information in public servers. See https://www.redhat.com/en/blog/how-debuginfod-project-evolved-2021.
Here is an example demonstrating what debuginfod does:
1 | objcopy --only-keep-debug a a.debug |
debuginfod listens on port 8002 by default. -F
causes it to scan archives in the specified directory. -Z .tar.zst=zstdcat
tells it to use zstdcat to handle a .tar.zst
file (e.g. an Arch Linux package).
For every archive member, debuginfod classifies the file as an regular executable/shared object (with at least one SHF_ALLOC SHT_PROGBITS
section) or a debug file (with a .debug_*
section). By default debuginfod parses a .debug_line
debuginfod-find
is a client.
1 | buildid=$(readelf -n a | awk '/Build ID:/ {print $3}') |
We can query the server manually:
1 | curl -s localhost:8002/buildid/$buildid/debuginfo -o output && cmp a.debug output |
With set debuginfod enabled on
, gdb can query a debuginfod server if a symbol file is not found. As of today, a few programs support debuginfod, e.g. valgrind.
Main change for Arch Linux: debuginfod: Implement role
In llvm-project, llvm-debuginfod is an alternative implementation. It requires LLVM_ENABLE_HTTPLIB=on
.
Microsoft SymbolStore
This is similar to debuginfod. See https://github.com/dotnet/symstore/blob/main/docs/specs/SSQP_Key_Conventions.md.
Apple's "lazy" DWARF scheme
See http://wiki.dwarfstd.org/index.php?title=Apple%27s_%22Lazy%22_DWARF_Scheme.
Apple platforms use a model different from Linux distributions. Apple ld64 does not combine input __debug_*
sections into output sections. Instead, unless -S
is specified, debug map entries are emitted into the symbol table to record source files and relocatable object files. Among the entries, N_OS
and N_OSO
record source files and relocatable object files. N_FUN
gives value/size for a function. N_GSYM
/N_STSYM
describe a global/static variable symbol.
When debugging a program with lldb, lldb parses DWARF from dSYM bundles and (for N_OSO
entries) relocatable object files.
1 | clang -c -g a.c b.c |
dsymutil can create a dSYM bundle from relocatable object files. It takes an executable name or if -y
is specified a YAML debug map. dsymutil finds two relocatable object files a.o
and b.o
, combines and optimizes their DWARF information into a dSYM file. In addition, dsymutil builds an accelerator table (.apple_*
or .debug_names
).
Windows PDB
TODO