Integrated assembler improvements in LLVM 19
2024-6-30 15:0:0 Author: maskray.me(查看原文) 阅读量:1 收藏

Within the LLVM project, MC is a library responsible for handling assembly, disassembly, and object file formats. [Intro to the LLVM MC Project], which was written back in 2010, remains a good source to understand the high-level structures.

In the latest release cycle, substantial effort has been dedicated to refining MC's internal representation for improved performance and readability. These changes have decreased compile time significantly. This blog post will delve into the details, providing insights into the specific changes.

MCAssembler and MCAsmLayout

MCAssembler manages assembler states (including sections, symbols) and implements layout and object file writing after parsing. MCAsmLayout, tightly coupled with MCAssembler, was in charge of symbol and fragment offsets during MCAssembler::Finish. Many MCAssembler and MCExpr member functions have a const MCAsmLayout & parameter, contributing to slight overhead.

I have started to merge MCAsmLayout into MCAssembler and simplify fragment management.

Fragments

Fragments, representing sequences of non-relaxable instructions, relaxable instruction, alignment directives, and other elements. MCDataFragment and MCRelaxableFragment, whose sizes are crucial for memory consumption, have undergone several optimizations:

The fragment management system has also been streamlined by transitioning from a doubly-linked list (llvm::iplist) to a singly-linked list, eliminating unnecessary overhead. A few prerequisite commits removed backward iterator requirements.

Furthermore, I introduced the "current fragment" concept (MCSteamer::CurFrag) allowing for faster appending of new fragments.

Symbols

@aengelke made two noticeable performance improvements:

In MCObjectStreamer, newly defined labels were put into a "pending label" list and initially assigned to a MCDummyFragment associated with the current section. The symbols will be reassigned to a new fragment when the next instruction or directive is parsed. This pending label system, while necessary for aligned bundling, introduced complexity and potential for subtle bugs.

To streamline this, I revamped the implementation by directly adjusting offsets of existing fragments, eliminating over 100 lines of code and reducing the potential for errors.

Details: In 2014, [MC] Attach labels to existing fragments instead of using a separate fragment introduced flushPendingLabels aligned bundling assembler extension for Native Client. [MC] Match labels to existing fragments even when switching sections., built on top of flushPendingLabels, added further complication. Worse, a lot of directive handling code have to add flushPendingLabels and a missing flushPendingLabels could lead to subtle bugs related to incorrect symbol values.

For the following code, aligned bundling requires that .Ltmp is defined at addl.

1
2
3
4
5
6
7
8
9
$ clang var.c -S -o - -fPIC -m32
...
.bundle_lock align_to_end
calll .L0$pb
.bundle_unlock
.L0$pb:
popl %eax
.Ltmp0:
addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax

( MCAsmStreamer doesn't call flushPendingLabels in its handlers. This is the reason that we cannot change MCAsmStreamer::getAssemblerPtr to use a MCAssembler and change AsmParser::parseExpression. )

Sections

Section handling was also refined. MCStreamer maintains a a section stack for features like .push_section/.pop_section/.previous directives. Many functions relied on the section stack for loading the current section, which introduced overhead due to the additional indirection and nullable return values.

By leveraging the "current fragment" concept, the need for the section stack was eliminated in most cases, simplifying the codebase and improving efficiency.

I have eliminated nullable getCurrentSectionOnly uses and changed getCurrentSectionOnly to leverage the "current fragment" concept. This change also revealed an interesting quirk in NVPTX assembly related to DWARF sections.

Section symbols

Many section creation functions (MCContext::get*Section) had a const char *BeginSymName parameter to support the section symbol concept. This led to issues when we want to treat the section name as a symbol. In 2017, the parameter was removed for ELF, streamlining section symbol handling.

I changed the way MC handles section symbols for COFF and removed the unused parameters for WebAssembly. The work planned for XCOFF is outlined in https://github.com/llvm/llvm-project/issues/96810.

Expression evaluation

Expression evaluation in MCAssembler::layout previously employed a complex lazy evaluation algorithm, which aimed to minize the number of fragment relaxation. It proved difficult to understand and resulted in complex recursion detection.

To address this, I removed lazy evaluation in favor of eager fragment relaxation. This simplification improved the reliability of the layout process, eliminating the need for intricate workarounds like the MCFragment::IsBeingLaidOut flag introduced earlier.

Note: the benefit of lazy evaluation largely diminished when https://reviews.llvm.org/D76114 invalidated all sections to fix the correctness issue for the following assembly:

1
2
3
4
5
6
7
8
.section .text1,"ax"
.skip after-before,0x0
.L0:

.section .text2
before:
jmp .L0
after:

In addition, I removed an overload of isSymbolRefDifferenceFullyResolvedImpl, enabling constant folding for variable differences in Mach-O.

Target-specific features misplaced in the generic implementation

I have made efforts to relocate target-specific functionalities to their respective target implementations:

Summary

LLVM 19 introduces significant enhancements to the integrated assembler, resulting in notable performance gains, reduced memory usage, and a more streamlined codebase. These optimizations pave the way for future improvements.

I compiled the preprocessed SQLite Amalgamation (from llvm-test-suite) using a Release build of clang:

build 2024-05-14 2024-06-30
-O0 0.5304 0.4942
-O0 -g 0.8818 0.8026
-O2 6.249 6.087
-O2 -g 7.931 7.659

clang -c -w sqlite3.i

The AsmPrinter pass, which couples the assembler, dominates the -O0 compile time. I have modified the -ftime-report mechanism to decrease the per-instruction overhead. The decrease in compile time matches the decrease in the spent in AsmPrinter. Coupled with a recent observation that BOLT, which heavily utilizes MC, is ~8% faster, it's clear that MC modifications have yielded substantial improvements.

Roadmap

Symbol redefinition

llvm-mc: Diagnose misuse (mix) of defined symbols and labels. added redefinition error. This was refined many times. I hope to fix this in the future.

Addressing Mach-O weakness

The Mach-O assembler lacks the robustness of its ELF counterpart. Notably, certain aspects of the Mach-O implementation, such as the conditions for constant folding in MachObjectWriter::isSymbolRefDifferenceFullyResolvedImpl (different for x86-64 and AArch64), warrant revisiting.

Additionally, the Mach-O has a hack to maintain compatibility with Apple cctools assembler, when the relocation addend is non-zero.

1
2
3
4
5
6
7
8
9
10
.data
a = b + 4
.long a # ARM64_RELOC_UNSIGNED(a) instead of b; This might work around the linker bug(?) when the referenced symbol is b and the addend is 4.

c = d
.long c # ARM64_RELOC_UNSIGNED(d)

y:
x = y + 4
.long x # ARM64_RELOC_UNSIGNED(x) instead of y

This leads to another workaround in MCFragment.cpp:getSymbolOffsetImpl ([MC] Recursively calculate symbol offset), which is to support the following assembly:

1
2
3
4
l_a:
l_b = l_a + 1
l_c = l_b
.long l_c

文章来源: https://maskray.me/blog/2024-06-30-integrated-assembler-improvements-in-llvm-19
如有侵权请联系:admin#unsafe.sh