Within the LLVM project, MC is a library responsible for handling assembly, disassembly, and object file formats. [Intro to the LLVM MC Project], which was written back in 2010, remains a good source to understand the high-level structures.
In the latest release cycle, substantial effort has been dedicated to refining MC's internal representation for improved performance and readability. These changes have decreased compile time significantly. This blog post will delve into the details, providing insights into the specific changes.
MCAssembler
and
MCAsmLayout
MCAssembler
manages assembler states (including
sections, symbols) and implements layout and object file writing after
parsing. MCAsmLayout
, tightly coupled with
MCAssembler
, was in charge of symbol and fragment offsets
during MCAssembler::Finish
. Many MCAssembler
and MCExpr
member functions have a
const MCAsmLayout &
parameter, contributing to slight
overhead.
I have started
to merge MCAsmLayout
into MCAssembler
and
simplify fragment management.
Fragments
Fragments, representing sequences of non-relaxable instructions,
relaxable instruction, alignment directives, and other elements.
MCDataFragment
and MCRelaxableFragment
, whose
sizes are crucial for memory consumption, have undergone several
optimizations:
- MCInst: decrease inline element count to 6
- [MC] Reduce size of MCDataFragment by 8 bytes by @aengelke
- [MC] Move MCFragment::Atom to MCSectionMachO::Atoms
The fragment management system has also been streamlined by
transitioning from a doubly-linked list (llvm::iplist
) to a
singly-linked
list, eliminating unnecessary overhead. A few prerequisite commits
removed backward iterator requirements.
Furthermore, I introduced
the "current fragment" concept (MCSteamer::CurFrag
)
allowing for faster appending of new fragments.
Symbols
@aengelke made two noticeable performance improvements:
In MCObjectStreamer
, newly defined labels were put into
a "pending label" list and initially assigned to a
MCDummyFragment
associated with the current section. The
symbols will be reassigned to a new fragment when the next instruction
or directive is parsed. This pending label system, while necessary for
aligned bundling, introduced complexity and potential for subtle
bugs.
To streamline this, I revamped the implementation by directly adjusting offsets of existing fragments, eliminating over 100 lines of code and reducing the potential for errors.
Details: In 2014, [MC]
Attach labels to existing fragments instead of using a separate
fragment introduced flushPendingLabels
aligned bundling
assembler extension for Native Client. [MC] Match labels to existing
fragments even when switching sections., built on top of
flushPendingLabels
, added further complication. Worse, a
lot of directive handling code have to add
flushPendingLabels
and a missing
flushPendingLabels
could lead to subtle bugs related to
incorrect symbol values.
For the following code, aligned bundling requires that
.Ltmp
is defined at addl
.
1 | $ clang var.c -S -o - -fPIC -m32 |
( MCAsmStreamer
doesn't call
flushPendingLabels
in its handlers. This is the reason that
we cannot change MCAsmStreamer::getAssemblerPtr
to use a
MCAssembler
and change
AsmParser::parseExpression
. )
Sections
Section handling was also refined. MCStreamer maintains a a section
stack for features like
.push_section
/.pop_section
/.previous
directives. Many functions relied on the section stack for loading the
current section, which introduced overhead due to the additional
indirection and nullable return values.
By leveraging the "current fragment" concept, the need for the section stack was eliminated in most cases, simplifying the codebase and improving efficiency.
I have eliminated nullable getCurrentSectionOnly
uses
and changed
getCurrentSectionOnly
to leverage the "current fragment"
concept. This change also revealed
an interesting quirk in NVPTX assembly related to DWARF
sections.
Section symbols
Many section creation functions (MCContext::get*Section
)
had a const char *BeginSymName
parameter to support the section symbol concept. This led to issues
when we want to treat the section name as a symbol. In 2017, the
parameter was removed
for ELF, streamlining section symbol handling.
I changed the way MC handles section symbols for COFF and removed the unused parameters for WebAssembly. The work planned for XCOFF is outlined in https://github.com/llvm/llvm-project/issues/96810.
Expression evaluation
Expression evaluation in MCAssembler::layout
previously
employed a complex lazy evaluation algorithm, which aimed to minize the
number of fragment relaxation. It proved difficult to understand and
resulted in complex recursion
detection.
To address this, I removed lazy evaluation in favor of eager
fragment relaxation. This simplification improved the reliability of
the layout process, eliminating the need for intricate workarounds like
the MCFragment::IsBeingLaidOut
flag introduced earlier.
Note: the benefit of lazy evaluation largely diminished when https://reviews.llvm.org/D76114 invalidated all sections to fix the correctness issue for the following assembly:
1 | .section .text1,"ax" |
In addition, I removed an overload of isSymbolRefDifferenceFullyResolvedImpl, enabling constant folding for variable differences in Mach-O.
Target-specific features misplaced in the generic implementation
I have made efforts to relocate target-specific functionalities to their respective target implementations:
- [MC,X86] emitInstruction: remove virtual function calls due to Intel JCC Erratum
- [MC,X86] De-virtualize emitPrefix
- [MC] Move Mach-O specific getAtom and isSectionAtomizableBySymbols to Mach-O files
- [MC] Move ELFWriter::createMemtagRelocs to AArch64TargetELFStreamer::finish
Summary
LLVM 19 introduces significant enhancements to the integrated assembler, resulting in notable performance gains, reduced memory usage, and a more streamlined codebase. These optimizations pave the way for future improvements.
I compiled the preprocessed SQLite Amalgamation (from llvm-test-suite) using a Release build of clang:
build | 2024-05-14 | 2024-06-30 |
---|---|---|
-O0 | 0.5304 | 0.4942 |
-O0 -g | 0.8818 | 0.8026 |
-O2 | 6.249 | 6.087 |
-O2 -g | 7.931 | 7.659 |
clang -c -w sqlite3.i
The AsmPrinter pass, which couples the assembler, dominates the
-O0
compile time. I have modified the
-ftime-report
mechanism to decrease the per-instruction
overhead. The decrease in compile time matches the decrease in the spent
in AsmPrinter. Coupled with a recent observation that BOLT, which
heavily utilizes MC, is ~8% faster, it's clear that MC modifications
have yielded substantial improvements.
Roadmap
Symbol redefinition
llvm-mc: Diagnose misuse (mix) of defined symbols and labels. added redefinition error. This was refined many times. I hope to fix this in the future.
Addressing Mach-O weakness
The Mach-O assembler lacks the robustness of its ELF counterpart.
Notably, certain aspects of the Mach-O implementation, such as the
conditions for constant folding in
MachObjectWriter::isSymbolRefDifferenceFullyResolvedImpl
(different for x86-64 and AArch64), warrant revisiting.
Additionally, the Mach-O has a hack to maintain compatibility with Apple cctools assembler, when the relocation addend is non-zero.
1 | .data |
This leads to another workaround in
MCFragment.cpp:getSymbolOffsetImpl
([MC] Recursively calculate
symbol offset), which is to support the following assembly:
1 | l_a: |