Sanitize your C++ containers: ASan annotations step-by-step

By Dominik Klemba and Dominik Czarnota

AddressSanitizer (ASan) is a compiler plugin that helps detect memory errors like buffer overflows or use-after-frees. In this post, we explain how to equip your C++ code with ASan annotations to find more bugs. We also show our work on ASan in GCC and LLVM. In LLVM, Trail of Bits added annotations to the libc++ std::string and std::deque containers, enabled custom allocators for container annotations, and fixed bugs in libc++!

Container overflows

As mentioned in our “Understanding AddressSanitizer” blog post, ASan cannot automatically detect invalid memory accesses into allocated memory. Instead, it provides an API for users to mark memory regions as accessible or inaccessible. The C++ standard libraries leverage those APIs to annotate STL containers, which helps ASan find container overflow bugs.

This is shown in action in figure 1, where we compile with ASan and no optimizations (-O0 -fsanitize=address -D_GLIBCXX_SANITIZE_VECTOR flags). This functionality is supported by both clang++ and g++. Also, if libc++ is used (-stdlib=libc++), the GLIBCXX macro can be omitted since libc++ (the LLVM’s C++ standard library) enables container annotations by default.

Figure 2 shows the result of running this code, where we can see that the invalid memory access was detected as a container-overflow error (since the shadow memory was poisoned with the “fc” byte).

#include <vector>
int main() {
    std::vector<char> v;

    // Set capacity to 8, the size remains 0
    v.reserve(8);

    // Access vector past its size, but before its capacity (8)
    return *(v.data());
}

Figure 1: Example of container overflow detection (Note: we do not show MSVC on CompilerExplorer since it does not have ASan installed yet)

==1==ERROR: AddressSanitizer: container-overflow on address 0x502000000010 at pc
0x000000401315 bp 0x7ffdd7e0c670 sp 0x7ffdd7e0c668
READ of size 1 at 0x502000000010 thread T0
    #0 0x401314 in main /app/example.cpp:10
    #1 0x7a47d5229d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)
    #2 0x7a47d5229e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f)
    #3 0x401174 in _start (/app/output.s+0x401174)
…

Shadow bytes around the buggy address:
=>0x502000000000: fa fa[fc]fa fa fa fa fa fa fa fa fa fa fa fa fa

Shadow byte legend (one shadow byte represents 8 application bytes):
  Container overflow:      fc

Figure 2: ASan detecting the bug from figure 1. The output is truncated to show only relevant information.

However, the C++ standard libraries have varying levels of support for detecting container overflows. The table below summarizes current support for this detection.

Library	Annotated containers	Comment
libstdc++ (GCC)	`std::vector` (GCC 8)	Requires `-D_GLIBCXX_SANITIZE_VECTOR` macro during compilation. For `std::string` and `std::deque`, see the “GCC / libstdc++ annotations” section below.
libc++ (LLVM)	`std::vector` (LLVM 3.5), `std::deque` (LLVM17), long `std::string` (LLVM18), short `std::string` (not yet released)	Container annotations are enabled by default. Can be disabled with environment variable `ASAN_OPTIONS=detect_container_overflow=0` (does not require recompilation)
msvc++	`std::vector` and `std::string` (Visual Studio 2022 17.2 and 17.6)	Container annotations are enabled by default. Can be disabled with `-D_DISABLE_VECTOR_ANNOTATION -D_DISABLE_STRING_ANNOTATION`.

AddressSanitizer API

The recommended way to annotate memory is using the ASAN_POISON_MEMORY_REGION(addr, size) and ASAN_UNPOISON_MEMORY_REGION(addr, size) macros, which set the appropriate values in shadow memory. (If ASan is not enabled during compilation, then those macros only evaluate their arguments without calling annotation functions).

As shown in figure 3, we can find more details on using the ASAN_POISON_MEMORY_REGION macro by reading the docstring of the underlying _asan_poison_memory_region function.

/// Marks a memory region ([addr, addr+size)) as unaddressable.
///
/// This memory must be previously allocated by your program. Instrumented
/// code is forbidden from accessing addresses in this region until it is
/// unpoisoned. This function is not guaranteed to poison the entire region -
/// it could poison only a subregion of [addr, addr+size) due to ASan
/// alignment restrictions.
///
/// \note This function is not thread-safe because no two threads can poison or
/// unpoison memory in the same memory region simultaneously.
///
/// \param addr Start of memory region.
/// \param size Size of memory region.
void __asan_poison_memory_region(void const volatile *addr, size_t size);

Figure 3: A comment describing __asan_poison_memory_region

Apart from those macros, the asan_interface.h file provides functions that allow for customizing the value set in shadow memory and helping with annotating certain containers, such as the __sanitizer_annotate_contiguous_container and __sanitizer_annotate_double_ended_contiguous_container functions. The documentation for the former function is shown in figure 4.

/// \note  Use this function with caution and do not use for anything other
/// than vector-like classes.
///
/// \param beg Beginning of memory region.
/// \param end End of memory region.
/// \param old_mid Old middle of memory region.
/// \param new_mid New middle of memory region.
void __sanitizer_annotate_contiguous_container(const void *beg,
                                               const void *end,
                                               const void *old_mid,
                                               const void *new_mid);

Figure 4: A comment describing __sanitizer_annotate_contiguous_container

This function is used, for example, during the std::vector::pop_back operation to mark the memory of the removed element as inaccessible as shown in figure 5. Under the hood, it poisons the shadow memory with the “fc” value to report memory accesses to the corresponding memory addresses with the “container-overflow” error.

Figure 5: Illustration of memory poisoning after pop_back called on five element std::vector

Notice that in pop_back, the function has to be called after destructing the element as that memory becomes inaccessible.

A step-by-step example

Here, we illustrate a proper way of adding ASan annotations to a container based on an example stack class with a limited interface. The stack data is stored in a contiguous buffer and implements the functionality shown in figure 6. The full code for the stack can be found here.

class stack {
public:
    using T = int;
    stack();
    stack(const stack&) = delete;
    ~stack();
    bool empty() { return size == 0; }
    void push(T const &v);
    void pop();
    T& top() {
        if(empty())
            throw std::runtime_error("Stack is empty");
        return buffer[size - 1];
    }

private:
    T* buffer;
    size_t size = 0;
    size_t capacity = 32;
    // Returns next capacity, used only when buffer grows
    size_t next_capacity() { return 2 * capacity; }
    void grow_buffer();
};

Figure 6: Declaration of a simple stack class

Container annotation wrappers

The first step when adding ASan annotations is determining if ASan APIs are available. If they’re not, using ASan’s functions will lead to an undefined reference linker error when compiling without ASan. For that, we can use the __has_feature preprocessor macro to create a wrapper function for annotating our container, which will do nothing if compiled without ASan. Since our stack data is kept in a contiguous buffer, we will annotate it with the __sanitizer_annotate_contiguous_container function.

#if __has_feature(address_sanitizer)
    void annotate_contiguous_container(void *container_beg, 
void *container_end, void *old_mid, void *new_mid) {
        if(container_beg != nullptr)
            __sanitizer_annotate_contiguous_container(container_beg, 
container_end, old_mid, new_mid);
    }
#else
    void annotate_contiguous_container(void *, void *, void *, void *) { }
#endif

Figure 7: Annotation wrapper function to be used in our implementation

Next, we add the annotate_new and annotate_delete functions—the former to poison a buffer of our container after it is allocated and the latter to unpoison it before deallocating it.

// Annotates a new buffer.
 void annotate_new() {
     // buffer points to the new memory buffer
     // capacity and size have value of the size of new buffer
     annotate_contiguous_container(buffer, buffer + capacity,
                                   buffer + capacity, buffer + size);
 }

 // Annotates (unpoisons) buffer before deallocation
 void annotate_delete() {
     // should be called before deallocation
     annotate_contiguous_container(buffer, buffer + capacity,
                                   buffer + size, buffer + capacity);
 }

Figure 8: Functions for updating container annotations after a new buffer allocation and just before buffer deallocation

Next, we need to create helper functions to update the annotations when we add or remove an item from the container, as shown in figure 10.

Note that the specifics of these functions depend on how the container stores its data. In containers with one moving end, as with vectors or our stack, those functions will simply handle poisoning or unpoisoning memory before adding and after removing an object. In other cases, such helper functions may require an argument, such as the old size or the number of objects that will be added.

// Unpoisones memory for a new element, *before* adding it
void annotate_increase() {
    annotate_contiguous_container(buffer, buffer + capacity,
                                  buffer + size, buffer + size + 1);
}

// Poison memory *after* removing an element
void annotate_shrink() {
    annotate_contiguous_container(buffer, buffer + capacity,
                                  buffer + size + 1, buffer + size);
}

Figure 9: Helper functions to update container annotations

Annotating the container

Finally, we use the helper functions in the container constructors, destructors, and methods that update its underlying size or capacity. Note that the order of operations is very important here. If our code accesses memory before unpoisoning it, ASan will detect a violation and crash. It’s also important to remember to unpoison memory before deallocation since different memory allocators may need to access the underlying memory (as it may store some metadata before or inside the allocated buffer).

Shrinking the size is usually simpler than increasing it because the buffer can be moved to a new memory area while growing its size. In our stack class, we poison the memory of one removed object in a pop function, as shown in figure 14. The annotate_shrink function has to be called at the very end, after the container is fully modified.

stack() {
    annotate_new();
}

~stack() {
    annotate_delete();
    free(buffer);
}

void pop() {
    if(empty()) {
        throw std::runtime_error("Stack is empty");
    }
    size -= 1;
    annotate_shrink();
}

void push(T const &v) {
    if(size == capacity)
        grow_buffer();
    annotate_increase();
    buffer[size] = v;
    size += 1;
}

Figure 10: Implementation of a default constructor and the destructor; helper functions are used to update ASan annotations

To manage buffer reallocation during push, we use the grow_buffer function shown in figure 15. This function maintains the buffer’s size and ensures the new buffer and capacity are correctly annotated. Consequently, by the end of the function’s execution, the object is accurately updated. This approach simplifies the push operation, as we no longer need to consider multiple buffers. It’s enough to unpoison memory for the new element, regardless of whether the capacity changed, as shown in figure 11. This last point is important to remember; for example, we discovered an issue in an ABI function of the std::basic_string class in libc++ that caused the string to have an incorrect size. This was overlooked because the function was never employed in a relevant context until we began integrating annotations. The issue, however, will stay in libc++ ABIv1 forever despite a replacement we created. While unlikely, changes to the string implementation that rely on the correct results from that function could lead to serious issues.

// A function increasing capacity, but not modifying stacks content
void grow_buffer() {
    size_t new_capacity = next_capacity(); // Get a size of the new (bigger) buffer
    T *new_buffer = (T *)calloc(new_capacity, sizeof(T));
    // Allocate a new buffer

    for(size_t i = 0; i < size; ++i) {
        new_buffer[i] = std::move(buffer[i]);
        // Move all elements from the previous container into the new one.
    }

    annotate_delete();        // Unpoison old buffer (prepares for deallocation)
    free(buffer);             // Free the buffer.
    buffer = new_buffer;      // Assign new buffer.
    capacity = new_capacity;  // Update capacity.
    annotate_new();           // Annotate (poison) new buffer. AT THE VERY END
}

Figure 11: Implementation of a helper function changing the buffer to a bigger one

Testing our annotations in practice

And with that, we’re done! With the entire stack container implemented, (almost) every invalid access to memory that is allocated triggers an error. We can test it with a main function, as shown in figure 18 (full source code here); when run on Clang++15, this function gives the output shown in figure 13.

int main() {
    stack s;
    stack::T* ptr;
    s.push(0);
    s.push(1);
    s.push(2);
    s.push(3);
    // 4 elements
    ptr = &s.top();                  // Save address of the top element in ptr
    s.pop();                         // Remove the top elements (ptr does not change)
    std::cout << *ptr << std::endl;  // ERROR: access to already removed element
}

Figure 12: An implementation of a program accessing a removed element

clang++ -fsanitize=address listing-x-src.cpp -o program
./program
=================================================================
==38540==ERROR: AddressSanitizer: container-overflow on address 0x60c00000004c
 at pc 0x559a9925a93c bp 0x7fffc06038d0 sp 0x7fffc06038c8
READ of size 4 at 0x60c00000004c thread T0
    #0 0x559a9925a93b in main (/home/username/CLionProjects/
simple-annotations/a.out+0xde93b) (BuildId: b4b3601668152bb18905aec484b9234f2fabd710)
[...]

0x60c00000004c is located 12 bytes inside of 128-byte region 
[0x60c000000040,0x60c0000000c0)
allocated by thread T0 here:
[...]
    #1 0x559a9925b2bd in stack::grow_buffer()
[...]

Shadow bytes around the buggy address:
[...]
  0x0c187fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c187fff8000: fa fa fa fa fa fa fa fa 00[04]fc fc fc fc fc fc
  0x0c187fff8010: fc fc fc fc fc fc fc fc fa fa fa fa fa fa fa fa
  0x0c187fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[...]

Figure 13: Error detected by ASan for our container overflow annotations for the program from figure 12 compiled with Clang++ 15. Because there are 3 elements (each of 4 bytes) left in the container, fa 00[04]fc is in shadow memory, as 00 describes the first two objects (8 bytes) and 04 the last one.

How we improved container annotations

As mentioned earlier, we made many improvements to C++ container annotations in libc++. We have detailed some lessons learned during that endeavor which should help future developers implement annotations in their custom containers and custom allocators.

Vector annotations

With our improvements, the std::vector container in libc++ is annotated by ASan when a custom memory allocator is used, whereas previously, it supported only the default memory allocator. This was because the __sanitizer_annotate_contiguous_container function, used internally by vector annotations, had restrictions that could result in false positives with custom memory allocators. We removed those restrictions in LLVM16 and enabled vector annotations for custom allocators in LLVM17.

These restrictions concerned the alignment of the buffer begin address and exclusivity of the last granule to be annotated. The former error is shown here. Since ASan can only poison suffixes, using an allocator that returns unaligned addresses may cause a failure to detect instances of invalid access into the non-poisoned prefix bytes. The latter exclusivity restriction concerns cases when a sanitized buffer ends on an unaligned address where another object starts; in such cases, ASan does not poison another object’s memory.

Note that while the function is called __sanitizer_annotate_contiguous_container, it operates on a single buffer. As such, the naming may be slightly confusing at first. If a container has many memory buffers, but every buffer has to be empty, or its content starts from the very beginning, the function may still be used with all buffers treated as separate containers.

Control over container annotations

In rare cases, annotating memory allocated by a custom allocator may have unexpected outcomes, such as unwanted ASan errors. Such errors may include when an area allocator neither unpoisons the memory of freed objects by calling their destructors nor does it manually.

There are two ways to deal with such problems. Ideally, the allocator should be changed to unpoison the whole memory before it is allocated again. Alternatively, if that is not feasible, the ASan container annotations can be turned off for the problematic allocator or its specialization by using the __asan_annotate_container_with_allocator customization point, which we added in LLVM17.

For example, to do this for a user_allocator, one has to specialize in the customization point inheriting from the std::false_type, as shown in figure 14.

#ifdef _LIBCPP_HAS_ASAN_CONTAINER_ANNOTATIONS_FOR_ALL_ALLOCATORS
template <class T>
struct std::__asan_annotate_container_with_allocator<user_allocator> : std::false_type {};
#endif

Figure 14: An example of turning off container annotations for a user_allocator

In most cases, you won’t use information from that section since container annotations are usually transparent to allocators; ASan unpoisoning happens in destructors. We added this customization point in response to a need for a mechanism to turn off annotations with area allocators.

Deque annotations

While adding support for all allocators was not our initial goal, an opportunity presented itself along the way. From the very beginning, however, we wanted to annotate more containers – so we also extended the compiler-rt ASan API in LLVM16. We implemented the __sanitizer_annotate_double_ended_contiguous_container function, which is tailored for deque-like containers with buffers that do not require the content to start at the very beginning of those buffers, but instead store their elements in an interior contiguous buffer.

/// Argument requirements:
/// During unpoisoning memory of empty container (before first element is
/// added):
/// - old_container_beg_p == old_container_end_p
/// During poisoning after last element was removed:
/// - new_container_beg_p == new_container_end_p
/// \param storage_beg Beginning of memory region.
/// \param storage_end End of memory region.
/// \param old_container_beg Old beginning of used region.
/// \param old_container_end End of used region.
/// \param new_container_beg New beginning of used region.
/// \param new_container_end New end of used region.
void __sanitizer_annotate_double_ended_contiguous_container(
    const void *storage_beg, const void *storage_end,
    const void *old_container_beg, const void *old_container_end,
    const void *new_container_beg, const void *new_container_end);

Figure 15: A comment describing __sanitizer_annotate_double_ended_contiguous_container

That function was also not used until LLVM17, where we upstreamed std::deque annotations. In contrast to std::vector annotations, the code added to std::deque is quite complicated because the ASan container annotation interface functions operate on one contiguous buffer, but std::deque has many of them.

Thanks to our changes, with libc++17 and above, everyone can easily detect container overflows in deque objects. False negatives are possible but unlikely, as only up to 7 unused bytes before content may not be poisoned.

Thanks to vitalybuka’s evaluation prior to the release of LLVM17, we learned that our deque annotations detect approximately 10% more bugs compared to libc++ buffer hardening (at the moment):

From my experience of enabling https://libcxx.llvm.org/UsingLibcxx.html#enabling-the-safe-libc-mode on the same code-base, my very rough estimate is that your patch fetched at least 10% of additional bugs to the “safe libc++ mode”.

String annotations

ASan’s failure to detect a std::string bug was our impulse to action, but implementing this detection turned out to be the most challenging part and has not yet been finalized.

We designed the update to the __sanitizer_annotate_contiguous_container function to facilitate string annotations since the string is conceptually very similar to vector: it has one contiguous buffer, and content always starts from its very beginning. Yet there is one crucial difference between those collections: string enables Short String Optimization (SSO), a technique used by the libc++ std::basic_string class to store short strings directly in the object itself, avoiding memory allocation on the heap. Effectively, strings are really unions of “short string” and “long string,” and when a string does not fit into the “short” variant, the “long” variant kicks in, allocating memory on the heap.

The long string case is essentially the same as the vector case, and we added long string annotations to LLVM18. We added the short string annotations into the git main branch, which will hopefully be released in LLVM19. If you want to test it, use the libc++ from commit fed94502e5.

Additionally, the std::basic_string annotations, unlike std::vector and std::deque annotations, require a libc++ built with ASan because the string member functions are part of the libc++ ABI. In other words, it will not work by default with most libc++ versions shipped at the time of this post’s publication (LLVM18 is the most current version) because they are often built without ASan. To use it, you must build LLVM with AddressSanitizer (LLVM_USE_SANITIZER=Address) and link against it. Remember that ASan is unstable, so you should use everything from just one version of LLVM (compiler-rt, libc++, libc++abi, clang), or you will likely encounter incompatibility bugs with cryptic errors.

To ensure that string annotations are used if and only if libc++ was compiled with ASan, in the PR adding annotations, we adjusted the compilation process so that the _LIBCPP_INSTRUMENTED_WITH_ASAN macro is appended to __config_site whenever libc++ is built with ASan. If this macro is not defined, string annotations are not enabled.

Note that this does not prevent linking errors if one object file (or library) uses string annotations and another does not. Again, running a binary build this way would result in difficult-to-understand incompatibility errors.

We hope to see short string annotations in the next LLVM release. We already upstreamed the PR, but its release may be delayed if errors are detected.

One of the most intriguing reasons behind the previous revert of short string annotations was a bug caused by compiler optimization. Specifically, the compiler was found to be preloading values from both branches of exclusive conditions (such as if/else or ternary operator). However, logically, only one of these branches would execute; for this to happen, the compiler would have to recognize that both values are on the stack and assume that it is more efficient to preload both values than to load only one of them later on. This preemptive loading resulted in errors that proved challenging to comprehend due to their non-intuitive nature.

The complexity of this issue highlights the subtle interactions between compiler optimizations and instrumentations, making the detection and resolution of such issues an intricate task that requires a solid understanding of instrumentation and compiler behaviors. We want to give a shoutout to vitalybuka for digging into the root of this problem.

Testing annotations

If you want to test your container annotations, use the __sanitizer_verify_contiguous_container function; additionally, the wrappers for vector, deque, or basic_string containers may serve as inspiration.

The libc++ library itself has many tests for container implementations, which we extended with additional assertions for the added annotations (see an example here).

Thanks

We’d like to express our gratitude to the entire LLVM community for their support during the development of our ASan annotation improvements; they helped with activities from reviewing code patches and brainstorming implementation ideas to identifying issues and sharing knowledge. We especially want to thank vitalybuka, ldionne, and philnik777 for their ongoing support!

Sanitize your allocators, too!

This post focused on container annotations, but annotating custom allocators is just as simple (if not simpler) and equally powerful. Allocator sanitization involves poisoning the whole buffer at the very end of deallocation and unpoisoning at the very beginning of allocation. You can use the previously mentioned ASAN_*_MEMORY_REGION macros or other AddressSanitizer functions to do this.

GCC / libstdc++ annotations

Our research and improvements did not start with LLVM and libc++. We initially started down this path by hacking on the container annotation detections in std::string and std::deque collections for libstdc++ in GCC 11.1. The code we developed for this is not production-ready yet: it does not use the latest compiler-rt API functions and container annotation tests should be incorporated into the standard container tests. We released this code in the trailofbits/gcc-asan-container-overflows repository (and its container-overflow branch), hoping it could be reused for future work. We would be happy to work on it for the latest libstdc++ version, given the resourcing for it.

Are your containers annotated?

Sanitizing your containers and allocators is a step towards building robust and secure software. By leveraging the power of ASan to detect memory errors in containers, you can minimize the risk of buffer overflows, use-after-free, and other vulnerabilities.

This valuable technique is straightforward, as the ASan API takes care of almost everything. However, it requires a good understanding of the codebase and correct reasoning about whether memory is accessible. Sometimes, it also requires a good understanding of compiler optimizations. Thankfully, maintaining annotations is easy, and the benefits are much more significant than the time spent implementing them.

If you need help with ASan annotations, fuzzing, or anything related to LLVM, contact us! We are happy to help tailor sanitizers or other LLVM tools to your specific needs. If you’d like to read more about our work on compilers, check out our posts on VAST (GitHub repository) and Macroni (GitHub repository).