Fuzzing 101 with LibAFL 学习（一）

参考 Fuzzing101 with LibAFL - Part I: Fuzzing Xpdf¹ 和 Fuzzing101 with LibAFL - Part I.V: Speed Improvements to Part I² 做一下笔记。libafl 的自由度相当高，我觉得学习路线会比较陡峭，这一次我就不求甚解一波。

复现

先下载 xpdf

cd fuzzing-101-solutions/exercise-1
wget https://dl.xpdfreader.com/old/xpdf-3.02.tar.gz
tar xvf xpdf-3.02.tar.gz
rm xpdf-3.02.tar.gz
mv xpdf-3.02 xpdf

build.rs 本质上是做了如下工作：

# these are example commands that will be executed automatically by build.rs
# and were taken almost verbatim from Fuzzing101's README
cd fuzzing-101-solutions/exercise-1/xpdf
make clean
rm -rf install 
export LLVM_CONFIG=llvm-config-15
CC=afl-clang-fast CXX=afl-clang-fast++ ./configure --prefix=./install
make
make install

具体实现方法之后再看，先照抄。

复制完代码之后发现默认的 libafl 版本是 0.10.1，编译不起来就改成了 0.13.2，结果发现好多东西都变了，比如 libafl::bolts 变成了 libafl_bolts，还有一个 Executor ：

We deleted TimeoutExecutor and TimeoutForkserverExecutor and make it mandatory for InProcessExecutor and ForkserverExecutor to have the timeout. Now InProcessExecutor and ForkserverExecutor have the default timeout of 5 seconds.

参考官方代码改了一堆问题之后可以编译运行了：

cd exercise-1
cargo build --release
../target/release/exercise-one-solution

如果我们要运行其他程序的话，修改 executor 的参数就行了，例如这里的参数为：

let mut executor = ForkserverExecutor::builder()
  .program("./xpdf/install/bin/pdftotext")
  .parse_afl_cmdline(["@@"])
  .coverage_map_size(MAP_SIZE)
  .build(tuple_list!(time_observer, edges_observer))
  .unwrap();

流程

研究一下流程：

Corpus
- corpus_dirs: vec!：种子目录；
- input_corpus: InMemoryCorpus：保存在内存中的语料库；
- timeouts_corpus: OnDiskCorpus：满足需求条件的语料库；
Observer
- time_observer: TimeObserver：记录执行时间；
- edges_observer: HitcountsMapObserver, StdMapObserver：记录执行边的覆盖率信息；
Feedback
- feedback: MaxMapFeedback, TimeFeedback：选择感兴趣的输入的反馈机制；
  - 组合 edges_observer、time_observer；
- objective: CrashFeedback, TimeoutFeedback：选择满足需求（超时或崩溃）输入的反馈机制；
Monitor：跟踪所有模糊测试客户端
- monitor: SimpleMonitor：这里使用了 SimpleMonitor 向 terminal 发送报告；
Event Manager
- mgr: SimpleEventManager：核心三部件之一，这里使用 monitor 构建最简单的 SimpleEventManager；
State
- state: StdState：核心三部件之一，保存模糊测试时的一些必要信息；
  - 组合了 input_corpus、timeouts_corpus、feedback、objective；
Scheduler
- scheduler: IndexesLenTimeMinimizerScheduler：调度策略，作者使用 IndexesLenTimeMinimizerScheduler 选取最快最小的种子；
Fuzzer：
- fuzzer: StdFuzzer：核心三部件之一，生成种子，并处理执行后的状态和反馈；
  - 组合了 scheduler、feedback、objective；
Executor
- executor: ForkserverExecutor：执行器；
  - 指定运行的程序和参数；
  - 组合 time_observer 和 edges_observer；
加载语料库
Mutator
- mutator: StdScheduledMutator：变异器
Stage
- stage: StdMutationalStage：对单个输入的操作，这里是使用 mutator 对输入做；
运行 Fuzzer
- 组合了 stages、executor、state 和 mgr。

感觉还是不好理解，所以我绘制了一张从上到下，从左到右的构建图：

修改

运行一段时间之后，只有超时没有崩溃，这是因为作者只配置了 TimeoutFeedback，而高性能机器上在 timeout 之前就 crash 了，所以我建议还是把 CrashFeedback 加上。我们先理解一下原来的 Feedback 及其用法：

 
// A Feedback, in most cases, processes the information reported by one or more observers to
// decide if the execution is interesting. This one is composed of two Feedbacks using a logical
// OR.
//
// Due to the fact that TimeFeedback can never classify a testcase as interesting on its own,
// we need to use it alongside some other Feedback that has the ability to perform said
// classification. These two feedbacks are combined to create a boolean formula, i.e. if the
// input triggered a new code path, OR, false.
let mut feedback = feedback_or!(
    // New maximization map feedback (attempts to maximize the map contents) linked to the
    // edges observer. This one will track indexes, but will not track novelties,
    // i.e. new_tracking(... true, false).
    MaxMapFeedback::new(&edges_observer),
    // Time feedback, this one never returns true for is_interesting, However, it does keep
    // track of testcase execution time by way of its TimeObserver
    TimeFeedback::new(&time_observer)
);

我们可以看到，这里其实用到了两种 Feedback 的组合。根据注释可知，TimeFeedback 不能独自判断一个样例是否有趣，因此这里用了一个 feedback_or 宏，如果 MaxMapFeedback 判断是否触发了新路径则认为输入是有趣的。

我最开始眼花了，把 TimeFeedback 看成了 TimeoutFeedback，然而并不是。TimeFeedback 永远不会返回 True，但是它可以跟踪输入的执行时间。

光有趣还不够，我们还要保存一些符合我们要求的输入，例如在这里作者保存的是能触发超时的种子：

// A feedback is used to choose if an input should be added to the corpus or not. In the case
// below, we're saying that in order for a testcase's input to be added to the corpus, it must:
//   1: be a timeout
//        AND
//   2: have created new coverage of the binary under test
//
// The goal is to do similar deduplication to what AFL does
//
// The feedback_and_fast macro combines the two feedbacks with a fast AND operation, which
// means only enough feedback functions will be called to know whether or not the objective
// has been met, i.e. short-circuiting logic.
let mut objective =
    feedback_and_fast!(TimeoutFeedback::new(), MaxMapFeedback::new(&edges_observer));

这里作者通过 feedback_and_fast 建立了两个约束，一是要超时，二是要能发现新的路径，这样是为了执行与 AFL 类似的重复数据删除。

在最后，feedback 和 objective 都被用在了 state 中：

//
// Component: State
//
 
// Creates a new State, taking ownership of all of the individual components during fuzzing.
//
// On the initial pass, setup_restarting_mgr_std returns (None, LlmpRestartingEventManager).
// On each successive execution (i.e. on a fuzzer restart), it returns the state from the prior
// run that was saved off in shared memory. The code below handles the initial None value
// by providing a default StdState. After the first restart, we'll simply unwrap the
// Some(StdState) returned from the call to setup_restarting_mgr_std
let mut state = StdState::new(
    // random number generator with a time-based seed
    StdRand::with_seed(current_nanos()),
    input_corpus,
    timeouts_corpus,
    // States of the feedbacks that store the data related to the feedbacks that should be
    // persisted in the State.
    &mut feedback,
    &mut objective,
)
.unwrap();

第一个参数是随机数生成器，第二个参数是语料库，第三个参数是保存符合目标的语料库的位置。在最后两个参数中，feedback 会记录有趣的种子，objective 会保存符合要求的种子。

好了，在理解作者的意图之后，接下来该怎么做就很明显了，除了超时之外，我们肯定还要考虑能导致崩溃的输入，显然 CrashFeedback 是符合我们要求的。那我们该怎么使用它呢？

继续参考官方的示例，它使用了 feedback_or_fast! 宏去同时选取触发崩溃和超时的种子：

// A feedback to choose if an input is a solution or not
let mut objective = feedback_or_fast!(CrashFeedback::new(), TimeoutFeedback::new());

我们也照葫芦画瓢，引入相关的库后修改就好了。在修改之后运行 fuzz，我们可以看到成功保存了可以触发 crash 的输入。

在几天之后回顾这篇文章时，我发现我已经忘记了编译命令，这里记录一下：

cargo build --release
../target/release/exercise-one-solution

问题

那么接下来的问题是：

根据 state 的参数，超时和崩溃似乎保存在了同一目录下，按照 libafl 的设计哲学，这个目录保存的是符合我们要求的输入，那么是否能够分别指定崩溃和超时保存的目录呢？
在通过 feedback_or_fast!(CrashFeedback::new(), TimeoutFeedback::new()); 保存符合要求的输入后，libafl 是怎么去重的呢？上文使用 feedback_and_fast!(TimeoutFeedback::new(), MaxMapFeedback::new(&edges_observer)); 通过是否发现新路径进行去重，而这里没有发现新路径但也会超时的情况，是否也会被保存呢？如果都会被保存的话，我们是否可以在 feedback_or_fast 之后添加一个 feedback_and_fast 的 MaxMapFeedback 帮助我们去重呢？

上面的这些问题，随着学习路程的继续慢慢解答吧。

加速

我们可以使用持久模式而不是 forkserver 加速模糊测试。在 Fuzzing101 with LibAFL - Part I.V: Speed Improvements to Part I 中，作者修改了 xpdf 的源码并编写了 harness.c，这里不多加描述。

由于我们要将 fuzzer 编译成库，因此接下来将上文的 main.rs 重命名为 lib.rs。接下来看一下和上面的流程相比有什么改动吧。

Observer

在编译 harness 的过程中，作者使用了 libafl_cc 而不是上文的 afl-clang-[fast|lto]。对于传统的 afl-clang-[fast|lto]，libafl 可以根据 __AFL_SHM_ID 环境变量获取覆盖率信息，而对于 libafl_cc 则需要使用 libafl_targets 暴露 EDGES_MAP：

let edges_observer =
    HitcountsMapObserver::new(unsafe { std_edges_map_observer("edges") }).track_indices();

Monitor

为了避免目标打印的输出和 fuzzer 的输出混淆，作者使用 MultiMonitor 替换 SimpleMonitor。MultiMonitor 可以展示和累计每个客户端的统计数据。

let monitor = MultiMonitor::new(|s| {
    println!("{}", s);
});

Event Manager

对于 MultiMonitor，使用的时候需要启动两个 fuzzer，按照作者的意思第一个开启的 fuzzer 也是客户端，但我觉得，这都开启端口听其他客户端的消息了，怎么看都是服务端吧：

let (state, mut mgr) = match setup_restarting_mgr_std(monitor, 1337, EventConfig::AlwaysUnique)
{
    Ok(res) => res,
    Err(err) => match err {
        Error::ShuttingDown => {
            return Ok(());
        }
        _ => {
            panic!("Failed to setup the restarting manager: {}", err);
        }
    },
};

Harness

这是 forkserver 不存在的一个部件，专为 InProcessExecutor 而构造的。其中的 libfuzzer_test_one_input 就是我们编写 harness 时的 LLVMFuzzerTestOneInput：

let mut harness = |input: &BytesInput| {
    let target = input.target_bytes();
    let buffer = target.as_slice();
    libfuzzer_test_one_input(buffer);
    ExitKind::Ok
};

Executor

既然使用了持久模式，相应的 executor 也会发生变化，InProcessExecutor 相比 ForkserverExecutor 需要更多的组件：

let mut executor = InProcessExecutor::new(
    &mut harness,
    tuple_list!(edges_observer, time_observer),
    &mut fuzzer,
    &mut state,
    &mut mgr,
)
.unwrap();

具体的流程是什么样的还是之后慢慢研究吧。

运行 Fuzzer

这一部分主要是加入了类似于 __AFL_LOOP 的机制，确定重启次数以及设置可能的手动重启：

fuzzer
    .fuzz_loop_for(&mut stages, &mut executor, &mut state, &mut mgr, 10000)
    .unwrap();
 
// Since were using this fuzz_loop_for in a restarting scenario to only run for n iterations
// before exiting, we need to ensure we call on_restart() and pass it the state. This way, the
// state will be available in the next, respawned, iteration.
mgr.on_restart(&mut state).unwrap();

作者使用了 cargo make 机制来完成整个流程，通过 Makefile.toml 编写各个部分：

exercise-1.5/Makefile.toml

[tasks.clean]
dependencies = ["cargo-clean", "afl-clean", "clean-xpdf"]
 
[tasks.afl-clean]
script = '''
rm -rf .cur_input* timeouts fuzzer fuzzer.o libexerciseonepointfive.a
'''
 
[tasks.clean-xpdf]
cwd = "xpdf"
script = """
make --silent clean
rm -rf built-with-* ../build/*
"""
 
[tasks.cargo-clean]
command = "cargo"
args = ["clean"]
 
[tasks.rebuild]
dependencies = ["afl-clean", "clean-xpdf", "build-compilers", "build-xpdf", "build-fuzzer"]
 
[tasks.build-compilers]
script = """
cargo build --release
cp -f ../target/release/libexerciseonepointfive.a .
"""
 
[tasks.build-xpdf]
cwd = "build"
script = """
cmake ../xpdf -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=$(pwd)/../../target/release/compiler -DCMAKE_CXX_COMPILER=$(pwd)/../../target/release/compiler_pp
make
"""
 
[tasks.build-fuzzer]
script = """
../target/release/compiler_pp -I xpdf/goo -I xpdf/fofi -I xpdf/splash -I xpdf/xpdf -I xpdf -o fuzzer harness.cc build/*/*.a -lm -ldl -lpthread -lstdc++ -lgcc -lutil -lrt
"""

之后运行 cargo run rebuild 执行所有部分，并生成 fuzzer 文件。

最后在两个窗口中分别运行编译好的 fuzzer，最先运行的 fuzzer 会作为服务端。

总结

作者通过 libafl 编写了一整套模糊测试流程，可以实现基于 forkserver 的模糊测试和基于 persistent mode 的模糊测试。这篇笔记简单总结了 libafl 的堆叠过程与使用流程，足以见出 libafl 的高自由度。