Fuzzing 101 with LibAFL 学习(一)
2024-9-7 23:15:50 Author: 5ec.top(查看原文) 阅读量:18 收藏

参考 Fuzzing101 with LibAFL - Part I: Fuzzing Xpdf1 和 Fuzzing101 with LibAFL - Part I.V: Speed Improvements to Part I2 做一下笔记。libafl 的自由度相当高,我觉得学习路线会比较陡峭,这一次我就不求甚解一波。

复现

先下载 xpdf

cd fuzzing-101-solutions/exercise-1
wget https://dl.xpdfreader.com/old/xpdf-3.02.tar.gz
tar xvf xpdf-3.02.tar.gz
rm xpdf-3.02.tar.gz
mv xpdf-3.02 xpdf

build.rs 本质上是做了如下工作:

# these are example commands that will be executed automatically by build.rs
# and were taken almost verbatim from Fuzzing101's README
cd fuzzing-101-solutions/exercise-1/xpdf
make clean
rm -rf install 
export LLVM_CONFIG=llvm-config-15
CC=afl-clang-fast CXX=afl-clang-fast++ ./configure --prefix=./install
make
make install

具体实现方法之后再看,先照抄。

复制完代码之后发现默认的 libafl 版本是 0.10.1,编译不起来就改成了 0.13.2,结果发现好多东西都变了,比如 libafl::bolts 变成了 libafl_bolts,还有一个 Executor :

We deleted TimeoutExecutor and TimeoutForkserverExecutor and make it mandatory for InProcessExecutor and ForkserverExecutor to have the timeout. Now InProcessExecutor and ForkserverExecutor have the default timeout of 5 seconds.


参考官方代码改了一堆问题之后可以编译运行了:

cd exercise-1
cargo build --release
../target/release/exercise-one-solution

如果我们要运行其他程序的话,修改 executor 的参数就行了,例如这里的参数为:

let mut executor = ForkserverExecutor::builder()
  .program("./xpdf/install/bin/pdftotext")
  .parse_afl_cmdline(["@@"])
  .coverage_map_size(MAP_SIZE)
  .build(tuple_list!(time_observer, edges_observer))
  .unwrap();

流程

研究一下流程:

  1. Corpus
    • corpus_dirs:种子目录;
    • input_corpus:保存在内存中的语料库;
    • timeouts_corpus:满足需求条件的语料库;
  2. Observer
    • time_observer:记录执行时间;
    • edges_observer:记录执行边的覆盖率信息;
  3. Feedback
    • feedback:选择感兴趣的输入的反馈机制;
      • 组合 edges_observeredges_observer
    • objective:选择满足需求(超时或崩溃)输入的反馈机制;
  4. Monitor:跟踪所有模糊测试客户端
    • monitor:这里使用了 SimpleMonitor 向 terminal 发送报告;
  5. Event Manager
    • mgr核心三部件之一,这里使用 monitor 构建最简单的 SimpleEventManager
  6. State
    • state核心三部件之一,保存模糊测试时的一些必要信息;
      • 组合了 input_corpustimeouts_corpusfeedbackobjective
  7. Scheduler
    • scheduler:调度策略,作者使用 IndexesLenTimeMinimizerScheduler 选取最快最小的种子;
  8. Fuzzer:
    • fuzzer:核心三部件之一,生成种子,并处理执行后的状态和反馈;
      • 组合了 schedulerfeedbackobjective
  9. Executor
    • executor:执行器;
      • 指定运行的程序和参数;
      • 组合 time_observeredges_observer
  10. 加载语料库
  11. Mutator
    • mutator:变异器
  12. Stage
    • stage:对单个输入的操作,这里是使用 mutator 对输入做;
  13. 运行 Fuzzer
    • 组合了 stagesexecutorstatemgr

修改

运行一段时间之后,只有超时没有崩溃,这是因为作者只配置了 TimeoutFeedback,而高性能机器上在 timeout 之前就 crash 了,所以我建议还是把 CrashFeedback 加上。我们先理解一下原来的 Feedback 及其用法:

 
// A Feedback, in most cases, processes the information reported by one or more observers to
// decide if the execution is interesting. This one is composed of two Feedbacks using a logical
// OR.
//
// Due to the fact that TimeFeedback can never classify a testcase as interesting on its own,
// we need to use it alongside some other Feedback that has the ability to perform said
// classification. These two feedbacks are combined to create a boolean formula, i.e. if the
// input triggered a new code path, OR, false.
let mut feedback = feedback_or!(
    // New maximization map feedback (attempts to maximize the map contents) linked to the
    // edges observer. This one will track indexes, but will not track novelties,
    // i.e. new_tracking(... true, false).
    MaxMapFeedback::new(&edges_observer),
    // Time feedback, this one never returns true for is_interesting, However, it does keep
    // track of testcase execution time by way of its TimeObserver
    TimeFeedback::new(&time_observer)
);

我们可以看到,这里其实用到了两种 Feedback 的组合。根据注释可知,TimeFeedback 不能独自判断一个样例是否有趣,因此这里用了一个 feedback_or 宏,如果 MaxMapFeedback 判断是否触发了新路径则认为输入是有趣的。

我最开始眼花了,把 TimeFeedback 看成了 TimeoutFeedback,然而并不是。TimeFeedback 永远不会返回 True,但是它可以跟踪输入的执行时间。

光有趣还不够,我们还要保存一些符合我们要求的输入,例如在这里作者保存的是能触发超时的种子:

// A feedback is used to choose if an input should be added to the corpus or not. In the case
// below, we're saying that in order for a testcase's input to be added to the corpus, it must:
//   1: be a timeout
//        AND
//   2: have created new coverage of the binary under test
//
// The goal is to do similar deduplication to what AFL does
//
// The feedback_and_fast macro combines the two feedbacks with a fast AND operation, which
// means only enough feedback functions will be called to know whether or not the objective
// has been met, i.e. short-circuiting logic.
let mut objective =
    feedback_and_fast!(TimeoutFeedback::new(), MaxMapFeedback::new(&edges_observer));

这里作者通过 feedback_and_fast 建立了两个约束,一是要超时,二是要能发现新的路径,这样是为了执行与 AFL 类似的重复数据删除。

在最后,feedbackobjective 都被用在了 state 中:

//
// Component: State
//
 
// Creates a new State, taking ownership of all of the individual components during fuzzing.
//
// On the initial pass, setup_restarting_mgr_std returns (None, LlmpRestartingEventManager).
// On each successive execution (i.e. on a fuzzer restart), it returns the state from the prior
// run that was saved off in shared memory. The code below handles the initial None value
// by providing a default StdState. After the first restart, we'll simply unwrap the
// Some(StdState) returned from the call to setup_restarting_mgr_std
let mut state = StdState::new(
    // random number generator with a time-based seed
    StdRand::with_seed(current_nanos()),
    input_corpus,
    timeouts_corpus,
    // States of the feedbacks that store the data related to the feedbacks that should be
    // persisted in the State.
    &mut feedback,
    &mut objective,
)
.unwrap();

第一个参数是随机数生成器,第二个参数是语料库,第三个参数是保存符合目标的语料库的位置。在最后两个参数中,feedback 会记录有趣的种子,objective 会保存符合要求的种子。


好了,在理解作者的意图之后,接下来该怎么做就很明显了,除了超时之外,我们肯定还要考虑能导致崩溃的输入,显然 CrashFeedback 是符合我们要求的。那我们该怎么使用它呢?

继续参考官方的示例,它使用了 feedback_or_fast! 宏去同时选取触发崩溃和超时的种子:

// A feedback to choose if an input is a solution or not
let mut objective = feedback_or_fast!(CrashFeedback::new(), TimeoutFeedback::new());

我们也照葫芦画瓢,引入相关的库后修改就好了。在修改之后运行 fuzz,我们可以看到成功保存了可以触发 crash 的输入。

在几天之后回顾这篇文章时,我发现我已经忘记了编译命令,这里记录一下:

cargo build --release
../target/release/exercise-one-solution

问题

那么接下来的问题是:

  1. 根据 state 的参数,超时和崩溃似乎保存在了同一目录下,按照 libafl 的设计哲学,这个目录保存的是符合我们要求的输入,那么是否能够分别指定崩溃和超时保存的目录呢?
  2. 在通过 feedback_or_fast!(CrashFeedback::new(), TimeoutFeedback::new()); 保存符合要求的输入后,libafl 是怎么去重的呢?上文使用 feedback_and_fast!(TimeoutFeedback::new(), MaxMapFeedback::new(&edges_observer)); 通过是否发现新路径进行去重,而这里没有发现新路径但也会超时的情况,是否也会被保存呢?如果都会被保存的话,我们是否可以在 feedback_or_fast 之后添加一个 feedback_and_fastMaxMapFeedback 帮助我们去重呢?

上面的这些问题,随着学习路程的继续慢慢解答吧。


加速

我们可以使用持久模式而不是 forkserver 加速模糊测试。在 Fuzzing101 with LibAFL - Part I.V: Speed Improvements to Part I 中,作者修改了 xpdf 的源码并编写了 harness.c,这里不多加描述。

由于我们要将 fuzzer 编译成库,因此接下来将上文的 main.rs 重命名为 lib.rs。接下来看一下和上面的流程相比有什么改动吧。

Observer

在编译 harness 的过程中,作者使用了 libafl_cc 而不是上文的 afl-clang-[fast|lto]。对于传统的 afl-clang-[fast|lto],libafl 可以根据 __AFL_SHM_ID 环境变量获取覆盖率信息,而对于 libafl_cc 则需要使用 libafl_targets 暴露 EDGES_MAP

let edges_observer =
    HitcountsMapObserver::new(unsafe { std_edges_map_observer("edges") }).track_indices();

Monitor

为了避免目标打印的输出和 fuzzer 的输出混淆,作者使用 MultiMonitor 替换 SimpleMonitor。MultiMonitor 可以展示和累计每个客户端的统计数据。

let monitor = MultiMonitor::new(|s| {
    println!("{}", s);
});

Event Manager

对于 MultiMonitor,使用的时候需要启动两个 fuzzer,按照作者的意思第一个开启的 fuzzer 也是客户端,但我觉得,这都开启端口听其他客户端的消息了,怎么看都是服务端吧:

let (state, mut mgr) = match setup_restarting_mgr_std(monitor, 1337, EventConfig::AlwaysUnique)
{
    Ok(res) => res,
    Err(err) => match err {
        Error::ShuttingDown => {
            return Ok(());
        }
        _ => {
            panic!("Failed to setup the restarting manager: {}", err);
        }
    },
};

Harness

这是 forkserver 不存在的一个部件,专为 InProcessExecutor 而构造的。其中的 libfuzzer_test_one_input 就是我们编写 harness 时的 LLVMFuzzerTestOneInput

let mut harness = |input: &BytesInput| {
    let target = input.target_bytes();
    let buffer = target.as_slice();
    libfuzzer_test_one_input(buffer);
    ExitKind::Ok
};

Executor

既然使用了持久模式,相应的 executor 也会发生变化,InProcessExecutor 相比 ForkserverExecutor 需要更多的组件:

let mut executor = InProcessExecutor::new(
    &mut harness,
    tuple_list!(edges_observer, time_observer),
    &mut fuzzer,
    &mut state,
    &mut mgr,
)
.unwrap();

具体的流程是什么样的还是之后慢慢研究吧。

运行 Fuzzer

这一部分主要是加入了类似于 __AFL_LOOP 的机制,确定重启次数以及设置可能的手动重启:

fuzzer
    .fuzz_loop_for(&mut stages, &mut executor, &mut state, &mut mgr, 10000)
    .unwrap();
 
// Since were using this fuzz_loop_for in a restarting scenario to only run for n iterations
// before exiting, we need to ensure we call on_restart() and pass it the state. This way, the
// state will be available in the next, respawned, iteration.
mgr.on_restart(&mut state).unwrap();

作者使用了 cargo make 机制来完成整个流程,通过 Makefile.toml 编写各个部分:

exercise-1.5/Makefile.toml
[tasks.clean]
dependencies = ["cargo-clean", "afl-clean", "clean-xpdf"]
 
[tasks.afl-clean]
script = '''
rm -rf .cur_input* timeouts fuzzer fuzzer.o libexerciseonepointfive.a
'''
 
[tasks.clean-xpdf]
cwd = "xpdf"
script = """
make --silent clean
rm -rf built-with-* ../build/*
"""
 
[tasks.cargo-clean]
command = "cargo"
args = ["clean"]
 
[tasks.rebuild]
dependencies = ["afl-clean", "clean-xpdf", "build-compilers", "build-xpdf", "build-fuzzer"]
 
[tasks.build-compilers]
script = """
cargo build --release
cp -f ../target/release/libexerciseonepointfive.a .
"""
 
[tasks.build-xpdf]
cwd = "build"
script = """
cmake ../xpdf -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=$(pwd)/../../target/release/compiler -DCMAKE_CXX_COMPILER=$(pwd)/../../target/release/compiler_pp
make
"""
 
[tasks.build-fuzzer]
script = """
../target/release/compiler_pp -I xpdf/goo -I xpdf/fofi -I xpdf/splash -I xpdf/xpdf -I xpdf -o fuzzer harness.cc build/*/*.a -lm -ldl -lpthread -lstdc++ -lgcc -lutil -lrt
"""

之后运行 cargo run rebuild 执行所有部分,并生成 fuzzer 文件。

最后在两个窗口中分别运行编译好的 fuzzer,最先运行的 fuzzer 会作为服务端。

总结

作者通过 libafl 编写了一整套模糊测试流程,可以实现基于 forkserver 的模糊测试和基于 persistent mode 的模糊测试。这篇笔记简单总结了 libafl 的堆叠过程与使用流程,足以见出 libafl 的高自由度。

  1. Fuzzing101 with LibAFL - Part I: Fuzzing Xpdf

  2. Fuzzing101 with LibAFL - Part I.V: Speed Improvements to Part I



文章来源: https://5ec.top/00-notes/00-fuzz/libafl/fuzzing101/study-note-1
如有侵权请联系:admin#unsafe.sh