参考 Fuzzing101 with LibAFL - Part I: Fuzzing Xpdf1 和 Fuzzing101 with LibAFL - Part I.V: Speed Improvements to Part I2 做一下笔记。libafl 的自由度相当高,我觉得学习路线会比较陡峭,这一次我就不求甚解一波。
复现
先下载 xpdf
cd fuzzing-101-solutions/exercise-1
wget https://dl.xpdfreader.com/old/xpdf-3.02.tar.gz
tar xvf xpdf-3.02.tar.gz
rm xpdf-3.02.tar.gz
mv xpdf-3.02 xpdf
build.rs 本质上是做了如下工作:
# these are example commands that will be executed automatically by build.rs
# and were taken almost verbatim from Fuzzing101's README
cd fuzzing-101-solutions/exercise-1/xpdf
make clean
rm -rf install
export LLVM_CONFIG=llvm-config-15
CC=afl-clang-fast CXX=afl-clang-fast++ ./configure --prefix=./install
make
make install
具体实现方法之后再看,先照抄。
复制完代码之后发现默认的 libafl 版本是 0.10.1,编译不起来就改成了 0.13.2,结果发现好多东西都变了,比如 libafl::bolts
变成了 libafl_bolts
,还有一个 Executor :
We deleted TimeoutExecutor
and TimeoutForkserverExecutor
and make it mandatory for InProcessExecutor
and ForkserverExecutor
to have the timeout. Now InProcessExecutor
and ForkserverExecutor
have the default timeout of 5 seconds.
参考官方代码 改了一堆问题之后可以编译运行了:
cd exercise-1
cargo build --release
. ./target/release/exercise-one-solution
如果我们要运行其他程序的话,修改 executor 的参数就行了,例如这里的参数为:
let mut executor = ForkserverExecutor :: builder ()
. program ( "./xpdf/install/bin/pdftotext" )
. parse_afl_cmdline ([ "@@" ])
. coverage_map_size ( MAP_SIZE )
. build ( tuple_list! (time_observer, edges_observer))
. unwrap ();
流程
研究一下流程:
Corpus
corpus_dirs
:种子目录;
input_corpus
:保存在内存中的语料库;
timeouts_corpus
:满足需求条件的语料库;
Observer
time_observer
:记录执行时间;
edges_observer
:记录执行边的覆盖率信息;
Feedback
feedback
:选择感兴趣的输入的反馈机制;
组合 edges_observer
、edges_observer
;
objective
:选择满足需求(超时或崩溃)输入的反馈机制;
Monitor:跟踪所有模糊测试客户端
monitor
:这里使用了 SimpleMonitor
向 terminal 发送报告;
Event Manager
mgr
:核心三部件 之一,这里使用 monitor
构建最简单的 SimpleEventManager
;
State
state
:核心三部件 之一,保存模糊测试时的一些必要信息;
组合了 input_corpus
、timeouts_corpus
、feedback
、objective
;
Scheduler
scheduler
:调度策略,作者使用 IndexesLenTimeMinimizerScheduler
选取最快最小的种子;
Fuzzer:
fuzzer:核心三部件 之一,生成种子,并处理执行后的状态和反馈;
组合了 scheduler
、feedback
、objective
;
Executor
executor:执行器;
指定运行的程序和参数;
组合 time_observer
和 edges_observer
;
加载语料库
Mutator
Stage
stage:对单个输入的操作,这里是使用 mutator 对输入做;
运行 Fuzzer
组合了 stages
、executor
、state
和 mgr
。
修改
运行一段时间之后,只有超时没有崩溃,这是因为作者只配置了 TimeoutFeedback
,而高性能机器上在 timeout 之前就 crash 了,所以我建议还是把 CrashFeedback
加上。我们先理解一下原来的 Feedback 及其用法:
// A Feedback, in most cases, processes the information reported by one or more observers to
// decide if the execution is interesting. This one is composed of two Feedbacks using a logical
// OR.
//
// Due to the fact that TimeFeedback can never classify a testcase as interesting on its own,
// we need to use it alongside some other Feedback that has the ability to perform said
// classification. These two feedbacks are combined to create a boolean formula, i.e. if the
// input triggered a new code path, OR, false.
let mut feedback = feedback_or! (
// New maximization map feedback (attempts to maximize the map contents) linked to the
// edges observer. This one will track indexes, but will not track novelties,
// i.e. new_tracking(... true, false).
MaxMapFeedback :: new ( & edges_observer),
// Time feedback, this one never returns true for is_interesting, However, it does keep
// track of testcase execution time by way of its TimeObserver
TimeFeedback :: new ( & time_observer)
);
我们可以看到,这里其实用到了两种 Feedback 的组合。根据注释可知,TimeFeedback
不能独自判断一个样例是否有趣,因此这里用了一个 feedback_or
宏,如果 MaxMapFeedback
判断是否触发了新路径则认为输入是有趣的。
我最开始眼花了,把 TimeFeedback
看成了 TimeoutFeedback
,然而并不是。TimeFeedback
永远不会返回 True,但是它可以跟踪输入的执行时间。
光有趣还不够,我们还要保存一些符合我们要求的输入,例如在这里作者保存的是能触发超时的种子:
// A feedback is used to choose if an input should be added to the corpus or not. In the case
// below, we're saying that in order for a testcase's input to be added to the corpus, it must:
// 1: be a timeout
// AND
// 2: have created new coverage of the binary under test
//
// The goal is to do similar deduplication to what AFL does
//
// The feedback_and_fast macro combines the two feedbacks with a fast AND operation, which
// means only enough feedback functions will be called to know whether or not the objective
// has been met, i.e. short-circuiting logic.
let mut objective =
feedback_and_fast! ( TimeoutFeedback :: new (), MaxMapFeedback :: new ( & edges_observer));
这里作者通过 feedback_and_fast
建立了两个约束,一是要超时,二是要能发现新的路径,这样是为了执行与 AFL 类似的重复数据删除。
在最后,feedback
和 objective
都被用在了 state
中:
//
// Component: State
//
// Creates a new State, taking ownership of all of the individual components during fuzzing.
//
// On the initial pass, setup_restarting_mgr_std returns (None, LlmpRestartingEventManager).
// On each successive execution (i.e. on a fuzzer restart), it returns the state from the prior
// run that was saved off in shared memory. The code below handles the initial None value
// by providing a default StdState. After the first restart, we'll simply unwrap the
// Some(StdState) returned from the call to setup_restarting_mgr_std
let mut state = StdState :: new (
// random number generator with a time-based seed
StdRand :: with_seed ( current_nanos ()),
input_corpus,
timeouts_corpus,
// States of the feedbacks that store the data related to the feedbacks that should be
// persisted in the State.
&mut feedback,
&mut objective,
)
. unwrap ();
第一个参数是随机数生成器,第二个参数是语料库,第三个参数是保存符合目标的语料库的位置。在最后两个参数中,feedback 会记录有趣的种子,objective 会保存符合要求的种子。
好了,在理解作者的意图之后,接下来该怎么做就很明显了,除了超时之外,我们肯定还要考虑能导致崩溃的输入,显然 CrashFeedback
是符合我们要求的。那我们该怎么使用它呢?
继续参考官方的示例 ,它使用了 feedback_or_fast!
宏去同时选取触发崩溃和超时的种子:
// A feedback to choose if an input is a solution or not
let mut objective = feedback_or_fast! ( CrashFeedback :: new (), TimeoutFeedback :: new ());
我们也照葫芦画瓢,引入相关的库后修改就好了。在修改之后运行 fuzz,我们可以看到成功保存了可以触发 crash 的输入。
在几天之后回顾这篇文章时,我发现我已经忘记了编译命令,这里记录一下:
cargo build --release
. ./target/release/exercise-one-solution
问题
那么接下来的问题是:
根据 state 的参数,超时和崩溃似乎保存在了同一目录下,按照 libafl 的设计哲学,这个目录保存的是符合我们要求的输入,那么是否能够分别指定崩溃和超时保存的目录呢?
在通过 feedback_or_fast!(CrashFeedback::new(), TimeoutFeedback::new());
保存符合要求的输入后,libafl 是怎么去重的呢?上文使用 feedback_and_fast!(TimeoutFeedback::new(), MaxMapFeedback::new(&edges_observer));
通过是否发现新路径进行去重,而这里没有发现新路径但也会超时的情况,是否也会被保存呢?如果都会被保存的话,我们是否可以在 feedback_or_fast
之后添加一个 feedback_and_fast
的 MaxMapFeedback
帮助我们去重呢?
上面的这些问题,随着学习路程的继续慢慢解答吧。
加速
我们可以使用持久模式而不是 forkserver 加速模糊测试。在 Fuzzing101 with LibAFL - Part I.V: Speed Improvements to Part I 中,作者修改了 xpdf 的源码并编写了 harness.c,这里不多加描述。
由于我们要将 fuzzer 编译成库,因此接下来将上文的 main.rs 重命名为 lib.rs。接下来看一下和上面的流程 相比有什么改动吧。
Observer
在编译 harness 的过程中,作者使用了 libafl_cc
而不是上文的 afl-clang-[fast|lto]
。对于传统的 afl-clang-[fast|lto]
,libafl 可以根据 __AFL_SHM_ID
环境变量获取覆盖率信息,而对于 libafl_cc
则需要使用 libafl_targets
暴露 EDGES_MAP
:
let edges_observer =
HitcountsMapObserver :: new ( unsafe { std_edges_map_observer ( "edges" ) }) . track_indices ();
Monitor
为了避免目标打印的输出和 fuzzer 的输出混淆,作者使用 MultiMonitor 替换 SimpleMonitor。MultiMonitor 可以展示和累计每个客户端的统计数据。
let monitor = MultiMonitor :: new ( | s | {
println! ( "{}" , s);
});
Event Manager
对于 MultiMonitor,使用的时候需要启动两个 fuzzer,按照作者的意思第一个开启的 fuzzer 也是客户端,但我觉得,这都开启端口听其他客户端的消息了,怎么看都是服务端吧:
let (state, mut mgr) = match setup_restarting_mgr_std (monitor, 1337 , EventConfig :: AlwaysUnique )
{
Ok (res) => res,
Err (err) => match err {
Error :: ShuttingDown => {
return Ok (());
}
_ => {
panic! ( "Failed to setup the restarting manager: {}" , err);
}
},
};
Harness
这是 forkserver 不存在的一个部件,专为 InProcessExecutor
而构造的。其中的 libfuzzer_test_one_input
就是我们编写 harness 时的 LLVMFuzzerTestOneInput
:
let mut harness = | input : & BytesInput | {
let target = input . target_bytes ();
let buffer = target . as_slice ();
libfuzzer_test_one_input (buffer);
ExitKind :: Ok
};
Executor
既然使用了持久模式,相应的 executor 也会发生变化,InProcessExecutor
相比 ForkserverExecutor
需要更多的组件:
let mut executor = InProcessExecutor :: new (
&mut harness,
tuple_list! (edges_observer, time_observer),
&mut fuzzer,
&mut state,
&mut mgr,
)
. unwrap ();
具体的流程是什么样的还是之后慢慢研究吧。
运行 Fuzzer
这一部分主要是加入了类似于 __AFL_LOOP
的机制,确定重启次数以及设置可能的手动重启:
fuzzer
. fuzz_loop_for ( &mut stages, &mut executor, &mut state, &mut mgr, 10000 )
. unwrap ();
// Since were using this fuzz_loop_for in a restarting scenario to only run for n iterations
// before exiting, we need to ensure we call on_restart() and pass it the state. This way, the
// state will be available in the next, respawned, iteration.
mgr . on_restart ( &mut state) . unwrap ();
作者使用了 cargo make 机制来完成整个流程,通过 Makefile.toml
编写各个部分:
exercise-1.5/Makefile.toml [ tasks . clean ]
dependencies = [ "cargo-clean" , "afl-clean" , "clean-xpdf" ]
[ tasks . afl-clean ]
script = '''
rm -rf .cur_input* timeouts fuzzer fuzzer.o libexerciseonepointfive.a
'''
[ tasks . clean-xpdf ]
cwd = "xpdf"
script = """
make --silent clean
rm -rf built-with-* ../build/*
"""
[ tasks . cargo-clean ]
command = "cargo"
args = [ "clean" ]
[ tasks . rebuild ]
dependencies = [ "afl-clean" , "clean-xpdf" , "build-compilers" , "build-xpdf" , "build-fuzzer" ]
[ tasks . build-compilers ]
script = """
cargo build --release
cp -f ../target/release/libexerciseonepointfive.a .
"""
[ tasks . build-xpdf ]
cwd = "build"
script = """
cmake ../xpdf -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=$(pwd)/../../target/release/compiler -DCMAKE_CXX_COMPILER=$(pwd)/../../target/release/compiler_pp
make
"""
[ tasks . build-fuzzer ]
script = """
../target/release/compiler_pp -I xpdf/goo -I xpdf/fofi -I xpdf/splash -I xpdf/xpdf -I xpdf -o fuzzer harness.cc build/*/*.a -lm -ldl -lpthread -lstdc++ -lgcc -lutil -lrt
"""
之后运行 cargo run rebuild
执行所有部分,并生成 fuzzer
文件。
最后在两个窗口中分别运行编译好的 fuzzer,最先运行的 fuzzer 会作为服务端。
总结
作者通过 libafl 编写了一整套模糊测试流程,可以实现基于 forkserver 的模糊测试和基于 persistent mode 的模糊测试。这篇笔记简单总结了 libafl 的堆叠过程与使用流程,足以见出 libafl 的高自由度。