FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage

一、论文阅读

Abstract

FairFuzz两个关键点：1.自动识别由少数输入覆盖的分支（稀有分支），通过变异往往很难覆盖到这些分支。2. mutation mask 变异算法，使变异偏向于产生覆盖稀有分支的输入

1. Introduction

AFL通常无法覆盖关键程序功能，例如，AFL没有涵盖djpeg中的colorspace转换代码，xmllint中的属性列表处理代码以及tcpdump中的大量数据包结构。

自动识别稀有分支，使变异偏向于生成覆盖稀有分支的输入，因为已经覆盖稀有分支的输入的某些部分（称为关键部分吧）已经满足覆盖该分支所需的条件，因此不应该在关键部分上进行变异，FairFuzz通过执行许多小型变异实验来识别输入中的关键部分。

该变异算法和魔法字节检查以及能量分配策略是正交的。

本文贡献：

提出了一种新颖的轻量级 mutation mask 策略
开发了FairFuzz
对FairFuzz进行了评估

2. Overview

需要注意的是，这篇文章将AFL中的边的表示方式称为分支

2.1 Limitations of AFL

作者在如下图所示的Xmllint代码段上进行了20次24小时的实验，只有一次AFL产生了通过第一行的输入，但是没有通过第六行，所以31行后的大量代码都是未被探索过的。一个例子：在产生了 ”<!ATTLIST BD“ 的输入后， ”<!ATTLIST“ 这部分就是关键部分，就不需要再变异了，但是AFL往往会继续变异关键部分

2.2 Overview of FairFuzz

识别稀有分支，根据覆盖到分支的输入的数量区分吗，识别后就可以进行靶向处理了。还有一点是稀有分支所保护的代码往往比非稀有分支所保护的代码要少得多

改变变异策略，使用确定性变异阶段来近似确定输入中不应变异的关键部分，以使变异用例稀有分支。

3. FairFuzz Algorithm

两点：1.修改从队列中选择种子进行变异的策略；2.修了对这些输入执行变异的方式。

3.1 Mutation Masking

给定输入 x ，fuzz目标 T，satisfies(x, T) 为真仅当 x 满足 T

定义一： 元组 (c, m) ，其中 m 是变异的字节数， c 是以下三个变异类型之一：

O：用一些值从位置 k 处开始覆盖 m 个字节
I：在位置 k 处插入 m 字节
D：从位置 k 处开始删除 m 字节

对于 O 和 I ，要插入或是复写的字节内容需要明确，

mutate(x, u, i) 表示在输入 x 的 i 位置处进行 u = (c, m) 变异

定义二： 对于输入 x 和目标 T 的变异蒙版 maskx,T:N→P({O,I,D})mask_{x,T}:N \rightarrow P(\{O,I,D\})maskx,T:N→P({O,I,D}) 输入 i 为 x 的变异位置，返回 $ {O,I,D}$ 的一个子集。一个变异类型 c∈maskx,T(i)c\in mask_{x,T}(i)c∈maskx,T(i) 仅当 staisfies(mutate(x,(c,1),i),T)staisfies(mutate(x,(c,1),i),T)staisfies(mutate(x,(c,1),i),T) 为真，即如果在 x 的位置 i 处应用变异 c 的变异结果满足 T ，则 c 存在于 maskx,T(i)mask_{x,T}(i)maskx,T(i) 中。

变异蒙版表示输入 x 的位置 i 处变异产生的输入能否到达目标代码。

如图所示，第12 行，在确定性阶段过滤掉不能到达目标代码的变异。在随机性阶段不再随机选择变异位置，而是通过 randomOkToMutate(maskx,T,u)randomOkToMutate(mask_{x,T},u)randomOkToMutate(maskx,T,u) 确定变异位置，就是对的okToMutate(maskx,T,u,i)okToMutate(mask_{x,T},u,i)okToMutate(maskx,T,u,i)一个调用，但是参数 i 在 [0, x -m-1] 之间平均取样，所有符合要求的 i 组成一个集合，从集合中随机选择一个作为变异位置。如果okToMutate(maskx,T,u,i)okToMutate(mask_{x,T},u,i)okToMutate(maskx,T,u,i) 得出的位置集合为空，则跳过此次变异（25行），并在下一轮循环中选择一个新的 u=(c,m)u = (c,m)u=(c,m)

3.2 Targeting Rare Branches

FairFuzz仅对覆盖稀有分支的种子进行变异。

定义三： hits(x,b)hits(x, b)hits(x,b) 表示输入 x 覆盖到分支 b，命中计数表示覆盖到某个分支的输入数量

定义四： L 表示迄今为止产生的所有输入的集合，对分支 b 的命中计数为：numHits[b]=∣x∈L:hits(x,b)∣numHits[b]=|x \in L : hits(x,b)|numHits[b]=∣x∈L:hits(x,b)∣ ，具体地说numHits 是一个分支到命中计数的映射，每测试一个输入就会更新。FairFuzz会首先在没有变异蒙版的情况下对种子进行一轮变异，以建立numHits 。

经过实验，将被最少输入覆盖的n个分支指定为稀有，或者被少于p % 的输入覆盖的分支指定为稀有是不可行的，因为如果 n=2，最少被覆盖的2个都将被认为是稀有分支，即使他们的命中计数分别为20和5000，显然是不合理的。其次，阈值需要针对不同的目标程序进行修改。于是作者将稀有分支定义为命中计数小于动态

定义五： Bv={b∈B:numHits[b]>0}B_v=\{b \in B : numHits[b]>0\}Bv={b∈B:numHits[b]>0}，其中 B 为目标程序中所有分支集合，则 BvB_vBv 表示已经覆盖到的分支集合。如果numHits[b]≤ratity_cutoffnumHits[b]\leq ratity\_cutoffnumHits[b]≤ratity_cutoff，则 b 为稀有分支，其中 ratity_cutoff=2iratity\_cutoff=2^iratity_cutoff=2i 其中i满足 2i−1<min(numHits[b′])≤2i,b′∈Bv2^{i-1} < min(numHits[b\prime])\leq 2^i, b\prime \in B_v2i−1<min(numHits[b′])≤2i,b′∈Bv

比如，如果最少的命中次数为17，则 ratity_cutoff=25ratity\_cutoff=2^5ratity_cutoff=25

定义六： branches(x)=b∈B:hits(x,b)branches(x)={b \in B:hits(x,b)}branches(x)=b∈B:hits(x,b) ，branches(x) 表示输入 x 覆盖到的分支集合。则 x 覆盖的命中次数最少分支 b∗=argmin(numHits[b]),b∈branches(x)b^*=argmin (numHits[b]), b \in branches(x)b∗=argmin(numHits[b]),b∈branches(x)

如果一个输入覆盖的命中次数最少分支是稀有分支，则FairFuzz才会对其变异（很容易理解，命中次数最少的分支都不是稀有分支，则就没有覆盖到稀有分支）。

需要注意的是FairFuzz并不是只采用上述策略选择输入，也会使用AFL的默认策略运行一些周期。

上图为蒙版算法，流程为：对于输入 x 的每个位置 i ，FairFuzz分别变异出 xOx^OxO（字节翻转）、 xIx^IxI（插入随机字节）、 xDx^DxD（字节删除），对于每个变异的用例 xcx^cxc ，FairFuzz将其输入到目标程序中测试，判断 hits(xc,b)hits(x^c,b)hits(xc,b) 是否为真。根据 xcx^cxc 是否覆盖到 b 分别将位置 i 标记为O、I、D。与AFL确定性变异阶段相比，仅添加了两种变异类型（插入随机字节和字节删除，字节翻转AFL本来就有），所以开销可以忽略。

需要注意的是，O和 I 的计算是近似的，FairFuzz并不检查每个覆盖或是插入后的输入到会导致分支 b 被覆盖，因为那样做的开销太大以及产生很多冗余输入（4.2详细说明）。

3.3 Trimming Inputs for Testing Targets

AFL保持测试用例较小的处理：

优先考虑更短的测试用例
变异之前修剪，覆盖相同路径的最小化测试用例

FairFuzz 放宽了修剪的修剪的约束，覆盖相同的分支就行而不是走一样的路径。AFL根据修剪前后的校验和是否相同决定是否修剪，如果校验和相同，则说明 trace_bits 中的内容完全相同（即命中的边和命中次数都相同，注意是已经分桶的 trace_bits），所以修剪。

4. Implementation and Evaluation

4.1 Coverage Compared to Prior Trchniques

以分支覆盖率作为指标，为什么不是路径覆盖率。因为路径覆盖是依赖于覆盖顺序的，下面一个例子。输入 A 覆盖了分支 i，输入 B 覆盖了分支 j ，输入 C 覆盖了分支 i 和 j，如果先模糊A和B，C将不会被认为是新的覆盖，将不会被记录。同理，如果先模糊C，A和B将也不会被记录。

4.2 Can Masking Effectively Target Branches？

5. Discussion

FairFuzz最大的限制是无法将未被覆盖过的分支作为目标。将多字符比较转换为单字符比较（即laf-intel的功能）再用FairFuzz测试效果可能更好，但是这部分作者没有实验证明。

二、源码分析

输入参数

-b：禁用变异蒙版
-q num：一轮中没有发现新的分支后的处理策略 num = (1, 2, 3)
- 1：回到常规的AFL队列处理以及变异，直到发现新的branch再次启用变异蒙版
- 2：回到常规的AFL队列处理以及变异，禁用确定性变异，直到发现新的branch再次启用变异蒙版
- 3：回到常规的AFL队列处理以及变异执行一轮
-r：额外的修剪模式（不会停用AFL原有的trim_case），保持命中相同的分支就行，放宽了修剪约束，也会导致修剪掉的东西更多。建议种子文件较大并且启用确定性变异时启用该模式。
-s：shadow模式，对于每个种子，先用普通AFL变异策略执行一遍，然后再用变异蒙版模式执行一遍。方便比较。

部分变量

branch_mask：变异蒙版，每次 fuzz_one 都会重新计算

rb_skip_deterministic：布尔值，是否跳过确定性变异

skip_simple_bitflipd`：布尔值，是否跳过 bitflip1/1

position_map：仅用于随机性变异阶段，有效变异位置列表，执行每次变异前重新计算

vanilla_afl =1000 ，该变量定义时就赋值1000，表示在普通AFL下运行多少次（调用common_fuzz_stuff）后启动变异蒙版。

hit_bits ：哈希表，每个分支的命中次数（每个分支被多少个用例覆盖到），初始全0，如果是resume则从文件中恢复

cycle_wo_new ：布尔值，一轮中没有新发现为真，否则为假。每一轮开始时置为真，如果有发现就置为假

prev_cycle_wo_new：布尔值，一轮中没有新发现为真，否则为假。-q 3 时有用

rb_fuzzing ：稀有分支id，[1, 65536]

position_map：存储随机性变异阶段有效的变异位置

1. fuzz_one

    /******************* Initialization *******************/if (!vanilla_afl){   // 判断是否已经到达普通AFL运行的的次数，可以启动变异蒙版了if (prev_cycle_wo_new && bootstrap){ // 如果上一轮没有任何发现且 -q numvanilla_afl = 1; // 继续使用普通AFLrb_fuzzing = 0;        // 也是不启用变异蒙版的标志之一if (bootstrap == 2){ // -q 2skip_deterministic_bootstrap = 1;   // 本轮的所有种子跳过确定性变异}}
}if (skip_deterministic){rb_skip_deterministic = 1;skip_simple_bitflip = 1;
}// 如果是普通AFL模式，就按照AFL的策略一定的概率跳过，变异蒙版模式则如下
/* select inputs which hit rare branches */
if (!vanilla_afl) {     // 如果已经不是普通AFL模式skip_deterministic_bootstrap = 0;  // 变异蒙版是一定需要确定性变异的，因为要计算branch_masku32 * min_branch_hits = is_rb_hit_mini(queue_cur->trace_mini);    // 命中的稀有分支列表
// 因为这里要用到queue_cur->trace_mini，所以作者对 add_to_queue() 函数做了些许修改if (min_branch_hits == NULL){  // 没有命中任何稀有分支，跳过当前种子// not a rare hit. don't fuzz.return 1;} else { int ii;for (ii = 0; min_branch_hits[ii] != 0; ii++){rb_fuzzing = min_branch_hits[ii];if (rb_fuzzing){int byte_offset = (rb_fuzzing - 1) >> 3;int bit_offset = (rb_fuzzing - 1) & 7;// skip deterministic if we have fuzzed this min branch // fuzzed_branches记录种子已经模糊过的稀有分支，每一bit代表一个分支// 如果当前种子fuzz过当前稀有分支，则跳过，很显然再次模糊意义已经不大了if (queue_cur->fuzzed_branches[byte_offset] & (1 << (bit_offset))){// let's try the next onecontinue;} else { // 如果当前种子fuzz过任何其他的稀有分支，则跳过 bitflip 1/1。可能因为重复吧？for (int k = 0; k < MAP_SIZE >> 3; k++){if (queue_cur->fuzzed_branches[k] != 0){DEBUG1("We fuzzed this guy already\n");skip_simple_bitflip = 1;break;}}// indicate we have fuzzed this branch idqueue_cur->fuzzed_branches[byte_offset] |= (1 << (bit_offset));  // 记录// chose minimumbreak;}} else break; }// if we got to the end of min_branch_hits...// it's either because we fuzzed all the things in min_branch_hits// or because there was nothing. If there was nothing, // min_branch_hits[0] should be 0 // 如果遍历到了min_branch_hits末尾，说明该种子该列表中的所有稀有分支都已经fuzz过了，或者min_branch_hits[0] = 0if (!rb_fuzzing || (min_branch_hits[ii] == 0)){rb_fuzzing = min_branch_hits[0];if (!rb_fuzzing) {    // min_branch_hits[0] = 0 ，直接返回return 1;}DEBUG1("We fuzzed this guy already for real\n"); // 都已经fuzz过了skip_simple_bitflip = 1;   // 跳过 bitflip 1/1rb_skip_deterministic = 1;    // 跳过确定性阶段}ck_free(min_branch_hits);if (!skip_simple_bitflip){cycle_wo_new = 0; }DEBUG1("Trying to fuzz input %s: \n", queue_cur->fname);DEBUG1("which hit branch %i (hit by %u inputs) \n", rb_fuzzing -1, hit_bits[rb_fuzzing -1]);}
}...u32 orig_bitmap_size = queue_cur->bitmap_size;  // 保存原有变量
u64 orig_exec_us = queue_cur->exec_us;if (rb_fuzzing && trim_for_branch) {  // 如果启动分支蒙版，并且参数中有-r u32 trim_len = trim_case_rb(argv, in_buf, len, out_buf);   // 修剪后的剩余长度，只要命中rb_fuzzing就行if (trim_len > 0){len = trim_len;/* this is kind of an unfair time measurement because theone in calibrate includes a lot of other loop stuff这是一种不公平的时间测量因为calibrate中包含了很多其他的循环*/u64 start_time = get_cur_time_us();write_to_testcase(in_buf, len);run_target(argv, exec_tmout);/* we are setting these to get a more accurate performance score */queue_cur->exec_us = get_cur_time_us() - start_time; // 重新计算运行用时和占用的位图大小queue_cur->bitmap_size = count_bytes(trace_bits);    // 因为不再是覆盖相同路径，而是命中相同的稀有分支就行了，所以占用的位图大小很可能会变化}}memcpy(out_buf, in_buf, len);orig_perf = perf_score = calculate_score(queue_cur);
/* @RB@ */
orig_total_execs = total_execs;if (rb_fuzzing && trim_for_branch){/* restoring these because the changes to the test case were not permanent */queue_cur->bitmap_size = orig_bitmap_size;  // 恢复queue_cur->exec_us =  orig_exec_us;
}/* @RB@ */
re_run: // re-run when running in shadow mode   shadow模式if (rb_fuzzing){if (run_with_shadow && !shadow_mode){shadow_mode = 1;virgin_virgin_bits = ck_alloc(MAP_SIZE); // 保存全局位图memcpy(virgin_virgin_bits, virgin_bits, MAP_SIZE);shadow_prefix = "PLAIN AFL: ";} else if (run_with_shadow && shadow_mode) {    // 重置变量// reset all stats. nothing is added to queue.  shadow_mode = 0;queued_discovered = orig_queued_discovered;queued_with_cov = orig_queued_with_cov;perf_score = orig_perf; //NOTE: this line is not stricly necessary. total_execs = orig_total_execs;memcpy(virgin_bits, virgin_virgin_bits, MAP_SIZE); // 保存全局位图ck_free(virgin_virgin_bits);shadow_prefix = "RB: ";}}if (vanilla_afl || shadow_mode || (use_branch_mask == 0)){// alloc_branch_mask将branch_mask每个字节置为7(00000111)，最后一个字节置为4(00000100)// 因为往下会看到字节翻转用1(000000001)标记，字节删除用2(00000010)标记，字节插入用4(00000100)标记// 每个字节置为7(00000111)就是这个种子的所有位置应用三种变异都是有效的，因为是普通AFL模式，要让havoc阶段的变异都能采用// 最后一个字节值为4(00000100)也很容易理解，因为是len+1，最后一个位置只能插入，不能翻转和删除branch_mask = alloc_branch_mask(len + 1); orig_branch_mask = alloc_branch_mask(len + 1);
} else {    // 变异蒙版模式branch_mask = ck_alloc(len + 1);orig_branch_mask = ck_alloc(len + 1);
}position_map = ck_alloc(sizeof(u32) * (len+1));  // 存储随机性变异阶段的有效的变异位置if ((!rb_fuzzing && skip_deterministic) || skip_deterministic_bootstrap || (vanilla_afl && queue_cur->was_fuzzed ) || (vanilla_afl && queue_cur->passed_det))goto havoc_stage;/* Skip deterministic fuzzing if exec path checksum puts this out of scopefor this master instance. */if (master_max && (queue_cur->exec_cksum % master_max) != master_id - 1) {if (!rb_fuzzing || shadow_mode) goto havoc_stage;// skip all but branch mask creation if we're RB fuzzingelse {rb_skip_deterministic=1; skip_simple_bitflip=1;}
}  /* Skip simple bitflip if we've done it already */
if (skip_simple_bitflip) {  // 跳过bitflip1/1位翻转new_hit_cnt = queued_paths + unique_crashes;goto skip_simple_bitflip;
}.../******************* Get branch_mask *******************/// O，字节复写，AFL默认变异策略中有字节翻转，所以这部分就加了几行就完事了
stage_name  = "bitflip 8/8";
stage_short = "flip8";
stage_max   = len;orig_hit_cnt = new_hit_cnt;for (stage_cur = 0; stage_cur < stage_max; stage_cur++) {stage_cur_byte = stage_cur;out_buf[stage_cur] ^= 0xFF;  if (common_fuzz_stuff(argv, out_buf, len)) goto abandon_entry;if (rb_fuzzing && !shadow_mode && use_branch_mask > 0)if (hits_branch(rb_fuzzing - 1)){    // 如果命中branch_mask[stage_cur] = 1;     // 标记为 00000001}/* We also use this stage to pull off a simple trick: we identifybytes that seem to have no effect on the current execution patheven when fully flipped - and we skip them during more expensivedeterministic stages, such as arithmetics or known ints. */if (!eff_map[EFF_APOS(stage_cur)]) {u32 cksum;/* If in dumb mode or if the file is very short, just flag everythingwithout wasting time on checksums. */if (!dumb_mode && len >= EFF_MIN_LEN)cksum = hash32(trace_bits, MAP_SIZE, HASH_CONST);elsecksum = ~queue_cur->exec_cksum;if (cksum != queue_cur->exec_cksum) {eff_map[EFF_APOS(stage_cur)] = 1;eff_cnt++;}}out_buf[stage_cur] ^= 0xFF;
}/* @RB@ also figure out add/delete map in this stage */if (rb_fuzzing && !shadow_mode && use_branch_mask > 0){    // buffer to clobber with new thingsu8* tmp_buf = ck_alloc(len+1);// check if we can delete this byte// D 字节删除stage_short = "rbrem8";for (stage_cur = 0; stage_cur < len; stage_cur++) { // 逐个删除字节/* delete current byte */stage_cur_byte = stage_cur;/* head */memcpy(tmp_buf, out_buf, stage_cur);/* tail */memcpy(tmp_buf + stage_cur, out_buf + 1 + stage_cur, len - stage_cur - 1 );if (common_fuzz_stuff(argv, tmp_buf, len - 1)) goto abandon_entry;/* if even with this byte deleted we hit the branch, can delete here */if (hits_branch(rb_fuzzing - 1)){  // 如果命中稀有分支branch_mask[stage_cur] += 2;   // 标记 00000010}}// check if we can add at this byte// I 字节插入随机值stage_short = "rbadd8";for (stage_cur = 0; stage_cur <= len; stage_cur++) {    // 逐位置添加[0, 256)中的一个随机值/* add random byte */stage_cur_byte = stage_cur;/* head */memcpy(tmp_buf, out_buf, stage_cur);tmp_buf[stage_cur] = UR(256);/* tail */memcpy(tmp_buf + stage_cur + 1, out_buf + stage_cur, len - stage_cur);if (common_fuzz_stuff(argv, tmp_buf, len + 1)) goto abandon_entry;/* if adding before still hit branch, can add */if (hits_branch(rb_fuzzing - 1)){ // 如果命中稀有分支branch_mask[stage_cur] += 4;   // 标记 00000100}}ck_free(tmp_buf);// save the original branch mask for after the havoc stage memcpy (orig_branch_mask, branch_mask, len + 1);
}// 在确定性阶段，先求出变异蒙版branch_mask,后续变异要用.../****************** Deterministic ******************/// 后续确定性阶段的每种变异(除了bitflip 1/1和bitflip 8/8)都加了if (rb_fuzzing)判断// bitflip 2/1if (rb_fuzzing){ //&& use_mask()){   // 如果是变异蒙版模式// only run modified case if it won't produce garbageif (!(branch_mask[stage_cur_byte] & 1))) {    // 如果是个无效的位置，直接跳过stage_max--;    // 执行次数减1continue;}// if we're spilling into next byte, check that that byte can// be modified// 如果要翻转的两个bit是属于两个byte，判断第二个bit所在的byte是否有效，两个byte都有效才能进变异if ((stage_cur_byte != ((stage_cur + 1)>> 3))&& (!(branch_mask[stage_cur_byte + 1] & 1))){stage_max--;continue;}}// bitflip 4/1 同上if (rb_fuzzing){//&& use_mask()){// only run modified case if it won't produce garbageif (!(branch_mask[stage_cur_byte] & 1)) {stage_max--;continue;}// if we're spilling into next byte, check that that byte can// be modifiedif ((stage_cur_byte != ((stage_cur + 3)>> 3))&& (!(branch_mask[stage_cur_byte + 1] & 1))){stage_max--;continue;}}// bitflip 16/8if (rb_fuzzing ){// skip if either byte will modify the branchif (!(branch_mask[i] & 1) || !(branch_mask[i+1] & 1) ){ // 两字节都有效才变异stage_max--;continue;}}// bitflip 32/8if (rb_fuzzing){// skip if either byte will modify the branchif (!(branch_mask[i] & 1) || !(branch_mask[i+1]& 1) ||!(branch_mask[i+2]& 1) || !(branch_mask[i+3]& 1) ){ // 四字节都有效才变异stage_max--;continue;}}// arith 8/8if (rb_fuzzing){if (!(branch_mask[i]& 1) ){  stage_max -= 2 * ARITH_MAX;continue;}}// arith 16/8if (rb_fuzzing){if (!(branch_mask[i] & 1) || !(branch_mask[i+1] & 1)){stage_max -= 4 * ARITH_MAX;continue;}}// arith 32/8if (rb_fuzzing ){// skip if either byte will modify the branchif (!(branch_mask[i] & 1) || !(branch_mask[i+1]& 1) ||!(branch_mask[i+2]& 1) || !(branch_mask[i+3]& 1)){stage_max -= 4 * ARITH_MAX;continue;}}// interest 8/8  interest 16/8  interest 32/8 同上// user extras(over)if (rb_fuzzing ){//&& use_mask()){// if any fall outside the mask, skipint bailing = 0;// 如果要覆盖的任何一个字节是无效的，则跳过for (int ii = 0; ii < extras[j].len; ii++){if (!(branch_mask[i + ii] & 1)){bailing = 1;break;}}if (bailing){stage_max--;continue;}        }// user extras(insert)if (!(branch_mask[i] & 4) ){  // 注意这里是 branch_mask[i] & 4,因为上面说过插入的标志是4(00000100)stage_max--;continue;}// auto extras (over) 同user extras(over)/***************** RANDOM HAVOC *****************/// 该阶段，每种变异之前会调用  get_random_modifiable_posn(复写或是删除) 或是 get_random_insert_posn(插入)从有效的位置中随机选择一个作为变异位置// 特别的case 11、12、13、16, 由于改变了原有种子的长度，所以branch_mask 和 position_map 的长度也要改变
case 11 ... 12: {   // 删除对position_map没影响，可以不用处理/* Delete bytes. We're making this a bit more likelythan insertion (the next option) in hopes of keepingfiles reasonably small. */u32 del_from, del_len;if (temp_len < 2) break;/* Don't delete too much. */del_len = choose_block_len(temp_len - 1);del_from = get_random_modifiable_posn(del_len*8, 2, temp_len, branch_mask, position_map);if (del_from == 0xffffffff) break;memmove(out_buf + del_from, out_buf + del_from + del_len,temp_len - del_from - del_len);// remove that data from the branch mask 改变branch_mask长度// the +1 copies over the last part of branch_mask  +1因为n个人，n+1个空隙memmove(branch_mask + del_from, branch_mask + del_from + del_len,temp_len - del_from - del_len + 1);temp_len -= del_len;break;}case 13:if (temp_len + HAVOC_BLK_XL < MAX_FILE) {/* Clone bytes (75%) or insert a block of constant bytes (25%). */u8 actually_clone = UR(4);u32 clone_from, clone_to, clone_len;u8* new_buf;u8* new_branch_mask; if (actually_clone) {clone_len  = choose_block_len(temp_len);clone_from = UR(temp_len - clone_len + 1);} else {clone_len = choose_block_len(HAVOC_BLK_LARGE);clone_from = 0;}clone_to   = get_random_insert_posn(temp_len, branch_mask, position_map);if (clone_to == 0xffffffff) break; // this shouldn't happen, probably...new_buf = ck_alloc_nozero(temp_len + clone_len);// 改变branch_mask长度，中间增加的部分全部为7，即有效new_branch_mask = alloc_branch_mask(temp_len + clone_len + 1);/* Head */memcpy(new_buf, out_buf, clone_to);   //memcpy(new_branch_mask, branch_mask, clone_to);   /* Inserted part */if (actually_clone)memcpy(new_buf + clone_to, out_buf + clone_from, clone_len);elsememset(new_buf + clone_to,UR(2) ? UR(256) : out_buf[UR(temp_len)], clone_len);/* Tail */memcpy(new_buf + clone_to + clone_len, out_buf + clone_to,temp_len - clone_to);memcpy(new_branch_mask + clone_to + clone_len, branch_mask + clone_to,temp_len - clone_to + 1);ck_free(out_buf);ck_free(branch_mask);out_buf = new_buf;branch_mask = new_branch_mask;temp_len += clone_len;// 改变branch_mask长度position_map = ck_realloc(position_map, sizeof (u32) * (temp_len + 1));if (!position_map)PFATAL("Failure resizing position_map.\n");
}// case 16 与case 13同理/************* SPLICING *************/
// 由于拼接过后种子长度也发生改变，所以进行类似case13 的处理
new_branch_mask = alloc_branch_mask(len + 1); // 增加空间，默认有效
// 可以看到是保留当前种子的前半部分branch_mask，后半部分直接全部默认有效
memcpy(new_branch_mask, branch_mask, MIN(split_at, temp_len + 1));
ck_free(branch_mask);
branch_mask = new_branch_mask;
ck_free(orig_branch_mask);
orig_branch_mask = ck_alloc(len +1);
//ck_realloc(orig_branch_mask, len + 1);
memcpy (orig_branch_mask, branch_mask, len + 1);
position_map = ck_realloc(position_map, sizeof (u32) * (len + 1));
if (!position_map)PFATAL("Failure resizing position_map.\n");

2. add_to_queue 中的更改

// 由于需要用trace_mini 计算得到 branch_mask 所以在加入种子队列之前就会求出trace_mini以及分配了fuzzed_branches
struct queue_entry* q = ck_alloc(sizeof(struct queue_entry));
// @RB@ added these for every queue entry
q->trace_mini = ck_alloc(MAP_SIZE >> 3);
minimize_bits(q->trace_mini, trace_bits);
q->fuzzed_branches = ck_alloc(MAP_SIZE >>3);
// @End

3. preform_dry_run 中的更改

// 但是输入文件夹下的初始种子add_to_queue之前并未执行过，计算出来的trace_min是空的，
// 所以在preform_dry_run中执行过calibrate_case后再次计算得到trace_min
res = calibrate_case(argv, q, use_mem, 0, 1);
// @RB@ added these for every queue entry
// free what was added in add_to_queue
ck_free(q->trace_mini);
ck_free(q->fuzzed_branches);
q->trace_mini = ck_alloc(MAP_SIZE >> 3);
minimize_bits(q->trace_mini, trace_bits);
q->fuzzed_branches = ck_alloc(MAP_SIZE >>3);
// @End

4. increment_hit_bits 更新hit_bits

// 在函数在save_if_interesting中被调用，也就是每个用例执行完后，就会更新给hit_bits。但是初始种子的命中信息并未被计入？？？
static void increment_hit_bits(){for (int i = 0; i < MAP_SIZE; i++){if ((trace_bits[i] > 0) && (hit_bits[i] < ULONG_MAX))hit_bits[i]++;}
}

5. is_rb_hit_mini 求当前种子覆盖的稀有分支列表

static u32 * is_rb_hit_mini(u8* trace_bits_mini){int * rarest_branches = get_lowest_hit_branch_ids();   // 稀有分支列表  MAX_RARE_BRANCHES = 256u32 * branch_ids = ck_alloc(sizeof(u32) * MAX_RARE_BRANCHES);   // 命中的稀有分支列表u32 * branch_cts = ck_alloc(sizeof(u32) * MAX_RARE_BRANCHES);  // 命中次数，排序用的int min_hit_index = 0;for (int i = 0; i < MAP_SIZE ; i ++){if (unlikely (trace_bits_mini[i >> 3]  & (1 <<(i & 7)) )){ // 如果命中int cur_index = i;int is_rare = contains_id(cur_index, rarest_branches);   // 是否是稀有分支if (is_rare) {    // 如果是稀有分支// at loop initialization, set min_branch_hit properlyif (!min_hit_index) {branch_cts[min_hit_index] = hit_bits[cur_index];branch_ids[min_hit_index] = cur_index + 1;}// in general just check if we're a smaller branch // than the previously found minint j;for (j = 0 ; j < min_hit_index; j++){    // 对命中的稀有分支列表排序，越稀有越靠前if (hit_bits[cur_index] <= branch_cts[j]){ memmove(branch_cts + j + 1, branch_cts + j, min_hit_index -j);memmove(branch_ids + j + 1, branch_ids + j, min_hit_index -j);branch_cts[j] = hit_bits[cur_index];branch_ids[j] = cur_index + 1; // 默认值为0，+1变成非零值区别其他用例}}// append at endif (j == min_hit_index){  // 比列表中所有的都多，就放在最后面branch_cts[j] = hit_bits[cur_index];// + 1 so we can distinguish 0 from other casesbranch_ids[j] = cur_index + 1;}// this is only incremented when is_rare holds, which should// only happen a max of MAX_RARE_BRANCHES -1 times -- the last// time we will never reenter so this is always < MAX_RARE_BRANCHES// at the top of the if statementmin_hit_index++;}}}ck_free(branch_cts);  // 用来排序的，排序后就释放掉ck_free(rarest_branches);if (min_hit_index == 0){  // 没有命中任何稀有分支ck_free(branch_ids);branch_ids = NULL;} else {// 0 terminate the arraybranch_ids[min_hit_index] = 0; // 添加结束标志}return branch_ids;}

6. get_lowest_hit_branch_ids 求动态阈值，同时得到稀有分支列表

static int* get_lowest_hit_branch_ids(){ // 论文中的定义5，求动态阈值的过程，同时得到稀有分支列表int * rare_branch_ids = ck_alloc(sizeof(int) * MAX_RARE_BRANCHES); // MAX_RARE_BRANCHES = 256int lowest_hob = INT_MAX;int ret_list_size = 0;for (int i = 0; (i < MAP_SIZE) && (ret_list_size < MAX_RARE_BRANCHES - 1); i++){// ignore unseen branches. sparse array -> unlikely if (unlikely(hit_bits[i] > 0)){  // hit_bits 所有分支的命中次数if (contains_id(i, blacklist)) continue;   // 分支在黑名单中则直接跳过unsigned int long cur_hits = hit_bits[i];int highest_order_bit = 0;while(cur_hits >>=1) // 找大于 最少命中次数 的第一个2的次幂highest_order_bit++;lowest_hob = highest_order_bit < lowest_hob ? highest_order_bit : lowest_hob;if (highest_order_bit < rare_branch_exp){       // rare_branch_exp 该变量是界定稀有分支的阈值// if we are an order of magnitude smaller, prioritize the// rarer branches // 如果求得的highest_order_bit比阈值rare_branch_exp还要第一个量级（也是减1的原因），更新阈值if (highest_order_bit < rare_branch_exp - 1){rare_branch_exp = highest_order_bit + 1;// everything else that came before had way more hits// than this one, so remove from listret_list_size = 0;    // 清空原有列表}rare_branch_ids[ret_list_size] = i;  // 保存稀有分支ret_list_size++;}}}if (ret_list_size == 0){DEBUG1("Was returning list of size 0\n");if (lowest_hob != INT_MAX) {    // 如果列表为空但是lowest_hob的值变化了，阈值调大一个量级再执行一次rare_branch_exp = lowest_hob + 1;DEBUG1("Upped max exp to %i\n", rare_branch_exp);ck_free(rare_branch_ids);return get_lowest_hit_branch_ids();}}rare_branch_ids[ret_list_size] = -1;    // 添加结束标志return rare_branch_ids;    // 返回稀有分支列表}

7. trim_case_rb 特殊修剪

// 并没有像AFL默认修剪那样，将修剪后的种子保存到文件中，因为是暂时性的，只是在这次模糊时使用trim_case_rb修剪的种子。
// 而且该函数中调用了 common_fuzz_stuff 函数，这个函数是会将修剪过程中有新覆盖的种子保存到队列以及文件中的
// AFL默认修剪是直接调用 run_target，并不会对修剪将程中的新覆盖种子保存下来的，而是放弃相应的修剪
// 我最开始觉得这样做FairFuzz不会导致队列中有很多冗余的种子吗？仔细想想修剪后的种子大概率是没有原始种子覆盖率高的，
// 而且产生新覆盖的概率应该也是很低的，所以能保存下来的其实很少。但是没有亲自实验证明，完全是猜测。
// 但是总是觉得调用 common_fuzz_stuff 保存到队列中有点别扭，可能是理解有误？
static u32 trim_case_rb(char** argv, u8* in_buf, u32 in_len, u8* out_buf) {DEBUG1 ("entering RB trim, len is %i\n", in_len);if (rb_fuzzing == 0){       // 如果不是变异蒙版模式，就不用修剪// @RB@ this should not happen. return in_len;}static u8 tmp[64];u8  fault = 0;u32 trim_exec = 0;u32 remove_len;u32 len_p2;/* Although the trimmer will be less useful when variable behavior isdetected, it will still work to some extent, so we don't check forthis. */if (in_len < 5) return 0;stage_name = tmp;stage_short= "rbtrim";// CAROTODO: what is this, update later//bytes_trim_in += in_len;/* Select initial chunk len, starting with large steps. */len_p2 = next_p2(in_len);     // 大于in_len 的第一个2的次幂// CAROTODO: could make TRIM_START_STEPS smaller   remove_len = MAX(len_p2 / TRIM_START_STEPS, TRIM_MIN_BYTES);    // 初始步长较大/* Continue until the number of steps gets too high or the stepovergets too small. */while (remove_len >= MAX(len_p2 / TRIM_END_STEPS, TRIM_MIN_BYTES)) {  // 遍历步长// why doesn't this start at 0?// u32 remove_pos = remove_len;u32 remove_pos = 0;sprintf(tmp, "rb trim %s/%s", DI(remove_len), DI(remove_len));stage_cur = 0;stage_max = in_len / remove_len;while (remove_pos < in_len) { // 遍历修剪位置u32 trim_avail = MIN(remove_len, in_len - remove_pos);    // 防止超长//write_with_gap(in_buf, q->len, remove_pos, trim_avail);// HEAD  memcpy(out_buf, in_buf, remove_pos);// TAILmemcpy(out_buf + remove_pos, in_buf + remove_pos + trim_avail, in_len - remove_pos - trim_avail);// not actually fault.../* using common fuzz stuff prevents us from having to mess withpermanent changes to the queue */// 既然防止对队列的永久更改，什么还要用 common_fuzz_stuff 呢？？？？fault = common_fuzz_stuff(argv, out_buf, in_len - trim_avail);  // Not sure if we want this given that fault is no longer a faultif (stop_soon || fault == FAULT_ERROR) goto abort_rb_trimming;// if successfully hit branch of interest...if (hits_branch(rb_fuzzing - 1)) { // 如果命中了稀有分支// (0) calclength of tailu32 move_tail = in_len - remove_pos - trim_avail; // 尾部长度// (1) reduce length by how much was trimmedin_len -= trim_avail;   // 更新长度// (2) update the closest power of 2 lenlen_p2  = next_p2(in_len);  // 更新memmove(in_buf + remove_pos, in_buf + remove_pos + trim_avail, move_tail);  // 更新in_buf} else remove_pos += remove_len;   // 没有命中则下一个位置if (!(trim_exec++ % stats_update_freq)) show_stats();stage_cur++;/* Note that we don't keep track of crashes or hangs here; maybe TODO? */}remove_len >>= 1;   // 更新步长}abort_rb_trimming://@RM@ TODO: update later// bytes_trim_out += in_len;DEBUG1 ("output of rb trimming has len %i\n", in_len);return in_len;   // 返回修建后的剩余长度}

8. get_random_modifiable_posn 找到有效的复写或删除位置

下面的算法有点晦涩，我们举个例子（结合注释更好理解）：我们令1表示有效的位置，0表示无效的位置

0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	…
0	0	1	0	1	1	1	0	0	0	1	1	0	1	1	…

我们称有效的位置连成的块为有效块，反之则为无效块。

上图有效块有[2]、[4, 6]、[10, 11]、[13, 14]，有效块的开始位置(即prev_start_of_1_block)分别为：2、4、10、13

无效块有[0, 1]、[3]、[7, 9]、[12]，当 i 遍历至这些部分时，变量 in_0_block会为1

假设 num_to_modify=16，即2个字节。只要满足连续2个字节为有效位置的都将左边界加入到 position_map 中，则position_map为[4, 5, 10, 13]，然后从这个列表中随机挑一个作为返回值。虽然[2]也是有效块，但是不满足条件。

static u32 get_random_modifiable_posn(u32 num_to_modify, u8 mod_type, u32 map_len, u8* branch_mask, u32 * position_map){ // mod_type=1复写，mod_type=2删除；map_len 种子长度； position_map 存储有效的变异位置u32 ret = 0xffffffff;u32 position_map_len = 0;    // 列表长度int prev_start_of_1_block = -1; // 某个有效块的开始位置int in_0_block = 1;for (int i = 0; i < map_len; i ++){  // 遍历每个位置if (branch_mask[i] & mod_type){    // 如果是有效的位置// if the last thing we saw was a zero, set// to start of 1 blockif (in_0_block) {       // 如果前一个位置是无效的 prev_start_of_1_block = i;  // 则当前位置为有效块的开始位置in_0_block = 0;       // 出无效块，标志清零}} else {// for the first 0 we see (unless the eff_map starts with zeroes)// we know the last index was the last 1 in the lineif ((!in_0_block) &&(prev_start_of_1_block != -1)){   // 前一个有效块已经结尾int num_bytes = MAX(num_to_modify/8, 1);  // 需要连续有效的字节数for (int j = prev_start_of_1_block; j < i-num_bytes + 1; j++){// I hate this ++ within operator stuffposition_map[position_map_len++] = j; // 左边界加入列表中}}in_0_block = 1;   // 进入无效块，设立标志}}// if we ended not in a 0 block, add it in too if (!in_0_block) { // 最后一个块为有效块的特殊处理u32 num_bytes = MAX(num_to_modify/8, 1);for (u32 j = prev_start_of_1_block; j < map_len-num_bytes + 1; j++){// I hate this ++ within operator stuffposition_map[position_map_len++] = j;}}if (position_map_len){u32 random_pos = UR(position_map_len);if (num_to_modify >= 8)ret =  position_map[random_pos];   // 随机返回一个有效的位置else // I think num_to_modify can only ever be 1 if it's less than 8. otherwise need trickier stuff. // 这里感觉不太对？虽然 UR(8)保证某个byte的任意bit都有机会翻转，但是翻转的byte对不上。// 感觉应该是 ret = (position_map[random_pos] >> 3) + UR(8);  ret = position_map[random_pos] + UR(8);   } return ret;}

9. get_random_insert_posn 找到有效的插入位置

static u32 get_random_insert_posn(u32 map_len, u8* branch_mask, u32 * position_map){u32 position_map_len = 0;u32 ret = map_len;for (u32 i = 0; i <= map_len; i++){if (branch_mask[i] & 4)       // 有效就加入列表position_map[position_map_len++] = i;}if (position_map_len){   // 随机返回一个有效的位置ret = position_map[UR(position_map_len)];}return ret;
}

三、整体流程

在普通模式下运行1000次，得到初始的 hit_bits 即所有分支的命中次
开始变异蒙版模式，对于每个种子
- 计算得到稀有分支列表rare_branches，该列表是根据hit_bits 实时计算的，即动态的，不同种子计算出来的rare_branches不一定是相同的。
- 根据rare_branches计算得到当前种子命中的稀有分支列表min_branch_hits，该列表是有序的，越稀有的分支越靠前。该列表也有可能为空，表示未命中任何稀有分支，则会跳过当前种子。
- 在当前种子命中的稀有分支列表min_branch_hits中找到一个还未被当前种子fuzz过的分支作为目标分支rb_fuzzing，如果全都fuzz过，则跳过确定性阶段。如果fuzz过min_branch_hits中除目标分支外的其它分支，则跳过 bitflip 1/1.
- 如果开启特殊修剪模式，则在原始AFL修剪策略后进行特殊修剪，修剪约束为命中相同的目标分支，得到修剪后的种子。
- 计算三种变异类型（O、I、D）的变异蒙版branch_mask。其实就是一个记录有效位置和变异类型的哈希表，如果当前变异以及变异位置能够命中目标分支rb_fuzzing，则为有效的，在branch_mask中做记录。
- 进入确定性变异阶段。每一次变异都查看变异蒙版branch_mask中的对应类型和位置就否有效，无效就跳过此次变异。
- 进入随机性变异阶段。对每一次变异，根据变异类型和长度找到所有有效的变异位置，放入有效位置列表position_map，从有效位置列表中随机选择一个位置作词此次变异的位置。如果没有任何有效的变异位置，则跳过此次变异。

FairFuzz 论文简读+源码分析+整体流程简述相关推荐

【Android 启动过程】Activity 启动源码分析 ( ActivityThread 流程分析二 )
文章目录前言一.ActivityManagerService.attachApplicationLocked 二.ActivityStackSupervisor.attachApplication ...
libev源码分析---整体设计
libev是Marc Lehmann用C写的高性能事件循环库.通过libev,可以灵活地把各种事件组织管理起来,如:时钟.io.信号等.libev在业界内也是广受好评,不少项目都采用它来做底层的事件循 ...
jQuery源码分析整体框架部分及部分常用方法
最近尝试看看jQuery的源码. 版本 version = "1.11.1" 相对于看一本jQuery如何使用的书,看jQuery源码对它可以有更深层次的理解.jQuery中大量使 ...
【深入学习51单片机】二、一个极简RTOS源码分析
目录一.书接上回二.初始化过程三.任务的创建四.任务的切换五.任务的等待(系统延时) 一.书接上回上回写了一个测试程序,可以直观的体会PC指针和堆栈指针的变化和影响.这章写下参考程序的过程 ...
mybatis源码分析执行流程
前言在上一篇,我们了解了mybatis的整体执行流程,以及内部的各个组件在执行过程中的功能,下面我们来通过源码的方式对其中比较重要的几个组件在执行过程的使用进行简单的分析与说明环境准备基于第一篇 ...
uboot 2021.10源码分析(启动流程)
uboot版本:2021.10 平台:armv8 rk3399 eMMC 16G LPDDR4 4G 本文主要基于uboot的执行流程进行分析而忽略了相关细节,从uboot的基本框架结构着手,新 ...
【Android 启动过程】Activity 启动源码分析 ( ActivityThread 流程分析一 )
文章目录一.ActivityThread 主函数启动二.ActivityThread 绑定 ApplicationThread 三.AMS attachApplication -> atta ...
1月24日学习内容整理：Django的admin组件源码分析及流程
一.单例模式单例模式(Singleton Pattern)是一种常用的软件设计模式,该模式的主要目的是确保某一个类只有一个实例存在.当你希望在整个系统中,某个类只能出现一个实例时,单例对象就能派上用 ...
struts2源码分析-初始化流程
这一篇文章主要是记录struts.xml的初始化,还原struts2.xml的初始化流程.源码依据struts2-2.3.16.3版本. struts2初始化入口,位于web.xml中: 1 < ...

FairFuzz 论文简读+源码分析+整体流程简述