Driller源码阅读笔记（一）

Driller源码：https://github.com/shellphish/driller

所给样例为：

import drillerd = driller.Driller("./CADET_00001",  # path to the target binary"racecar", # initial testcase"\xff" * 65535, # AFL bitmap with no discovered transitions)new_inputs = d.drill()

由于没有程序可以自己随便写个测试，注意python3的话testcase和AFL bitmap都要是bytes。

大致使用情景是fuzzing达到瓶颈之后，将fuzzer的种子作为输入约束进行符号执行，然后对种子的路径上未探索的分支进行求解，将得到的结果作为fuzzer新的种子以让fuzzer对新的路径进一步fuzz。

其中Driller类中的drill方法如下：

    def drill(self):"""Perform the drilling, finding more code coverage based off our existing input base."""# Don't re-trace the same input.if self.redis and self.redis.sismember(self.identifier + '-traced', self.input):return -1# Write out debug info if desired.if l.level == logging.DEBUG and config.DEBUG_DIR:self._write_debug_info()elif l.level == logging.DEBUG and not config.DEBUG_DIR:l.warning("Debug directory is not set. Will not log fuzzing bitmap.")# Update traced.if self.redis:self.redis.sadd(self.identifier + '-traced', self.input)list(self._drill_input())if self.redis:return len(self._generated)else:return self._generated

由于很多参数没设置，可以先跳过，大致是redis用来阻止重新trace同一个输入。注意中间的"list(self._drill_input())"，由于self._drill_input是个函数生成器，加个list相当于让他迭代到循环结束。

重点还是self._drill_input函数，分段分析：

        r = tracer.qemu_runner.QEMURunner(self.binary, self.input, argv=self.argv)p = angr.Project(self.binary)for addr, proc in self._hooks.items():p.hook(addr, proc)l.debug("Hooking %#x -> %s...", addr, proc.display_name)if p.loader.main_object.os == 'cgc':p.simos.syscall_library.update(angr.SIM_LIBRARIES['cgcabi_tracer'])s = p.factory.entry_state(stdin=angr.SimFileStream, flag_page=r.magic, mode='tracing')else:s = p.factory.full_init_state(stdin=angr.SimFileStream, mode='tracing')

r所用的tracer是angr用来进行跟踪的，详见https://github.com/angr/tracer

self._hooks中保存一些用来hook的地址和代替执行的函数。

p.loader.main_object.os本来是程序所能运行的操作系统，好像angr专门为cgc的程序搞了个标签，对于普通程序为其初始化状态，输入流为angr.SimFileStream。

        s.preconstrainer.preconstrain_file(self.input, s.posix.stdin, True)simgr = p.factory.simulation_manager(s, save_unsat=True, hierarchy=False, save_unconstrained=r.crash_mode)t = angr.exploration_techniques.Tracer(trace=r.trace, crash_addr=r.crash_addr, copy_states=True)self._core = angr.exploration_techniques.DrillerCore(trace=r.trace, fuzz_bitmap=self.fuzz_bitmap)simgr.use_technique(t)simgr.use_technique(angr.exploration_techniques.Oppologist())simgr.use_technique(self._core)self._set_concretizations(simgr.one_active)

接下来使用preconstrainer.preconstrain_file来预先为符号执行添加约束，preconstrainer添加的约束可以在后面删除，preconstrain_file方法用于为文件设置约束，将s.posix.stdin（即符号执行中的输入）设置为self.input（即传递给Driller的testcase）。这样后续执行的时候先用testcase作为输入执行，确定testcase自身的执行路径。preconstrain_file类型说明如下，其中set_length为True时将content长度作为文件的长度。

preconstrain_file(content, simfile, set_length=False) method of angr.state_plugins.preconstrainer.SimStatePreconstrainer instancePreconstrain the contents of a file.:param content:     The content to preconstrain the file to. Can be a bytestring or a list thereof.:param simfile:     The actual simfile to preconstrain

接下来使用simulation_manager生成一个SimulationManager，save_unsat表示将不可满足的状态存入"unsat"存储中，hierarchy设为False会生成默认的一个StateHierarchy对象，来跟踪状态之间的关系。

随后用angr.exploration_techniques.Tracer定义了一个Tracer对象，这是一种遵循使用具体输入的angr路径的探索技术，trace参数为基本块的trace，crash_addr用于存放输入导致程序崩溃的地址（如果有的话），将copy_status将能看到错过的状态。

DrillerCore是一个符号化跟踪一个输入并寻找新状态转移的探索技术，需要与Tracer探索技术一起使用，结果放入'diverted'中。

use_technique能为SimulationManager添加探索技术，并添加了Tracer和DrillerCore，此外还加了Oppologist探索技术，它用于强制使用qemu执行不配合的代码。

此后通过one_active（simgr.stashes中的所有key都可以直接作为simgr的成员、或前缀"one_"的成员取出其中的一个）获取了SimulationManager中的一个活动状态，并作为参数传递给自定义的_set_concretizations方法，其代码如下：

    @staticmethoddef _set_concretizations(state):if state.project.loader.main_object.os == 'cgc':flag_vars = set()for b in state.cgc.flag_bytes:flag_vars.update(b.variables)state.unicorn.always_concretize.update(flag_vars)# Let's put conservative thresholds for now.state.unicorn.concretization_threshold_memory = 50000state.unicorn.concretization_threshold_registers = 50000

对于普通程序将设置concretization_threshold_memory与concretization_threshold_registers，这两个分别为在具体化开始后容忍内存和寄存器被unicorn拒绝的次数。

接下来继续看看_drill_input中最后的循环部分：

        while simgr.active and simgr.one_active.globals['trace_idx'] < len(r.trace) - 1:simgr.step()# Check here to see if a crash has been found.if self.redis and self.redis.sismember(self.identifier + '-finished', True):returnif 'diverted' not in simgr.stashes:continuewhile simgr.diverted:state = simgr.diverted.pop(0)l.debug("Found a diverted state, exploring to some extent.")w = self._writeout(state.history.bbl_addrs[-1], state)if w is not None:yield wfor i in self._symbolic_explorer_stub(state):yield i

循环条件为simgr.active与simgr.one_active.globals['trace_idx'] < len(r.trace) - 1，即当SimulationManager中还有活动状态以及活动状态还没有执行完全部的trace时，不断循环下去，通过simgr.step()执行一条程序指令。当simgr.stashes中没有key 'diverted'时，即没有出现新的状态转移时，继续执行下一个循环；当出现了状态转移时，取出该状态，通过state.history.bbl_addrs[-1]获取状态的前一个地址，并通过_write_out方法尝试生成输入。其中_write_out内容如下：

    def _writeout(self, prev_addr, state):generated = state.posix.stdin.load(0, state.posix.stdin.pos)generated = state.solver.eval(generated, cast_to=bytes)key = (len(generated), prev_addr, state.addr)# Checks here to see if the generation is worth writing to disk.# If we generate too many inputs which are not really different we'll seriously slow down AFL.if self._in_catalogue(*key):self._core.encounters.remove((prev_addr, state.addr))return Noneelse:self._add_to_catalogue(*key)l.debug("[%s] dumping input for %#x -> %#x.", self.identifier, prev_addr, state.addr)self._generated.add((key, generated))if self.redis:# Publish it out in real-time so that inputs get there immediately.channel = self.identifier + '-generated'self.redis.publish(channel, pickle.dumps({'meta': key, 'data': generated, "tag": self.tag}))else:l.debug("Generated: %s", binascii.hexlify(generated))return (key, generated)

大致就是取出输入并约束求解转换为bytes，并通过_in_catalogue判断该输入是否有价值（其实只有在设置了redis时才会在_in_catalogue内部判断，否则_in_catalogue直接返回False），随后将地址信息以及输入加入self._generated，从而得到新的输入作为fuzz的种子。

在_write_out返回为None时，会执行_symbolic_explorer_stub方法，其大致是新生成一个SimulationManager并执行累计1024步，然后将simgr.stashes中deadended和active的状态都计算一遍能否满足，如果可以满足则通过_writeout获取输入并加入self._generated。

    def _symbolic_explorer_stub(self, state):# Create a new simulation manager and step it forward up to 1024# accumulated active states or steps.steps = 0accumulated = 1p = state.projectstate = state.copy()try:state.options.remove(angr.options.LAZY_SOLVES)except KeyError:passsimgr = p.factory.simulation_manager(state, hierarchy=False)l.debug("[%s] started symbolic exploration at %s.", self.identifier, time.ctime())while len(simgr.active) and accumulated < 1024:simgr.step()steps += 1# Dump all inputs.accumulated = steps * (len(simgr.active) + len(simgr.deadended))l.debug("[%s] stopped symbolic exploration at %s.", self.identifier, time.ctime())# DO NOT think this is the same as using only the deadended stashes. this merges deadended and activesimgr.stash(from_stash='deadended', to_stash='active')for dumpable in simgr.active:try:if dumpable.satisfiable():w = self._writeout(dumpable.history.bbl_addrs[-1], dumpable)if w is not None:yield w# If the state we're trying to dump wasn't actually satisfiable.except IndexError:pass

Driller源码阅读笔记（一）相关推荐

Driller源码阅读笔记（二）
driller部分代码只是读取输入种子然后返回一个可用的新种子,实际fuzz的时候还是需要搭配fuzzer使用:https://github.com/shellphish/fuzzer 不过,fuzz ...
Transformers包tokenizer.encode()方法源码阅读笔记
Transformers包tokenizer.encode()方法源码阅读笔记_天才小呵呵的博客-CSDN博客_tokenizer.encode
源码阅读笔记 BiLSTM+CRF做NER任务流程图
源码阅读笔记 BiLSTM+CRF做NER任务(二) 源码地址:https://github.com/ZhixiuYe/NER-pytorch 本篇正式进入源码的阅读,按照流程顺序,一一解剖. 一.流 ...
代码分析：NASM源码阅读笔记
NASM源码阅读笔记 NASM(Netwide Assembler)的使用文档和代码间的注释相当齐全,这给阅读源码提供了很大的方便.按作者的说法,这是一个模块化的,可重用的x86汇编器, 而且能够被 ...
CI框架源码阅读笔记4 引导文件CodeIgniter.php
到了这里,终于进入CI框架的核心了.既然是"引导"文件,那么就是对用户的请求.参数等做相应的导向,让用户请求和数据流按照正确的线路各就各位.例如,用户的请求url: http:// ...
Yii源码阅读笔记 - 日志组件
2015-03-09 一 By youngsterxyf 使用 Yii框架为开发者提供两个静态方法进行日志记录: Yii::log($message, $level, $category); Yii: ...
AQS源码阅读笔记（一）
AQS源码阅读笔记先看下这个类张非常重要的一个静态内部类Node.如下: static final class Node {//表示当前节点以共享模式等待锁static final Node SHA ...
【Flink】Flink 源码阅读笔记（20）- Flink 基于 Mailbox 的线程模型
1.概述转载:Flink 源码阅读笔记(20)- Flink 基于 Mailbox 的线程模型相似文章:[Flink]Flink 基于 MailBox 实现的 StreamTask 线程模型 Fl ...
【Flink】Flink 源码阅读笔记（18）- Flink SQL 中的流和动态表
1.概述转载:Flink 源码阅读笔记(18)- Flink SQL 中的流和动态表

Driller源码阅读笔记（一）

Driller源码阅读笔记（一）相关推荐

最新文章

热门文章