AddressSanitizer 是一个性能非常好的 C/C++ 内存错误探测工具。它由编译器的插桩模块(目前,LLVM 通过)和替换了 malloc 函数的运行时库组成。这个工具可以探测如下这些类型的错误:

  • 对堆,栈和全局内存的访问越界(堆缓冲区溢出,栈缓冲区溢出,和全局缓冲区溢出)
  • UAP(Use-after-free,悬挂指针的解引用,或者说野指针)
  • Use-after-return(无效的栈上内存,运行时标记 ASAN_OPTIONS=detect_stack_use_after_return=1
  • Use-After-Scope(作用域外访问,clang 标记 -fsanitize-address-use-after-scope )
  • 内存的重复释放
  • 初始化顺序的 bug
  • 内存泄漏

这个工具非常快。通常情况下,内存问题探测这类调试工具的引入,会导致原有应用程序运行性能的大幅下降,比如大名鼎鼎的 valgrind 据说会导致应用程序性能下降到正常情况的十几分之一,但引入 AddressSanitizer 只会减慢运行速度的一半。

AddressSanitizer 的使用

自 LLVM 的版本 3.1 和 GCC 的版本 4.8 开始,AddressSanitizer 就是它们的一部分。如果需要的话,也可以从源码编译 AddressSanitizerHowToBuild。

查看自己的 LLVM 版本和 GCC 版本来确认是否内置了对 AddressSanitizer 的支持:

$ clang --version
clang version 3.9.1-4ubuntu3~16.04.2 (tags/RELEASE_391/rc2)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

我本地的工具虽然版本比较老,但对 AddressSanitizer 还是支持的。

看一下前面提到的 AddressSanitizer 的运行时库:

$ locate asan
/usr/lib/gcc/x86_64-linux-gnu/5/libasan.a
/usr/lib/gcc/x86_64-linux-gnu/5/libasan.so
/usr/lib/gcc/x86_64-linux-gnu/5/libasan_preinit.o
/usr/lib/gcc/x86_64-linux-gnu/5/32/libasan.a
/usr/lib/gcc/x86_64-linux-gnu/5/32/libasan.so
/usr/lib/gcc/x86_64-linux-gnu/5/32/libasan_preinit.o
/usr/lib/gcc/x86_64-linux-gnu/5/include/sanitizer/asan_interface.h
/usr/lib/gcc/x86_64-linux-gnu/5/x32/libasan.a
/usr/lib/gcc/x86_64-linux-gnu/5/x32/libasan.so
/usr/lib/gcc/x86_64-linux-gnu/5/x32/libasan_preinit.o
/usr/lib/gcc/x86_64-linux-gnu/7/libasan.a
/usr/lib/gcc/x86_64-linux-gnu/7/libasan.so
/usr/lib/gcc/x86_64-linux-gnu/7/libasan_preinit.o
/usr/lib/gcc/x86_64-linux-gnu/7/include/sanitizer/asan_interface.h
/usr/lib/gcc-cross/arm-linux-gnueabihf/5/libasan.a
/usr/lib/gcc-cross/arm-linux-gnueabihf/5/libasan.so
/usr/lib/gcc-cross/arm-linux-gnueabihf/5/libasan_preinit.o
/usr/lib/gcc-cross/arm-linux-gnueabihf/5/include/sanitizer/asan_interface.h
/usr/lib/gcc-cross/arm-linux-gnueabihf/5/sf/libasan.a
/usr/lib/gcc-cross/arm-linux-gnueabihf/5/sf/libasan.so
/usr/lib/gcc-cross/arm-linux-gnueabihf/5/sf/libasan_preinit.o
/usr/lib/llvm-3.9/lib/clang/3.9.1/asan_blacklist.txt
/usr/lib/llvm-3.9/lib/clang/3.9.1/include/sanitizer/asan_interface.h
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan-i386.a
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan-i386.so
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan-i686.a
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan-i686.so
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan-preinit-i386.a
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan-preinit-i686.a
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan-preinit-x86_64.a
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan-x86_64.a
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan-x86_64.a.syms
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan-x86_64.so
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan_cxx-i386.a
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan_cxx-i686.a
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan_cxx-x86_64.a
/usr/lib/llvm-3.9/lib/clang/3.9.1/lib/linux/libclang_rt.asan_cxx-x86_64.a.syms
/usr/lib/llvm-5.0/lib/clang/5.0.2/asan_blacklist.txt
/usr/lib/llvm-5.0/lib/clang/5.0.2/include/sanitizer/asan_interface.h
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan-i386.a
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan-i386.so
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan-i686.a
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan-i686.so
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan-preinit-i386.a
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan-preinit-i686.a
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan-preinit-x86_64.a
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan-x86_64.a
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan-x86_64.a.syms
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan-x86_64.so
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan_cxx-i386.a
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan_cxx-i686.a
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan_cxx-x86_64.a
/usr/lib/llvm-5.0/lib/clang/5.0.2/lib/linux/libclang_rt.asan_cxx-x86_64.a.syms
/usr/lib/x86_64-linux-gnu/libasan.so.2
/usr/lib/x86_64-linux-gnu/libasan.so.2.0.0
/usr/lib/x86_64-linux-gnu/libasan.so.4
/usr/lib/x86_64-linux-gnu/libasan.so.4.0.0
/usr/lib32/libasan.so.2
/usr/lib32/libasan.so.2.0.0
/usr/libx32/libasan.so.2
/usr/libx32/libasan.so.2.0.0

为了使用 AddressSanitizer,需要在使用 GCC 或 Clang 编译链接程序时加上 -fsanitize=address 开关。为了获得合理的性能,可以加上 -O1 或更高。为了在错误信息中获得更友好的栈追踪信息可以加上 -fno-omit-frame-pointer。为了获得完美的栈追踪信息,还可以禁用内联(使用 -O1)和尾调用消除(-fno-optimize-sibling-calls

下面是一段存在内存访问错误的代码:

// main.cpp
int main(int argc, char **argv) {int *array = new int[100];delete [] array;return array[argc];  // BOOM
}

使用 GCC 编译并运行:

$ gcc -fsanitize=address -fno-omit-frame-pointer -O1 -g -o main main.cpp
$ ./main
=================================================================
==5385==ERROR: AddressSanitizer: heap-use-after-free on address 0x61400000fe44 at pc 0x0000004007d4 bp 0x7ffddf0bafb0 sp 0x7ffddf0bafa0
READ of size 4 at 0x61400000fe44 thread T0#0 0x4007d3 in main addresssanitizer_demo/main.cpp:4#1 0x7f297fc2282f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)#2 0x4006b8 in _start (addresssanitizer_demo/main+0x4006b8)0x61400000fe44 is located 4 bytes inside of 400-byte region [0x61400000fe40,0x61400000ffd0)
freed by thread T0 here:#0 0x7f2980065caa in operator delete[](void*) (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x99caa)#1 0x4007a8 in main /home/hanpfei0306/data/MyProjects/addresssanitizer_demo/main.cpp:3#2 0x7f297fc2282f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)previously allocated by thread T0 here:#0 0x7f29800656b2 in operator new[](unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x996b2)#1 0x400798 in main /home/hanpfei0306/data/MyProjects/addresssanitizer_demo/main.cpp:2#2 0x7f297fc2282f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)SUMMARY: AddressSanitizer: heap-use-after-free addresssanitizer_demo/main.cpp:4 main
Shadow bytes around the buggy address:0x0c287fff9f70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9f80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9f90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9fa0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9fb0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c287fff9fc0: fa fa fa fa fa fa fa fa[fd]fd fd fd fd fd fd fd0x0c287fff9fd0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd0x0c287fff9fe0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd0x0c287fff9ff0: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa0x0c287fffa000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fffa010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):Addressable:           00Partially addressable: 01 02 03 04 05 06 07 Heap left redzone:       faHeap right redzone:      fbFreed heap region:       fdStack left redzone:      f1Stack mid redzone:       f2Stack right redzone:     f3Stack partial redzone:   f4Stack after return:      f5Stack use after scope:   f8Global redzone:          f9Global init order:       f6Poisoned by user:        f7Container overflow:      fcArray cookie:            acIntra object redzone:    bbASan internal:           fe
==5385==ABORTING

AddressSanitizer 在探测到内存错误之后,向 stderr 打印了错误信息并以非 0 值返回码退出。AddressSanitizer 在发现第一个错误时退出程序。这主要是基于如下的设计:

  • 这种方法允许 AddressSanitizer 产生更快和更小的生成码(总共 ~5%)。
  • 解决 bug 变得无法避免。AddressSanitizer 不产生误报。一旦内存崩溃发生,则程序进入不一致的状态,这可能导致令人费解的结果和潜在的误导性的后续报告。

如果进程运行在沙盒中且运行在 OS X 10.10 或更早的版本上,则需要设置DYLD_INSERT_LIBRARIES 环境变量并把它指向由编译器打包用于构建可执行文件的 ASan 库。(可以搜索名字中包含 asan 的动态链接库来找到这个库。)如果没有设置环境变量,则进程将试图重新执行。同时记住,当把可执行文件移动到另一台机器时,ASan 库也需要复制过去。

编译时如果遗漏了 -g 参数,导致可执行文件中缺乏调试信息,则探测到内存错误时,AddressSanitizer 吐出来的错误信息中,无法显示具体的出错的代码行,就像下面这样:

$ gcc -fsanitize=address -fno-omit-frame-pointer -O1  -o main main.cpp
$ ./main
=================================================================
==5403==ERROR: AddressSanitizer: heap-use-after-free on address 0x61400000fe44 at pc 0x0000004007d4 bp 0x7ffd981fce20 sp 0x7ffd981fce10
READ of size 4 at 0x61400000fe44 thread T0#0 0x4007d3 in main (addresssanitizer_demo/main+0x4007d3)#1 0x7f9de8af482f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)#2 0x4006b8 in _start (addresssanitizer_demo/main+0x4006b8)0x61400000fe44 is located 4 bytes inside of 400-byte region [0x61400000fe40,0x61400000ffd0)
freed by thread T0 here:#0 0x7f9de8f37caa in operator delete[](void*) (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x99caa)#1 0x4007a8 in main (addresssanitizer_demo/main+0x4007a8)#2 0x7f9de8af482f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)previously allocated by thread T0 here:#0 0x7f9de8f376b2 in operator new[](unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x996b2)#1 0x400798 in main (addresssanitizer_demo/main+0x400798)#2 0x7f9de8af482f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)SUMMARY: AddressSanitizer: heap-use-after-free ??:0 main
Shadow bytes around the buggy address:0x0c287fff9f70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9f80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9f90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9fa0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9fb0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c287fff9fc0: fa fa fa fa fa fa fa fa[fd]fd fd fd fd fd fd fd0x0c287fff9fd0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd0x0c287fff9fe0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd0x0c287fff9ff0: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa0x0c287fffa000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fffa010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):Addressable:           00Partially addressable: 01 02 03 04 05 06 07 Heap left redzone:       faHeap right redzone:      fbFreed heap region:       fdStack left redzone:      f1Stack mid redzone:       f2Stack right redzone:     f3Stack partial redzone:   f4Stack after return:      f5Stack use after scope:   f8Global redzone:          f9Global init order:       f6Poisoned by user:        f7Container overflow:      fcArray cookie:            acIntra object redzone:    bbASan internal:           fe
==5403==ABORTING

上面的 -g 选项也可以用 -ggdb 选项(尽可能的生成 gdb 的可以使用的调试信息)替换。

$ gcc -fsanitize=address -fno-omit-frame-pointer -O1 -ggdb -o main main.cpp
$ ./main
=================================================================
==5495==ERROR: AddressSanitizer: heap-use-after-free on address 0x61400000fe44 at pc 0x0000004007d4 bp 0x7fff7014ca10 sp 0x7fff7014ca00
READ of size 4 at 0x61400000fe44 thread T0#0 0x4007d3 in main /home/hanpfei0306/data/MyProjects/addresssanitizer_demo/main.cpp:4#1 0x7fdddb19982f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)#2 0x4006b8 in _start (addresssanitizer_demo/main+0x4006b8)0x61400000fe44 is located 4 bytes inside of 400-byte region [0x61400000fe40,0x61400000ffd0)
freed by thread T0 here:#0 0x7fdddb5dccaa in operator delete[](void*) (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x99caa)#1 0x4007a8 in main /home/hanpfei0306/data/MyProjects/addresssanitizer_demo/main.cpp:3#2 0x7fdddb19982f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)previously allocated by thread T0 here:#0 0x7fdddb5dc6b2 in operator new[](unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x996b2)#1 0x400798 in main /home/hanpfei0306/data/MyProjects/addresssanitizer_demo/main.cpp:2#2 0x7fdddb19982f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)SUMMARY: AddressSanitizer: heap-use-after-free /home/hanpfei0306/data/MyProjects/addresssanitizer_demo/main.cpp:4 main
Shadow bytes around the buggy address:0x0c287fff9f70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9f80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9f90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9fa0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9fb0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c287fff9fc0: fa fa fa fa fa fa fa fa[fd]fd fd fd fd fd fd fd0x0c287fff9fd0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd0x0c287fff9fe0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd0x0c287fff9ff0: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa0x0c287fffa000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fffa010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):Addressable:           00Partially addressable: 01 02 03 04 05 06 07 Heap left redzone:       faHeap right redzone:      fbFreed heap region:       fdStack left redzone:      f1Stack mid redzone:       f2Stack right redzone:     f3Stack partial redzone:   f4Stack after return:      f5Stack use after scope:   f8Global redzone:          f9Global init order:       f6Poisoned by user:        f7Container overflow:      fcArray cookie:            acIntra object redzone:    bbASan internal:           fe
==5495==ABORTING

使用 LLVM/Clang 编译与上面使用 GCC 编译基本相同:

$ clang++ -fsanitize=address -fno-omit-frame-pointer -O1 -g -o test main.cpp
$ ./test
=================================================================
==5508==ERROR: AddressSanitizer: heap-use-after-free on address 0x61400000fe44 at pc 0x00000050770f bp 0x7ffea20137b0 sp 0x7ffea20137a8
READ of size 4 at 0x61400000fe44 thread T0#0 0x50770e in main /home/hanpfei0306/data/MyProjects/addresssanitizer_demo/main.cpp:4:10#1 0x7fdeb802382f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:291#2 0x4192b8 in _start (addresssanitizer_demo/test+0x4192b8)0x61400000fe44 is located 4 bytes inside of 400-byte region [0x61400000fe40,0x61400000ffd0)
freed by thread T0 here:#0 0x504c20 in operator delete[](void*) (addresssanitizer_demo/test+0x504c20)#1 0x5076de in main /home/hanpfei0306/data/MyProjects/addresssanitizer_demo/main.cpp:3:3#2 0x7fdeb802382f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:291previously allocated by thread T0 here:#0 0x5045a0 in operator new[](unsigned long) (addresssanitizer_demo/test+0x5045a0)#1 0x5076d3 in main /home/hanpfei0306/data/MyProjects/addresssanitizer_demo/main.cpp:2:16#2 0x7fdeb802382f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:291SUMMARY: AddressSanitizer: heap-use-after-free /home/hanpfei0306/data/MyProjects/addresssanitizer_demo/main.cpp:4:10 in main
Shadow bytes around the buggy address:0x0c287fff9f70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9f80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9f90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9fa0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fff9fb0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c287fff9fc0: fa fa fa fa fa fa fa fa[fd]fd fd fd fd fd fd fd0x0c287fff9fd0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd0x0c287fff9fe0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd0x0c287fff9ff0: fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa0x0c287fffa000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa0x0c287fffa010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):Addressable:           00Partially addressable: 01 02 03 04 05 06 07 Heap left redzone:       faHeap right redzone:      fbFreed heap region:       fdStack left redzone:      f1Stack mid redzone:       f2Stack right redzone:     f3Stack partial redzone:   f4Stack after return:      f5Stack use after scope:   f8Global redzone:          f9Global init order:       f6Poisoned by user:        f7Container overflow:      fcArray cookie:            acIntra object redzone:    bbASan internal:           feLeft alloca redzone:     caRight alloca redzone:    cb
==5508==ABORTING

可以通过 readelf 看一下 GCC 和 LLVM 生成的可执行文件有什么差别:

# GCC 生成的可执行文件
$ readelf -s -W main | grep asan
Symbol table '.dynsym' contains 15 entries:Num:    Value          Size Type    Bind   Vis      Ndx Name1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __asan_report_load410: 0000000000400650     0 FUNC    GLOBAL DEFAULT  UND __asan_init_v444: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS asan_preinit.cc57: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __asan_report_load462: 0000000000400650     0 FUNC    GLOBAL DEFAULT  UND __asan_init_v469: 0000000000600dd0     8 OBJECT  GLOBAL HIDDEN    19 __local_asan_preinit# LLVM 生成的可执行文件
$ readelf -s -W test | grep asan3: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __asan_default_options16: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __asan_on_error49: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __asan_default_suppressions66: 0000000000426740   466 FUNC    GLOBAL DEFAULT   13 __asan_stack_malloc_070: 00000000004269b0   504 FUNC    GLOBAL DEFAULT   13 __asan_stack_malloc_1
. . . . . .271: 0000000000428000   134 FUNC    GLOBAL DEFAULT   13 __asan_stack_free_8272: 000000000042b1e0    44 FUNC    GLOBAL DEFAULT   13 __asan_register_image_globals273: 00000000004d7550    44 FUNC    GLOBAL DEFAULT   13 __asan_report_load4_noabort275: 00000000004282a0   134 FUNC    GLOBAL DEFAULT   13 __asan_stack_free_9
. . . . . .635: 00000000004d7460    44 FUNC    GLOBAL DEFAULT   13 __asan_report_load2636: 00000000004d74f0    44 FUNC    GLOBAL DEFAULT   13 __asan_report_load4644: 00000000004d7580    44 FUNC    GLOBAL DEFAULT   13 __asan_report_load8
. . . . . .37: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS asan_allocator.cc.o38: 0000000000419370   254 FUNC    LOCAL  DEFAULT   13 _ZN6__asanL10RZSize2LogEj39: 0000000000421c90   214 FUNC    LOCAL  DEFAULT   13 _ZN11__sanitizer20SizeClassAllocator64ILm105553116266496ELm4398046511104ELm0ENS_12SizeClassMapILm17ELm128ELm16EEEN6__asan20AsanMapUnmapCallbackEE15DeallocateBatchEPNS_14AllocatorStatsEmPNS2_13TransferBatchE.isra.3240: 0000000000419470  1241 FUNC    LOCAL  DEFAULT   13 _ZN6__asan9AsanChunk8UsedSizeEb.part.191162: 0000000000422670  1101 FUNC    WEAK   HIDDEN    13 _ZN11__sanitizer28SizeClassAllocatorLocalCacheINS_20SizeClassAllocator64ILm105553116266496ELm4398046511104ELm0ENS_12SizeClassMapILm17ELm128ELm16EEEN6__asan20AsanMapUnmapCallbackEEEE6RefillEPS6_m1178: 00000000004d74f0    44 FUNC    GLOBAL DEFAULT   13 __asan_report_load41194: 00000000004cff10    57 FUNC    GLOBAL HIDDEN    13 _ZN6__asan10AsanTSDGetEv1196: 00000000004d6ea0     8 FUNC    GLOBAL DEFAULT   13 __asan_get_report_sp1202: 00000000004a75e0    63 FUNC    GLOBAL HIDDEN    13 _ZN6__asan13SetThreadNameEPKc1212: 00000000004d7b50    96 FUNC    GLOBAL DEFAULT   13 __asan_load1_noabort
. . . . . .2202: 00000000004d9780    96 FUNC    GLOBAL DEFAULT   13 __asan_init2208: 000000000041a720  2468 FUNC    GLOBAL HIDDEN    13 _ZN6__asan22FindHeapChunkByAddressEm2219: 00000000004d9800     7 FUNC    GLOBAL HIDDEN    13 _ZN6__asan20GetMallocContextSizeEv
. . . . . .3790: 0000000000419ac0    19 FUNC    GLOBAL HIDDEN    13 _ZN6__asan13AsanChunkView11IsAllocatedEv3796: 00000000004d08e0    97 FUNC    GLOBAL HIDDEN    13 _ZN6__asan25ThreadNameWithParenthesisEPNS_17AsanThreadContextEPcm

主要看两个可执行文件中都有的符号 __asan_report_load4__asan_init,可以看到 GCC 生成的可执行文件动态链接 ASan 库,LLVM 静态链接。

符号化输出

AddressSanitizer 收集如下事件的调用栈:

  • mallocmalloc
  • 线程创建
  • 失败

mallocmalloc 发生的相对频繁,且它对于快速解开调用栈非常重要。AddressSanitizer 使用一个依赖帧指针的简单的 unwinder。

如果不关心 malloc/free 调用栈,简单地完全禁用(使用 malloc_context_size=0 运行时标记)unwinder。

每个栈帧需要被符号化(当然,如果二进制文件编译时带有调试信息)。给定一台 PC,我们需要输出:

#0xabcdf function_name file_name.cc:1234

AddressSanitizer 使用 Clang 包中的 llvm-symbolizer 符号化栈追踪信息(注意理想的 llvm-symbolizer 版本必须与 ASan 运行时库匹配)。为了使 AddressSanitizer 符号化它的输出,需要设置 ASAN_SYMBOLIZER_PATH 环境变量指向 llvm-symbolizer 二进制文件,或确保 llvm-symbolizer$PATH 中:

$ ASAN_SYMBOLIZER_PATH=/usr/local/bin/llvm-symbolizer ./a.out
==9442== ERROR: AddressSanitizer heap-use-after-free on address 0x7f7ddab8c084 at pc 0x403c8c bp 0x7fff87fb82d0 sp 0x7fff87fb82c8
READ of size 4 at 0x7f7ddab8c084 thread T0#0 0x403c8c in main example_UseAfterFree.cc:4#1 0x7f7ddabcac4d in __libc_start_main ??:0
0x7f7ddab8c084 is located 4 bytes inside of 400-byte region [0x7f7ddab8c080,0x7f7ddab8c210)
freed by thread T0 here:#0 0x404704 in operator delete[](void*) ??:0#1 0x403c53 in main example_UseAfterFree.cc:4#2 0x7f7ddabcac4d in __libc_start_main ??:0
previously allocated by thread T0 here:#0 0x404544 in operator new[](unsigned long) ??:0#1 0x403c43 in main example_UseAfterFree.cc:2#2 0x7f7ddabcac4d in __libc_start_main ??:0
==9442== ABORTING

llvm-symbolizer 符号化工具属于 llvm 包,Ubuntu 下具体的安装方法可以参考 LLVM Debian/Ubuntu nightly packages。

如果上面的方法不起作用,可以使用一个单独的脚本来离线地符号化结果(在线的符号化可以通过设置 ASAN_OPTIONS=symbolize=0,或者设置一个空的 ASAN_SYMBOLIZER_PATH 环境变量($ export ASAN_SYMBOLIZER_PATH=)来强制禁用):

$ ASAN_OPTIONS=symbolize=0 ./a.out 2> log
$ projects/compiler-rt/lib/asan/scripts/asan_symbolize.py / < log | c++filt
==9442== ERROR: AddressSanitizer heap-use-after-free on address 0x7f7ddab8c084 at pc 0x403c8c bp 0x7fff87fb82d0 sp 0x7fff87fb82c8
READ of size 4 at 0x7f7ddab8c084 thread T0#0 0x403c8c in main example_UseAfterFree.cc:4#1 0x7f7ddabcac4d in __libc_start_main ??:0
...

这个脚本接收一个可选的参数 -- a file prefix。子串 .*prefix 将被从文件名中移除。

上面的 c++filt 用于解函数名的符号重组,这也可以通过给 asan_symbolize.py 脚本添加 -d 参数来完成。

在 OS X 上可能需要对二进制文件运行 dsymutil 以在 AddressSanitizer 报告中获得 file:line 信息。

还可以引入自己的栈追踪格式,使用 stack_trace_format 运行时标记完成。例如:

% ./a.out...#0 0x4b615d in main /home/you/use-after-free.cc:12:3...
% ASAN_OPTIONS='stack_trace_format="[frame=%n, function=%f, location=%S]"' ./a.out...[frame=0, function=main, location=/home/you/use-after-free.cc:12:3]

AddressSanitizer 算法

简单的版本

运行时库替换 mallocfree 函数。把 malloc 分配的内存区域(红色区域)附近放入一些特定的字节(使中毒)。把 free 的内存放入隔离区,并且也放入一些特定的字节(使中毒)。程序中的每次内存访问由编译器以下面的方式做一个转换。
之前:

*address = ...;  // or: ... = *address;

之后:

if (IsPoisoned(address)) {ReportError(address, kAccessSize, kIsWrite);
}
*address = ...;  // or: ... = *address;

棘手的部分是如何把 IsPoisoned 实现的高效,且 ReportError 紧凑。同时,对某些访问的插桩可能被证明是冗余的。

内存映射

虚拟地址空间被分割为 2 个互斥的类别:

  • 主应用内存(Mem):这种内存由常规的应用代码使用。
  • 阴影内存(Shadow):这种内存包含阴影值(或元数据)。阴影和主应用程序内存之间存在对应关系。在主内存中 使中毒 一个字节意味着在对应的阴影区写入一些特殊的值。

这两种类别的内存应该以阴影内存(MemToShadow)可以被快速计算出来的方式进行组织。

编译器执行的插桩如下:

shadow_address = MemToShadow(address);
if (ShadowIsPoisoned(shadow_address)) {ReportError(address, kAccessSize, kIsWrite);
}

映射

AddressSanitizer 把 8 个字节的应用内存映射为 1 个字节的阴影内存。

对于任何 8 字节对齐的应用内存只有 9 个不同的值:

  • qword 中的所有 8 字节是未中毒的(比如,可寻址)。阴影值为 0。
  • qword 中的所有 8 字节是中毒的(比如,不可寻址)。阴影值为负数。
  • 起始的 k 字节是未中毒的,其余的 8-k 字节是中毒的。阴影值为 k。这主要由 malloc 的行为保证,即 malloc 总是返回 8 字节对齐的内存块。一个 qword 对齐的不同字节具有不同状态的仅有的情况是 malloc 的区域的尾部。比如,如果我们调用 malloc(13),我们将拥有一个完整的未中毒的 qword 和一个开头 5 字节未中毒的 qword。

插桩看起来像下面这样:

byte *shadow_address = MemToShadow(address);
byte shadow_value = *shadow_address;
if (shadow_value) {if (SlowPathCheck(shadow_value, address, kAccessSize)) {ReportError(address, kAccessSize, kIsWrite);}
}
// Check the cases where we access first k bytes of the qword
// and these k bytes are unpoisoned.
bool SlowPathCheck(shadow_value, address, kAccessSize) {last_accessed_byte = (address & 7) + kAccessSize - 1;return (last_accessed_byte >= shadow_value);
}

MemToShadow(ShadowAddr) 落入不可寻址的 ShadowGap 区域。因此,如果程序试图直接访问阴影区域中的内存位置,它将崩溃。

64-bit

Shadow = (Mem >> 3) + 0x7fff8000;
[0x10007fff8000, 0x7fffffffffff] HighMem
[0x02008fff7000, 0x10007fff7fff] HighShadow
[0x00008fff7000, 0x02008fff6fff] ShadowGap
[0x00007fff8000, 0x00008fff6fff] LowShadow
[0x000000000000, 0x00007fff7fff] LowMem

32 bit

Shadow = (Mem >> 3) + 0x20000000;
[0x40000000, 0xffffffff] HighMem
[0x28000000, 0x3fffffff] HighShadow
[0x24000000, 0x27ffffff] ShadowGap
[0x20000000, 0x23ffffff] LowShadow
[0x00000000, 0x1fffffff] LowMem

超紧凑的阴影区

使用更紧凑的阴影区内存也是可能的,比如:

Shadow = (Mem >> 7) | kOffset;

还在实验中。

报告错误

ReportError 可以被实现为一个调用(当前默认就是这样),但也有一些其它的,稍微更加高效和/或更加紧凑的方案。此刻默认的行为

  • 把失败地址拷贝到 %rax (%eax)
  • 执行 ud2 (产生 SIGILL)
  • ud2 之后的一个字节指令中编码访问类型和大小。整体上这 3 个指令需要 5-6 个字节的机器码。

仅使用单个指令(比如 ud2)也是可能的,但这需要在运行时库中有一个完整的反汇编器(或一些其它的 hacks)。

为了捕获栈溢出,AddressSanitizer 插桩的代码像这样:

原始的代码:

void foo() {char a[8];...return;
}

插桩后的代码:

void foo() {char redzone1[32];  // 32-byte alignedchar a[8];          // 32-byte alignedchar redzone2[24];char redzone3[32];  // 32-byte alignedint  *shadow_base = MemToShadow(redzone1);shadow_base[0] = 0xffffffff;  // poison redzone1shadow_base[1] = 0xffffff00;  // poison redzone2, unpoison 'a'shadow_base[2] = 0xffffffff;  // poison redzone3...shadow_base[0] = shadow_base[1] = shadow_base[2] = 0; // unpoison allreturn;
}

插桩的代码示例(x86_64)

# long load8(long *a) { return *a; }
0000000000000030 <load8>:30:  48 89 f8                mov    %rdi,%rax33: 48 c1 e8 03             shr    $0x3,%rax37: 80 b8 00 80 ff 7f 00    cmpb   $0x0,0x7fff8000(%rax)3e: 75 04                   jne    44 <load8+0x14>40:    48 8b 07                mov    (%rdi),%rax   <<<<<< original load43:  c3                      retq   44:  52                      push   %rdx45:  e8 00 00 00 00          callq  __asan_report_load8
# int  load4(int *a)  { return *a; }
0000000000000000 <load4>:0:   48 89 f8                mov    %rdi,%rax3:  48 89 fa                mov    %rdi,%rdx6:  48 c1 e8 03             shr    $0x3,%raxa:  83 e2 07                and    $0x7,%edxd:  0f b6 80 00 80 ff 7f    movzbl 0x7fff8000(%rax),%eax14: 83 c2 03                add    $0x3,%edx17: 38 c2                   cmp    %al,%dl19:   7d 03                   jge    1e <load4+0x1e>1b:    8b 07                   mov    (%rdi),%eax    <<<<<< original load1d: c3                      retq   1e:  84 c0                   test   %al,%al20:   74 f9                   je     1b <load4+0x1b>22:    50                      push   %rax23:  e8 00 00 00 00          callq  __asan_report_load4

未对齐的访问

当前紧凑的映射将不捕获未对齐的部分越界访问:

int *x = new int[2]; // 8 bytes: [0,7].
int *u = (int*)((char*)x + 6);
*u = 1;  // Access to range [6-9]

https://github.com/google/sanitizers/issues/100 中描述了一个可行的方案,但它付出了性能的代价。

运行时库

Malloc

运行时库替换 malloc/free,并提供错误报告函数,如 __asan_report_load8

malloc 分配由红区围绕的请求数量的内存。阴影值对应的红区被下毒,主内存区域的阴影值被清除。

free 用阴影值对整个区域下毒,并把内存块放入一个隔离区(这样在一定时间内这个内存块将不会再次被 malloc 返回)。

参考文档

  • Clang AddressSanitizer
  • AddressSanitizerAlgorithm
  • llvm-symbolizer
  • Address Sanitizer 用法
  • sanitizers
  • AddressSanitizerExampleHeapOutOfBounds
  • AddressSanitizerCallStack

Linux 下的 AddressSanitizer相关推荐

  1. LINUX 下使用Address Sanitizer ,以及不能运行的问题

    文章目录 一. 简介 二.AddressSanitizer 的使用 使用方法 1.使用添加编译选项的方式使用ASan 2.使用CMake添加编译选项 三.测试 不添加Asan选项,不会有任何输出 添加 ...

  2. Linux下内存检测工具:asan

    Linux下内存检测工具:asan ASAN(Address-Sanitizier)早先是LLVM中的特性,后被加入GCC 4.8,在GCC 4.9后加入对ARM平台的支持.因此GCC 4.8以上版本 ...

  3. 过滤Linux下不同大小的文件,linux查找当前目录下 M/G 大小的文件,删除Linux下指定大小的文件

    过滤Linux下不同大小的文件,linux查找当前目录下 M/G 大小的文件,删除Linux下指定大小的文件 find ./ -type f -size +1G| xargs rm 在清理系统日志文件 ...

  4. Linux下创建硬链接,文件访问为空,提示:xxxx: 符号连接的层数过多

    Linux下创建软链接|硬链接,文件访问为空,提示:x x x: 符号连接的层数过多. 原因:创建符号链接的时候未使用绝对路径,无论是源文件路径还是目标路径,都需要使用绝对路径. 如: ln -s / ...

  5. Linux下环境变量配置方法梳理(.bash_profile和.bashrc的区别)

    博客园 首页 新随笔 联系 管理 订阅 <div class="blogStats"><!--done--> 随笔- 556  文章- 38  评论- 77 ...

  6. linux下yum错误:[Errno 14] problem making ssl connection Trying other mirror.

    所有的base 都要取消注释 mirrorlist 加上注释 另外所有的enable都要设为零 目录 今天是要yum命令安装EPEL仓库后 yum install epel-release 突然发现y ...

  7. linux下使用source /etc/profile保存配置后,新的环境变量只能在一个终端里面有效

    博客园 首页 新随笔 联系 管理 订阅 <div class="blogStats"><!--done--> 随笔- 6  文章- 2  评论- 2 < ...

  8. Linux下Flash-LED的处理

    Linux下Flash-LED的处理 一些LED设备提供两种模式-torch和flash.在LED子系统中,LED类(参见Linux下的LED处理)和LED Flash类,分别支持这些模式.torch ...

  9. YOLOv4:目标检测(windows和Linux下Darknet 版本)实施

    YOLOv4:目标检测(windows和Linux下Darknet 版本)实施 YOLOv4 - Neural Networks for Object Detection (Windows and L ...

最新文章

  1. mysql port range_Defining port and port-range in /etc/services file
  2. Linux环境安装JDK
  3. python3 urlencode_Python2和Python3中urllib库中urlencode的使用注意事项
  4. java中quicksort的参数_Java中的Quicksort
  5. 如何监视Java EE数据源
  6. 转g代码教程_图深度学习入门教程(九)——图滤波神经网络模型
  7. 在 Keras 中为循环神经网络添加自定义注意层
  8. 台达PLC变频器通讯程序
  9. win7触摸板怎么关闭_笔记本触控板怎么关闭 笔记本触控板关闭方法【详解】
  10. 语言表达的6c原则是指什么,第二讲 BEC写作的语言、语法、组织和文体
  11. CAD门窗lisp_CAD高版本窗体阵列LISP_高版本CAD如何显示阵列窗口?
  12. 浅谈动感歌词:网易云歌词分析
  13. 22届春季校招实习试水之路2(前端)
  14. Python爬猫眼电影影评及可视化 Robin NJU
  15. winform 三层(BLL.DAL.MODEL)
  16. SAP与金蝶星瀚:ERP销售业务实现技术方式区别
  17. 母亲节快乐flash动画素材
  18. 扩展433兆赫射频发射模块的传输范围
  19. 语法3:for - 循环结构
  20. P72-前端基础项目开发-首页main部分开发Banner

热门文章

  1. Fanout交换器-编写生产者
  2. RaDirect交换器-搭建环境
  3. SpringBoot_数据访问-整合MyBatis(一)-基础环境搭建
  4. 华为云服务器芯片,云服务器芯片
  5. mysql count 排序_SQL进阶排序和窗口函数
  6. List、Array与ArrayList
  7. 四则运算2的单元测试
  8. 利用Scala特征(trait)的堆叠操作特性进行切面编程
  9. 牛客 - 点对最大值(树的直径)
  10. 牛客 - 捡金币(思维+二维前缀和+构造)