Address Sanitizer是谷歌开发的检测 use-after-free、内存泄漏等内存访问错误的工具。它内置在GCC版本>= 4.8中,可以在C和c++代码中使用 。Address Sanitizer程序使用运行时检测 来跟踪内存分配,这意味着你必须使用Address Sanitizer程序来构建代码以利用它的特性。
在AddressSanitizer Github Wiki上有大量的文档。
内存泄漏会增加程序使用的总内存 。当不再需要内存时,适当地释放内存是很重要的。对于小程序,到处丢失几个字节似乎不是什么大问题。然而,对于使用千兆内存的长时间运行程序,避免内存泄漏变得越来越重要。如果您的程序在不再需要内存时未能释放它所使用的内存,那么它可能会耗尽内存,从而导致应用程序提前终止。AddressSanitizer可以帮助检测这些内存泄漏。
此外,AddressSanitizer可以检测 use-after-free 的错误。当程序试图读取或写入已经释放的内存时,就会出现use-after-free错误。这是未定义的行为,可能导致数据损坏、不正确的结果,甚至程序崩溃 。
Building With Address Sanitzer
我们需要使用gcc来构建我们的代码,所以我们将加载gcc模块:
module load gnu/9.1.0
-fsanitize=address标志用于告诉编译器添加AddressSanitizer.
此外,由于OSC系统上的一些环境配置设置,我们还必须静态地链接到Asan。这是使用-static-libasan or -l:libasan.a标志完成的。
用debug symbols编译代码是很有帮助的。如果存在调试符号,AddressSanitizer将打印行号。为此,添加-g标志。此外,如果您发现堆栈跟踪看起来不太正确,-fno-omit-frame-pointer标志可能会有所帮助 。
在一个命令中,它看起来像:
gcc main.c -o main -fsanitize=address -l:libasan.a -g
gcc main.c -o main -fsanitize=address -static-libasan -g
或者,分为单独的编译和链接阶段:
gcc -c main.c -fsanitize=address -g
gcc main.o -o main -fsanitize=address -static-libasan
注意,编译和链接步骤都需要-fsanize -address标志,但只有链接步骤需要-static-libasan。如果您的构建系统更复杂,那么将这些标志放在CFLAGS和LDFLAGS环境变量中可能是有意义的。
就是这样!
Examples
No Leak
首先,让我们看一个没有内存泄漏的程序(noleak.c):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, const char *argv[]) {
char *s = malloc(100);
strcpy(s, "Hello world!");
printf("string is: %s\n", s);
free(s);
return 0;
}
为了构建它,我们运行:
gcc noleak.c -o noleak -fsanitize=address -static-libasan -g
并且,运行它后得到的输出:
string is: Hello world!
看起来是对的!因为在这个程序中没有内存泄漏,AddressSanitizer没有打印任何东西。但是,如果有泄漏怎么办?
Missing free
让我们再看一遍上面的程序,但是这一次,去掉free调用(leak.c):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, const char *argv[]) {
char *s = malloc(100);
strcpy(s, "Hello world!");
printf("string is: %s\n", s);
return 0;
}
然后,构建:
gcc leak.c -o leak -fsanitize=address -static-libasan
输出:
string is: Hello world!
=================================================================
==235624==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 100 byte(s) in 1 object(s) allocated from:
#00x4eaaa8 in \_\_interceptor\_malloc ../../.././libsanitizer/asan/asan\_malloc\_linux.cc:144
#10x5283dd in main /users/PZS0710/edanish/test/asan/leak.c:6
#20x2b0c29909544 in \_\_libc\_start\_main (/lib64/libc.so.6+0x22544)
SUMMARY: AddressSanitizer: 100 byte(s) leaked in 1 allocation(s).
这是AddressSanitizer公司的泄漏报告。它检测到分配了100个字节,但从未释放。查看它提供的堆栈跟踪,我们可以看到内存是在leak.c中的第6行分配的
Use After Free
假设我们在代码中发现了上述漏洞,我们想要修复它。我们需要添加免费呼叫。但是,如果我们把它加错了位置呢?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, const char *argv[]) {
char *s = malloc(100);
free(s);
strcpy(s, "Hello world!");
printf("string is: %s\n", s);
return 0;
}
以上(uaf.c)显然是错误的。在一个人为的例子中,已分配的内存(由“s”指向)在被释放后被写入和读取。
To Build:
gcc uaf.c -o uaf -fsanitize=address -static-libasan
构建它并运行它,我们从AddressSanitizer获得以下报告:
=================================================================
==244157==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b0000000f0 at pc 0x00000047a560 bp 0x7ffcdf0d59f0 sp 0x7ffcdf0d51a0
WRITE of size 13 at 0x60b0000000f0 thread T0
#0 0x47a55f in \_\_interceptor\_memcpy ../../.././libsanitizer/sanitizer\_common/sanitizer\_common\_interceptors.inc:790
#1 0x528403 in main /users/PZS0710/edanish/test/asan/uaf.c:8
#2 0x2b47dd204544 in \_\_libc\_start\_main (/lib64/libc.so.6+0x22544)
#3 0x405f5c (/users/PZS0710/edanish/test/asan/uaf+0x405f5c)
0x60b0000000f0 is located 0 bytes inside of 100-byte region [0x60b0000000f0,0x60b000000154)
freed by thread T0 here:
#0 0x4ea6f7 in \_\_interceptor\_free ../../.././libsanitizer/asan/asan\_malloc\_linux.cc:122
#1 0x5283ed in main /users/PZS0710/edanish/test/asan/uaf.c:7
#2 0x2b47dd204544 in \_\_libc\_start\_main (/lib64/libc.so.6+0x22544)
previously allocated by thread T0 here:
#0 0x4eaaa8 in \_\_interceptor\_malloc ../../.././libsanitizer/asan/asan\_malloc\_linux.cc:144
#1 0x5283dd in main /users/PZS0710/edanish/test/asan/uaf.c:6
#2 0x2b47dd204544 in \_\_libc\_start\_main (/lib64/libc.so.6+0x22544)
SUMMARY: AddressSanitizer: heap-use-after-free ../../.././libsanitizer/sanitizer\_common/sanitizer\_common\_interceptors.inc:790 in \_\_interceptor\_memcpy
Shadow bytes around the buggy address:
0x0c167fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c167fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c167fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c167fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c167fff8000: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
=>0x0c167fff8010: fd fd fd fd fd fa fa fa fa fa fa fa fa fa[fd]fd
0x0c167fff8020: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
0x0c167fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c167fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c167fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c167fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==244157==ABORTING
这有点吓人。看起来这里发生了很多事情,但并没有看起来那么糟糕。从顶部开始,我们看到AddressSanitizer检测到了什么。在本例中,一个13字节的“WRITE”(来自我们的strcpy)。紧接着,我们得到写操作发生位置的堆栈跟踪。这告诉我们,写操作发生在uaf.c中名为“main”的函数的第8行。
接下来,AddressSanitizer报告内存的位置。我们现在可以忽略这一点,但根据您的用例,它可能是有用的信息。
下面是两个关键信息。AddressSanitizer告诉我们内存被释放的位置(“这里由线程T0释放”一节),给我们另一个堆栈跟踪,表明内存在第7行被释放。然后,它报告它最初在哪里被分配(“之前由线程T0在这里分配:”),在uaf.c中的第6行。
这可能是开始调试问题所需的足够信息。报告的其余部分提供了有关内存如何布局的详细信息,以及访问/写入的确切地址。你可能不需要太关注这一部分。对于大多数用例来说,它有点“落伍”。
Heap Overflow
AddresssSanitizer也可以检测堆溢出。考虑下面的代码(overflow.c):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, const char *argv[]) {
// whoops, forgot c strings are null-terminated
// and not enough memory was allocated for the copy
char *s = malloc(12);
strcpy(s, "Hello world!");
printf("string is: %s\n", s);
free(s);
return 0;
}
"Hello world!"字符串有13个字符长,包括null结束符,但是我们只分配了12个字节,所以上面的字符串会溢出分配的缓冲区。要构建这个:
gcc overflow.c -o overflow -fsanitize=address -static-libasan -g -Wall
然后,运行它,我们从AddressSanitizer获得以下报告:
==168232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000003c at pc 0x000000423454 bp 0x7ffdd58700e0 sp 0x7ffdd586f890
WRITE of size 13 at 0x60200000003c thread T0
#00x423453 in \_\_interceptor\_memcpy /apps\_src/gnu/8.4.0/src/libsanitizer/sanitizer\_common/sanitizer\_common\_interceptors.inc:737
#10x5097c9 in main /users/PZS0710/edanish/test/asan/overflow.c:8
#20x2ad93cbd7544 in \_\_libc\_start\_main (/lib64/libc.so.6+0x22544)
#30x405d7b (/users/PZS0710/edanish/test/asan/overflow+0x405d7b)
0x60200000003c is located 0 bytes to the right of 12-byte region [0x602000000030,0x60200000003c)
allocated by thread T0 here:
#00x4cd5d0 in \_\_interceptor\_malloc /apps\_src/gnu/8.4.0/src/libsanitizer/asan/asan\_malloc\_linux.cc:86
#10x5097af in main /users/PZS0710/edanish/test/asan/overflow.c:7
#20x2ad93cbd7544 in \_\_libc\_start\_main (/lib64/libc.so.6+0x22544)
SUMMARY: AddressSanitizer: heap-buffer-overflow /apps\_src/gnu/8.4.0/src/libsanitizer/sanitizer\_common/sanitizer\_common\_interceptors.inc:737 in \_\_interceptor\_memcpy
Shadow bytes around the buggy address:
0x0c047fff7fb0: 00000000000000000000000000000000
0x0c047fff7fc0: 00000000000000000000000000000000
0x0c047fff7fd0: 00000000000000000000000000000000
0x0c047fff7fe0: 00000000000000000000000000000000
0x0c047fff7ff0: 00000000000000000000000000000000
=>0x0c047fff8000: fa fa 00 fa fa fa 00[04]fa fa fa fa fa fa fa fa
0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01020304050607
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==168232==ABORTING
这类似于我们上面看到的use-after-free报告。它告诉我们发生了堆缓冲区溢出,然后继续报告写操作发生在哪里,以及内存最初分配在哪里。同样,本报告的其余部分描述了堆的布局,对于您的用例来说可能不太重要。
C++ Delete Mismatch
AddressSanitizer也可以用于c++代码。考虑下面的代码(bad_delete.cxx):
#include <iostream>
#include <cstring>
int main(int argc, const char *argv[]) {
char *cstr = new char[100];
strcpy(cstr, "Hello World");
std::cout << cstr << std::endl;
delete cstr;
return 0;
}
这里的问题是什么?"cstr"指向的内存是用new[]分配的。必须使用delete[]操作符删除数组分配,而不是“delete”。
要构建这些代码,只需使用g++而不是gcc:
g++ bad\_delete.cxx -o bad\_delete -fsanitize=address -static-libasan -g
运行它,我们得到以下输出:
Hello World
=================================================================
==257438==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60b000000040
#0 0x4d0a78 in operator delete(void*, unsigned long) /apps\_src/gnu/8.4.0/src/libsanitizer/asan/asan\_new\_delete.cc:151
#1 0x509ea8 in main /users/PZS0710/edanish/test/asan/bad\_delete.cxx:9
#2 0x2b8232878544 in \_\_libc\_start\_main (/lib64/libc.so.6+0x22544)
#3 0x40642b (/users/PZS0710/edanish/test/asan/bad\_delete+0x40642b)
0x60b000000040 is located 0 bytes inside of 100-byte region [0x60b000000040,0x60b0000000a4)
allocated by thread T0 here:
#0 0x4cf840 in operator new[](unsigned long) /apps\_src/gnu/8.4.0/src/libsanitizer/asan/asan\_new\_delete.cc:93
#1 0x509e5f in main /users/PZS0710/edanish/test/asan/bad\_delete.cxx:5
#2 0x2b8232878544 in \_\_libc\_start\_main (/lib64/libc.so.6+0x22544)
SUMMARY: AddressSanitizer: alloc-dealloc-mismatch /apps\_src/gnu/8.4.0/src/libsanitizer/asan/asan\_new\_delete.cc:151 in operator delete(void*, unsigned long)
==257438==HINT: if you don't care about these errors you may set ASAN\_OPTIONS=alloc\_dealloc\_mismatch=0
==257438==ABORTING
这类似于我们看到的其他AddressSanitizer输出。这一次,它告诉我们new和delete之间不匹配。它打印了删除发生位置的堆栈跟踪(第9行)和分配发生位置的堆栈跟踪(第5行)。
Performance
文件指出:
这个工具非常快。instrumented程序的平均减速为~2x
AddressSanitizer比类似的分析工具(如valgrind)要快得多。这允许在HPC代码上使用。
但是,如果您发现AddressSanitizer对于您的代码来说太慢了,可以使用编译器标志来禁用特定函数的AddressSanitizer。这样,您就可以在代码中较冷的部分使用 address sanitizer 器,同时手动审核热路径。
跳过分析函数的编译器指令是:
\_\_attribute\_\_((no\_sanitize\_address)
参考文献
点个「赞」+「在看」❤️
让我们知道这份文字有温暖到你,也是 我们持续 创作的最大动力!
推荐
要用 AI 裁员 50% 的千亿独角兽,公开认错,重启招聘!
single codebook和dual codebook在LLM中向量量化上有什么区别?
