一、前言

最近看了几篇相关的论文，发现开源的数据集虽然大都来自SARD和NVD，但是每个数据集的组织形式和内容形式都不太一样，加上漏洞种类繁多，最近在复现提取漏洞标签算法时，不知道该如何去匹配到多种漏洞token，目前最简单的方案是让漏洞特征集H = {strcpy, fgets, memcpy, spr int, strcat, . . .}，然后再进行字符串逐行逐个单词匹配。但是类似的工作五六年前便已经有人做过，并且和导师讨论确实发现对数据集以及漏洞本身没有一个深刻的理解和认识。SySeVR这篇论文中数据集相对较全面，因此借助这篇论文中数据集进行学习分析，希望可以深入了解漏洞表征形式。

PS：由于该数据集涵盖40个大类，一百多种漏洞类型，十几万个源码，后面会一直更新。

二、漏洞代码分析

2.1 组织结构

起初看到这个分类一脸懵，原始数据集非常乱，应该是没删除.h文件，也没对C和C++进行区分，并且数据集来源甚广，没有另一篇看起来那么组织规整。

2.2 SARD

SARD数据集每个源代码都被文件夹包裹，这是便于批量在joern中生成图，不过不清楚为什么目录中编号是跳跃的，我这里与数据集中编号保持一致，方便大家对照看。并且每一个小类中源码都大同小异，因此我每种只分析一个源码。

000\117

int main(int argc, char *argv[])
{char buf[10];/*  BAD  */buf[10] = 'A';return 0;
}

存在数组越界漏洞。在程序中，定义了一个名为 buf 的字符数组，大小为 10，有效的索引应该介于 0 到 9 之间。然而，在代码的 /* BAD */ 部分，试图对数组 buf 的索引为 10 的位置进行赋值操作，将其设为字符 'A'。

001\003

int main(int argc, char *argv[])
{int inc_value;int loop_counter;char buf[10];inc_value = 4105 - (4105 - 1);for(loop_counter = 0; loop_counter <= 4105; loop_counter += inc_value){/*  BAD  */buf[loop_counter] = 'A';}return 0;
}

数组buf的大小只有10个字节，而循环变量loop_counter的初始值为0，每次迭代都增加inc_value，这个值为4105。由于loop_counter会一直增加，当其达到或超过10时，就会发生缓冲区溢出，即往缓冲区buf之外写入数据。这种情况会导致内存越界访问，可能会覆盖其他重要的数据结构，破坏程序的正常行为甚至引发安全问题。

002\026

#include <fstream>
#include <iostream>
#include <string>
using namespace std;typedef class cont_o cont;class cont_o{private:string name; public:cont_o(const string& n): name(n) { }string getName(){return name;}~cont_o(){}
};int main(int argc, const char *argv[])
{if (argc > 1){cont container(argv[1]);ifstream in(container.getName().c_str());char temp[100];while(!in.getline(temp, 100).fail()&&!in.eof()){cout << temp<<endl;}cout << temp<<endl;}return 0;
}

在上述代码中，存在一个潜在的文件路径注入漏洞。漏洞出现在以下几个地方：

cont类的构造函数接受一个字符串作为参数，并将其赋值给name成员变量。在构造函数中没有对传入的字符串进行任何检查或过滤。这意味着，如果恶意用户通过命令行参数传递一个恶意构造的字符串，那么该字符串可能包含特殊字符，如路径分隔符（如/或\），导致程序打开了恶意指定的文件路径。
在main函数中，通过将container对象的名称转换为C风格字符串，并将其用作ifstream对象的构造函数参数，打开了文件。然而，在这个过程中，没有对文件路径进行任何验证或过滤。这允许恶意用户通过传递恶意构造的字符串来注入任意文件路径。

这样的文件路径注入漏洞可能导致安全问题，如读取敏感文件、远程命令执行等。

061\944

PS：这一类数据集的制作和组织形式最赏心悦目，不过一开始看起来较于之前很是复杂，里面包含条件宏定义，一个bad函数和两个修复的good函数。这类数据集得自己写脚本预处理才行。

（代码样例中有Sink的概念，本人还不是很清楚，但是漏洞位置代码中会有注释标出）

/* TEMPLATE GENERATED TESTCASE FILE
Filename: CWE114_Process_Control__w32_char_connect_socket_05.c
Label Definition File: CWE114_Process_Control__w32.label.xml
Template File: sources-sink-05.tmpl.c
*/
/** @description* CWE: 114 Process Control* BadSource: connect_socket Read data using a connect socket (client side)* GoodSource: Hard code the full pathname to the library* Sink:*    BadSink : Load a dynamic link library* Flow Variant: 05 Control flow: if(staticTrue) and if(staticFalse)** */#include "std_testcase.h"#include <wchar.h>#ifdef _WIN32
#include <winsock2.h>
#include <windows.h>
#include <direct.h>
#pragma comment(lib, "ws2_32") /* include ws2_32.lib when linking */
#define CLOSE_SOCKET closesocket
#else /* NOT _WIN32 */
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <unistd.h>
#define INVALID_SOCKET -1
#define SOCKET_ERROR -1
#define CLOSE_SOCKET close
#define SOCKET int
#endif#define TCP_PORT 27015
#define IP_ADDRESS "127.0.0.1"/* The two variables below are not defined as "const", but are never* assigned any other value, so a tool should be able to identify that* reads of these will always return their initialized values.*/
static int staticTrue = 1; /* true */
static int staticFalse = 0; /* false */#ifndef OMITBADvoid CWE114_Process_Control__w32_char_connect_socket_05_bad()
{char * data;char dataBuffer[100] = "";data = dataBuffer;if(staticTrue){{
#ifdef _WIN32WSADATA wsaData;int wsaDataInit = 0;
#endifint recvResult;struct sockaddr_in service;char *replace;SOCKET connectSocket = INVALID_SOCKET;size_t dataLen = strlen(data);do{
#ifdef _WIN32if (WSAStartup(MAKEWORD(2,2), &wsaData) != NO_ERROR){break;}wsaDataInit = 1;
#endif/* POTENTIAL FLAW: Read data using a connect socket */connectSocket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);if (connectSocket == INVALID_SOCKET){break;}memset(&service, 0, sizeof(service));service.sin_family = AF_INET;service.sin_addr.s_addr = inet_addr(IP_ADDRESS);service.sin_port = htons(TCP_PORT);if (connect(connectSocket, (struct sockaddr*)&service, sizeof(service)) == SOCKET_ERROR){break;}/* Abort on error or the connection was closed, make sure to recv one* less char than is in the recv_buf in order to append a terminator *//* Abort on error or the connection was closed */recvResult = recv(connectSocket, (char *)(data + dataLen), sizeof(char) * (100 - dataLen - 1), 0);if (recvResult == SOCKET_ERROR || recvResult == 0){break;}/* Append null terminator */data[dataLen + recvResult / sizeof(char)] = '\0';/* Eliminate CRLF */replace = strchr(data, '\r');if (replace){*replace = '\0';}replace = strchr(data, '\n');if (replace){*replace = '\0';}}while (0);if (connectSocket != INVALID_SOCKET){CLOSE_SOCKET(connectSocket);}
#ifdef _WIN32if (wsaDataInit){WSACleanup();}
#endif}}{HMODULE hModule;/* POTENTIAL FLAW: If the path to the library is not specified, an attacker may be able to* replace his own file with the intended library */hModule = LoadLibraryA(data);if (hModule != NULL){FreeLibrary(hModule);printLine("Library loaded and freed successfully");}else{printLine("Unable to load library");}}
}#endif /* OMITBAD */#ifndef OMITGOOD/* goodG2B1() - use goodsource and badsink by changing the staticTrue to staticFalse */
static void goodG2B1()
{char * data;char dataBuffer[100] = "";data = dataBuffer;if(staticFalse){/* INCIDENTAL: CWE 561 Dead Code, the code below will never run */printLine("Benign, fixed string");}else{/* FIX: Specify the full pathname for the library */strcpy(data, "C:\\Windows\\System32\\winsrv.dll");}{HMODULE hModule;/* POTENTIAL FLAW: If the path to the library is not specified, an attacker may be able to* replace his own file with the intended library */hModule = LoadLibraryA(data);if (hModule != NULL){FreeLibrary(hModule);printLine("Library loaded and freed successfully");}else{printLine("Unable to load library");}}
}/* goodG2B2() - use goodsource and badsink by reversing the blocks in the if statement */
static void goodG2B2()
{char * data;char dataBuffer[100] = "";data = dataBuffer;if(staticTrue){/* FIX: Specify the full pathname for the library */strcpy(data, "C:\\Windows\\System32\\winsrv.dll");}{HMODULE hModule;/* POTENTIAL FLAW: If the path to the library is not specified, an attacker may be able to* replace his own file with the intended library */hModule = LoadLibraryA(data);if (hModule != NULL){FreeLibrary(hModule);printLine("Library loaded and freed successfully");}else{printLine("Unable to load library");}}
}void CWE114_Process_Control__w32_char_connect_socket_05_good()
{goodG2B1();goodG2B2();
}#endif /* OMITGOOD *//* Below is the main(). It is only used when building this testcase on* its own for testing or for building a binary to use in testing binary* analysis tools. It is not used when compiling all the testcases as one* application, which is how source code analysis tools are tested.*/#ifdef INCLUDEMAINint main(int argc, char * argv[])
{/* seed randomness */srand( (unsigned)time(NULL) );
#ifndef OMITGOODprintLine("Calling good()...");CWE114_Process_Control__w32_char_connect_socket_05_good();printLine("Finished good()");
#endif /* OMITGOOD */
#ifndef OMITBADprintLine("Calling bad()...");CWE114_Process_Control__w32_char_connect_socket_05_bad();printLine("Finished bad()");
#endif /* OMITBAD */return 0;
}#endif

该代码主要做了两件事：首先，它通过一个连接套接字从网络服务中读取数据；然后，使用读取的数据作为路径加载一个库。然而，这段代码存在潜在的安全风险，如果库的路径没有明确指定，攻击者可能会用他们自己的文件替换预期的库。

漏洞语句：recvResult = recv(connectSocket, (char *)(data + dataLen), sizeof(char) * (100 - dataLen - 1), 0);

首先解释下代码，这行代码使用了 recv() 函数从连接套接字 connectSocket 中接收数据，并将接收到的数据存储在 data 指针指向的缓冲区中。

让我们逐个解析函数的参数：

connectSocket：这是一个有效的套接字描述符，用于表示网络连接。
(char *)(data + dataLen)：这是接收数据存储的目标地址。data 是一个字符指针，指向要接收数据的缓冲区。通过在 data 指针上加上 dataLen 的偏移量，可以将数据存储在未使用的缓冲区中。
sizeof(char) * (100 - dataLen - 1)：这是要接收的数据的最大长度，以字节为单位。100 - dataLen - 1 计算出剩余的可用空间，即缓冲区中还可以接收的数据的最大字节数。sizeof(char) 表示一个字符的大小，通常是1字节。
0：这是接收操作的附加选项，指定接收操作的行为。在这种情况下，传递0表示没有额外的选项。

这里存在一个潜在的缓冲区溢出风险。在接收数据之前，通过 strlen() 函数计算了指针 data 指向的字符串的长度，并将结果存储在 dataLen 变量中。然后，在接收数据时，使用了这个长度来计算接收缓冲区的大小，并将接收到的数据放入 data 缓冲区中。

问题在于，这里没有考虑到 dataBuffer 数组的大小是 100，但是 dataLen 变量的值可能会大于等于 100，导致缓冲区溢出。同时，对于缓冲区中存储的数据，没有进行适当的验证和处理。

062\002

由于代码过于冗长，因此将有漏洞的函数单拎出来。

void CWE114_Process_Control__w32_char_console_15_bad()
{char * data;char dataBuffer[100] = "";data = dataBuffer;switch(6){case 6:{/* Read input from the console */size_t dataLen = strlen(data);/* if there is room in data, read into it from the console */if (100-dataLen > 1){/* POTENTIAL FLAW: Read data from the console */if (fgets(data+dataLen, (int)(100-dataLen), stdin) != NULL){/* The next few lines remove the carriage return from the string that is* inserted by fgets() */dataLen = strlen(data);if (dataLen > 0 && data[dataLen-1] == '\n'){data[dataLen-1] = '\0';}}else{printLine("fgets() failed");/* Restore NUL terminator if fgets fails */data[dataLen] = '\0';}}}break;default:/* INCIDENTAL: CWE 561 Dead Code, the code below will never run */printLine("Benign, fixed string");break;}{HMODULE hModule;/* POTENTIAL FLAW: If the path to the library is not specified, an attacker may be able to* replace his own file with the intended library */hModule = LoadLibraryA(data);if (hModule != NULL){FreeLibrary(hModule);printLine("Library loaded and freed successfully");}else{printLine("Unable to load library");}}
}

第一处漏洞是使用fgets从控制台读取数据，并将data+dataLen作为要读入的缓冲区。这意味着程序将从data中已有字符串的末尾开始继续读取输入。如果攻击者提供的输入长度超过了剩余的缓冲区空间，可能会发生缓冲区溢出，覆盖相邻的内存，从而导致任意代码执行。

第二处漏洞是程序尝试使用LoadLibraryA函数加载一个库，传递data的内容作为库名称。如果库成功加载，就使用FreeLibrary释放它，并打印成功消息。然而，如果攻击者控制输入，他们可以操纵data的值，可能导致加载和执行恶意库。

PS：一段代码有多处漏洞或者多行漏洞，这也是漏洞检测的一个处理难点。

063\002

这一类代码是论文中最喜欢拿来举例的，非常简单直观，给人一种一看就会的错觉，导致很多时候不到自己动手复现时，体会不到问题的复杂性。

void CWE121_Stack_Based_Buffer_Overflow__CWE193_char_alloca_cpy_07_bad()
{char * data;char * dataBadBuffer = (char *)ALLOCA((10)*sizeof(char));char * dataGoodBuffer = (char *)ALLOCA((10+1)*sizeof(char));if(staticFive==5){/* FLAW: Set a pointer to a buffer that does not leave room for a NULL terminator when performing* string copies in the sinks  */data = dataBadBuffer;data[0] = '\0'; /* null terminate */}{char source[10+1] = SRC_STRING;/* POTENTIAL FLAW: data may not have enough space to hold source */strcpy(data, source);printLine(data);}
}

漏洞为strcpy函数不会检查目标缓冲区的大小，而是简单地将源字符串复制到目标缓冲区，如果目标缓冲区不足以容纳源字符串，就会导致缓冲区溢出。

064\004

void CWE121_Stack_Based_Buffer_Overflow__CWE805_char_alloca_ncpy_13_bad()
{char * data;char * dataBadBuffer = (char *)ALLOCA(50*sizeof(char));char * dataGoodBuffer = (char *)ALLOCA(100*sizeof(char));if(GLOBAL_CONST_FIVE==5){/* FLAW: Set a pointer to a "small" buffer. This buffer will be used in the sinks as a destination* buffer in various memory copying functions using a "large" source buffer. */data = dataBadBuffer;data[0] = '\0'; /* null terminate */}{char source[100];memset(source, 'C', 100-1); /* fill with 'C's */source[100-1] = '\0'; /* null terminate *//* POTENTIAL FLAW: Possible buffer overflow if the size of data is less than the length of source */strncpy(data, source, 100-1);data[100-1] = '\0'; /* Ensure the destination buffer is null terminated */printLine(data);}
}

使用strncpy函数将source的内容复制到data指向的缓冲区中。这里存在一个潜在的漏洞，因为strncpy函数在复制字符串时，会将指定长度的字符复制到目标缓冲区，但如果源字符串的长度超过了指定的长度，strncpy不会自动添加终止符。这可能导致目标缓冲区没有终止符，从而导致缓冲区溢出。

065\000

void CWE121_Stack_Based_Buffer_Overflow__CWE805_struct_declare_memmove_09_bad()
{twoIntsStruct * data;twoIntsStruct dataBadBuffer[50];twoIntsStruct dataGoodBuffer[100];if(GLOBAL_CONST_TRUE){/* FLAW: Set a pointer to a "small" buffer. This buffer will be used in the sinks as a destination* buffer in various memory copying functions using a "large" source buffer. */data = dataBadBuffer;}{twoIntsStruct source[100];{size_t i;/* Initialize array */for (i = 0; i < 100; i++){source[i].intOne = 0;source[i].intOne = 0;}}/* POTENTIAL FLAW: Possible buffer overflow if data < 100 */memmove(data, source, 100*sizeof(twoIntsStruct));printStructLine(&data[0]);}
}

使用memmove函数将source数组的内容复制到data指向的缓冲区中。这里存在一个潜在的漏洞，因为memmove函数在复制内存块时，需要确保目标缓冲区的大小足够容纳源内存块，否则可能导致缓冲区溢出。

066\009

void CWE121_Stack_Based_Buffer_Overflow__CWE806_char_declare_ncpy_18_bad()
{char * data;char dataBuffer[100];data = dataBuffer;goto source;
source:/* FLAW: Initialize data as a large buffer that is larger than the small buffer used in the sink */memset(data, 'A', 100-1); /* fill with 'A's */data[100-1] = '\0'; /* null terminate */{char dest[50] = "";/* POTENTIAL FLAW: Possible buffer overflow if data is larger than dest */strncpy(dest, data, strlen(data));dest[50-1] = '\0'; /* Ensure the destination buffer is null terminated */printLine(data);}
}

在代码中，首先通过 memset 函数将 data 指向的缓冲区填充为大量的字符'A'，长度为100-1。然后，在使用 strncpy 函数将 data 的内容复制到 dest 缓冲区时，使用了 strlen 函数来计算 data 的长度。然而，由于 data 被填充为100个字符，而 dest 只有50个字符的空间，因此 strncpy 函数会发生缓冲区溢出。

067\005

void bad()
{char * data;unionType myUnion;char * dataBadBuffer = (char *)ALLOCA(sizeof(OneIntClass));char * dataGoodBuffer = (char *)ALLOCA(sizeof(TwoIntsClass));/* POTENTIAL FLAW: Initialize data to a buffer smaller than the sizeof(TwoIntsClass) */data = dataBadBuffer;myUnion.unionFirst = data;{char * data = myUnion.unionSecond;{/* The Visual C++ compiler generates a warning if you initialize the class with ().* This will cause the compile to default-initialize the object.* See http://msdn.microsoft.com/en-us/library/wewb47ee%28v=VS.100%29.aspx*//* POTENTIAL FLAW: data may not be large enough to hold a TwoIntsClass */TwoIntsClass * classTwo = new(data) TwoIntsClass;/* Initialize and make use of the class */classTwo->intOne = 5;classTwo->intTwo = 10; /* POTENTIAL FLAW: If sizeof(data) < sizeof(TwoIntsClass) then this line will be a buffer overflow */printIntLine(classTwo->intOne);/* skip printing classTwo->intTwo since that could be a buffer overread */}}
}

潜在漏洞1：这段代码的潜在漏洞在于在初始化data指针时，将其设置为dataBadBuffer，而dataBadBuffer的大小没有足够容纳sizeof(TwoIntsClass)个字节的数据。

潜在漏洞2：data可能不足以容纳TwoIntsClass类的对象.如果sizeof(data) < sizeof(TwoIntsClass)，那么在代码中，将10赋值给classTwo->intTwo时，会发生缓冲区溢出。

068\001

void bad()
{char * data;data = NULL;if(1){/* FLAW: Did not leave space for a null terminator */data = new char[10];}{char source[10+1] = SRC_STRING;/* Copy length + 1 to include NUL terminator from source *//* POTENTIAL FLAW: data may not have enough space to hold source */strncpy(data, source, strlen(source) + 1);printLine(data);delete [] data;}
}

使用strncpy()函数将source数组中的内容复制到data指针指向的数组中。strncpy()函数是按照指定的第三个参数进行复制的，即字符串的长度。在这里，传递给strncpy()函数的长度为strlen(source) + 1，其中strlen(source)是源字符串的长度，+ 1用于包含空字符’\0’。由于没有确保data数组有足够的空间来容纳source字符串及其结束符，这可能导致缓冲区溢出。

虽然在代码的最后使用delete [] data来释放了动态分配的内存，但由于存在缓冲区溢出，这里的data指针可能已经写入了超出其分配空间的数据，从而导致未定义的行为。

069\012

void bad()
{wchar_t * data;data = NULL;if(staticTrue){/* FLAW: Allocate using new[] and point data to a small buffer that is smaller than the large buffer used in the sinks */data = new wchar_t[50];data[0] = L'\0'; /* null terminate */}{size_t i;wchar_t source[100];wmemset(source, L'C', 100-1); /* fill with L'C's */source[100-1] = L'\0'; /* null terminate *//* POTENTIAL FLAW: Possible buffer overflow if source is larger than data */for (i = 0; i < 100; i++){data[i] = source[i];}data[100-1] = L'\0'; /* Ensure the destination buffer is null terminated */printWLine(data);delete [] data;}
}

在代码中，首先通过 new 操作符分配了一个长度为50的 wchar_t 类型的动态数组，并将其地址赋给 data 指针。然后，通过 wmemset 函数将 source 数组填充为大量的字符'C'，长度为100-1。接下来，在 for 循环中，将 source 数组中的内容逐个复制到 data 数组中。然而，由于 data 数组的长度只有50个元素，而 source 数组有100个元素，因此在复制过程中会发生堆溢出。

070\001

void bad()
{char * data;data = NULL;data = badSource(data);{char source[100];memset(source, 'C', 100-1); /* fill with 'C's */source[100-1] = '\0'; /* null terminate *//* POTENTIAL FLAW: Possible buffer overflow if source is larger than sizeof(data)-strlen(data) */strcat(data, source);printLine(data);delete [] data;}
}

strcat函数被用来将source字符串追加到data所指向的字符串中。然而，data在一开始被设置为NULL，因此使用strcat函数连接字符串时，会导致未定义的行为。strcat函数需要操作的目标字符串必须是以空字符结尾的有效字符串，但是data指针在此时并没有指向有效的内存区域，因此将数据追加到其中会导致缓冲区溢出。

漏洞的根本原因在于没有为data指针分配足够的内存空间来容纳源字符串source的内容。解决该问题的方法是，在使用strcat函数之前，需要为data指针分配足够的内存，并确保其能够容纳源字符串的内容和额外的终止空字符。

071\002

#define SRC_STRING L"AAAAAAAAAA"
static void badSource(wchar_t * &data)
{/* FLAW: Did not leave space for a null terminator */data = (wchar_t *)malloc(10*sizeof(wchar_t));
}void bad()
{wchar_t * data;data = NULL;badSource(data);{wchar_t source[10+1] = SRC_STRING;/* POTENTIAL FLAW: data may not have enough space to hold source */wcscpy(data, source);printWLine(data);free(data);}
}

在代码中，首先定义了一个名为 badSource 的函数，该函数通过调用 malloc 函数为 data 指针分配了10个 wchar_t 类型的内存空间。然而，由于没有为 data 留出空间来存储字符串的结尾标志符（null terminator），即 \0 ，因此在将字符串复制到 data 指向的内存区域时，可能会超出分配的内存空间。

072\001

void CWE122_Heap_Based_Buffer_Overflow__c_CWE805_wchar_t_memcpy_18_bad()
{wchar_t * data;data = NULL;goto source;
source:/* FLAW: Allocate and point data to a small buffer that is smaller than the large buffer used in the sinks */data = (wchar_t *)malloc(50*sizeof(wchar_t));data[0] = L'\0'; /* null terminate */{wchar_t source[100];wmemset(source, L'C', 100-1); /* fill with L'C's */source[100-1] = L'\0'; /* null terminate *//* POTENTIAL FLAW: Possible buffer overflow if source is larger than data */memcpy(data, source, 100*sizeof(wchar_t));data[100-1] = L'\0'; /* Ensure the destination buffer is null terminated */printWLine(data);free(data);}
}

在代码中，首先将 data 指针初始化为NULL，并通过 malloc 函数为其分配了50个 wchar_t 类型的内存空间。然后，在 source 标签处定义了一个名为 source 的数组，大小为100个 wchar_t 。接下来，使用 wmemset 函数将 source 数组填充为L'C'字符，并在末尾添加了结尾标志符（null terminator）。然后，使用 memcpy 函数将 source 数组中的内容复制到 data 指向的内存区域中。然而，由于 data 指向的内存空间只有50个 wchar_t ，而 source 数组有100个 wchar_t ，因此在复制过程中会发生堆缓冲区溢出，可能覆盖其他内存区域的内容，并且可能被恶意攻击者利用来执行恶意代码或者导致程序崩溃。

073\003

void CWE122_Heap_Based_Buffer_Overflow__c_src_char_cat_12_bad()
{char * data;data = (char *)malloc(100*sizeof(char));if(globalReturnsTrueOrFalse()){/* FLAW: Initialize data as a large buffer that is larger than the small buffer used in the sink */memset(data, 'A', 100-1); /* fill with 'A's */data[100-1] = '\0'; /* null terminate */}else{/* FIX: Initialize data as a small buffer that as small or smaller than the small buffer used in the sink */memset(data, 'A', 50-1); /* fill with 'A's */data[50-1] = '\0'; /* null terminate */}{char dest[50] = "";/* POTENTIAL FLAW: Possible buffer overflow if data is larger than sizeof(dest)-strlen(dest)*/strcat(dest, data);printLine(data);free(data);}
}

在代码中，首先使用 malloc 函数为 data 指针分配了100个 char 类型的内存空间。然后，通过 globalReturnsTrueOrFalse 函数的返回值判断分支，如果返回值为真，则将 data 数组填充为'A'字符，并在末尾添加了结尾标志符（null terminator）。如果返回值为假，则将 data 数组填充为'A'字符，并在末尾添加了结尾标志符（null terminator）。接下来，定义了一个名为 dest 的数组，大小为50个 char 。然后，使用 strcat 函数将 data 数组中的内容追加到 dest 数组中。然而，由于 data 数组的大小可能大于 dest 数组的剩余空间，因此在追加过程中可能发生堆缓冲区溢出，覆盖其他内存区域的内容。

074\000

void CWE122_Heap_Based_Buffer_Overflow__c_src_char_cat_12_bad()
{char * data;data = (char *)malloc(100*sizeof(char));if(globalReturnsTrueOrFalse()){/* FLAW: Initialize data as a large buffer that is larger than the small buffer used in the sink */memset(data, 'A', 100-1); /* fill with 'A's */data[100-1] = '\0'; /* null terminate */}else{/* FIX: Initialize data as a small buffer that as small or smaller than the small buffer used in the sink */memset(data, 'A', 50-1); /* fill with 'A's */data[50-1] = '\0'; /* null terminate */}{char dest[50] = "";/* POTENTIAL FLAW: Possible buffer overflow if data is larger than sizeof(dest)-strlen(dest)*/strcat(dest, data);printLine(data);free(data);}
}

在代码中，首先定义了一个名为 dataBuffer 的数组，大小为100个 char 。然后，使用 memset 函数将 dataBuffer 数组填充为'A'字符，并在末尾添加了结尾标志符（null terminator）。接下来，将 data 指针设置为 dataBuffer 数组的起始位置减去8个字节。然后，通过一个嵌套的代码块，将 data 指针赋值给了另外一个名为 data 的局部变量。在这个局部变量的作用域中，定义了一个名为 source 的数组，大小也为100个 char 。然后，使用 memset 函数将 source 数组填充为'C'字符，并在末尾添加了结尾标志符（null terminator）。接下来，使用一个循环将 source 数组中的内容复制到 data 指针指向的内存位置。然而，由于 data 指针被设置为在已分配内存缓冲区之前的位置上，因此在复制过程中可能会发生缓冲区下溢，覆盖其他内存区域的内容。

075\004

#ifndef OMITBAD#include "std_testcase.h"
#include "CWE124_Buffer_Underwrite__new_wchar_t_cpy_83.h"namespace CWE124_Buffer_Underwrite__new_wchar_t_cpy_83
{
CWE124_Buffer_Underwrite__new_wchar_t_cpy_83_bad::CWE124_Buffer_Underwrite__new_wchar_t_cpy_83_bad(wchar_t * dataCopy)
{data = dataCopy;{wchar_t * dataBuffer = new wchar_t[100];wmemset(dataBuffer, L'A', 100-1);dataBuffer[100-1] = L'\0';/* FLAW: Set data pointer to before the allocated memory buffer */data = dataBuffer - 8;}
}CWE124_Buffer_Underwrite__new_wchar_t_cpy_83_bad::~CWE124_Buffer_Underwrite__new_wchar_t_cpy_83_bad()
{{wchar_t source[100];wmemset(source, L'C', 100-1); /* fill with 'C's */source[100-1] = L'\0'; /* null terminate *//* POTENTIAL FLAW: Possibly copying data to memory before the destination buffer */wcscpy(data, source);printWLine(data);/* INCIDENTAL CWE-401: Memory Leak - data may not point to location* returned by new [] so can't safely call delete [] on it */}
}
}
#endif /* OMITBAD */

在代码中，首先定义了一个名为 dataBuffer 的 wchar_t 类型的数组，大小为100个元素。然后，使用 wmemset 函数将 dataBuffer 数组填充为'A'字符，并在末尾添加了结尾标志符（null terminator）。接下来，将 data 指针设置为 dataBuffer 数组的起始位置减去8个 wchar_t 的字节。然后，在析构函数中定义了一个名为 source 的 wchar_t 类型的数组，大小也为100个元素。然后，使用 wmemset 函数将 source 数组填充为'C'字符，并在末尾添加了结尾标志符（null terminator）。接下来，使用 wcscpy 函数将 source 数组中的内容复制到 data 指针指向的内存位置。然而，由于 data 指针被设置为在已分配内存缓冲区之前的位置上，因此在复制过程中可能会发生缓冲区下溢，覆盖其他内存区域的内容。

076\000

#ifndef OMITBADstatic void badSink()
{char * data = CWE126_Buffer_Overread__char_alloca_loop_45_badData;{size_t i, destLen;char dest[100];memset(dest, 'C', 100-1);dest[100-1] = '\0'; /* null terminate */destLen = strlen(dest);/* POTENTIAL FLAW: using length of the dest where data* could be smaller than dest causing buffer overread */for (i = 0; i < destLen; i++){dest[i] = data[i];}dest[100-1] = '\0';printLine(dest);}
}void CWE126_Buffer_Overread__char_alloca_loop_45_bad()
{char * data;char * dataBadBuffer = (char *)ALLOCA(50*sizeof(char));char * dataGoodBuffer = (char *)ALLOCA(100*sizeof(char));memset(dataBadBuffer, 'A', 50-1); /* fill with 'A's */dataBadBuffer[50-1] = '\0'; /* null terminate */memset(dataGoodBuffer, 'A', 100-1); /* fill with 'A's */dataGoodBuffer[100-1] = '\0'; /* null terminate *//* FLAW: Set data pointer to a small buffer */data = dataBadBuffer;CWE126_Buffer_Overread__char_alloca_loop_45_badData = data;badSink();
}#endif /* OMITBAD */

首先声明了两个字符指针变量 dataBadBuffer 和 dataGoodBuffer，并使用 ALLOCA 宏动态分配了两个缓冲区，分别大小为 50 字节和 100 字节。这些缓冲区被填充为 'A' 字符，并以 '\0' 字符结尾，形成两个字符串。
接下来，将 dataBadBuffer 赋值给指针变量 data。这就是一个潜在的缓冲区溢出漏洞，因为 data 指针指向一个只有 50 字节大小的缓冲区。
进入 badSink() 函数，声明了一个名为 dest 的字符数组，大小为 100 字节。数组被使用字符 'C' 进行填充，并在最后添加 '\0' 字符，形成一个字符串。
获取 dest 字符串的长度，并将其存储在 destLen 变量中。
在一个 for 循环中，将 data 指针所指向的缓冲区的数据复制到 dest 数组中。循环的迭代次数是 destLen，也就是 dest 字符串的长度。

问题出现在这里，当 destLen 大于 50，也就是 data 指向的缓冲区大小时，会发生缓冲区溢出。因为 data 指向的缓冲区只有 50 字节大小，但是循环尝试复制 destLen 个字节的数据，可能会导致超出缓冲区边界的访问。

077\013

void CWE126_Buffer_Overread__wchar_t_declare_memmove_04_bad()
{wchar_t * data;wchar_t dataBadBuffer[50];wchar_t dataGoodBuffer[100];wmemset(dataBadBuffer, L'A', 50-1); /* fill with 'A's */dataBadBuffer[50-1] = L'\0'; /* null terminate */wmemset(dataGoodBuffer, L'A', 100-1); /* fill with 'A's */dataGoodBuffer[100-1] = L'\0'; /* null terminate */if(STATIC_CONST_TRUE){/* FLAW: Set data pointer to a small buffer */data = dataBadBuffer;}{wchar_t dest[100];wmemset(dest, L'C', 100-1);dest[100-1] = L'\0'; /* null terminate *//* POTENTIAL FLAW: using memmove with the length of the dest where data* could be smaller than dest causing buffer overread */memmove(dest, data, wcslen(dest)*sizeof(wchar_t));dest[100-1] = L'\0';printWLine(dest);}
}

首先声明了三个变量：data 是 wchar_t 类型的指针，dataBadBuffer 是 wchar_t 类型的数组（大小为 50），dataGoodBuffer 是 wchar_t 类型的数组（大小为 100）。这些数组都被填充为 'A' 字符，并以 L'\0' 字符结尾，形成了两个字符串。
通过 if 语句，将 dataBadBuffer 赋值给指针变量 data。这就是一个潜在的缓冲区溢出漏洞，因为 data 指针指向了一个只有 50 个 wchar_t 元素的缓冲区。
声明了一个名为 dest 的 wchar_t 数组，大小为 100。数组被填充为 'C' 字符，并以 L'\0' 字符结尾，形成了一个字符串。
使用 wcslen(dest) 得到 dest 字符串的长度，并将其乘以 sizeof(wchar_t)。这个结果表示需要移动的字节数。
使用 memmove 函数将 data 指针所指向的缓冲区的内容复制到 dest 数组中。memmove 的长度参数是通过将 dest 字符串的长度乘以 sizeof(wchar_t) 得到的字节数。

问题出现在这里，当 dest 字符串的长度大于 data 缓冲区的大小时，会发生缓冲区溢出。因为 data 指针指向的缓冲区只有 50 个 wchar_t 元素，但 memmove 将尝试复制 dest 字符串的长度乘以 sizeof(wchar_t) 的字节数的数据，可能导致超出缓冲区边界的访问。

078\003

#ifndef OMITBADstatic wchar_t * badSource(wchar_t * data)
{{wchar_t * dataBuffer = (wchar_t *)malloc(100*sizeof(wchar_t));wmemset(dataBuffer, L'A', 100-1);dataBuffer[100-1] = L'\0';/* FLAW: Set data pointer to before the allocated memory buffer */data = dataBuffer - 8;}return data;
}void CWE127_Buffer_Underread__malloc_wchar_t_cpy_42_bad()
{wchar_t * data;data = NULL;data = badSource(data);{wchar_t dest[100*2];wmemset(dest, L'C', 100*2-1); /* fill with 'C's */dest[100*2-1] = L'\0'; /* null terminate *//* POTENTIAL FLAW: Possibly copy from a memory location located before the source buffer */wcscpy(dest, data);printWLine(dest);/* INCIDENTAL CWE-401: Memory Leak - data may not point to location* returned by malloc() so can't safely call free() on it */}
}#endif /* OMITBAD */

首先定义了一个名为 badSource 的静态函数，它接受一个 wchar_t 类型的指针 data 作为参数，并返回一个 wchar_t 类型的指针。
在 badSource 函数内部，通过调用 malloc 分配了一个大小为 100*sizeof(wchar_t) 的内存块，并将返回的指针赋给 dataBuffer。
dataBuffer 数组被填充为 'A' 字符，并以 L'\0' 字符结尾，形成了一个字符串。
这个函数的问题所在。将 data 指针设置为指向分配的内存块之前 8 个字节的位置。这是一个潜在的缓冲区欠读漏洞，因为 data 指针指向了分配的内存块之前的位置，如果在后续的复制操作中，并且源数据的开头位置之前的内存块中放置了一些数据，那么这些数据可能会被复制到目标缓冲区中，导致信息泄露或未定义的行为。

079\001

#ifndef OMITBADstatic void badSink(wchar_t * data)
{{wchar_t dest[100];wmemset(dest, L'C', 100-1); /* fill with 'C's */dest[100-1] = L'\0'; /* null terminate *//* POTENTIAL FLAW: Possibly copy from a memory location located before the source buffer */memcpy(dest, data, 100*sizeof(wchar_t));/* Ensure null termination */dest[100-1] = L'\0';printWLine(dest);}
}void CWE127_Buffer_Underread__wchar_t_declare_memcpy_44_bad()
{wchar_t * data;/* define a function pointer */void (*funcPtr) (wchar_t *) = badSink;wchar_t dataBuffer[100];wmemset(dataBuffer, L'A', 100-1);dataBuffer[100-1] = L'\0';/* FLAW: Set data pointer to before the allocated memory buffer */data = dataBuffer - 8;/* use the function pointer */funcPtr(data);
}#endif /* OMITBAD */

在这段代码中，存在一个缓冲区下读（Buffer Underread）的漏洞。具体来说，代码中定义了一个名为badSink的函数，该函数接受一个wchar_t类型的指针data作为参数。在badSink函数中，声明了一个大小为100的wchar_t数组dest，并将其初始化为100个连续的字符'C'，并在最后一个位置添加了一个空字符'\0'。问题出现在接下来的代码中，通过使用memcpy函数将data指针指向的内存内容复制到dest数组中。然而，由于在之前的代码中将data指针设置为dataBuffer的前8个位置，即data = dataBuffer - 8，这导致memcpy函数可能会从源缓冲区之前的内存位置复制数据，这是一个潜在的缺陷。这个缺陷可能导致程序读取未分配给data指针的内存，从而可能导致内存访问错误、程序崩溃或者泄露敏感信息。

080\017

void CWE134_Uncontrolled_Format_String__char_file_printf_08_bad()
{char * data;char dataBuffer[100] = "";data = dataBuffer;if(staticReturnsTrue()){{/* Read input from a file */size_t dataLen = strlen(data);FILE * pFile;/* if there is room in data, attempt to read the input from a file */if (100-dataLen > 1){pFile = fopen(FILENAME, "r");if (pFile != NULL){/* POTENTIAL FLAW: Read data from a file */if (fgets(data+dataLen, (int)(100-dataLen), pFile) == NULL){printLine("fgets() failed");/* Restore NUL terminator if fgets fails */data[dataLen] = '\0';}fclose(pFile);}}}}if(staticReturnsTrue()){/* POTENTIAL FLAW: Do not specify the format allowing a possible format string vulnerability */printf(data);}
}

在该函数中，声明了一个char类型的指针data，并初始化一个大小为100的char数组dataBuffer，并将data指针指向dataBuffer数组的首地址。在代码的后续部分，通过调用staticReturnsTrue函数来判断是否执行下面的代码块。在这个代码块中，首先计算了data指针指向的字符串的长度，并将结果存储在dataLen变量中。然后，尝试从文件中读取输入数据。如果data数组中还有足够的空间，就尝试从文件中读取数据，并将其追加到data数组的末尾。这里存在一个潜在的缺陷，即未对从文件中读取的数据进行格式化字符串的处理和验证。接下来，如果再次调用staticReturnsTrue函数返回true，则会执行下面的代码块。在这个代码块中，直接使用printf函数打印data指针指向的字符串，而没有指定格式化字符串的格式。这就可能导致格式化字符串漏洞，攻击者可以通过构造恶意的格式化字符串来读取、修改或执行任意内存位置的数据。

PS：格式化字符串漏洞（Format String Vulnerability）是一种常见的安全漏洞，存在于使用格式化字符串函数（如printf、sprintf、fprintf等）时未正确处理用户提供的格式化字符串的情况下。格式化字符串函数允许将变量的值以指定的格式输出到字符串中。然而，如果用户提供的格式化字符串中包含特殊的格式控制符（如"%s"、"%d"等），并且未正确处理这些格式控制符，那么可能导致程序执行意外的操作。

（未完待续......）

SySeVR中数据集漏洞类型以及对应漏洞代码位置分析相关推荐

C++中string字符串类型详解及常见方法分析
因为C中除了字符数组以外没有直接与字符串相关的变量类型,在处理一些问题的时候不是很方便.所以C++提供新的数据类型--字符串类型(string类型),在使用方法上,它和char､int类型一样,可以用 ...
NetSarang软件中nssock2.dll模块被植入恶意代码技术分析与防护方案
NetSarang是一家提供安全连接解决方案的公司,该公司的产品主要包括Xmanager, Xmanager 3D, Xshell, Xftp 和Xlpd.最近,官方在2017年7月18日发布的软件被 ...
java byte转int原理_java中int与byte数组互转代码详细分析
在java中,可能会遇到将int转成byte[]数组,或者将byte[]数组转成int的情况.下面我们来思考下怎么实现? 首先,分析int在java内存中的存储格式. 众所周知,int类型在内存中占4 ...
php 静态类内存,php面向对象中static静态属性与方法的内存位置分析
本文实例分析了php面向对象中static静态属性与方法的内存位置.分享给大家供大家参考.具体如下: static静态属性的内存位置-->类,而不是对象.下面做测试来证明一下 header(&q ...
sonar 规则之漏洞类型
漏洞类型: 1."@RequestMapping" methods should be "public" 漏洞阻断标注了RequestMapping是con ...
常见Web安全漏洞类型
阅读文本大概需要3分钟. 为了对Web安全有个整体的认识,整理一下常见的Web安全漏洞类型,主要参考于OWASP组织历年来所研究发布的项目文档. 01:注入漏洞 1)SQL注入(SQL Injecti ...
Python中的10个常见安全漏洞及修复方法
Python中的10个常见安全漏洞及修复方法写安全的代码很困难,当你学习一门编程语言.一个模块或框架时,你会学习其使用方法.在考虑安全性时,你需要考虑如何避免代码被滥用,Python也不例外,即使在 ...
python中常见的漏洞_注意！Python中的10个常见安全漏洞及修复方法
原标题:注意!Python中的10个常见安全漏洞及修复方法源 /Python程序员编写安全的代码很困难,当你学习一门编程语言.一个模块或框架时,你会学习其使用方法.在考虑安全性时,你需要考虑如何避 ...
ocx控件 postmessage消息会消失_通过HackerOne漏洞报告学习PostMessage漏洞实战场景中的利用与绕过...
0x00 前言这是一篇关于postMessage漏洞分析的文章,主要通过hackerone平台披露的Bug Bounty报告,学习和分析postMessage漏洞如何在真实的场景中得到利用的. 0x ...

SySeVR中数据集漏洞类型以及对应漏洞代码位置分析