前置知识

抽象语法树

基本介绍

AST(Abstract Syntax Tree)抽象语法树,当你有一段源代码的时候,是用于表示该源代码的抽象语法结构的树状图。对于不同的语言,有不同的抽象语法树结构,比如说C语言或者C++所使用的抽象语法树就和python的不一样。
类似于如果有这样一段源码:

#include<stdio.h>
int func(int a,int b)
{   int i;int c = 0;for(i=a;i<=b;i++){c+=i;}return c;
}
int main()
{int res = func(1,100);printf("res = %d\n",res);return 0;
}

用树状图分析:

#mermaid-svg-xVqdmEAcuVZAcggY {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-xVqdmEAcuVZAcggY .error-icon{fill:#552222;}#mermaid-svg-xVqdmEAcuVZAcggY .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-xVqdmEAcuVZAcggY .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-xVqdmEAcuVZAcggY .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-xVqdmEAcuVZAcggY .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-xVqdmEAcuVZAcggY .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-xVqdmEAcuVZAcggY .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-xVqdmEAcuVZAcggY .marker{fill:#333333;stroke:#333333;}#mermaid-svg-xVqdmEAcuVZAcggY .marker.cross{stroke:#333333;}#mermaid-svg-xVqdmEAcuVZAcggY svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-xVqdmEAcuVZAcggY .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-xVqdmEAcuVZAcggY .cluster-label text{fill:#333;}#mermaid-svg-xVqdmEAcuVZAcggY .cluster-label span{color:#333;}#mermaid-svg-xVqdmEAcuVZAcggY .label text,#mermaid-svg-xVqdmEAcuVZAcggY span{fill:#333;color:#333;}#mermaid-svg-xVqdmEAcuVZAcggY .node rect,#mermaid-svg-xVqdmEAcuVZAcggY .node circle,#mermaid-svg-xVqdmEAcuVZAcggY .node ellipse,#mermaid-svg-xVqdmEAcuVZAcggY .node polygon,#mermaid-svg-xVqdmEAcuVZAcggY .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-xVqdmEAcuVZAcggY .node .label{text-align:center;}#mermaid-svg-xVqdmEAcuVZAcggY .node.clickable{cursor:pointer;}#mermaid-svg-xVqdmEAcuVZAcggY .arrowheadPath{fill:#333333;}#mermaid-svg-xVqdmEAcuVZAcggY .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-xVqdmEAcuVZAcggY .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-xVqdmEAcuVZAcggY .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-xVqdmEAcuVZAcggY .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-xVqdmEAcuVZAcggY .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-xVqdmEAcuVZAcggY .cluster text{fill:#333;}#mermaid-svg-xVqdmEAcuVZAcggY .cluster span{color:#333;}#mermaid-svg-xVqdmEAcuVZAcggY div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-xVqdmEAcuVZAcggY :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}

Program
FunctionDeclaration
CompoundStatement1
CompoundStatement2
VarDecl
BinaryOperation
DeclRefExpr
ReturnStmt
VarDecl
CallExpr
ReturnStmt
DeclStmt

clang工具直接忽略标准头文件分析:

clang -Xclang -ast-dump -fsyntax-only -nostdinc test.c

得到分析结果如下图,然后这个结构是一整棵树。

用法

得到的树状图上有很多信息,标识了函数的类型,参数和参数类型,变量,变量类型等(有些只有一点信息,所以需要利用不同的分析工具来对比效果)。
这些数据可以用来分析函数结构、跳转函数、函数漏洞具体分析等。

LLVM

基本介绍

这里我直接拿来一个知乎答案:
LLVM是一个编译器框架。LLVM作为编译器框架,是需要各种功能模块支撑起来的,你可以将clang和lld都看做是LLVM的组成部分,框架的意思是,你可以基于LLVM提供的功能开发自己的模块,并集成在LLVM系统上,增加它的功能,或者就单纯自己开发软件工具,而利用LLVM来支撑底层实现。LLVM由一些库和工具组成,正因为它的这种设计思想,使它可以很容易和IDE集成(因为IDE软件可以直接调用库来实现一些如静态检查这些功能),也很容易构建生成各种功能的工具(因为新的工具只需要调用需要的库就行)。
这里是具体介绍。
因为我们需要使用它的接口,所以需要提前安装它和与python的第三方接口库。

整包安装方式

从这个网址上直接下载windows64版本,因为我用的win11
并且把这个路径加入环境变量里面:???/???/bin/libclang.dll

python接口

pip install clang

正式开始

一般使用

词法分析

这里可以直接用于一般词法分析,就是把每个词分出来,但并不会生成行和类型分析。

from clang.cindex import Index, Config, CursorKind, TypeKind
libclangPath = r"???\???\LLVM\bin\libclang.dll"
#这个路径需要自己先在笔记本上安装
Config.set_library_file(libclangPath)
file_path_ = r"your_file_path"
index = Index.create()
tu = index.parse(file_path_)
AST_root_node = tu.cursor  #cursor根节点
# 词法分析
cursor_content = ""
for token in AST_root_node.get_tokens():
# 针对根节点,调用get_tokens方法。print(token.spelling)# 相当于分离这个节点上的spelling属性 就是它的内容

这里只是最基础的分析,并没有涉及到复杂属性的筛选和区分,所以很简单,就是用于讲解的,真正的分词,可以使用定制化工具ctags来分析变量和函数,这样不仅可以知道函数的类型和变量的类型,还能知道它们所位于源代码里的具体位置,并且能告知是否是全局还是局部属性。

json生成

这里是筛选了比较多的节点上的属性,并且把它们整合到一个json文件里,如果它们是空,则代表它们可能是操作运算符或者某些关键字。

import json
from clang.cindex import Index, Config, CursorKind
class AST_Tree_json:def __init__(self, absolute_path):self.absolute_path = absolute_pathself.clang_path = r'??\???\LLVM\bin\libclang.dll'Config.set_library_file(self.clang_path)self.AST_Root = Index.create().parse(absolute_path).cursordef serialize_node(self, cursor):node_dict = {"kind": str(cursor.kind),"location": [cursor.extent.start.line, cursor.extent.start.column,cursor.extent.end.line, cursor.extent.end.column],"children": []}if cursor.spelling:node_dict["spelling"] = cursor.spellingprint('keywords: ', cursor.spelling)print('location: ', cursor.extent.start.line, cursor.extent.start.column,cursor.extent.end.line, cursor.extent.end.column)for child in cursor.get_children():child_dict = self.serialize_node(child)node_dict["children"].append(child_dict)return node_dictdef start(self):string_res = self.serialize_node(self.AST_Root)serialized_json = json.dumps(string_res, indent=4, ensure_ascii=False)import timelocal_time = time.localtime()date_time = time.strftime("%Y_%m_%d_%H_%M_%S", local_time)with open('./res_{}.json'.format(date_time),'w', encoding='utf-8') as file:file.write(serialized_json)file.close()# 虽然但是它能识别[]{};+-=,不能获取它们的标识符....而且获取不到值....# print(serialized_json)if __name__ == '__main__':path = r'your_file_path'ast_obj = AST_Tree_json(path)ast_obj.start()

虽然能够生成json文件,但是仍然能力有限,对于特殊字符的过滤并没有过滤出来。但是基本已经能生成较为详细的json文件内容,包含有内容的扫出来的所有属性的节点以及它们的所在的具体位置。
(start_line, start_column, end_line, end_column)指的是出现的(起始行, 起始列, 结束行,结束列)的位置。想要出具体位置的字符,则可能需要读取源代码片段并取它们的位置,并记录。

定制化使用

针对函数分析:

  • 函数语句类型(声明、定义、调用)
  • 函数具体位置
  • 函数声明、定义、调用内容
  • 函数参数和返回值的内容和类型
  • 函数所在文件绝对路径

函数信息类设计

我分别写了几个类用于承接过滤出来的信息。

  • FunctionDeclaration:函数声明信息类
  • FunctionDefinition:函数定义信息类
  • FunctionCallExpress:函数调用信息类
  • FunctionDump:函数数据包装类
  • DefinitionCallExpressCombiner:函数定义调用拼接类
  • SourceInfo:函数数据类
  • FunctionPreprocessor预处理器类

源码部分:

1.FunctionDeclaration

class FunctionDeclaration:def __init__(self, function_name=None, declared_location=None, declared_contents=None, return_types=None,parameter_types=None):self.function_name = function_nameself.declared_location = declared_locationself.declared_contents = declared_contentsself.return_types = return_typesself.parameter_types = parameter_typesself.kind = 'FUNCTION_DELCARATION'def __repr__(self):return f"函数名字: {self.function_name}\n函数语句类别: {self.kind}\n函数声明位置: {self.declared_location}\n" \f"函数参数类型: {self.parameter_types}\n函数返回值类型: {self.return_types}\n"

2.FunctionDefinition

class FunctionDefinition:def __init__(self, function_name=None, definition_location=None, definition_contents=None):self.function_name = function_nameself.definition_location = definition_locationself.definition_contents = definition_contentsself.kind = 'FUNCTION_DEFINITION'def __repr__(self):return f"函数名字: {self.function_name}\n函数语句类别: {self.kind}\n" \f"函数定义位置: {self.definition_location}\n函数定义内容: {self.definition_contents}\n"

3.FunctionCallExpress

class FunctionCallExpress:def __init__(self, function_name=None, call_express_location=None, call_express_contents=None):self.function_name = function_nameself.call_express_location = call_express_locationself.call_express_contents = call_express_contentsself.kind = 'FUNCTION_CALLEXPRESS'def __repr__(self):return f"函数名字: {self.function_name}\n函数语句类别: {self.kind}\n" \f"函数调用位置: {self.call_express_location}\n函数调用内容: {self.call_express_contents}\n"

4.FunctionDump

class FunctionDump:def __init__(self, source_path):self.index = Index.create()self.translation_unit = self.index.parse(source_path)self.root_cursor = self.translation_unit.cursorself.function_declaration_list = []self.function_definition_list = []self.function_callexpress_list = []self.source_path = source_path# 启动函数def analyseLauncher(self):self.analyseRunner(self.root_cursor)# 实施函数def analyseRunner(self, cursor):if cursor.kind == CursorKind.FUNCTION_DECL or cursor.kind == CursorKind.CXX_METHOD:if not cursor.is_definition():name = cursor.spellinglocation = (cursor.extent.start.line, cursor.extent.start.column, cursor.extent.end.line,cursor.extent.end.column)parameter_types = self.get_parameter_types(cursor)return_type = self.get_return_type(cursor)function_declaration = FunctionDeclaration(function_name=name, declared_location=location,declared_contents=self.get_node_contents(cursor),return_types=return_type,parameter_types=parameter_types)self.function_declaration_list.append(function_declaration)definition_cursor = cursor.get_definition()if definition_cursor:definition_location = (definition_cursor.extent.start.line, definition_cursor.extent.start.column,definition_cursor.extent.end.line, definition_cursor.extent.end.column)definition_contents = self.get_node_contents(definition_cursor)function_definition = FunctionDefinition(function_name=definition_cursor.spelling,definition_location=definition_location,definition_contents=definition_contents)self.function_definition_list.append(function_definition)self.check_function_calls(self.root_cursor, cursor.spelling)  # 这句for child in cursor.get_children():self.analyseRunner(child)def check_function_calls(self, cursor, function_name):if cursor.kind == CursorKind.CALL_EXPR and cursor.spelling == function_name:call_location = (cursor.extent.start.line,cursor.extent.start.column,cursor.extent.end.line,cursor.extent.end.column,)call_contents = self.get_node_contents(cursor)  # 获取函数调用语句的内容function_callexpress = FunctionCallExpress(function_name=function_name, call_express_location=call_location,call_express_contents=call_contents)self.function_callexpress_list.append(function_callexpress)for child in cursor.get_children():self.check_function_calls(child, function_name)# 参数类型过滤def get_parameter_types(self, cursor):parameter_types = []for arg in cursor.get_arguments():arg_type = arg.type.spellingparameter_types.append(arg_type)if not parameter_types:return ["void"]  # 返回 "void" 字符串表示无参函数return parameter_types# 返回值过滤def get_return_type(self, cursor):result_type = cursor.typeif cursor.spelling == "main":return "int"elif result_type.kind == TypeKind.FUNCTIONPROTO:  # 除了void以外的类型return_type = result_type.get_result().spellingreturn return_typeelif result_type.kind == TypeKind.FUNCTIONNOPROTO:  # voidreturn_type = result_type.get_result().spellingreturn return_typereturn None# 返回节点内容def get_node_contents(self, cursor):with open(self.source_path, 'r', encoding='utf-8') as file:contents = file.readlines()start_line = cursor.extent.start.line - 1start_column = cursor.extent.start.column - 1end_line = cursor.extent.end.line - 1end_column = cursor.extent.end.column - 1cursor_contents = ""for line in range(start_line, end_line + 1):if line == start_line:cursor_contents += contents[line][start_column:]elif line == end_line:cursor_contents += contents[line][:end_column + 1]else:cursor_contents += contents[line]return cursor_contents# 查找调用函数def show_function_details(self):### 函数声明print('~~函数声明~~')for item in self.function_declaration_list:print(item)print('~~函数定义~~')for item in self.function_definition_list:print(item)print('~~函数调用~~')for item in self.function_callexpress_list:print(item)

5.DefinitionCallExpressCombiner组合器类

# 组合器
class DefinitionCallExpressCombiner:def __init__(self, file_path):self.file_path = file_pathself.main_sign = Noneself.definition_contents = []self.mix_contents = []self.main_length = 0self.offset_length = 0def find_all_files(self, filepath):directory, _ = os.path.split(filepath)file_list = []for root, _, files in os.walk(directory):for file in files:if file.endswith('.c') or file.endswith('.cpp'):file_list.append(os.path.abspath(os.path.join(root, file)))return file_listdef find_all_headers(self, filepath):directory, _ = os.path.split(filepath)file_list = []for root, _, files in os.walk(directory):for file in files:if file.endswith('.h') or file.endswith('.hh'):path = os.path.abspath(os.path.join(root, file))if self.is_defined(path):file_list.append(path)return file_listdef is_defined(self, file_path):with open(file_path, "r") as file:content = file.read()return "{" in content or "}" in contentdef has_main_function(self, file_path):with open(file_path, "r") as file:content = file.read()return "int main(" in contentdef getDefinitionCodes(self):source_files = self.find_all_files(self.file_path)for file_path in source_files:with open(file_path, "r") as file:content = file.readlines()if self.has_main_function(file_path):if self.main_sign is None:self.main_sign = file_pathelse:passelse:if content:last_line = content[-1]pattern = r'.*\n'if re.findall(pattern, last_line):passelse:content[-1] = last_line + '\n'self.definition_contents += contentdef getDefinitionCodes_(self):source_files = self.find_all_files(self.file_path)header_files = self.find_all_headers(self.file_path)for file_path in header_files:with open(file_path, "r") as file:content = file.readlines()if content:last_line = content[-1]pattern = r'.*\n'if re.findall(pattern, last_line):passelse:content[-1] = last_line + '\n'self.definition_contents += contentfor file_path in source_files:with open(file_path, "r") as file:content = file.readlines()if self.has_main_function(file_path):if self.main_sign is None:self.main_sign = file_pathelse:passelse:if content:last_line = content[-1]pattern = r'.*\n'if re.findall(pattern, last_line):passelse:content[-1] = last_line + '\n'self.definition_contents += contentdef Combiner_(self):self.getDefinitionCodes_()path, name = split(self.main_sign)name = '._' + nametemp_path = os.path.join(path, name)with open(self.main_sign, "r", encoding='utf-8') as main_file:main_file_content = main_file.readlines()self.main_length = len(main_file_content)last_line = self.definition_contents[-1]pattern = r'.*\n'if re.findall(pattern, last_line):passelse:self.definition_contents[-1] = last_line + '\n'if main_file_content:self.mix_contents = self.definition_contents + main_file_contentnew_data = ["//" + line if line.startswith("#include") else line for line in self.mix_contents]with open(temp_path, 'w', encoding='utf-8') as temp_obj:temp_obj.writelines(new_data)self.offset_length = len(new_data) - self.main_lengthreturn temp_pathdef Combiner(self):self.getDefinitionCodes()path, name = split(self.main_sign)name = '.' + nametemp_path = os.path.join(path, name)with open(self.main_sign, "r", encoding='utf-8') as main_file:main_file_content = main_file.readlines()self.main_length = len(main_file_content)last_line = self.definition_contents[-1]pattern = r'.*\n'if re.findall(pattern, last_line):passelse:self.definition_contents[-1] = last_line + '\n'if main_file_content:self.mix_contents = self.definition_contents + main_file_contentnew_data = ["//" + line if line.startswith("#include") else line for line in self.mix_contents]with open(temp_path, 'w', encoding='utf-8') as temp_obj:temp_obj.writelines(new_data)self.offset_length = len(new_data) - self.main_lengthreturn temp_path

6.SourceInfo函数数据类

# 数据类
class SourceInfo:def __init__(self, filepath, source_obj=None, headers_obj_list=None):self.filepath = filepathself.source_obj = source_objself.headers_obj_list = headers_obj_list

7.FunctionPreprocessor预处理器类

class FunctionPreprocessor:def __init__(self, file_path, keyword=None):self.file_path = file_pathself.target_function_name = keywordself.headers_list = Noneself.exclude_headers_list = Noneself.main_flag = Noneself.header_defined = False# 产生除去头文件的临时文件XXX_.c/.cppdef virtualTempFile(self, filename):with open(filename, 'r', encoding='utf-8') as file:contents = file.readlines()temp_contents = []# 注释头文件....for item in contents:if item.startswith('#include'):item = '//' + item  # 在头文件行之前添加注释符号temp_contents.append(item)path, name = split(filename)name = '.' + namenew_filename = os.path.join(path, name)with open(new_filename, 'w', encoding='utf-8') as file:file.writelines(temp_contents)return new_filename# 获取源文件的所有头文件列表def find_dependencies(self, filename):with open(filename, 'r', encoding='utf-8') as file:contents = file.readlines()headers = []pattern = r'#include\s*["]\s*(\w+\.h)\s*["]'for item in contents:match = re.search(pattern, item)if match:dependency = match.group(1)headers.append(dependency)return headersdef find_all_headers(self, filepath):directory, _ = os.path.split(filepath)for root, _, files in os.walk(directory):for file in files:if file.endswith('.h') or file.endswith('.hh'):path = os.path.abspath(os.path.join(root, file))if self.is_defined(path):self.header_defined = Truedef is_defined(self, file_path):with open(file_path, "r") as file:content = file.read()return "{" in content or "}" in content# 遍历所有同类型文件def find_all_files(self, filepath):directory, _ = os.path.split(filepath)file_list = []for root, _, files in os.walk(directory):for file in files:if file.endswith('.c') or file.endswith('.cpp'):absolute_path = os.path.abspath(os.path.join(root, file))file_list.append(absolute_path)if self.has_main_function(absolute_path):self.main_flag = absolute_pathreturn file_listdef has_main_function(self, file_path):with open(file_path, "r") as file:content = file.read()return "int main(" in contentdef multiCallExpressCombiner(self, filepath):combiner = DefinitionCallExpressCombiner(filepath)temp_filepath = combiner.Combiner()call_analyzer = FunctionDump(temp_filepath)call_analyzer.analyseLauncher()os.remove(temp_filepath)offset = combiner.offset_lengthfunction_declaration_list = []function_definition_list = []function_call_express_list = []for item in call_analyzer.function_declaration_list:if item.declared_location[0] > offset:start_line, start_index, end_line, end_index = item.declared_locationitem.declared_location = (start_line - offset, start_index, end_line - offset, end_index)function_declaration_list.append(item)else:continuefor item in call_analyzer.function_definition_list:if item.definition_location[0] > offset:start_line, start_index, end_line, end_index = item.definition_locationitem.definition_location = (start_line - offset, start_index, end_line - offset, end_index)function_definition_list.append(item)else:continuefor item in call_analyzer.function_callexpress_list:if item.call_express_location[0] > offset:start_line, start_index, end_line, end_index = item.call_express_locationitem.call_express_location = (start_line - offset, start_index, end_line - offset, end_index)function_call_express_list.append(item)else:continue# 覆盖原文call_analyzer.function_declaration_list = function_declaration_listcall_analyzer.function_definition_list = function_definition_listcall_analyzer.function_callexpress_list = function_call_express_listreturn call_analyzerdef _multiCallExpressCombiner(self, filepath):combiner = DefinitionCallExpressCombiner(filepath)temp_filepath = combiner.Combiner_()call_analyzer = FunctionDump(temp_filepath)call_analyzer.analyseLauncher()os.remove(temp_filepath)offset = combiner.offset_lengthfunction_declaration_list = []function_definition_list = []function_call_express_list = []for item in call_analyzer.function_declaration_list:if item.declared_location[0] > offset:start_line, start_index, end_line, end_index = item.declared_locationitem.declared_location = (start_line - offset, start_index, end_line - offset, end_index)function_declaration_list.append(item)else:continuefor item in call_analyzer.function_definition_list:if item.definition_location[0] > offset:start_line, start_index, end_line, end_index = item.definition_locationitem.definition_location = (start_line - offset, start_index, end_line - offset, end_index)function_definition_list.append(item)else:continuefor item in call_analyzer.function_callexpress_list:if item.call_express_location[0] > offset:start_line, start_index, end_line, end_index = item.call_express_locationitem.call_express_location = (start_line - offset, start_index, end_line - offset, end_index)function_call_express_list.append(item)else:continue# 覆盖原文call_analyzer.function_declaration_list = function_declaration_listcall_analyzer.function_definition_list = function_definition_listcall_analyzer.function_callexpress_list = function_call_express_listreturn call_analyzerdef source_runner(self, init_filename):filelist = self.find_all_files(init_filename)self.find_all_headers(init_filename)source_info_list = []if len(filelist) < 2 and not self.header_defined:for file in filelist:headers_objs = []# 源文件source_path = self.virtualTempFile(file)headers_path = self.find_dependencies(source_path)path, name = split(source_path)for header in headers_path:header_path = path + '/' + headersource_path_ = self.virtualTempFile(header_path)headers_analyzer = FunctionDump(source_path_)headers_analyzer.analyseLauncher()# headers_analyzer.show_function_details()headers_objs.append((file, header_path, headers_analyzer))os.remove(source_path_)analyzer = FunctionDump(source_path)analyzer.analyseLauncher()os.remove(source_path)# analyzer.show_function_details()per_source_info = SourceInfo(filepath=file, source_obj=analyzer, headers_obj_list=headers_objs)source_info_list.append(per_source_info)elif len(filelist) >= 2 and not self.header_defined:for file in filelist:headers_objs = []if file != self.main_flag:# 标记是不是main# 源文件source_path = self.virtualTempFile(file)headers_path = self.find_dependencies(source_path)path, name = split(source_path)for header in headers_path:header_path = path + '/' + headersource_path_ = self.virtualTempFile(header_path)headers_analyzer = FunctionDump(source_path_)headers_analyzer.analyseLauncher()# headers_analyzer.show_function_details()headers_objs.append((file, header_path, headers_analyzer))os.remove(source_path_)analyzer = FunctionDump(source_path)analyzer.analyseLauncher()os.remove(source_path)else:# 是main源文件 开始复杂拼装analyzer = self.multiCallExpressCombiner(file)per_source_info = SourceInfo(filepath=file, source_obj=analyzer, headers_obj_list=headers_objs)source_info_list.append(per_source_info)elif self.header_defined:for file in filelist:headers_objs = []if file != self.main_flag:# 标记是不是main# 源文件source_path = self.virtualTempFile(file)headers_path = self.find_dependencies(source_path)path, name = split(source_path)for header in headers_path:header_path = path + '/' + headersource_path_ = self.virtualTempFile(header_path)headers_analyzer = FunctionDump(source_path_)headers_analyzer.analyseLauncher()headers_objs.append((file, header_path, headers_analyzer))os.remove(source_path_)analyzer = FunctionDump(source_path)analyzer.analyseLauncher()os.remove(source_path)else:headers_path = self.find_dependencies(file)path, name = split(file)for header in headers_path:header_path = path + '/' + headersource_path_ = self.virtualTempFile(header_path)headers_analyzer = FunctionDump(source_path_)headers_analyzer.analyseLauncher()headers_objs.append((file, header_path, headers_analyzer))os.remove(source_path_)# 是main源文件 开始复杂拼装analyzer = self._multiCallExpressCombiner(file)per_source_info = SourceInfo(filepath=file, source_obj=analyzer, headers_obj_list=headers_objs)source_info_list.append(per_source_info)return source_info_list

函数跳转函数实现

  1. 过滤选中字符selected_text
  2. 函数跳转菜单UI连接函数
  3. 三个函数跳转逻辑编写
    gotoDeclaration:针对于右键转到声明的函数
    gotoDefinition:针对右键转到定义的函数
    gotoCallExpress:针对右键转到调用的函数
  4. 源数据的获取
    getFuncAnalyzer:接口用于获取最新的函数分析数据,当文本编辑器里的内容发生过更改或者新建文件之后,刷新数据内容。

源代码

1.过滤选中字符串getSelectdFunctionName

def getSelectdFunctionName(self, input_string):import repattern = r'\b(\w+)\s*\('match = re.search(pattern, input_string)if match:return match.group(1)words = re.findall(r'\b\w+\b', input_string)  # 提取字符串中的单词列表for word in words:if re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', word):  # 判断单词是否符合函数名的命名规则return word  # 返回第一个符合要求的单词作为函数名return None

2.右键菜单UI逻辑

def show_context_menu(self, point):self.context_menu = self.__editor.createStandardContextMenu()# 添加默认选项self.context_menu.insertSeparator(self.context_menu.actions()[0])ui_icon = self.config_ini['main_project']['project_name'] + self.config_ini['ui_img']['ui_turn_to']action_goto_declaration = QAction("转到声明", self)action_goto_declaration.setIcon(QIcon(ui_icon))action_goto_declaration.triggered.connect(self.gotoDeclaration)action_goto_definition = QAction("转到定义", self)action_goto_definition.setIcon(QIcon(ui_icon))action_goto_definition.triggered.connect(self.gotoDefinition)action_goto_call_express = QAction("转到调用", self)action_goto_call_express.setIcon(QIcon(ui_icon))action_goto_call_express.triggered.connect(self.gotoCallExpress)# 分隔符self.context_menu.insertSeparator(self.context_menu.actions()[0])self.context_menu.insertAction(self.context_menu.actions()[0], action_goto_declaration)self.context_menu.insertAction(self.context_menu.actions()[1], action_goto_definition)self.context_menu.insertAction(self.context_menu.actions()[2], action_goto_call_express)# 应用self.context_menu.exec_(self.__editor.mapToGlobal(point))def gotoDeclaration(self):self.gotoDeclarationSign.emit()def gotoDefinition(self):self.gotoDefinitionSign.emit()def gotoCallExpress(self):self.gotoCallExpressSign.emit()

text_editor_obj.gotoDeclarationSign.connect(lambda: self.gotoDeclaration(text_editor_obj))
text_editor_obj.gotoDefinitionSign.connect(lambda: self.gotoDefinition(text_editor_obj))
text_editor_obj.gotoCallExpressSign.connect(lambda: self.gotoCallExpress(text_editor_obj))

3.gotoDeclaration gotoDefinition gotoCallExpress

# 声明跳转
def gotoDeclaration(self, editor):position, selected_text = editor.getSelected_Position_Content()locations = []absolute_path = editor.filepath + '/' + editor.filename# 过滤选中的字符selected_text = editor.getSelectdFunctionName(selected_text)if self.source_data == None or self.current_source_path == None:self.source_data = self.getFuncAnalyzer(editor=editor)self.current_source_path = os.path.normpath(absolute_path)if self.source_data and self.current_source_path == None:self.current_source_path = os.path.normpath(absolute_path)elif self.current_source_path and self.current_source_path != os.path.normpath(absolute_path):self.current_source_path = os.path.normpath(absolute_path)else:passlocation = NoneisSource = True# 头文件跳源文件if '.h' in editor.filename or '.hh' in editor.filename:isSource = Falseif self.source_data:for data in self.source_data:# 文件名isFind = Falsefilename = data.filepath# 声明function_declaration_list = data.source_obj.function_declaration_list# 头文件headers_obj_list = data.headers_obj_list# 查源文件...for per_obj in function_declaration_list:if selected_text == per_obj.function_name and per_obj.declared_contents:location = per_obj.declared_locationisFind = Truebreakif not isFind and location == None:# 头文件遍历current_editor = Nonefor per_obj in headers_obj_list:filepath, header_path, item = per_objpath, name = split(filepath)path, name_ = split(header_path)# 声明for i in item.function_declaration_list:if  selected_text == i.function_name and i.declared_contents:location = i.declared_locationif isSource:self.create_new_open_tab(header_path)current_editor = self.ui.text_editor.currentWidget()else:# 关键!current_editor = editorbreakif location is not None and current_editor is not None:start_line = location[0] - 1start_index = location[1] - 1end_line = location[2] - 1end_index = location[3] - 1text_location = [(start_line, start_index, end_line, end_index)]current_editor.highlight_function_declaration(text_location)elif isFind and location is not None:if location is not None:start_line = location[0] - 1start_index = location[1] - 1end_line = location[2] - 1end_index = location[3] - 1text_location = [(start_line, start_index, end_line, end_index)]editor.highlight_function_declaration(text_location)# 定义跳转
def gotoDefinition(self, editor):position, selected_text = editor.getSelected_Position_Content()locations = []absolute_path = editor.filepath + '/' + editor.filenameselected_text = editor.getSelectdFunctionName(selected_text)if self.source_data == None or self.current_source_path == None:self.source_data = self.getFuncAnalyzer(editor=editor)self.current_source_path = os.path.normpath(absolute_path)if self.source_data and self.current_source_path == None:self.current_source_path = os.path.normpath(absolute_path)elif self.current_source_path and self.current_source_path != os.path.normpath(absolute_path):self.current_source_path = os.path.normpath(absolute_path)else:passlocation = NoneisSource = Trueif '.h' in editor.filename or '.hh' in editor.filename:isSource = Falseif self.source_data:for data in self.source_data:# 文件名isFind = Falsefilename = data.filepath# 定义function_definition_list = data.source_obj.function_definition_list# 头文件headers_obj_list = data.headers_obj_list# 查源文件...for per_obj in function_definition_list:if selected_text == per_obj.function_name and per_obj.definition_contents:location = per_obj.definition_locationisFind = Truebreakif not isFind and location == None:# 头文件遍历for per_obj in headers_obj_list:filepath, header_path, item = per_objpath, name = split(filepath)path, name_ = split(header_path)# 定义for i in item.function_definition_list:if selected_text == i.function_name  and i.definition_contents:location = i.definition_locationif isSource:self.create_new_open_tab(header_path)current_editor = self.ui.text_editor.currentWidget()else:current_editor = editorbreakif location is not None and current_editor is not None:start_line = location[0] - 1start_index = location[1] - 1end_line = location[2] - 1end_index = location[3] - 1text_location = [(start_line, start_index, end_line, end_index)]current_editor.highlight_function_definition(text_location)elif isFind and location is not None:another_editor = editorif os.path.normpath(absolute_path) != os.path.normpath(filename):self.create_new_open_tab(os.path.normpath(filename))another_editor = self.ui.text_editor.currentWidget()if location is not None:start_line = location[0] - 1start_index = location[1] - 1end_line = location[2] - 1end_index = location[3] - 1text_location = [(start_line, start_index, end_line, end_index)]another_editor.highlight_function_definition(text_location)
# 调用跳转
def gotoCallExpress(self, editor):position, selected_text = editor.getSelected_Position_Content()locations = []absolute_path = editor.filepath + '/' + editor.filenameselected_text = editor.getSelectdFunctionName(selected_text)if self.source_data == None or self.current_source_path == None:self.source_data = self.getFuncAnalyzer(editor=editor)self.current_source_path = os.path.normpath(absolute_path)if self.source_data and self.current_source_path == None:self.current_source_path = os.path.normpath(absolute_path)elif self.current_source_path and self.current_source_path != os.path.normpath(absolute_path):self.current_source_path = os.path.normpath(absolute_path)else:passisSource = Trueif '.h' in editor.filename or '.hh' in editor.filename:isSource = Falseif self.source_data:for data in self.source_data:# 文件名filename = data.filepath# 调用function_callexpress_list = data.source_obj.function_callexpress_list# 记得清空 不然GGlocations = []for per_obj in function_callexpress_list:if selected_text == per_obj.function_name and per_obj.call_express_contents:location = per_obj.call_express_locationstart_line = location[0] - 1start_index = location[1] - 1end_line = location[2] - 1end_index = location[3] - 1text_location = (start_line, start_index, end_line, end_index)locations.append(text_location)if not isSource and locations != []:self.create_new_open_tab(filename)another_editor = self.ui.text_editor.currentWidget()another_editor.highlight_function_call_express(locations)elif isSource and locations != []:if os.path.normpath(absolute_path) != os.path.normpath(filename):self.create_new_open_tab(os.path.normpath(filename))another_editor = self.ui.text_editor.currentWidget()another_editor.highlight_function_call_express(locations)else:editor.highlight_function_call_express(locations)

4.getFuncAnalyzer

def getFuncAnalyzer(self, editor):filename = editor.filenamefilepath = editor.filepathabsolute_path = filepath + '/' + filenamefunc_dump = FunctionPreprocessor(absolute_path)source_data = func_dump.source_runner(absolute_path)return source_data

基本逻辑流程图

#mermaid-svg-JLaN092UjmWrEx2C {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-JLaN092UjmWrEx2C .error-icon{fill:#552222;}#mermaid-svg-JLaN092UjmWrEx2C .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-JLaN092UjmWrEx2C .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-JLaN092UjmWrEx2C .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-JLaN092UjmWrEx2C .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-JLaN092UjmWrEx2C .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-JLaN092UjmWrEx2C .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-JLaN092UjmWrEx2C .marker{fill:#333333;stroke:#333333;}#mermaid-svg-JLaN092UjmWrEx2C .marker.cross{stroke:#333333;}#mermaid-svg-JLaN092UjmWrEx2C svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-JLaN092UjmWrEx2C .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-JLaN092UjmWrEx2C .cluster-label text{fill:#333;}#mermaid-svg-JLaN092UjmWrEx2C .cluster-label span{color:#333;}#mermaid-svg-JLaN092UjmWrEx2C .label text,#mermaid-svg-JLaN092UjmWrEx2C span{fill:#333;color:#333;}#mermaid-svg-JLaN092UjmWrEx2C .node rect,#mermaid-svg-JLaN092UjmWrEx2C .node circle,#mermaid-svg-JLaN092UjmWrEx2C .node ellipse,#mermaid-svg-JLaN092UjmWrEx2C .node polygon,#mermaid-svg-JLaN092UjmWrEx2C .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-JLaN092UjmWrEx2C .node .label{text-align:center;}#mermaid-svg-JLaN092UjmWrEx2C .node.clickable{cursor:pointer;}#mermaid-svg-JLaN092UjmWrEx2C .arrowheadPath{fill:#333333;}#mermaid-svg-JLaN092UjmWrEx2C .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-JLaN092UjmWrEx2C .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-JLaN092UjmWrEx2C .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-JLaN092UjmWrEx2C .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-JLaN092UjmWrEx2C .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-JLaN092UjmWrEx2C .cluster text{fill:#333;}#mermaid-svg-JLaN092UjmWrEx2C .cluster span{color:#333;}#mermaid-svg-JLaN092UjmWrEx2C div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-JLaN092UjmWrEx2C :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}

注释头文件
过滤游标节点属性
过滤游标节点属性
过滤游标节点属性
注释头文件
正则过滤
预处理
判断器
定义部分
主函数部分
拼接函数
头文件
抽象语法树
函数声明信息类
函数定义信息类
函数调用信息类
函数数据包装类
函数数据类
判断函数
函数声明跳转函数
函数定义跳转函数
函数调用跳转函数
过滤器
源文件
用户选中
关键词
纯净函数名
选择菜单
转到声明
转到定义
转到调用
信号判断器
对应槽函数
切换标签页
高亮对应位置

markdown画流程图真的是很难画啊,用飞书就是在救赎自己

奇淫技巧

由于我使用的接口分析信息的时候,会自动分析大量的头文件信息,导致我在分析过滤的时候会输出一大堆标准库里面的函数信息,这样会阻碍我对于自定义函数的分析和处理。
这个接口调用的方式是传入一个有内容的代码文件然后分析,如果能换成字符串那将会很好,但是不行,所以我选择手动注释掉文件里的头文件,然后把临时文件传入分析类里分析,但是保留原来文件的内容在文本编辑器上,分析结束之后立刻删除该临时文件,这样就能造成我读取的是源码分析的错觉。

演示效果

演示图

后面的碎碎念

起因

其实是实习的项目内容,但是我挑选了一部分(我觉得我在做的时候饱受折磨的)来展示。因为几乎没有得到明显的参考资料,大部分是自己手搓的,还有根据大爹chatgpt3.5的指点,虽然它的指点也很烂,写出来的代码还不如我手搓的。

给后辈的礼物

也许下一次他们这个还是写这个项目,可以拿来参考参考,但是跑不跑得通又是另一回事了。我果然有当老师的天赋吗,我修bug的能力一流…或者有些人被迫拿这个类来做文本编辑器…

源码

我放在github上了,但是配置文件是加密的,所以请仔细查看README里面的内容。
我的部分会详细说明,不是我的部分,自己参悟0-0!
源码在这里
上篇在这里

[手搓人]大战[高亮编辑器/查找替换/函数跳跳蛙]--巅峰对决(2)相关推荐

  1. [手搓人]大战[高亮编辑器/查找替换/函数跳跳蛙]--巅峰对决(1)

    战前准备 github以及它的访问权限 强大的心理支撑 一点点英文水平(至少会念ABC ) 会写测试样例 面向对象要学好(没学真的会感到是在地狱) 使用环境 python 3.8.0 pyqt 5.1 ...

  2. ubuntu中查找文件后高亮_vim查找替换及取消高亮

    查找替换的格式如下: :[range]s[ubstitute]/{pattern}/{string}/[flags] [count] range可以是 .  点号表示在当前行查找(这是默认的range ...

  3. PostgreSQL 查找替换函数

    介绍 PostgreSQL 替换函数,这些函数在字符串中搜索子字符串并将其替换为新的子字符串. PostgreSQL REPLACE 函数 有时,您想搜索列中的字符串并将其替换为新字符串,例如替换过时 ...

  4. Excel字符函数(5):REPLACE、SUBSTITUTE查找替换函数之区别

    文本字符串中用 new_text 替换 old_text. 如果需要在某一文本字符串中替换指定的文本,使用函数 SUBSTITUTE: 如果需要在某一文本字符串中替换特定位置处的任意文本,使用函数 R ...

  5. html编辑器查找与替换,织梦kindeditor文本编辑器增加“查找替换”功能

    织梦kindeditor文本编辑器增加"查找替换"功能效果演示 1.items 里面增加 search 按钮 ['source','|','undo','redo','|','pr ...

  6. 找不到 查找_当心Excel查找替换错误,别犯“台风致山东全省人死亡”的错误

    哈喽,大家好!我是爱踢汪.最近让大家最关注的一定是台风利奇马吧,全国人民的心都为此揪在了一起,各大平台也都纷纷关注.近日,腾讯视频可以说闹了个笑话,在对台风的报道中,竟然说台风利奇马已致全省人死亡,这 ...

  7. Vim查找替换操作 --- 查找和替换

    查找替换 查找和替换是编辑器中最常用的功能之一,在普通编辑器当中查找替换时,你可能需要先移动鼠标在菜单中点击查找的功能,输入查找内容,再点击确认查找.而在vim中,所有的操作只需要敲击几下键盘就行了, ...

  8. 狂肝10个月手搓GPU,他们在《我的世界》里面玩《我的世界》

    梦晨 衡宇 萧箫 发自 凹非寺 量子位 | 公众号 QbitAI 自从有人在<我的世界>里用红石电路造出CPU,就流传着一个梗: 总有一天,这帮红石佬能在我的世界里玩上我的世界. 这一天, ...

  9. 手柄映射键盘_“吃鸡”直接匹配“手搓”玩家?北通G2是一款非常好用的手柄...

    大家好,欢迎来到<刺激实战教室>,我是你们的老朋友刺激哥. 作为一名吃鸡手游的爱好者,刺激哥的水平虽然并不见得有多好,但入手的游戏手柄可是真的一点都不少. 前前后后,接连入手了数款手机手柄 ...

最新文章

  1. Spring AOP AspectJ Pointcut Expressions With Examples--转
  2. sklearn API快速上手
  3. WEB-INF目录下登录表单提交的重定向
  4. Linux基础之shell变量
  5. DIP第三章习题解答
  6. 全球及中国家电用PET薄膜涂层钢卷市场前景形势与未来竞争规模展望报告2022版
  7. MapReduce的工作原理,详细解释WordCount程序
  8. ASP.NET Core Blazor Webassembly 之 组件
  9. 新增成功到编制为空bug_36 个JS 面试题为你助力,让面试更有力(面试必读)
  10. 实时计算pv/uv Demo
  11. Android JSON数据与实体类之间的相互转化-------GSON的简单用法
  12. 程序员如何应对中年危机?让编程变得不再重要
  13. 从 活动选择问题 看动态规划和贪心算法的区别与联系
  14. php wula,PHP老师没教过你的那些知识点
  15. 清除用友所有单据锁定的SQL语句
  16. 阿里云申请免费ssl证书并配置nginx
  17. leetcode之奇偶链表
  18. CentOS 6.5下安装MySQL后重置root密码方法
  19. [深入Maven源代码]maven绑定命令行参数到具体插件
  20. 思科模拟器路由表怎么看_思科模拟器基本命令

热门文章

  1. 快速傅里叶变换(FFT),离散傅里叶变换(DFT)
  2. class.getClassLoader().getResource(xxx)是什么意思啊?最后这个xxx和这个类有什么关系?...
  3. C语言scanf输入字符串,举例详解
  4. 为别人做嫁衣裳——代理模式
  5. c#调用C++写的dll导出类,如何实现
  6. 【线性代数的几何意义】什么是线性代数
  7. 使用MircoPython转换PUD编码发送中文短信
  8. c语言,一维数组指针
  9. 高效运维:运维自动化之殇
  10. 头条搜索无法撼动百度,核心战场是内容分发