a linux trace/probe tool.

官网:https://sourceware.org/systemtap/

用户空间

SystemTap探测用户空间程序需要utrace的支持,3.5以上的内核版本默认支持。

对于3.5以下的内核版本,需要自己打相关补丁。

更多信息:http://sourceware.org/systemtap/wiki/utrace

需要:

debugging information for the named program

utrace support in the kernel

(1) Begin/end

探测点:

进程/线程创建时

进程/线程结束时

process.begin

process("PATH").begin

process(PID).begin

process.thread.begin

process("PATH").thread.begin

process(PID).thread.begin

process.end

process("PATH").end

process(PID).end

process.thread.end

process("PATH").thread.end

process(PID).thread.end

(2) Syscall

探测点:

系统调用开始

系统调用返回

process.syscall

process("PATH").syscall

process(PID).syscall

process.syscall.return

process("PATH").syscall.return

process(PID).syscall.return

可用的进程上下文变量:

$syscall // 系统调用号

$argN ($arg1~$arg6) // 系统调用参数

$return // 系统调用返回值

(3) Function/statement

探测点:

函数入口处

函数返回处

文件中某行

函数中的某个标签

process("PATH").function("NAME")

process("PATH").statement("*@FILE.c:123")

process("PATH").function("*").return

process("PATH").function("myfunc").label("foo")

(4) Absolute variant

探测点:

进程的虚拟地址

process(PID).statement(ADDRESS).absolute

A non-symbolic probe point uses raw, unverified virtual addresses and provide no $variables.

The target PID parameter must identify a running process and ADDRESS must identify a valid instruction address.

This is a guru mode probe.

(5) Target process

探测点:

动态链接库中的函数(比如glibc)

Target process mode (invoked with stap -c CMD or -x PID) implicitly restricts all process.* probes to the given child

process.

If PATH names a shared library, all processes map that shared library can be probed.

If dwarf debugging information is installed, try using a command with this syntax:

probe process("/lib64/libc-2.8.so").function("...") { ... }

(6) Instruction probes

探测点:

单条指令

指令块

process("PATH").insn

process(PID).insn

process("PATH").insn.block

process(PID).insn.block

The .insn probe is called for every single-stepped instruction of the process described by PID or PATH.

The .insn.block probe is called for every block-stepped instruction of the process described by PID or PATH.

Using this feature will significantly slow process execution.

统计一个进程执行了多少条指令:

stap -e 'global steps; probe process("/bin/ls").insn {steps++}; probe end {printf("Total instruction: %d\n", steps)}' \

-c /bin/ls

(7) 使用

gcc -g3 -o test test.c

stap -L 'process("./test").function("*")' // 显示程序中的函数和变量

调试等级:

Request debugging information and also use level to specify how much information. The default level is 2.

Level 0 produces no debug information at all. Thus, -g0 negates -g.

Level 1 produces minimal information, enough for making backtraces in parts of the program that you don't

plan to debug. This includes descriptions of functions and external variables, but no information about local

variables and no line numbers.

Level 3: includes extra information, such as all the macro definitions present in the program.

高级功能

(1) 自建脚本库

A tapset is just a script that designed for reuse by installation into a special directory.

Systemtap attempts to resolve references to global symbols (probes, functions, variables) that are not defined

within the script by a systematic search through the tapset library for scripts that define those symbols.

A user may give additional directories with the -I DIR option.

构建自己的库:

1. 创建库目录mylib,添加两个库文件

time-default.stp

[java] view plaincopy在CODE上查看代码片派生到我的代码片

  1. function __time_value() {
  2. returngettimeofday_us()
  3. }

time-common.stp

[java] view plaincopy在CODE上查看代码片派生到我的代码片

  1. global __time_vars
  2. function timer_begin(name) {
  3. __time_vars[name] = __time_value()
  4. }
  5. function timer_end(name) {
  6. return__time_value() - __time_vars[name]
  7. }

2. 编写应用脚本

tapset-time-user.stp

[java] view plaincopy在CODE上查看代码片派生到我的代码片

  1. probe begin {
  2. timer_begin("bench")
  3. for(i=0; i<1000; i++) ;
  4. printf("%d cycles\n", timer_end("bench"))
  5. exit()
  6. }

3. 执行

stap -I mylib/ tapset-time-user.stp

(2) 探测点重命名

主要用于在探测点之上提供一个抽象层。

Probe point aliases allow creation of new probe points from existing ones.

This is useful if the new probe points are named to provide a higher level of abstraction.

格式:

probe new_name = existing_name1, existing_name2[, ..., existing_nameN]

{

prepending behavior

}

实例:

[java] view plaincopy在CODE上查看代码片派生到我的代码片

  1. probe syscallgroup.io = syscall.open, syscall.close,
  2. syscall.read, syscall.write
  3. {
  4. groupname = "io"
  5. }
  6. probe syscallgroup.process = syscall.fork, syscall.execve
  7. {
  8. groupname = "process"
  9. }
  10. probe syscallgroup.*
  11. {
  12. groups[execname() . "/". groupname]++
  13. }
  14. global groups
  15. probe end
  16. {
  17. foreach (eg in groups+)
  18. printf("%s: %d\n", eg, groups[eg])
  19. }

(3) 嵌入C代码

SystemTap provides an "escape hatch" to go beyond what the language can safely offer.

嵌入的C代码段用%{和%}括起来,执行脚本时要加-g选项。

提供一个THIS宏,可以用于获取函数参数和保存函数返回值。

实例:

[java] view plaincopy在CODE上查看代码片派生到我的代码片

  1. %{
  2. #include <linux/sched.h>
  3. #include <linux/list.h>
  4. %}
  5. function process_list()
  6. %{
  7. struct task_struct *p;
  8. struct list_head *_p, *_n;
  9. printk("%-20s%-10s\n","program","pid");
  10. list_for_each_safe(_p, _n, &current->tasks) {
  11. p = list_entry(_p, struct task_struct, tasks);
  12. printk("%-20s%-10d\n", p->comm, p->pid);
  13. }
  14. %}
  15. probe begin {
  16. process_list()
  17. exit()
  18. }

stap -g embeded-c.stp

dmesg可看到打印出的所有进程。

C代码用%{ ... %}括起来,可以是独立的一个段,可以作为函数的一部分,也可以只是一个表达式。

(4) 已有脚本库

SystemTap默认提供了非常强大的脚本库,主要类别如下:

Context Functions

Timestamp Functions

Time utility functions

Shell command functions

Memory Tapset

Task Time Tapset

Secheduler Tapset

IO Scheduler and block IO Tapset

SCSI Tapset

TTY Tapset

Interrupt Request (IRQ) Tapset

Networking Tapset

Socket Tapset

SNMP Information Tapset

Kernel Process Tapset

Signal Tapset

Errno Tapset

Device Tapset

Directory-entry (dentry) Tapset

Logging Tapset

Queue Statistics Tapset

Random functions Tapset

String and data retrieving functions Tapset

String and data writing functions Tapset

Guru tapsets

A collection of standard string functions

Utility functions for using ansi control chars in logs

SystemTap Translator Tapset

Network File Storage Tapsets

Speculation

实现原理

(1) SystemTap脚本的执行流程

pass1

During the parsing of the code, it is represented internally in a parse tree.

Preprocessing is performed during this step, and the code is checked for semantic and syntax errors.

pass2

During the elaboration step, the symbols and references in the SystemTap script are resolved.

Also, any tapsets that are referenced in the SystemTap script are imported.

Debug data that is read from the DWARF(a widely used, standardized debugging data format) information,

which is produced during kernel compilation, is used to find the addresses for functions and variables

referenced in the script, and allows probes to be placed inside functions.

pass3

Takes the output from the elaboration phase and converts it into C source code.

Variables used by multiple probes are protected by locks. Safety checks, and any necessary locking, are

handled during the translation. The code is also converted to use the Kprobes API for inserting probe points

into the kernel.

pass4

Once the SystemTap script has been translated into a C source file, the code is compiled into a module that

can be dynamically loaded and executed in the kernel.

pass5

Once the module is built, SystemTap loads the module into the kernel.

When the module loads, an init routine in the module starts running and begins inserting probes into their

proper locations. Hitting a probe causes execution to stop while the handler for that probe is called.

When the handler exits, normal execution continues. The module continues waiting for probes and executing

handler code until the script exits, or until the user presses Ctrl-c, at which time SystemTap removes the

probes, unloads the module, and exits.

Output from SystemTap is transferred from the kernel through a mechanism called relayfs, and sent to STDOUT.

(2) 从用户空间和内核空间来看SystemTap脚本的执行

(3) kprobes

断点指令(breakpoint instruction):__asm INT 3,机器码为CC。

断点中断(INT3)是一种软中断,当执行到INT 3指令时,CPU会把当时的程序指针(CS和EIP)压入堆栈保存起来,

然后通过中断向量表调用INT 3所对应的中断例程。

INT是软中断指令,中断向量表是中断号和中断处理函数地址的对应表。

INT 3即触发软中断3,相应的中断处理函数的地址为:中断向量表地址 + 4 * 3。

A Kprobe is a general purpose hook that can be inserted almost anywhere in the kernel code.

To allow it to probe an instruction, the first byte of the instruction is replaced with the breakpoint

instruction for the architecture being used. When this breakpoint is hit, Kprobe takes over execution,

executes its handler code for the probe, and then continues execution at the next instruction.

(4) 依赖的内核特性

kprobes/jprobes

return probes

reentrancy

colocated (multiple)

relayfs

scalability (unlocked handlers)

user-space probes

内核调试神器SystemTap — 更多功能与原理(三)相关推荐

  1. ​内核调试技巧--systemtap定位丢包原因

    作者:wqiangwang,腾讯 TEG 后台开发工程师 内核收发包,可能会由于backlog队列满.内存不足.包校验失败.特性开关如rpf.路由不可达.端口未监听等等因素将包丢弃. 在内核里面,数据 ...

  2. Linux 内核调试 四:qemu-system-arm功能选项整理

    参考资料: https://qemu.readthedocs.io/en/latest/about/index.html onlylove@ubuntu:~/My/qemu/qemu-lq$ ./qe ...

  3. Linux内核调试原理和工具介绍--理解静态插装/动态插装、tracepoint、ftrace、kprobe、SystemTap、Perf、eBPF

    可以将linux跟踪系统分成Tracer(跟踪数据来自哪里),数据收集分析(如"ftrace")和跟踪前端(更方便的用户态工具). 1. 数据源(Tracers) printk 是 ...

  4. 《安富莱嵌入式周报》第310期:集成大语言模型的开源调试器ChatDBG, 多功能开源计算器,M7内核航空航天芯片评估板, Zigbee PRO规范

    周报汇总地址:嵌入式周报 - uCOS & uCGUI & emWin & embOS & TouchGFX & ThreadX - 硬汉嵌入式论坛 - Pow ...

  5. Linux内核调试技术指南

    前两天,完成了ucos在2440上的移植,以及boot的修改.今天突然想到,我在linux下,该如何来编写,调试比较复杂的驱动.我想这个问题应该从如何调试内核入手,先转载两个文字,待西西看来. 系统搭 ...

  6. linux内核调试指南

    Hunnad的专栏 * 条新通知 * 登录 * 注册 * 欢迎 * 退出 * 我的博客 * 配置 * 写文章 * 文章管理 * 博客首页 * * * * 空间 * 博客 * 好友 * 相册 * 留言 ...

  7. linux内核调试指南 1

    大海里的鱼有很多,而我们需要的是鱼钩一只 一些前言 作者前言 知识从哪里来 为什么撰写本文档 为什么需要汇编级调试 ***第一部分:基础知识*** 总纲:内核世界的陷阱 源码阅读的陷阱 代码调试的陷阱 ...

  8. linux 内核调试指南

    大海里的鱼有很多,而我们需要的是鱼钩一只 本文档由大家一起自由编写,修改和扩充,sniper负责维护.引用外来的文章要注明作者和来处.本文档所有命令都是在ubuntu/debian下的操作.选取的内核 ...

  9. Linux Kernel - Debug Guide (Linux内核调试指南 )

    linux内核调试指南 一些前言 作者前言 知识从哪里来 为什么撰写本文档 为什么需要汇编级调试 ***第一部分:基础知识*** 总纲:内核世界的陷阱 源码阅读的陷阱 代码调试的陷阱 原理理解的陷阱 ...

  10. Linux内核调试方法【转】

    转自:http://www.cnblogs.com/shineshqw/articles/2359114.html kdb:只能在汇编代码级进行调试: 优点是不需要两台机器进行调试. gdb:在调试模 ...

最新文章

  1. ceph的数据存储之路(6) -----pg的创建
  2. 常用JS图片滚动(无缝、平滑、上下左右滚动)代码大全
  3. 《架构漫谈》阅读笔记
  4. Java基础篇:重新温习不一样的数组
  5. pytorch中的nn.LSTM模块参数详解
  6. OSI参考模型(2)
  7. android通过代码设置铃声_让你的手机铃声与众不同 (附ios音乐dj)
  8. php地名转换成拼音,php汉字转拼音_php中怎么将中文转换拼音
  9. 各省GDP+人均GDP+固定投资+财政收支等面板dta数据(1949-2020年)
  10. OpenGL 编程指南 ( 原书第 9 版 ) --- 第二章
  11. hdu6070 Dirt Ratio(二分+线段树)
  12. 有参构造方法的作用和无参构造方法的作用
  13. 卫青和霍去病:汉匈战争史最天才的两名战将
  14. Font Awesome所有图标
  15. 特性(Attributes)
  16. 数据采集系统的抗干扰措施
  17. 攻防世界 Crypto高手进阶区 3分题 wtc_rsa_bbq
  18. 详解Java基础数据类型
  19. 基于SpringBoot的便捷网住宿预约系统的设计与实现
  20. Linux下的虚拟串口驱动(一)

热门文章

  1. 把触发器说透(转载)
  2. 在ASP.NET中创建安全的web站点
  3. Hbase常用shell
  4. Win10下安装MySQL5.6
  5. 研究发现,近一半生产容器存在漏洞
  6. 初学linux网络服务之DHCP实验
  7. idea类生成序列号
  8. mybatis 实现查询商品列表的分页
  9. 不为人知的AI简史:人机共生梦想家,却意外促成互联网的出现
  10. 10562:Undraw the Trees