Blktrace简介

Blktrace是一个用户态的工具,用来收集磁盘IO信息中当IO进行到块设备层(block层,所以叫blk trace)时的详细信息(如IO请求提交,入队,合并,完成等等一些列的信息)。

块设备层处于下图(借用褚霸的图)中的 “block layer”

Blktrace工作原理

(1)     blktrace测试的时候,会分配物理机上逻辑cpu个数个线程,并且每一个线程绑定一个逻辑cpu来收集数据

(2)     blktrace在debugfs挂载的路径(默认是/sys/kernel/debug )下每个线程产生一个文件(就有了对应的文件描述符),然后调用ioctl函数(携带文件描述符, _IOWR(0x12,115,struct blk_user_trace_setup),& blk_user_trace_setup三个参数),产生系统调用将这些东西给内核去调用相应函数来处理,由内核经由debugfs文件系统往此文件描述符写入数据

(3)     blktrace需要结合blkparse来使用,由blkparse来解析blktrace产生的特定格式的二进制数据

(4)     blkparse仅打开blktrace产生的文件,从文件里面取数据做展示以及最后做per cpu的统计输出,但blkparse中展示的数据状态(如 A,U,Q,详细见下)是blkparse在t->action & 0xffff之后自己把数值转换为“A,Q,U之类的状态”来展示的。

Blktrace安装

1.       yum install blktrace

2.       源码获取(你也可以从源码安装)

git clone git://git.kernel.org/pub/scm/linux/kernel/git/axboe/blktrace.git bt

cd bt

make

make install

Blktrace的使用

Debugfs挂载

由之前的blktrace工作原理可知,blktrace需要借助内核经由debugfs文件系统(debugfs文件系统在内存中)来输出信息

所以用blktrace工具之前需要先挂载debugfs文件系统

mount      –t debugfs    debugfs /sys/kernel/debug

或者在/etc/fstab中添加下面一行以便在开机启动的时候自动挂载

debug      /sys/kernel/debug           debugfs    default     0       0

blktrace具体的磁盘或分区

blktrace具体语法man blktrace,这里讲常用的

文件输出

mkdir test  #blktrace生成的数据默认会在当前目录,如之前在blktrace原理中提到,每个逻辑cpu都有一个线程,产生一个文件,故会产生cpu数目个文件

blktrace –d /dev/sda –o test1

#对 /dev/sda的trace,输出文件名为test1. Blktrace.[0-cpu数-1]   (文件里面存的是二进制数据,需要blkparse来解析)

终端输出

Blktrace –d /dev/sda –o - |blkparse  -i –

输出到终端用“-”表示,可是都是一堆二进制东西,没法看,所以需要实时blkparse来解析

Blkparse 的“-i”后加文件名,blktrace输出为“-“代表终端(代码里面写死了,就是用这个符号来代表终端),blkparse也用“-”来代表终端解析

blkparse解析blktrace产生的数据

blkparse具体语法man blkparse,这里讲常用的

文件解析

blkparse  -i    test1 #对test1.blktrace. [0-cpu数-1]都解析(只统计有数据的),

实时解析

实时数据的解析即上blktrace的“终端输出”

使用实例

终端1:

blktrace /dev/sda -o - |blkparse -i – 跑着

终端2:

dd if=/dev/zero of=/root/a1 bs=4k count=1000

终端1显示

8,0   16     3041    94.435078912   891  A   W 72411584 + 8 <- (8,2) 71884224

8,0   16     3042    94.435079691   891  Q   W 72411584 + 8 [flush-8:0]

8,0   16     3043    94.435080790   891  M   W 72411584 + 8 [flush-8:0]

8,0   16     3044    94.435083089   891  A   W 72411592 + 8 <- (8,2) 71884232

输出解析

这是默认输出格式,代码里默认输出格式为,再按action输出或不输出后续信息

先输出 –f "%D %2c %8s %5T.%9t %5p %2a %3d "

其中每个字母代表意思如下,数字代表占几个字符,和printf里的数字输出一样的

8,0   16     3042    94.435079691   891  Q   W 72411584 + 8 [flush-8:0]

由于默认格式为先输出–f "%D %2c %8s %5T.%9t %5p %2a %3d "

(1)8,0 按默认输出对应%D,主从设备号

(2)16 按默认输出对应%2c,表示cpu id

(3)3042 按默认输出对应%8s,表示序列号(序列号是blkparse自己产生的一个序号,实际IO里没有这个号)

(4)94.435079691 按默认对应%5T.%9t,表示”秒.纳秒”

(5)891对应%5p,表示,进程id

(6)Q对应%2a,表示Action,Action表格如下(如Q表示IO handled by request queue code),更详细的含义见附录action表

The following table shows the various actions which may be output.

Act Description

A IO was remapped to a different device

B IO bounced

C IO completion

D IO issued to driver

F IO front merged with request on queue

G Get request

I IO inserted onto request queue

M IO back merged with request on queue

P Plug request

Q IO handled by request queue code

S Sleep request

T Unplug due to timeout

U Unplug request

X Split

(7)W 对应%3d,表示RWBS域(W表示写操作),各字母含义如下

至少包含“RWD“( R 读,W写,D块被忽略)中的1个字符

还可以附加“BS“(B barrier,S同步)

再输出(源代码里面这么写的)

switch (act[0]) {case 'R':   /* Requeue */case 'C': /* Complete */if (t->action & BLK_TC_ACT(BLK_TC_PC)) {char *p = dump_pdu(pdu_buf, pdu_len);if (p)fprintf(ofp, "(%s) ", p);fprintf(ofp, "[%d]n", t->error);} else {if (elapsed != -1ULL) {if (t_sec(t))fprintf(ofp, "%llu + %u (%8llu) [%d]n",(unsigned long long) t->sector,t_sec(t), elapsed, t->error);elsefprintf(ofp, "%llu (%8llu) [%d]n",(unsigned long long) t->sector,elapsed, t->error);} else {if (t_sec(t))fprintf(ofp, "%llu + %u [%d]n",(unsigned long long) t->sector,t_sec(t), t->error);elsefprintf(ofp, "%llu [%d]n",(unsigned long long) t->sector,t->error);}}break;case 'D':           /* Issue */case 'I':   /* Insert */case 'Q':           /* Queue */case 'B':   /* Bounce */if (t->action & BLK_TC_ACT(BLK_TC_PC)) {char *p;fprintf(ofp, "%u ", t->bytes);p = dump_pdu(pdu_buf, pdu_len);if (p)fprintf(ofp, "(%s) ", p);fprintf(ofp, "[%s]n", name);} else {if (elapsed != -1ULL) {if (t_sec(t))fprintf(ofp, "%llu + %u (%8llu) [%s]n",(unsigned long long) t->sector,t_sec(t), elapsed, name);elsefprintf(ofp, "(%8llu) [%s]n", elapsed,name);} else {if (t_sec(t))fprintf(ofp, "%llu + %u [%s]n",(unsigned long long) t->sector,t_sec(t), name);elsefprintf(ofp, "[%s]n", name);}}break;case 'M':  /* Back merge */case 'F':    /* Front merge */case 'G':   /* Get request */case 'S':    /* Sleep request */if (t_sec(t))fprintf(ofp, "%llu + %u [%s]n",(unsigned long long) t->sector, t_sec(t), name);elsefprintf(ofp, "[%s]n", name);break;case 'P':   /* Plug */fprintf(ofp, "[%s]n", name);break;case 'U':   /* Unplug IO */case 'T': /* Unplug timer */fprintf(ofp, "[%s] %un", name, get_pdu_int(t));break;case 'A': /* remap */get_pdu_remap(t, &r);fprintf(ofp, "%llu + %u <- (%d,%d) %llun",(unsigned long long) t->sector, t_sec(t),MAJOR(r.device_from), MINOR(r.device_from),(unsigned long long) r.sector_from);break;case 'X': /* Split */fprintf(ofp, "%llu / %u [%s]n", (unsigned long long) t->sector,get_pdu_int(t), name);break;case 'm':  /* Message */fprintf(ofp, "%*sn", pdu_len, pdu_buf);break;default:fprintf(stderr, "Unknown action %cn", act[0]);break;}

所以

具体解析

8,0   16     3042    94.435079691   891  Q   W 72411584 + 8 [flush-8:0]

中的act[0]=’Q’,后面的72411584是(8,0即sda)相对8:0的扇区起始号,+8,为后面连续的8个扇区(默认一个扇区512byte,所以8个扇区就是4K),后面的[flush-8:0]是程序的名字。

8,0   16     3041    94.435078912   891  A   W 72411584 + 8 <- (8,2) 71884224

Action[0]=’A’, 72411584是相对8:0(即sda)的起始扇区号,(8,2)是相对/dev/sda2分区的扇区号为71884224,(由于/dev/sda2分区时sda磁盘上面的一个分区,故sda2上面的起始位置要先映射到sda磁盘上面去)

由于扇区号在磁盘上面是连续的,磁盘又被格式化成很多块,一个块里包含多个扇区,所以,扇区号/块大小=块号,

根据块号你就可以找到对应的inode,

debugfs -R 'icheck  块号'  具体磁盘或分区

如你的扇区号是相对sda2上面算出来的块号,那debugfs –R ‘icheck 块号’ /dev/sda2就可以找到对应的inode

根据inode你就可以找到对应的文件是什么了
find / -inum your_inode

有一个例子见淘宝牛人写的一篇http://blog.tao.ma/?p=61

附录:action含义

C – complete A previously issued request has been completed. The output

will detail the sector and size of that request, as well as the success or

failure of it.

D – issued A request that previously resided on the block layer queue or in

the io scheduler has been sent to the driver.

I – inserted A request is being sent to the io scheduler for addition to the

internal queue and later service by the driver. The request is fully formed

at this time.

Q – queued This notes intent to queue io at the given location. No real requests

exists yet.

B – bounced The data pages attached to this bio are not reachable by the

hardware and must be bounced to a lower memory location. This causes

a big slowdown in io performance, since the data must be copied to/from

kernel buffers. Usually this can be fixed with using better hardware -

either a better io controller, or a platform with an IOMMU.

m – message Text message generated via kernel call to blk add trace msg.

M – back merge A previously inserted request exists that ends on the boundary

of where this io begins, so the io scheduler can merge them together.

F – front merge Same as the back merge, except this io ends where a previously

inserted requests starts.

G – get request To send any type of request to a block device, a struct request

container must be allocated first.

S – sleep No available request structures were available, so the issuer has to

wait for one to be freed.

P – plug When io is queued to a previously empty block device queue, Linux

will plug the queue in anticipation of future ios being added before this

data is needed.

U – unplug Some request data already queued in the device, start sending

requests to the driver. This may happen automatically if a timeout period

has passed (see next entry) or if a number of requests have been added to

the queue.

T – unplug due to timer If nobody requests the io that was queued after

plugging the queue, Linux will automatically unplug it after a defined

period has passed.

X – split On raid or device mapper setups, an incoming io may straddle a

device or internal zone and needs to be chopped up into smaller pieces

for service. This may indicate a performance problem due to a bad setup

of that raid/dm device, but may also just be part of normal boundary

conditions. dm is notably bad at this and will clone lots of io.

A – remap For stacked devices, incoming io is remapped to device below it in

the io stack. The remap action details what exactly is being remapped to what.

Blktrace原理简介及使用相关推荐

  1. javascript原理_JavaScript程序包管理器工作原理简介

    javascript原理 by Shubheksha 通过Shubheksha JavaScript程序包管理器工作原理简介 (An introduction to how JavaScript pa ...

  2. Nginx 反向代理工作原理简介与配置详解

    Nginx 反向代理工作原理简介与配置详解 测试环境 CentOS 6.8-x86_64 nginx-1.10.0 下载地址:http://nginx.org/en/download.html 安装 ...

  3. DeepLearning tutorial(1)Softmax回归原理简介+代码详解

    FROM: http://blog.csdn.net/u012162613/article/details/43157801 DeepLearning tutorial(1)Softmax回归原理简介 ...

  4. DeepLearning tutorial(3)MLP多层感知机原理简介+代码详解

    FROM:http://blog.csdn.net/u012162613/article/details/43221829 @author:wepon @blog:http://blog.csdn.n ...

  5. DeepLearning tutorial(4)CNN卷积神经网络原理简介+代码详解

    FROM: http://blog.csdn.net/u012162613/article/details/43225445 DeepLearning tutorial(4)CNN卷积神经网络原理简介 ...

  6. 【Android 异步操作】Handler ( 主线程中的 Handler 与 Looper | Handler 原理简介 )

    文章目录 一.主线程中的 Handler 与 Looper 二.Handler 原理简介 一.主线程中的 Handler 与 Looper Android 系统中 , 点击图标启动一个应用进程 , 就 ...

  7. 量子计算机编程原理简介 和 机器学习

    量子计算机编程原理简介 和 机器学习 本文翻译自D-Wave公司网站 www.dwavesys.com/en/dev-tutorial-intro.html D-wave公司在2007年就声称实现了1 ...

  8. DL之CNN:卷积神经网络算法简介之原理简介——CNN网络的3D可视化(LeNet-5为例可视化)

    DL之CNN:卷积神经网络算法简介之原理简介--CNN网络的3D可视化(LeNet-5为例可视化) CNN网络的3D可视化 3D可视化地址:http://scs.ryerson.ca/~aharley ...

  9. DL之CNN:卷积神经网络算法简介之原理简介(步幅/填充/特征图)、七大层级结构(动态图详解卷积/池化+方块法理解卷积运算)、CNN各层作用及其可视化等之详细攻略

    DL之CNN:卷积神经网络算法简介之原理简介(步幅/填充/特征图).七大层级结构(动态图详解卷积/池化+方块法理解卷积运算).CNN各层作用及其可视化等之详细攻略 目录 CNN 的层级结构及相关概念 ...

最新文章

  1. CSCNN:新一代京东电商广告排序模型
  2. nodejs发送数据到html显示_用php生成HTML文件的类
  3. 三、CXF对Interceptor拦截器的支持
  4. jupyter notebook运行出错:ModuleNotFoundError: No module named ‘keras‘ 解决办法
  5. IC/FPGA笔试题分析(六)用16bit加法器IP核实现8bit乘法运算(文末彩蛋)
  6. python刷b站教程_python + selenium 刷B站播放量的实例代码
  7. java 正则首位8或者9的8位数字_Python 正则表达式re最完整的操作教程
  8. 阿里新晋CNCF TOC委员张磊:“云原生”为什么对云计算生态充满吸引力?
  9. char un 数组printf_c语言中能不能用printf函数直接输出数组?如printf(%d,a[3][3]);
  10. Ansible-----循环
  11. 李航老师亲自推荐的《统计学习方法》课件下载
  12. 姓名评分程序PHP,姓名测试打分,免费姓名评分测试,免费姓名测试评分 - 姓名算命最准的网站...
  13. 《自己动手写网络爬虫》笔记4-带偏好的网络爬虫
  14. Android 自动检测版本更新(包含强制更新)并安装
  15. Jquery各种插件下载
  16. Flutter 2.0 Null-Safety(空安全)使用和理解
  17. html表格第一列和最后一列冻结
  18. mysql 除号_MySql的运算符-阿里云开发者社区
  19. 容器技术在企业落地的最佳实践
  20. STM32之蜂鸣器实验

热门文章

  1. LQR的理解与运用 第一期——理解篇
  2. PostgreSQL导不了入数据
  3. 【ThreeJS基础教程-高级几何体篇】2.1更好的视觉效果-综合案例(2)
  4. MFC-最简单的MFC程序
  5. 电话号码或者姓名的隐藏小工具
  6. JSP连接数据库出现的问题
  7. 【数分书单】分析思维《一本小小的蓝色逻辑书》第三章小结
  8. QT实现浮层绘制、样式处理、显示时机、躲避屏幕边缘功能
  9. 神经网络预测值几乎一样,神经网络预测出现负值
  10. 3.nc在PWN中的使用