工欲善其事,必先利其器。本文主要介绍linux下crash工具常用命令的功能和使用。

背景知识

crash是redhat的工程师开发的,主要用来离线分析linux内核转存文件,它整合了gdb工具,功能非常强大。可以查看堆栈,dmesg日志,内核数据结构,反汇编等等。crash支持多种工具生成的转存文件格式,如kdump,LKCD,netdump和diskdump,而且还可以分析虚拟机Xen和Kvm上生成的内核转存文件。同时crash还可以调试运行时系统,直接运行crash即可,ubuntu下内核映象存放在/proc/kcore。

运行时系统调试

crash和linux内核是紧密耦合的,会随着内核的变化持续更新,它向前兼容的,新的crash工具可以分析老内核的转存文件。如果你的内核版本较新,crash无法解析,可以尝试安装最新的crash工具。

常用命令

下面介绍常用命令的使用,主要参考了crash_whitepaper和crash工具自带的帮助文档。crash_whitepaper介绍了开发的初衷,编译,命令的分类和使用以及如何添加自己的命令,是一个非常好的参考文献。我用的版本是crash-7.2.6和gdb-7.6,使用时可以使用“help command”来查看详细的帮助文档,详细的命令列表见附件。

帮助文档

crash在加载内核转存文件是会输出系统基本信息,如出问题的进程(bash - 2613),系统内存大小(7.9GB),系统架构(x86_64)等等,可以看到这个dump是sysrq触发的一个panic系统崩溃。

KERNEL: ../kernel-src/linux-4.19.53/vmlinux

DUMPFILE: crash/201907070732/dump.201907070732 [PARTIAL DUMP]

CPUS: 4

DATE: Sun Jul 7 07:31:34 2019

UPTIME: 00:10:27

LOAD AVERAGE: 0.14, 0.16, 0.12

TASKS: 584

NODENAME: glbian-OptiPlex-990

RELEASE: 4.19.53

VERSION: #1 SMP Sun Jun 23 11:01:25 CST 2019

MACHINE: x86_64 (3292 Mhz)

MEMORY: 7.9 GB

PANIC: "sysrq: SysRq : Trigger a crash"

PID: 2613

COMMAND: "bash"

TASK: ffff8b7df3cdae00 [THREAD_INFO: ffff8b7df3cdae00]

CPU: 2

STATE: TASK_RUNNING (SYSRQ)

查看堆栈

一般可以先查看堆栈(bt),看看系统死在什么地方,进而确定调查方向。可以看到这个dump的异常发生在sysrq的处理函数里面。

crash> bt

PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"

'#0 [ffffa0f442cd7a08] machine_kexec at ffffffff99a69313

'#1 [ffffa0f442cd7a68] __crash_kexec at ffffffff99b3e6b9

'#2 [ffffa0f442cd7b30] crash_kexec at ffffffff99b3f441

'#3 [ffffa0f442cd7b50] oops_end at ffffffff99a32bed

'#4 [ffffa0f442cd7b78] no_context at ffffffff99a7997c

'#5 [ffffa0f442cd7bd8] __bad_area_nosemaphore at ffffffff99a79d15

'#6 [ffffa0f442cd7c20] bad_area at ffffffff99a79f86

'#7 [ffffa0f442cd7c48] __do_page_fault at ffffffff99a7a486

'#8 [ffffa0f442cd7cc0] do_page_fault at ffffffff99a7a60d

'#9 [ffffa0f442cd7cf0] page_fault at ffffffff9a6010ae

[exception RIP: sysrq_handle_crash+22]

RIP: ffffffff9a034066 RSP: ffffa0f442cd7da8 RFLAGS: 00010286

RAX: ffffffff9a034050 RBX: 0000000000000063 RCX: 0000000000000006

RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000063

RBP: ffffa0f442cd7da8 R8: 00000000000002f2 R9: 0000000000000007

R10: 0000000000000000 R11: ffffffff9b39c3ed R12: 0000000000000004

R13: 0000000000000000 R14: ffffffff9afa7300 R15: ffff8b7de5af9100

ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

'#10 [ffffa0f442cd7db0] __handle_sysrq at ffffffff9a0347e8

'#11 [ffffa0f442cd7de0] write_sysrq_trigger at ffffffff9a034cbf

... ...

另外可以加参数显示函数偏移,函数所在的文件和每一帧的具体内容,从而对照源码和汇编代码,查看函数入参和局部变量。

crash> bt -slf

PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"

'#0 [ffffa0f442cd7a08] machine_kexec+451 at ffffffff99a69313

/home/glbian/data/kernel-src/linux-4.19.53/arch/x86/kernel/machine_kexec_64.c: 346

ffffa0f442cd7a10: 0000a0f442cd7a50 ffff8b7c40000000

ffffa0f442cd7a20: 0000000024001000 ffff8b7c64001000

ffffa0f442cd7a30: 0000000024000000 a05cedc0dfb99200

ffffa0f442cd7a40: a05cedc0dfb99200 ffffa0f442cd7cf8

ffffa0f442cd7a50: 0000000000000009 ffffa0f442cd7cf8

ffffa0f442cd7a60: ffffa0f442cd7b28 ffffffff99b3e6b9

... ...

’#8 [ffffa0f442cd7cc0] do_page_fault+45 at ffffffff99a7a60d

/home/glbian/data/kernel-src/linux-4.19.53/arch/x86/mm/fault.c: 1470

ffffa0f442cd7cc8: ffff8b7e6500d140 0000000000000000

ffffa0f442cd7cd8: 0000000000000000 0000000000000000

ffffa0f442cd7ce8: ffffa0f442cd7cf9 ffffffff9a6010ae

'#9 [ffffa0f442cd7cf0] page_fault+30 at ffffffff9a6010ae

/home/glbian/data/kernel-src/linux-4.19.53/arch/x86/entry/entry_64.S: 1181

[exception RIP: sysrq_handle_crash+22]

RIP: ffffffff9a034066 RSP: ffffa0f442cd7da8 RFLAGS: 00010286

RAX: ffffffff9a034050 RBX: 0000000000000063 RCX: 0000000000000006

RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000063

RBP: ffffa0f442cd7da8 R8: 00000000000002f2 R9: 0000000000000007

R10: 0000000000000000 R11: ffffffff9b39c3ed R12: 0000000000000004

R13: 0000000000000000 R14: ffffffff9afa7300 R15: ffff8b7de5af9100

ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

/home/glbian/data/kernel-src/linux-4.19.53/drivers/tty/sysrq.c: 147

ffffa0f442cd7cf8: ffff8b7de5af9100 ffffffff9afa7300

ffffa0f442cd7d08: 0000000000000000 0000000000000004

ffffa0f442cd7d18: ffffa0f442cd7da8 0000000000000063

ffffa0f442cd7d28: ffffffff9b39c3ed 0000000000000000

ffffa0f442cd7d38: 0000000000000007 00000000000002f2

ffffa0f442cd7d48: ffffffff9a034050 0000000000000006

ffffa0f442cd7d58: 0000000000000000 0000000000000096

ffffa0f442cd7d68: 0000000000000063 ffffffffffffffff

ffffa0f442cd7d78: ffffffff9a034066 0000000000000010

ffffa0f442cd7d88: 0000000000010286 ffffa0f442cd7da8

ffffa0f442cd7d98: 0000000000000018 0000000000000000

ffffa0f442cd7da8: ffffa0f442cd7dd8 ffffffff9a0347e8

'#10 [ffffa0f442cd7db0] __handle_sysrq+136 at ffffffff9a0347e8

/home/glbian/data/kernel-src/linux-4.19.53/drivers/tty/sysrq.c: 583

ffffa0f442cd7db8: 0000000000000002 fffffffffffffffb

ffffa0f442cd7dc8: ffffa0f442cd7ee8 0000563d45717780

ffffa0f442cd7dd8: ffffa0f442cd7df0 ffffffff9a034cbf

... ...

可以用dis命令进行返汇编,查看对应地址的代码逻辑。

>crash> dis -r ffffffff9a6010ae

0xffffffff9a601090 : data32 xchg %ax,%ax

0xffffffff9a601093 : callq 0xffffffff9a601230

0xffffffff9a601098 : mov %rsp,%rdi

0xffffffff9a60109b : mov 0x78(%rsp),%rsi

0xffffffff9a6010a0 : movq $0xffffffffffffffff,0x78(%rsp)

0xffffffff9a6010a9 : callq 0xffffffff99a7a5e0

0xffffffff9a6010ae : jmpq 0xffffffff9a601330

>crash> dis -f ffffffff9a6010ae

0xffffffff9a6010ae : jmpq 0xffffffff9a601330

0xffffffff9a6010b3 : nopl (%rax)

0xffffffff9a6010b6 : nopw %cs:0x0(%rax,%rax,1)

有时会出现堆栈被破坏的情况,可以用-t/-T来把整个stack的信息dump出来,往往可以看到一些蛛丝马迹。

crash> bt -t

PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"

START: machine_kexec at ffffffff99a69313

[ffffa0f442cd7a08] machine_kexec at ffffffff99a69313

[ffffa0f442cd7a68] __crash_kexec at ffffffff99b3e6b9

[ffffa0f442cd7ac0] sysrq_handle_crash at ffffffff9a034050

[ffffa0f442cd7af0] sysrq_handle_crash at ffffffff9a034066

[ffffa0f442cd7b30] crash_kexec at ffffffff99b3f441

[ffffa0f442cd7b38] __die at ffffffff99a33375

[ffffa0f442cd7b50] oops_end at ffffffff99a32bed

[ffffa0f442cd7b78] no_context at ffffffff99a7997c

[ffffa0f442cd7bd8] __bad_area_nosemaphore at ffffffff99a79d15

[ffffa0f442cd7c20] bad_area at ffffffff99a79f86

[ffffa0f442cd7c48] __do_page_fault at ffffffff99a7a486

[ffffa0f442cd7cc0] do_page_fault at ffffffff99a7a60d

[ffffa0f442cd7cf0] page_fault at ffffffff9a6010ae

[ffffa0f442cd7d48] sysrq_handle_crash at ffffffff9a034050

[ffffa0f442cd7d78] sysrq_handle_crash at ffffffff9a034066

[ffffa0f442cd7db0] __handle_sysrq at ffffffff9a0347e8

[ffffa0f442cd7de0] write_sysrq_trigger at ffffffff9a034cbf

[ffffa0f442cd7df8] proc_reg_write at ffffffff99d2a0ee

[ffffa0f442cd7e18] __vfs_write at ffffffff99ca8a0a

[ffffa0f442cd7e40] apparmor_file_permission at ffffffff99e53a0a

[ffffa0f442cd7e50] security_file_permission at ffffffff99e06cf1

[ffffa0f442cd7e78] _cond_resched at ffffffff9a4153f9

[ffffa0f442cd7ea0] vfs_write at ffffffff99ca8d11

[ffffa0f442cd7ed8] ksys_write at ffffffff99ca8fcc

[ffffa0f442cd7f20] __x64_sys_write at ffffffff99ca906a

[ffffa0f442cd7f30] do_syscall_64 at ffffffff99a0428a

[ffffa0f442cd7f50] entry_SYSCALL_64_after_hwframe at ffffffff9a600088

RIP: 00007ff47e1ef154 RSP: 00007ffee9226298 RFLAGS: 00000246

RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff47e1ef154

RDX: 0000000000000002 RSI: 0000563d45717780 RDI: 0000000000000001

RBP: 0000563d45717780 R8: 000000000000000a R9: 0000000000000001

R10: 000000000000000a R11: 0000000000000246 R12: 00007ff47e4cb760

R13: 0000000000000002 R14: 00007ff47e4c72a0 R15: 00007ff47e4c6760

ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b

默认bt会dump问题线程的场景,还可以用bt -a/-c查看所有当前CPU或指定cpu的堆栈。

crash> bt -c 1

PID: 0 TASK: ffff8b7e64165c00 CPU: 1 COMMAND: "swapper/1"

'#0 [fffffe0000034e38] crash_nmi_callback at ffffffff99a5d3d7

'#1 [fffffe0000034e48] nmi_handle at ffffffff99a33691

... ...

'#12 [ffffa0f440cd7f50] secondary_startup_64 at ffffffff99a000d4

crash> bt -a

PID: 0 TASK: ffffffff9ae13740 CPU: 0 COMMAND: "swapper/0"

... ...

PID: 0 TASK: ffff8b7e64165c00 CPU: 1 COMMAND: "swapper/1"

... ...

PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"

... ...

PID: 0 TASK: ffff8b7e642c4500 CPU: 3 COMMAND: "swapper/3"

... ...

也可以用set命令来改变线程环境,从而查看别的cpu上的堆栈情况。

crash> set 1

PID: 1

COMMAND: "systemd"

TASK: ffff8b7e6413c500 [THREAD_INFO: ffff8b7e6413c500]

CPU: 3

STATE: TASK_INTERRUPTIBLE

crash> bt

PID: 1 TASK: ffff8b7e6413c500 CPU: 3 COMMAND: "systemd"

'#0 [ffffa0f440c6fce0] __schedule at ffffffff9a414ba7

'#1 [ffffa0f440c6fd80] schedule at ffffffff9a41519c

'#2 [ffffa0f440c6fd90] schedule_hrtimeout_range_clock at ffffffff9a419691

'#3 [ffffa0f440c6fe20] schedule_hrtimeout_range at ffffffff9a4196b3

'#4 [ffffa0f440c6fe30] ep_poll at ffffffff99cf8941

'#5 [ffffa0f440c6fee0] do_epoll_wait at ffffffff99cf8ae0

'#6 [ffffa0f440c6ff20] __x64_sys_epoll_wait at ffffffff99cf8b0e

'#7 [ffffa0f440c6ff30] do_syscall_64 at ffffffff99a0428a

'#8 [ffffa0f440c6ff50] entry_SYSCALL_64_after_hwframe at ffffffff9a600088

RIP: 00007ffa791c6bb7 RSP: 00007ffc1c00b9d0 RFLAGS: 00000293

RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007ffa791c6bb7

RDX: 00000000000000eb RSI: 00007ffc1c00ba10 RDI: 0000000000000004

RBP: 00007ffc1c00ba10 R8: 0000000000000000 R9: 7465677261742e79

R10: 00000000ffffffff R11: 0000000000000293 R12: 00000000000000eb

R13: 00000000ffffffff R14: 00007ffc1c00ba10 R15: 0000000000000001

ORIG_RAX: 00000000000000e8 CS: 0033 SS: 002b

系统日志

log命令可以用来查看系统的日志,“log -a”可以读取还没有从内核日志缓存到用户空间日志缓存的日志。

也可以重定向到文件(log > logfile)。

crash> log

... ...

[ 1610.759133] sysrq: SysRq : Trigger a crash

[ 1610.759147] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000

[ 1610.759150] PGD 0 P4D 0

[ 1610.759154] Oops: 0002 [#1] SMP PTI

[ 1610.759159] CPU: 2 PID: 2613 Comm: bash Kdump: loaded Not tainted 4.19.53 #1

[ 1610.759161] Hardware name: Dell Inc. OptiPlex 990/0RVG2C, BIOS A13 04/02/2012

[ 1610.759167] RIP: 0010:sysrq_handle_crash+0x16/0x20

[ 1610.759170] Code: e8 9f fb ff ff e9 c0 fe ff ff 90 90 90 90 90 90 90 90 90 90 66 66 66 66 90 55 48 89 e5 c7 05 85 10 36 01 01 00 00 00 0f ae f8 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 c7 05 40 fa e2 00

[ 1610.759173] RSP: 0018:ffffa0f442cd7da8 EFLAGS: 00010286

[ 1610.759176] RAX: ffffffff9a034050 RBX: 0000000000000063 RCX: 0000000000000006

[ 1610.759178] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000063

[ 1610.759180] RBP: ffffa0f442cd7da8 R08: 00000000000002f2 R09: 0000000000000007

[ 1610.759182] R10: 0000000000000000 R11: ffffffff9b39c3ed R12: 0000000000000004

[ 1610.759184] R13: 0000000000000000 R14: ffffffff9afa7300 R15: ffff8b7de5af9100

[ 1610.759186] FS: 00007ff47eb0a740(0000) GS:ffff8b7e65880000(0000) knlGS:0000000000000000

[ 1610.759189] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[ 1610.759191] CR2: 0000000000000000 CR3: 0000000205db0003 CR4: 00000000000606e0

[ 1610.759193] Call Trace:

[ 1610.759199] __handle_sysrq+0x88/0x140

[ 1610.759203] write_sysrq_trigger+0x2f/0x40

[ 1610.759208] proc_reg_write+0x3e/0x60

[ 1610.759212] __vfs_write+0x3a/0x190

[ 1610.759216] ? apparmor_file_permission+0x1a/0x20

[ 1610.759220] ? security_file_permission+0x31/0xc0

[ 1610.759224] ? _cond_resched+0x19/0x40

[ 1610.759226] vfs_write+0xb1/0x1a0

[ 1610.759229] ksys_write+0x5c/0xe0

[ 1610.759232] __x64_sys_write+0x1a/0x20

[ 1610.759237] do_syscall_64+0x5a/0x120

[ 1610.759241] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 1610.759245] RIP: 0033:0x7ff47e1ef154

[ 1610.759247] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 b1 07 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5

[ 1610.759249] RSP: 002b:00007ffee9226298 EFLAGS: 00000246 ORIG_RAX: 0000000000000001

[ 1610.759252] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff47e1ef154

[ 1610.759254] RDX: 0000000000000002 RSI: 0000563d45717780 RDI: 0000000000000001

[ 1610.759256] RBP: 0000563d45717780 R08: 000000000000000a R09: 0000000000000001

[ 1610.759258] R10: 000000000000000a R11: 0000000000000246 R12: 00007ff47e4cb760

[ 1610.759260] R13: 0000000000000002 R14: 00007ff47e4c72a0 R15: 00007ff47e4c6760

[ 1610.759263] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic pcbc aesni_intel snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep aes_x86_64 snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi input_leds crypto_simd cryptd snd_seq snd_seq_device snd_timer dcdbas snd glue_helper intel_cstate intel_rapl_perf lpc_ich serio_raw soundcore sch_fq_codel mei_me mei mac_hid parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid uas usb_storage i915 kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass i2c_algo_bit cec rc_core drm_kms_helper psmouse syscopyarea sysfillrect video sysimgblt fb_sys_fops ahci drm libahci e1000e

[ 1610.759320] CR2: 0000000000000000

查看数据结构

struct和union可以用来查看结构体和共用体,用法相同,下面看一些struct

打印的例子。把指定地址的内容以task_struct结构体解析打印,如果不带地址会显示结构体定义和大小。

1 打印task_struct结构体

crash> task_struct ffff8b7df3cdae00 -x

struct task_struct {

thread_info = {

flags = 0x80000000,

status = 0x0

},

state = 0x0,

stack = 0xffffa0f442cd4000,

usage = {

counter = 0x2

},

... ...

2 打印task_struct定义和大小。

struct task_struct {

[0x0] struct thread_info thread_info;

[0x10] volatile long state;

[0x18] void *stack;

... ...

[0x1288] void *security;

[0x12c0] struct thread_struct thread;

}

SIZE: 0x23c0

3 查看成员变量

crash> task_struct.stack_refcount ffff8b7df3cdae00 -xo

struct task_struct {

[ffff8b7df3cdc080] atomic_t stack_refcount;

}

4 查看指针成员变量

crash> task_struct.mm ffff8b7df3cdae00

mm = 0xffff8b7e5af06600

crash> task_struct.mm ffff8b7df3cdae00 -p

struct mm_struct *mm = 0xffff8b7e5af06600

-> {

{

mmap = 0xffff8b7dec0520c8,

mm_rb = {

rb_node = 0xffff8b7dec003b78

},

vmacache_seqnum = 17,

get_unmapped_area = 0xffffffff99a35760,

此外还可以查看数组内容,per-cpu变量,以及其他一些功能,详细可参考帮助文档。

查看和搜索内存

除了打印数据结构,有时需要查看和搜索内存内容,看有没有制定的数据模式。

1 查看系统版本信息

crash> rd -a linux_banner

ffffffff9aa00100: Linux version 4.19.53 (glbian@glbian-OptiPlex-990) (gcc vers

ffffffff9aa0013c: ion 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #1 SMP Sun Jun 23

ffffffff9aa00178: 11:01:25 CST 2019

查看内存内容

crash> rd ffffa0f442cd7a08 32

ffffa0f442cd7a08: ffffffff99a69313 0000a0f442cd7a50 ........Pz.B....

ffffa0f442cd7a18: ffff8b7c40000000 0000000024001000 ...@|......

....

ffffa0f442cd7a38: a05cedc0dfb99200 a05cedc0dfb99200 ..............

ffffa0f442cd7a48: ffffa0f442cd7cf8 0000000000000009 .|.B............

ffffa0f442cd7a58: ffffa0f442cd7cf8 ffffa0f442cd7b28 .|.B....({.B....

ffffa0f442cd7a68: ffffffff99b3e6b9 ffff8b7de5af9100 ............}...

ffffa0f442cd7a78: ffffffff9afa7300 0000000000000000 .s..............

ffffa0f442cd7a88: 0000000000000004 ffffa0f442cd7da8 .........}.B....

ffffa0f442cd7a98: 0000000000000063 ffffffff9b39c3ed c.........9.....

ffffa0f442cd7aa8: 0000000000000000 0000000000000007 ................

ffffa0f442cd7ab8: 00000000000002f2 ffffffff9a034050 ........P@......

ffffa0f442cd7ac8: 0000000000000006 0000000000000000 ................

ffffa0f442cd7ad8: 0000000000000096 0000000000000063 ........c.......

ffffa0f442cd7ae8: ffffffffffffffff ffffffff9a034066 ........f@......

ffffa0f442cd7af8: 0000000000000010 0000000000010286 ................

3 打印符号表

crash> rd ffffa0f442cd7a08 32 -s

ffffa0f442cd7a08: machine_kexec+451 0000a0f442cd7a50

ffffa0f442cd7a18: ffff8b7c40000000 0000000024001000

ffffa0f442cd7a28: ffff8b7c64001000 0000000024000000

ffffa0f442cd7a38: a05cedc0dfb99200 a05cedc0dfb99200

ffffa0f442cd7a48: ffffa0f442cd7cf8 0000000000000009

ffffa0f442cd7a58: ffffa0f442cd7cf8 ffffa0f442cd7b28

ffffa0f442cd7a68: __crash_kexec+105 ffff8b7de5af9100

ffffa0f442cd7a78: sysrq_crash_op 0000000000000000

ffffa0f442cd7a88: 0000000000000004 ffffa0f442cd7da8

ffffa0f442cd7a98: 0000000000000063 text.45672+13

ffffa0f442cd7aa8: 0000000000000000 0000000000000007

ffffa0f442cd7ab8: 00000000000002f2 sysrq_handle_crash

ffffa0f442cd7ac8: 0000000000000006 0000000000000000

ffffa0f442cd7ad8: 0000000000000096 0000000000000063

ffffa0f442cd7ae8: ffffffffffffffff sysrq_handle_crash+22

ffffa0f442cd7af8: 0000000000000010 0000000000010286

4 查看指定内存区域内容

crash> rd ffffa0f442cd7a08 -e ffffa0f442cd7a68

ffffa0f442cd7a08: ffffffff99a69313 0000a0f442cd7a50 ........Pz.B....

ffffa0f442cd7a18: ffff8b7c40000000 0000000024001000 ...@|......

....

ffffa0f442cd7a38: a05cedc0dfb99200 a05cedc0dfb99200 ..............

ffffa0f442cd7a48: ffffa0f442cd7cf8 0000000000000009 .|.B............

ffffa0f442cd7a58: ffffa0f442cd7cf8 ffffa0f442cd7b28 .|.B....({.B....

5 搜索指定内存

crash> search -s ffffa0f442cd7a08 -e ffffa0f442cd7db0 ffffffff9b39c3ed

ffffa0f442cd7aa0: ffffffff9b39c3ed

ffffa0f442cd7d28: ffffffff9b39c3ed

6 搜索匹配数据

crash> search -p babe0000 -m ffff

1c4cc6530: babec685

21f7d35b8: babe4550

crash>

查看线程状态

1 查看所有线程状态

crash> ps

PID PPID CPU TASK ST %MEM VSZ RSS COMM

0 0 0 ffffffff9ae13740 RU 0.0 0 0 [swapper/0]

0 0 1 ffff8b7e64165c00 RU 0.0 0 0 [swapper/1]

0 0 2 ffff8b7e64162e00 RU 0.0 0 0 [swapper/2]

0 0 3 ffff8b7e642c4500 RU 0.0 0 0 [swapper/3]

1 0 3 ffff8b7e6413c500 IN 0.1 225916 9716 systemd

2 0 2 ffff8b7e64138000 IN 0.0 0 0 [kthreadd]

2 查看父线程树

crash> ps -p 2613

PID: 0 TASK: ffffffff9ae13740 CPU: 0 COMMAND: "swapper/0"

PID: 1 TASK: ffff8b7e6413c500 CPU: 3 COMMAND: "systemd"

PID: 1081 TASK: ffff8b7e5dc81700 CPU: 1 COMMAND: "gdm3"

PID: 2114 TASK: ffff8b7e584f2e00 CPU: 0 COMMAND: "gdm-session-wor"

PID: 2136 TASK: ffff8b7e63cc4500 CPU: 1 COMMAND: "gdm-x-session"

PID: 2149 TASK: ffff8b7e5dfaae00 CPU: 0 COMMAND: "gnome-session-b"

PID: 2254 TASK: ffff8b7e5e04dc00 CPU: 0 COMMAND: "gnome-shell"

PID: 2582 TASK: ffff8b7dec3bae00 CPU: 0 COMMAND: "terminator"

PID: 2592 TASK: ffff8b7dec05ae00 CPU: 1 COMMAND: "bash"

PID: 2611 TASK: ffff8b7df3f8ae00 CPU: 0 COMMAND: "sudo"

PID: 2612 TASK: ffff8b7dec3b9700 CPU: 3 COMMAND: "su"

PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"

3 查看子线程

crash> ps -c 2582

PID: 2582 TASK: ffff8b7dec3bae00 CPU: 0 COMMAND: "terminator"

PID: 2592 TASK: ffff8b7dec05ae00 CPU: 1 COMMAND: "bash"

PID: 2600 TASK: ffff8b7df3f88000 CPU: 0 COMMAND: "bash"

PID: 2787 TASK: ffff8b7df9f80000 CPU: 3 COMMAND: "bash"

4 查看线程运行时间

crash> ps -t 2613

PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"

RUN TIME: 00:00:00

START TIME: 1296209749767

UTIME: 36000000

STIME: 16000000

5 查看活动线程

crash> ps -A

PID PPID CPU TASK ST %MEM VSZ RSS COMM

0 0 0 ffffffff9ae13740 RU 0.0 0 0 [swapper/0]

0 0 1 ffff8b7e64165c00 RU 0.0 0 0 [swapper/1]

0 0 3 ffff8b7e642c4500 RU 0.0 0 0 [swapper/3]

2613 2612 2 ffff8b7df3cdae00 RU 0.0 28708 4352 bash

6 查看内核线程

crash> ps -k

PID PPID CPU TASK ST %MEM VSZ RSS COMM

0 0 0 ffffffff9ae13740 RU 0.0 0 0 [swapper/0]

0 0 1 ffff8b7e64165c00 RU 0.0 0 0 [swapper/1]

0 0 2 ffff8b7e64162e00 RU 0.0 0 0 [swapper/2]

0 0 3 ffff8b7e642c4500 RU 0.0 0 0 [swapper/3]

2 0 2 ffff8b7e64138000 IN 0.0 0 0 [kthreadd]

7 查看用户态线程

crash> ps -u

PID PPID CPU TASK ST %MEM VSZ RSS COMM

1 0 3 ffff8b7e6413c500 IN 0.1 225916 9716 systemd

298 1 3 ffff8b7e5879c500 IN 0.4 126508 38028 systemd-journal

318 1 0 ffff8b7e584f5c00 IN 0.1 48004 6360 systemd-udevd

822 1 2 ffff8b7e59c71700 IN 0.1 70756 6176 systemd-resolve

824 1 2 ffff8b7e586e5c00 IN 0.1 146108 5540 systemd-timesyn

834 1 3 ffff8b7e63881700 IN 0.1 146108 5540 sd-resolve

863 1 3 ffff8b7e5d790000 IN 0.1 51612 6112 dbus-daemon

864 1 1 ffff8b7e5d794500 IN 0.1 427264 9404 ModemManager

8 查看最后运行时间戳

crash> ps -l

[1610759003323] [IN] PID: 2582 TASK: ffff8b7dec3bae00 CPU: 0 COMMAND: "terminator"

[1610758998404] [ID] PID: 211 TASK: ffff8b7e585aae00 CPU: 3 COMMAND: "kworker/u32:5"

[1610758938747] [RU] PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"

[1610758009873] [IN] PID: 2587 TASK: ffff8b7e06cd5c00 CPU: 2 COMMAND: "gdbus"

crash> ps -m

[0 00:00:00.000] [IN] PID: 2582 TASK: ffff8b7dec3bae00 CPU: 0 COMMAND: "terminator"

[0 00:00:00.000] [ID] PID: 211 TASK: ffff8b7e585aae00 CPU: 3 COMMAND: "kworker/u32:5"

[0 00:00:00.000] [RU] PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"

[0 00:00:00.000] [IN] PID: 2587 TASK: ffff8b7e06cd5c00 CPU: 2 COMMAND: "gdbus"

[0 00:00:00.001] [IN] PID: 2138 TASK: ffff8b7e26801700 CPU: 0 COMMAND: "Xorg"

9 查看线程资源限制

crash> ps -r 2613

PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"

RLIMIT CURRENT MAXIMUM

CPU (unlimited) (unlimited)

FSIZE (unlimited) (unlimited)

DATA (unlimited) (unlimited)

STACK 8388608 (unlimited)

CORE 0 (unlimited)

RSS (unlimited) (unlimited)

NPROC 30393 30393

NOFILE 1024 1048576

MEMLOCK 16777216 16777216

AS (unlimited) (unlimited)

LOCKS (unlimited) (unlimited)

SIGPENDING 30393 30393

MSGQUEUE 819200 819200

NICE 0 0

RTPRIO 0 0

RTTIME (unlimited) (unlimited)

Context切换

有些命令是线程上线文相关的,比如bt,可以用set命令来进行线程上下文切换。

1 切换到指定线程

crash> set ffff8b7e6413c500

PID: 1

COMMAND: "systemd"

TASK: ffff8b7e6413c500 [THREAD_INFO: ffff8b7e6413c500]

CPU: 3

STATE: TASK_INTERRUPTIBLE

crash> bt

PID: 1 TASK: ffff8b7e6413c500 CPU: 3 COMMAND: "systemd"

'#0 [ffffa0f440c6fce0] __schedule at ffffffff9a414ba7

'#1 [ffffa0f440c6fd80] schedule at ffffffff9a41519c

'#2 [ffffa0f440c6fd90] schedule_hrtimeout_range_clock at ffffffff9a419691

'#3 [ffffa0f440c6fe20] schedule_hrtimeout_range at ffffffff9a4196b3

'#4 [ffffa0f440c6fe30] ep_poll at ffffffff99cf8941

'#5 [ffffa0f440c6fee0] do_epoll_wait at ffffffff99cf8ae0

'#6 [ffffa0f440c6ff20] __x64_sys_epoll_wait at ffffffff99cf8b0e

'#7 [ffffa0f440c6ff30] do_syscall_64 at ffffffff99a0428a

'#8 [ffffa0f440c6ff50] entry_SYSCALL_64_after_hwframe at ffffffff9a600088

RIP: 00007ffa791c6bb7 RSP: 00007ffc1c00b9d0 RFLAGS: 00000293

RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007ffa791c6bb7

RDX: 00000000000000eb RSI: 00007ffc1c00ba10 RDI: 0000000000000004

RBP: 00007ffc1c00ba10 R8: 0000000000000000 R9: 7465677261742e79

R10: 00000000ffffffff R11: 0000000000000293 R12: 00000000000000eb

R13: 00000000ffffffff R14: 00007ffc1c00ba10 R15: 0000000000000001

ORIG_RAX: 00000000000000e8 CS: 0033 SS: 002b

2 切会panic线程

crash> set -p

PID: 2613

COMMAND: "bash"

TASK: ffff8b7df3cdae00 [THREAD_INFO: ffff8b7df3cdae00]

CPU: 2

STATE: TASK_RUNNING (SYSRQ)

加载module符号表

1 查看当前加载的module

crash> mod

MODULE NAME SIZE OBJECT FILE

ffffffffc019d0c0 vfio_iommu_type1 24576 (not loaded) [CONFIG_KALLSYMS]

ffffffffc01a4440 uas 24576 (not loaded) [CONFIG_KALLSYMS]

ffffffffc01b0b40 rc_core 45056 (not loaded) [CONFIG_KALLSYMS]

ffffffffc01e76c0 e1000e 249856 (not loaded) [CONFIG_KALLSYMS]

ffffffffc01fcbc0 usbhid 49152 (not loaded) [CONFIG_KALLSYMS]

ffffffffc0207580 libahci 32768 (not loaded) [CONFIG_KALLSYMS]

2 加载所有module符号表

crash> mod -S

MODULE NAME SIZE OBJECT FILE

ffffffffc019d0c0 vfio_iommu_type1 24576 /lib/modules/4.19.53/kernel/drivers/vfio/vfio_iommu_type1.ko

ffffffffc01a4440 uas 24576 /lib/modules/4.19.53/kernel/drivers/usb/storage/uas.ko

ffffffffc01b0b40 rc_core 45056 /lib/modules/4.19.53/kernel/drivers/media/rc/rc-core.ko

ffffffffc01e76c0 e1000e 249856 /lib/modules/4.19.53/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko

ffffffffc01fcbc0 usbhid 49152 /lib/modules/4.19.53/kernel/drivers/hid/usbhid/usbhid.ko

3 加载指定module符号表

crash> mod -s rc_core /lib/modules/4.19.53/kernel/drivers/media/rc/rc-core.ko

MODULE NAME SIZE OBJECT FILE

ffffffffc01b0b40 rc_core 45056 /lib/modules/4.19.53/kernel/drivers/media/rc/rc-core.ko

crash> mod

MODULE NAME SIZE OBJECT FILE

ffffffffc019d0c0 vfio_iommu_type1 24576 (not loaded) [CONFIG_KALLSYMS]

ffffffffc01a4440 uas 24576 (not loaded) [CONFIG_KALLSYMS]

ffffffffc01b0b40 rc_core 45056 /lib/modules/4.19.53/kernel/drivers/media/rc/rc-core.ko

ffffffffc01e76c0 e1000e 249856 (not loaded) [CONFIG_KALLSYMS]

ffffffffc01fcbc0 usbhid 49152 (not loaded) [CONFIG_KALLSYMS]

其他命令

还有很多针对某些内核模块的命令,比如kmem,vm,tree,list,pte等等,参考附件命令列表,后面在使用过程中再学习和研究。

命令扩展

crash还支持用户添加在自己的调试命令。可以直接在Crash源码里添加新的命令,更多的是创建一个共享库,用extend动态加载。帮助文档里有一个简单的例子,在crash源码目录下新建一个test.c,把示例代码拷贝进去,就可以进行编译。

gcc -nostartfiles -shared -rdynamic -o echo.so echo.c -fPIC -D $(TARGET_CFLAGS)

crash> sys

KERNEL: ../../kernel-src/linux-4.19.53/vmlinux

DUMPFILE: 201907070732/dump.201907070732 [PARTIAL DUMP]

CPUS: 4

DATE: Sun Jul 7 07:31:34 2019

UPTIME: 00:10:27

LOAD AVERAGE: 0.14, 0.16, 0.12

TASKS: 584

NODENAME: glbian-OptiPlex-990

RELEASE: 4.19.53

VERSION: #1 SMP Sun Jun 23 11:01:25 CST 2019

MACHINE: x86_64 (3292 Mhz)

MEMORY: 7.9 GB

PANIC: "sysrq: SysRq : Trigger a crash"

可以用sys命令查看机器架构,我的及其machine-type选x86-64,编译命令如下:gcc -shared -rdynamic -o test.so test.c -fPIC -Dx86_64 _D_FILE_OFFSET_BITS=64

生成test.so。可以用extend直接加载,加载成功后可以看到帮助菜单多了一条echo命令,我们可以基于echo示例开发自己的命令。

crash> extend ../../src/crash-7.2.6/test.so

../../src/crash-7.2.6/test.so: shared object loaded

crash> extend

SHARED OBJECT COMMANDS

../../src/crash-7.2.6/test.so echo

crash> help

‘* extend mach runq union

alias files mod search vm

ascii foreach mount set vtop

bpf fuser net sig waitq

bt gdb p struct whatis

btop help ps swap wr

dev ipcs pte sym q

dis irq ptob sys

echo kmem ptov task

eval list rd timer

exit log repeat tree

结语

系统崩溃通常是非常棘手的问题,需要非常熟悉内核和相应的子模块,再结合crash工具进行分析,总之需要在实践中累积经验,实践出真知。

附件

Crash命令列表

命令

功能

*

指针快捷健

alias

命令快捷键

ascii

ASCII码转换和码表

bpf

eBPF - extended Berkeley Filter

bt

堆栈查看

btop

地址页表转换

dev

设备数据查询

dis

返汇编

eval

计算器

exit

退出

extend

命令扩展

files

打开的文件查看

foreach

循环查看

fuser

文件使用者查看

gdb

调用gdb执行命令

help

帮助

ipcs

查看system V IPC工具

irq

查看irq数据

kmem

查看Kernel内存

list

查看链表

log

查看系统消息缓存

mach

查看平台信息

mod

加载符号表

mount

Mount文件系统数据

net

网络命令

p

查看数据结构

ps

查看进程状态信息

pte

查看页表

ptob

页表地址转换

ptov

物理地址虚拟地址转换

rd

查看内存

repeat

重复执行

runq

查看run queue上的线程

search

搜索内存

set

设置线程环境和Crash内部变量

sig

查询线程消息

struct

查询结构体

swap

查看swap信息

sym

符号和虚拟地址转换

sys

查看系统信息

task

查看task_struct和thread_thread信息

timer

查看timer队列

tree

查看radix树和rb树

union

查看union结构体

vm

查看虚拟内存

vtop

虚拟地址物理地址转换

waitq

查看wait queue上的进程

whatis

符号表查询

wr

改写内存

q

退出

图片发自简书App

linux crash,系统崩溃 - crash工具介绍相关推荐

  1. linux嵌入式系统程序加密工具推荐:Virbox Protector.

    linux嵌入式系统程序加密工具推荐:Virbox Protector. 软件开发商 Virbox Protector是由北京深思数盾科技股份有限公司研发,2018年上市的一款加密工具. ARM-Li ...

  2. Linux -- ***检测系统(IDS)介绍及应用(1)

    一.***检测工具简介 Internet上的服务器一般都会被安置在防火墙的DMZ(Demilitarized Zone)区,受到防火墙的保护.这在一定程度可以防止具有已知非法特征的危险连接和恶意*** ...

  3. Linux内核调试原理和工具介绍--理解静态插装/动态插装、tracepoint、ftrace、kprobe、SystemTap、Perf、eBPF

    可以将linux跟踪系统分成Tracer(跟踪数据来自哪里),数据收集分析(如"ftrace")和跟踪前端(更方便的用户态工具). 1. 数据源(Tracers) printk 是 ...

  4. xp linux双系统引导修复工具下载,双系统引导修复工具下载

    双系统引导修复工具是一款可以帮助你的电脑进行双系统引导修复的系统软件,当你安装两个系统的时候其中一个系统出现问题的时候就可以使用它来修复了,非常方便,还能把不用的系统删掉,需要的朋友快来当易网下载使用 ...

  5. win7 linux双系统引导修复工具,给你传授双系统引导修复工具【搞定指南】

    win7系统有很多人都喜欢使用,我们操作的过程中常常会碰到win7系统双系统引导修复工具的问题.如果遇到win7系统双系统引导修复工具的问题该怎么办呢?很多电脑水平薄弱的网友不知道win7系统双系统引 ...

  6. 【代理设置】Linux Windows 系统下各工具设置代理方式笔记(整理中)

    背景:公司局域网环境,只提供代理服务器&代理账号的方式上网.Linux服务器各种工具代理设置方法整理如下: 1. 命令行界面设置代理 命令行界面的一般代理设置方法:在profile文件中设置相 ...

  7. Windows系统自带工具介绍

    文章目录 1 Windows系统工具 1.1 系统增强工具PowerToys 1.1.1 简介 1.1.2 使用 1.2 增强版任务管理器 Process Explorer 1.2.1 简介 1.2. ...

  8. linux的系统移植——交叉编译工具集

    1.交叉编译工具集 \qquad 在我们安装交叉工具集时,不仅安装了arm-linux-gnueabi-gcc ,还安装了很多工具,它们共同构成交叉编译工具集. arm-linux-gnueabi-a ...

  9. Linux测试系统稳定性的工具,Linux桌面操作系统稳定性测试.pdf

    Linux桌面操作系统稳定性测试 第 33 卷 第 7 期 计 算 机 工 程 2007 年 4 月 Vol.33 No.7 Computer Engineering April 2007 ·软件技术 ...

最新文章

  1. EXCEL中数据的自动匹配主要包含的内容
  2. 连续举办了十七年的韩国大学生智能车竞赛谢幕了
  3. 探索区块链-挖框体验
  4. The Digits String
  5. eclipse 摁住ctrl 键卡死
  6. c#.net2005 调用evc4.0生成的dll文件
  7. 智能家居 (5) —— LD3320语音模块二次开发
  8. 云上安全工作乱如麻,等保2.0来一下
  9. vue页面跳转数据传递
  10. 比特币在推特上的活跃度正接近2017年水平
  11. DAVINCI DM6446 开发攻略——V4L2视频驱动和应用分析
  12. 【认知无线网络】认知无线网络基础知识学习
  13. echarts2的一个地图demo
  14. word批量调整图片大小:
  15. pygame中的mixer(含music)模块
  16. matlab直方图拉伸、均衡化和匹配
  17. MDI格式文件的打开与转换(pdf)
  18. Google证书生成
  19. 计算机一级mcoffice考试题型,计算机一级MSOffice考试试题
  20. 变量相关性分析(决策变量和目标函数之间的关系-决策变量可加可分离性)

热门文章

  1. 十部门发促消费“24条”:提高相对低收入群体待遇
  2. 拆解报告:爱否开物1A2C 65W PD氮化镓充电器智融SW3516十分表现抢眼
  3. JSOUP爬取4K高清壁纸
  4. 艾永亮:为什么企业都在追求超级产品,超级产品的意义是什么?
  5. wps云文档 wps自动备份怎么设置和取消
  6. Spring入门——AOP(面向切面,切什么面?)
  7. MiddleBury与SceneFlow 数据集相机参数与pfm文件解析
  8. 深度学习环境搭建之七_Ubuntu安装微信、QQ、百度网盘
  9. 自学前端建立知识体系,是最简单入门以及工作后快速进阶的有效方法
  10. 抖音不做真人出镜,打造百万粉运营攻略!