linux crash,系统崩溃 - crash工具介绍
工欲善其事,必先利其器。本文主要介绍linux下crash工具常用命令的功能和使用。
背景知识
crash是redhat的工程师开发的,主要用来离线分析linux内核转存文件,它整合了gdb工具,功能非常强大。可以查看堆栈,dmesg日志,内核数据结构,反汇编等等。crash支持多种工具生成的转存文件格式,如kdump,LKCD,netdump和diskdump,而且还可以分析虚拟机Xen和Kvm上生成的内核转存文件。同时crash还可以调试运行时系统,直接运行crash即可,ubuntu下内核映象存放在/proc/kcore。
运行时系统调试
crash和linux内核是紧密耦合的,会随着内核的变化持续更新,它向前兼容的,新的crash工具可以分析老内核的转存文件。如果你的内核版本较新,crash无法解析,可以尝试安装最新的crash工具。
常用命令
下面介绍常用命令的使用,主要参考了crash_whitepaper和crash工具自带的帮助文档。crash_whitepaper介绍了开发的初衷,编译,命令的分类和使用以及如何添加自己的命令,是一个非常好的参考文献。我用的版本是crash-7.2.6和gdb-7.6,使用时可以使用“help command”来查看详细的帮助文档,详细的命令列表见附件。
帮助文档
crash在加载内核转存文件是会输出系统基本信息,如出问题的进程(bash - 2613),系统内存大小(7.9GB),系统架构(x86_64)等等,可以看到这个dump是sysrq触发的一个panic系统崩溃。
KERNEL: ../kernel-src/linux-4.19.53/vmlinux
DUMPFILE: crash/201907070732/dump.201907070732 [PARTIAL DUMP]
CPUS: 4
DATE: Sun Jul 7 07:31:34 2019
UPTIME: 00:10:27
LOAD AVERAGE: 0.14, 0.16, 0.12
TASKS: 584
NODENAME: glbian-OptiPlex-990
RELEASE: 4.19.53
VERSION: #1 SMP Sun Jun 23 11:01:25 CST 2019
MACHINE: x86_64 (3292 Mhz)
MEMORY: 7.9 GB
PANIC: "sysrq: SysRq : Trigger a crash"
PID: 2613
COMMAND: "bash"
TASK: ffff8b7df3cdae00 [THREAD_INFO: ffff8b7df3cdae00]
CPU: 2
STATE: TASK_RUNNING (SYSRQ)
查看堆栈
一般可以先查看堆栈(bt),看看系统死在什么地方,进而确定调查方向。可以看到这个dump的异常发生在sysrq的处理函数里面。
crash> bt
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
'#0 [ffffa0f442cd7a08] machine_kexec at ffffffff99a69313
'#1 [ffffa0f442cd7a68] __crash_kexec at ffffffff99b3e6b9
'#2 [ffffa0f442cd7b30] crash_kexec at ffffffff99b3f441
'#3 [ffffa0f442cd7b50] oops_end at ffffffff99a32bed
'#4 [ffffa0f442cd7b78] no_context at ffffffff99a7997c
'#5 [ffffa0f442cd7bd8] __bad_area_nosemaphore at ffffffff99a79d15
'#6 [ffffa0f442cd7c20] bad_area at ffffffff99a79f86
'#7 [ffffa0f442cd7c48] __do_page_fault at ffffffff99a7a486
'#8 [ffffa0f442cd7cc0] do_page_fault at ffffffff99a7a60d
'#9 [ffffa0f442cd7cf0] page_fault at ffffffff9a6010ae
[exception RIP: sysrq_handle_crash+22]
RIP: ffffffff9a034066 RSP: ffffa0f442cd7da8 RFLAGS: 00010286
RAX: ffffffff9a034050 RBX: 0000000000000063 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000063
RBP: ffffa0f442cd7da8 R8: 00000000000002f2 R9: 0000000000000007
R10: 0000000000000000 R11: ffffffff9b39c3ed R12: 0000000000000004
R13: 0000000000000000 R14: ffffffff9afa7300 R15: ffff8b7de5af9100
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
'#10 [ffffa0f442cd7db0] __handle_sysrq at ffffffff9a0347e8
'#11 [ffffa0f442cd7de0] write_sysrq_trigger at ffffffff9a034cbf
... ...
另外可以加参数显示函数偏移,函数所在的文件和每一帧的具体内容,从而对照源码和汇编代码,查看函数入参和局部变量。
crash> bt -slf
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
'#0 [ffffa0f442cd7a08] machine_kexec+451 at ffffffff99a69313
/home/glbian/data/kernel-src/linux-4.19.53/arch/x86/kernel/machine_kexec_64.c: 346
ffffa0f442cd7a10: 0000a0f442cd7a50 ffff8b7c40000000
ffffa0f442cd7a20: 0000000024001000 ffff8b7c64001000
ffffa0f442cd7a30: 0000000024000000 a05cedc0dfb99200
ffffa0f442cd7a40: a05cedc0dfb99200 ffffa0f442cd7cf8
ffffa0f442cd7a50: 0000000000000009 ffffa0f442cd7cf8
ffffa0f442cd7a60: ffffa0f442cd7b28 ffffffff99b3e6b9
... ...
’#8 [ffffa0f442cd7cc0] do_page_fault+45 at ffffffff99a7a60d
/home/glbian/data/kernel-src/linux-4.19.53/arch/x86/mm/fault.c: 1470
ffffa0f442cd7cc8: ffff8b7e6500d140 0000000000000000
ffffa0f442cd7cd8: 0000000000000000 0000000000000000
ffffa0f442cd7ce8: ffffa0f442cd7cf9 ffffffff9a6010ae
'#9 [ffffa0f442cd7cf0] page_fault+30 at ffffffff9a6010ae
/home/glbian/data/kernel-src/linux-4.19.53/arch/x86/entry/entry_64.S: 1181
[exception RIP: sysrq_handle_crash+22]
RIP: ffffffff9a034066 RSP: ffffa0f442cd7da8 RFLAGS: 00010286
RAX: ffffffff9a034050 RBX: 0000000000000063 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000063
RBP: ffffa0f442cd7da8 R8: 00000000000002f2 R9: 0000000000000007
R10: 0000000000000000 R11: ffffffff9b39c3ed R12: 0000000000000004
R13: 0000000000000000 R14: ffffffff9afa7300 R15: ffff8b7de5af9100
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
/home/glbian/data/kernel-src/linux-4.19.53/drivers/tty/sysrq.c: 147
ffffa0f442cd7cf8: ffff8b7de5af9100 ffffffff9afa7300
ffffa0f442cd7d08: 0000000000000000 0000000000000004
ffffa0f442cd7d18: ffffa0f442cd7da8 0000000000000063
ffffa0f442cd7d28: ffffffff9b39c3ed 0000000000000000
ffffa0f442cd7d38: 0000000000000007 00000000000002f2
ffffa0f442cd7d48: ffffffff9a034050 0000000000000006
ffffa0f442cd7d58: 0000000000000000 0000000000000096
ffffa0f442cd7d68: 0000000000000063 ffffffffffffffff
ffffa0f442cd7d78: ffffffff9a034066 0000000000000010
ffffa0f442cd7d88: 0000000000010286 ffffa0f442cd7da8
ffffa0f442cd7d98: 0000000000000018 0000000000000000
ffffa0f442cd7da8: ffffa0f442cd7dd8 ffffffff9a0347e8
'#10 [ffffa0f442cd7db0] __handle_sysrq+136 at ffffffff9a0347e8
/home/glbian/data/kernel-src/linux-4.19.53/drivers/tty/sysrq.c: 583
ffffa0f442cd7db8: 0000000000000002 fffffffffffffffb
ffffa0f442cd7dc8: ffffa0f442cd7ee8 0000563d45717780
ffffa0f442cd7dd8: ffffa0f442cd7df0 ffffffff9a034cbf
... ...
可以用dis命令进行返汇编,查看对应地址的代码逻辑。
>crash> dis -r ffffffff9a6010ae
0xffffffff9a601090 : data32 xchg %ax,%ax
0xffffffff9a601093 : callq 0xffffffff9a601230
0xffffffff9a601098 : mov %rsp,%rdi
0xffffffff9a60109b : mov 0x78(%rsp),%rsi
0xffffffff9a6010a0 : movq $0xffffffffffffffff,0x78(%rsp)
0xffffffff9a6010a9 : callq 0xffffffff99a7a5e0
0xffffffff9a6010ae : jmpq 0xffffffff9a601330
>crash> dis -f ffffffff9a6010ae
0xffffffff9a6010ae : jmpq 0xffffffff9a601330
0xffffffff9a6010b3 : nopl (%rax)
0xffffffff9a6010b6 : nopw %cs:0x0(%rax,%rax,1)
有时会出现堆栈被破坏的情况,可以用-t/-T来把整个stack的信息dump出来,往往可以看到一些蛛丝马迹。
crash> bt -t
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
START: machine_kexec at ffffffff99a69313
[ffffa0f442cd7a08] machine_kexec at ffffffff99a69313
[ffffa0f442cd7a68] __crash_kexec at ffffffff99b3e6b9
[ffffa0f442cd7ac0] sysrq_handle_crash at ffffffff9a034050
[ffffa0f442cd7af0] sysrq_handle_crash at ffffffff9a034066
[ffffa0f442cd7b30] crash_kexec at ffffffff99b3f441
[ffffa0f442cd7b38] __die at ffffffff99a33375
[ffffa0f442cd7b50] oops_end at ffffffff99a32bed
[ffffa0f442cd7b78] no_context at ffffffff99a7997c
[ffffa0f442cd7bd8] __bad_area_nosemaphore at ffffffff99a79d15
[ffffa0f442cd7c20] bad_area at ffffffff99a79f86
[ffffa0f442cd7c48] __do_page_fault at ffffffff99a7a486
[ffffa0f442cd7cc0] do_page_fault at ffffffff99a7a60d
[ffffa0f442cd7cf0] page_fault at ffffffff9a6010ae
[ffffa0f442cd7d48] sysrq_handle_crash at ffffffff9a034050
[ffffa0f442cd7d78] sysrq_handle_crash at ffffffff9a034066
[ffffa0f442cd7db0] __handle_sysrq at ffffffff9a0347e8
[ffffa0f442cd7de0] write_sysrq_trigger at ffffffff9a034cbf
[ffffa0f442cd7df8] proc_reg_write at ffffffff99d2a0ee
[ffffa0f442cd7e18] __vfs_write at ffffffff99ca8a0a
[ffffa0f442cd7e40] apparmor_file_permission at ffffffff99e53a0a
[ffffa0f442cd7e50] security_file_permission at ffffffff99e06cf1
[ffffa0f442cd7e78] _cond_resched at ffffffff9a4153f9
[ffffa0f442cd7ea0] vfs_write at ffffffff99ca8d11
[ffffa0f442cd7ed8] ksys_write at ffffffff99ca8fcc
[ffffa0f442cd7f20] __x64_sys_write at ffffffff99ca906a
[ffffa0f442cd7f30] do_syscall_64 at ffffffff99a0428a
[ffffa0f442cd7f50] entry_SYSCALL_64_after_hwframe at ffffffff9a600088
RIP: 00007ff47e1ef154 RSP: 00007ffee9226298 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff47e1ef154
RDX: 0000000000000002 RSI: 0000563d45717780 RDI: 0000000000000001
RBP: 0000563d45717780 R8: 000000000000000a R9: 0000000000000001
R10: 000000000000000a R11: 0000000000000246 R12: 00007ff47e4cb760
R13: 0000000000000002 R14: 00007ff47e4c72a0 R15: 00007ff47e4c6760
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
默认bt会dump问题线程的场景,还可以用bt -a/-c查看所有当前CPU或指定cpu的堆栈。
crash> bt -c 1
PID: 0 TASK: ffff8b7e64165c00 CPU: 1 COMMAND: "swapper/1"
'#0 [fffffe0000034e38] crash_nmi_callback at ffffffff99a5d3d7
'#1 [fffffe0000034e48] nmi_handle at ffffffff99a33691
... ...
'#12 [ffffa0f440cd7f50] secondary_startup_64 at ffffffff99a000d4
crash> bt -a
PID: 0 TASK: ffffffff9ae13740 CPU: 0 COMMAND: "swapper/0"
... ...
PID: 0 TASK: ffff8b7e64165c00 CPU: 1 COMMAND: "swapper/1"
... ...
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
... ...
PID: 0 TASK: ffff8b7e642c4500 CPU: 3 COMMAND: "swapper/3"
... ...
也可以用set命令来改变线程环境,从而查看别的cpu上的堆栈情况。
crash> set 1
PID: 1
COMMAND: "systemd"
TASK: ffff8b7e6413c500 [THREAD_INFO: ffff8b7e6413c500]
CPU: 3
STATE: TASK_INTERRUPTIBLE
crash> bt
PID: 1 TASK: ffff8b7e6413c500 CPU: 3 COMMAND: "systemd"
'#0 [ffffa0f440c6fce0] __schedule at ffffffff9a414ba7
'#1 [ffffa0f440c6fd80] schedule at ffffffff9a41519c
'#2 [ffffa0f440c6fd90] schedule_hrtimeout_range_clock at ffffffff9a419691
'#3 [ffffa0f440c6fe20] schedule_hrtimeout_range at ffffffff9a4196b3
'#4 [ffffa0f440c6fe30] ep_poll at ffffffff99cf8941
'#5 [ffffa0f440c6fee0] do_epoll_wait at ffffffff99cf8ae0
'#6 [ffffa0f440c6ff20] __x64_sys_epoll_wait at ffffffff99cf8b0e
'#7 [ffffa0f440c6ff30] do_syscall_64 at ffffffff99a0428a
'#8 [ffffa0f440c6ff50] entry_SYSCALL_64_after_hwframe at ffffffff9a600088
RIP: 00007ffa791c6bb7 RSP: 00007ffc1c00b9d0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007ffa791c6bb7
RDX: 00000000000000eb RSI: 00007ffc1c00ba10 RDI: 0000000000000004
RBP: 00007ffc1c00ba10 R8: 0000000000000000 R9: 7465677261742e79
R10: 00000000ffffffff R11: 0000000000000293 R12: 00000000000000eb
R13: 00000000ffffffff R14: 00007ffc1c00ba10 R15: 0000000000000001
ORIG_RAX: 00000000000000e8 CS: 0033 SS: 002b
系统日志
log命令可以用来查看系统的日志,“log -a”可以读取还没有从内核日志缓存到用户空间日志缓存的日志。
也可以重定向到文件(log > logfile)。
crash> log
... ...
[ 1610.759133] sysrq: SysRq : Trigger a crash
[ 1610.759147] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 1610.759150] PGD 0 P4D 0
[ 1610.759154] Oops: 0002 [#1] SMP PTI
[ 1610.759159] CPU: 2 PID: 2613 Comm: bash Kdump: loaded Not tainted 4.19.53 #1
[ 1610.759161] Hardware name: Dell Inc. OptiPlex 990/0RVG2C, BIOS A13 04/02/2012
[ 1610.759167] RIP: 0010:sysrq_handle_crash+0x16/0x20
[ 1610.759170] Code: e8 9f fb ff ff e9 c0 fe ff ff 90 90 90 90 90 90 90 90 90 90 66 66 66 66 90 55 48 89 e5 c7 05 85 10 36 01 01 00 00 00 0f ae f8 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 c7 05 40 fa e2 00
[ 1610.759173] RSP: 0018:ffffa0f442cd7da8 EFLAGS: 00010286
[ 1610.759176] RAX: ffffffff9a034050 RBX: 0000000000000063 RCX: 0000000000000006
[ 1610.759178] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000063
[ 1610.759180] RBP: ffffa0f442cd7da8 R08: 00000000000002f2 R09: 0000000000000007
[ 1610.759182] R10: 0000000000000000 R11: ffffffff9b39c3ed R12: 0000000000000004
[ 1610.759184] R13: 0000000000000000 R14: ffffffff9afa7300 R15: ffff8b7de5af9100
[ 1610.759186] FS: 00007ff47eb0a740(0000) GS:ffff8b7e65880000(0000) knlGS:0000000000000000
[ 1610.759189] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1610.759191] CR2: 0000000000000000 CR3: 0000000205db0003 CR4: 00000000000606e0
[ 1610.759193] Call Trace:
[ 1610.759199] __handle_sysrq+0x88/0x140
[ 1610.759203] write_sysrq_trigger+0x2f/0x40
[ 1610.759208] proc_reg_write+0x3e/0x60
[ 1610.759212] __vfs_write+0x3a/0x190
[ 1610.759216] ? apparmor_file_permission+0x1a/0x20
[ 1610.759220] ? security_file_permission+0x31/0xc0
[ 1610.759224] ? _cond_resched+0x19/0x40
[ 1610.759226] vfs_write+0xb1/0x1a0
[ 1610.759229] ksys_write+0x5c/0xe0
[ 1610.759232] __x64_sys_write+0x1a/0x20
[ 1610.759237] do_syscall_64+0x5a/0x120
[ 1610.759241] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1610.759245] RIP: 0033:0x7ff47e1ef154
[ 1610.759247] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 b1 07 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5
[ 1610.759249] RSP: 002b:00007ffee9226298 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1610.759252] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff47e1ef154
[ 1610.759254] RDX: 0000000000000002 RSI: 0000563d45717780 RDI: 0000000000000001
[ 1610.759256] RBP: 0000563d45717780 R08: 000000000000000a R09: 0000000000000001
[ 1610.759258] R10: 000000000000000a R11: 0000000000000246 R12: 00007ff47e4cb760
[ 1610.759260] R13: 0000000000000002 R14: 00007ff47e4c72a0 R15: 00007ff47e4c6760
[ 1610.759263] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic pcbc aesni_intel snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep aes_x86_64 snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi input_leds crypto_simd cryptd snd_seq snd_seq_device snd_timer dcdbas snd glue_helper intel_cstate intel_rapl_perf lpc_ich serio_raw soundcore sch_fq_codel mei_me mei mac_hid parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid uas usb_storage i915 kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass i2c_algo_bit cec rc_core drm_kms_helper psmouse syscopyarea sysfillrect video sysimgblt fb_sys_fops ahci drm libahci e1000e
[ 1610.759320] CR2: 0000000000000000
查看数据结构
struct和union可以用来查看结构体和共用体,用法相同,下面看一些struct
打印的例子。把指定地址的内容以task_struct结构体解析打印,如果不带地址会显示结构体定义和大小。
1 打印task_struct结构体
crash> task_struct ffff8b7df3cdae00 -x
struct task_struct {
thread_info = {
flags = 0x80000000,
status = 0x0
},
state = 0x0,
stack = 0xffffa0f442cd4000,
usage = {
counter = 0x2
},
... ...
2 打印task_struct定义和大小。
struct task_struct {
[0x0] struct thread_info thread_info;
[0x10] volatile long state;
[0x18] void *stack;
... ...
[0x1288] void *security;
[0x12c0] struct thread_struct thread;
}
SIZE: 0x23c0
3 查看成员变量
crash> task_struct.stack_refcount ffff8b7df3cdae00 -xo
struct task_struct {
[ffff8b7df3cdc080] atomic_t stack_refcount;
}
4 查看指针成员变量
crash> task_struct.mm ffff8b7df3cdae00
mm = 0xffff8b7e5af06600
crash> task_struct.mm ffff8b7df3cdae00 -p
struct mm_struct *mm = 0xffff8b7e5af06600
-> {
{
mmap = 0xffff8b7dec0520c8,
mm_rb = {
rb_node = 0xffff8b7dec003b78
},
vmacache_seqnum = 17,
get_unmapped_area = 0xffffffff99a35760,
此外还可以查看数组内容,per-cpu变量,以及其他一些功能,详细可参考帮助文档。
查看和搜索内存
除了打印数据结构,有时需要查看和搜索内存内容,看有没有制定的数据模式。
1 查看系统版本信息
crash> rd -a linux_banner
ffffffff9aa00100: Linux version 4.19.53 (glbian@glbian-OptiPlex-990) (gcc vers
ffffffff9aa0013c: ion 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #1 SMP Sun Jun 23
ffffffff9aa00178: 11:01:25 CST 2019
查看内存内容
crash> rd ffffa0f442cd7a08 32
ffffa0f442cd7a08: ffffffff99a69313 0000a0f442cd7a50 ........Pz.B....
ffffa0f442cd7a18: ffff8b7c40000000 0000000024001000 ...@|......
....
ffffa0f442cd7a38: a05cedc0dfb99200 a05cedc0dfb99200 ..............
ffffa0f442cd7a48: ffffa0f442cd7cf8 0000000000000009 .|.B............
ffffa0f442cd7a58: ffffa0f442cd7cf8 ffffa0f442cd7b28 .|.B....({.B....
ffffa0f442cd7a68: ffffffff99b3e6b9 ffff8b7de5af9100 ............}...
ffffa0f442cd7a78: ffffffff9afa7300 0000000000000000 .s..............
ffffa0f442cd7a88: 0000000000000004 ffffa0f442cd7da8 .........}.B....
ffffa0f442cd7a98: 0000000000000063 ffffffff9b39c3ed c.........9.....
ffffa0f442cd7aa8: 0000000000000000 0000000000000007 ................
ffffa0f442cd7ab8: 00000000000002f2 ffffffff9a034050 ........P@......
ffffa0f442cd7ac8: 0000000000000006 0000000000000000 ................
ffffa0f442cd7ad8: 0000000000000096 0000000000000063 ........c.......
ffffa0f442cd7ae8: ffffffffffffffff ffffffff9a034066 ........f@......
ffffa0f442cd7af8: 0000000000000010 0000000000010286 ................
3 打印符号表
crash> rd ffffa0f442cd7a08 32 -s
ffffa0f442cd7a08: machine_kexec+451 0000a0f442cd7a50
ffffa0f442cd7a18: ffff8b7c40000000 0000000024001000
ffffa0f442cd7a28: ffff8b7c64001000 0000000024000000
ffffa0f442cd7a38: a05cedc0dfb99200 a05cedc0dfb99200
ffffa0f442cd7a48: ffffa0f442cd7cf8 0000000000000009
ffffa0f442cd7a58: ffffa0f442cd7cf8 ffffa0f442cd7b28
ffffa0f442cd7a68: __crash_kexec+105 ffff8b7de5af9100
ffffa0f442cd7a78: sysrq_crash_op 0000000000000000
ffffa0f442cd7a88: 0000000000000004 ffffa0f442cd7da8
ffffa0f442cd7a98: 0000000000000063 text.45672+13
ffffa0f442cd7aa8: 0000000000000000 0000000000000007
ffffa0f442cd7ab8: 00000000000002f2 sysrq_handle_crash
ffffa0f442cd7ac8: 0000000000000006 0000000000000000
ffffa0f442cd7ad8: 0000000000000096 0000000000000063
ffffa0f442cd7ae8: ffffffffffffffff sysrq_handle_crash+22
ffffa0f442cd7af8: 0000000000000010 0000000000010286
4 查看指定内存区域内容
crash> rd ffffa0f442cd7a08 -e ffffa0f442cd7a68
ffffa0f442cd7a08: ffffffff99a69313 0000a0f442cd7a50 ........Pz.B....
ffffa0f442cd7a18: ffff8b7c40000000 0000000024001000 ...@|......
....
ffffa0f442cd7a38: a05cedc0dfb99200 a05cedc0dfb99200 ..............
ffffa0f442cd7a48: ffffa0f442cd7cf8 0000000000000009 .|.B............
ffffa0f442cd7a58: ffffa0f442cd7cf8 ffffa0f442cd7b28 .|.B....({.B....
5 搜索指定内存
crash> search -s ffffa0f442cd7a08 -e ffffa0f442cd7db0 ffffffff9b39c3ed
ffffa0f442cd7aa0: ffffffff9b39c3ed
ffffa0f442cd7d28: ffffffff9b39c3ed
6 搜索匹配数据
crash> search -p babe0000 -m ffff
1c4cc6530: babec685
21f7d35b8: babe4550
crash>
查看线程状态
1 查看所有线程状态
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff9ae13740 RU 0.0 0 0 [swapper/0]
0 0 1 ffff8b7e64165c00 RU 0.0 0 0 [swapper/1]
0 0 2 ffff8b7e64162e00 RU 0.0 0 0 [swapper/2]
0 0 3 ffff8b7e642c4500 RU 0.0 0 0 [swapper/3]
1 0 3 ffff8b7e6413c500 IN 0.1 225916 9716 systemd
2 0 2 ffff8b7e64138000 IN 0.0 0 0 [kthreadd]
2 查看父线程树
crash> ps -p 2613
PID: 0 TASK: ffffffff9ae13740 CPU: 0 COMMAND: "swapper/0"
PID: 1 TASK: ffff8b7e6413c500 CPU: 3 COMMAND: "systemd"
PID: 1081 TASK: ffff8b7e5dc81700 CPU: 1 COMMAND: "gdm3"
PID: 2114 TASK: ffff8b7e584f2e00 CPU: 0 COMMAND: "gdm-session-wor"
PID: 2136 TASK: ffff8b7e63cc4500 CPU: 1 COMMAND: "gdm-x-session"
PID: 2149 TASK: ffff8b7e5dfaae00 CPU: 0 COMMAND: "gnome-session-b"
PID: 2254 TASK: ffff8b7e5e04dc00 CPU: 0 COMMAND: "gnome-shell"
PID: 2582 TASK: ffff8b7dec3bae00 CPU: 0 COMMAND: "terminator"
PID: 2592 TASK: ffff8b7dec05ae00 CPU: 1 COMMAND: "bash"
PID: 2611 TASK: ffff8b7df3f8ae00 CPU: 0 COMMAND: "sudo"
PID: 2612 TASK: ffff8b7dec3b9700 CPU: 3 COMMAND: "su"
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
3 查看子线程
crash> ps -c 2582
PID: 2582 TASK: ffff8b7dec3bae00 CPU: 0 COMMAND: "terminator"
PID: 2592 TASK: ffff8b7dec05ae00 CPU: 1 COMMAND: "bash"
PID: 2600 TASK: ffff8b7df3f88000 CPU: 0 COMMAND: "bash"
PID: 2787 TASK: ffff8b7df9f80000 CPU: 3 COMMAND: "bash"
4 查看线程运行时间
crash> ps -t 2613
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
RUN TIME: 00:00:00
START TIME: 1296209749767
UTIME: 36000000
STIME: 16000000
5 查看活动线程
crash> ps -A
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff9ae13740 RU 0.0 0 0 [swapper/0]
0 0 1 ffff8b7e64165c00 RU 0.0 0 0 [swapper/1]
0 0 3 ffff8b7e642c4500 RU 0.0 0 0 [swapper/3]
2613 2612 2 ffff8b7df3cdae00 RU 0.0 28708 4352 bash
6 查看内核线程
crash> ps -k
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff9ae13740 RU 0.0 0 0 [swapper/0]
0 0 1 ffff8b7e64165c00 RU 0.0 0 0 [swapper/1]
0 0 2 ffff8b7e64162e00 RU 0.0 0 0 [swapper/2]
0 0 3 ffff8b7e642c4500 RU 0.0 0 0 [swapper/3]
2 0 2 ffff8b7e64138000 IN 0.0 0 0 [kthreadd]
7 查看用户态线程
crash> ps -u
PID PPID CPU TASK ST %MEM VSZ RSS COMM
1 0 3 ffff8b7e6413c500 IN 0.1 225916 9716 systemd
298 1 3 ffff8b7e5879c500 IN 0.4 126508 38028 systemd-journal
318 1 0 ffff8b7e584f5c00 IN 0.1 48004 6360 systemd-udevd
822 1 2 ffff8b7e59c71700 IN 0.1 70756 6176 systemd-resolve
824 1 2 ffff8b7e586e5c00 IN 0.1 146108 5540 systemd-timesyn
834 1 3 ffff8b7e63881700 IN 0.1 146108 5540 sd-resolve
863 1 3 ffff8b7e5d790000 IN 0.1 51612 6112 dbus-daemon
864 1 1 ffff8b7e5d794500 IN 0.1 427264 9404 ModemManager
8 查看最后运行时间戳
crash> ps -l
[1610759003323] [IN] PID: 2582 TASK: ffff8b7dec3bae00 CPU: 0 COMMAND: "terminator"
[1610758998404] [ID] PID: 211 TASK: ffff8b7e585aae00 CPU: 3 COMMAND: "kworker/u32:5"
[1610758938747] [RU] PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
[1610758009873] [IN] PID: 2587 TASK: ffff8b7e06cd5c00 CPU: 2 COMMAND: "gdbus"
crash> ps -m
[0 00:00:00.000] [IN] PID: 2582 TASK: ffff8b7dec3bae00 CPU: 0 COMMAND: "terminator"
[0 00:00:00.000] [ID] PID: 211 TASK: ffff8b7e585aae00 CPU: 3 COMMAND: "kworker/u32:5"
[0 00:00:00.000] [RU] PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
[0 00:00:00.000] [IN] PID: 2587 TASK: ffff8b7e06cd5c00 CPU: 2 COMMAND: "gdbus"
[0 00:00:00.001] [IN] PID: 2138 TASK: ffff8b7e26801700 CPU: 0 COMMAND: "Xorg"
9 查看线程资源限制
crash> ps -r 2613
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
RLIMIT CURRENT MAXIMUM
CPU (unlimited) (unlimited)
FSIZE (unlimited) (unlimited)
DATA (unlimited) (unlimited)
STACK 8388608 (unlimited)
CORE 0 (unlimited)
RSS (unlimited) (unlimited)
NPROC 30393 30393
NOFILE 1024 1048576
MEMLOCK 16777216 16777216
AS (unlimited) (unlimited)
LOCKS (unlimited) (unlimited)
SIGPENDING 30393 30393
MSGQUEUE 819200 819200
NICE 0 0
RTPRIO 0 0
RTTIME (unlimited) (unlimited)
Context切换
有些命令是线程上线文相关的,比如bt,可以用set命令来进行线程上下文切换。
1 切换到指定线程
crash> set ffff8b7e6413c500
PID: 1
COMMAND: "systemd"
TASK: ffff8b7e6413c500 [THREAD_INFO: ffff8b7e6413c500]
CPU: 3
STATE: TASK_INTERRUPTIBLE
crash> bt
PID: 1 TASK: ffff8b7e6413c500 CPU: 3 COMMAND: "systemd"
'#0 [ffffa0f440c6fce0] __schedule at ffffffff9a414ba7
'#1 [ffffa0f440c6fd80] schedule at ffffffff9a41519c
'#2 [ffffa0f440c6fd90] schedule_hrtimeout_range_clock at ffffffff9a419691
'#3 [ffffa0f440c6fe20] schedule_hrtimeout_range at ffffffff9a4196b3
'#4 [ffffa0f440c6fe30] ep_poll at ffffffff99cf8941
'#5 [ffffa0f440c6fee0] do_epoll_wait at ffffffff99cf8ae0
'#6 [ffffa0f440c6ff20] __x64_sys_epoll_wait at ffffffff99cf8b0e
'#7 [ffffa0f440c6ff30] do_syscall_64 at ffffffff99a0428a
'#8 [ffffa0f440c6ff50] entry_SYSCALL_64_after_hwframe at ffffffff9a600088
RIP: 00007ffa791c6bb7 RSP: 00007ffc1c00b9d0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007ffa791c6bb7
RDX: 00000000000000eb RSI: 00007ffc1c00ba10 RDI: 0000000000000004
RBP: 00007ffc1c00ba10 R8: 0000000000000000 R9: 7465677261742e79
R10: 00000000ffffffff R11: 0000000000000293 R12: 00000000000000eb
R13: 00000000ffffffff R14: 00007ffc1c00ba10 R15: 0000000000000001
ORIG_RAX: 00000000000000e8 CS: 0033 SS: 002b
2 切会panic线程
crash> set -p
PID: 2613
COMMAND: "bash"
TASK: ffff8b7df3cdae00 [THREAD_INFO: ffff8b7df3cdae00]
CPU: 2
STATE: TASK_RUNNING (SYSRQ)
加载module符号表
1 查看当前加载的module
crash> mod
MODULE NAME SIZE OBJECT FILE
ffffffffc019d0c0 vfio_iommu_type1 24576 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01a4440 uas 24576 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01b0b40 rc_core 45056 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01e76c0 e1000e 249856 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01fcbc0 usbhid 49152 (not loaded) [CONFIG_KALLSYMS]
ffffffffc0207580 libahci 32768 (not loaded) [CONFIG_KALLSYMS]
2 加载所有module符号表
crash> mod -S
MODULE NAME SIZE OBJECT FILE
ffffffffc019d0c0 vfio_iommu_type1 24576 /lib/modules/4.19.53/kernel/drivers/vfio/vfio_iommu_type1.ko
ffffffffc01a4440 uas 24576 /lib/modules/4.19.53/kernel/drivers/usb/storage/uas.ko
ffffffffc01b0b40 rc_core 45056 /lib/modules/4.19.53/kernel/drivers/media/rc/rc-core.ko
ffffffffc01e76c0 e1000e 249856 /lib/modules/4.19.53/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
ffffffffc01fcbc0 usbhid 49152 /lib/modules/4.19.53/kernel/drivers/hid/usbhid/usbhid.ko
3 加载指定module符号表
crash> mod -s rc_core /lib/modules/4.19.53/kernel/drivers/media/rc/rc-core.ko
MODULE NAME SIZE OBJECT FILE
ffffffffc01b0b40 rc_core 45056 /lib/modules/4.19.53/kernel/drivers/media/rc/rc-core.ko
crash> mod
MODULE NAME SIZE OBJECT FILE
ffffffffc019d0c0 vfio_iommu_type1 24576 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01a4440 uas 24576 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01b0b40 rc_core 45056 /lib/modules/4.19.53/kernel/drivers/media/rc/rc-core.ko
ffffffffc01e76c0 e1000e 249856 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01fcbc0 usbhid 49152 (not loaded) [CONFIG_KALLSYMS]
其他命令
还有很多针对某些内核模块的命令,比如kmem,vm,tree,list,pte等等,参考附件命令列表,后面在使用过程中再学习和研究。
命令扩展
crash还支持用户添加在自己的调试命令。可以直接在Crash源码里添加新的命令,更多的是创建一个共享库,用extend动态加载。帮助文档里有一个简单的例子,在crash源码目录下新建一个test.c,把示例代码拷贝进去,就可以进行编译。
gcc -nostartfiles -shared -rdynamic -o echo.so echo.c -fPIC -D $(TARGET_CFLAGS)
crash> sys
KERNEL: ../../kernel-src/linux-4.19.53/vmlinux
DUMPFILE: 201907070732/dump.201907070732 [PARTIAL DUMP]
CPUS: 4
DATE: Sun Jul 7 07:31:34 2019
UPTIME: 00:10:27
LOAD AVERAGE: 0.14, 0.16, 0.12
TASKS: 584
NODENAME: glbian-OptiPlex-990
RELEASE: 4.19.53
VERSION: #1 SMP Sun Jun 23 11:01:25 CST 2019
MACHINE: x86_64 (3292 Mhz)
MEMORY: 7.9 GB
PANIC: "sysrq: SysRq : Trigger a crash"
可以用sys命令查看机器架构,我的及其machine-type选x86-64,编译命令如下:gcc -shared -rdynamic -o test.so test.c -fPIC -Dx86_64 _D_FILE_OFFSET_BITS=64
生成test.so。可以用extend直接加载,加载成功后可以看到帮助菜单多了一条echo命令,我们可以基于echo示例开发自己的命令。
crash> extend ../../src/crash-7.2.6/test.so
../../src/crash-7.2.6/test.so: shared object loaded
crash> extend
SHARED OBJECT COMMANDS
../../src/crash-7.2.6/test.so echo
crash> help
‘* extend mach runq union
alias files mod search vm
ascii foreach mount set vtop
bpf fuser net sig waitq
bt gdb p struct whatis
btop help ps swap wr
dev ipcs pte sym q
dis irq ptob sys
echo kmem ptov task
eval list rd timer
exit log repeat tree
结语
系统崩溃通常是非常棘手的问题,需要非常熟悉内核和相应的子模块,再结合crash工具进行分析,总之需要在实践中累积经验,实践出真知。
附件
Crash命令列表
命令
功能
*
指针快捷健
alias
命令快捷键
ascii
ASCII码转换和码表
bpf
eBPF - extended Berkeley Filter
bt
堆栈查看
btop
地址页表转换
dev
设备数据查询
dis
返汇编
eval
计算器
exit
退出
extend
命令扩展
files
打开的文件查看
foreach
循环查看
fuser
文件使用者查看
gdb
调用gdb执行命令
help
帮助
ipcs
查看system V IPC工具
irq
查看irq数据
kmem
查看Kernel内存
list
查看链表
log
查看系统消息缓存
mach
查看平台信息
mod
加载符号表
mount
Mount文件系统数据
net
网络命令
p
查看数据结构
ps
查看进程状态信息
pte
查看页表
ptob
页表地址转换
ptov
物理地址虚拟地址转换
rd
查看内存
repeat
重复执行
runq
查看run queue上的线程
search
搜索内存
set
设置线程环境和Crash内部变量
sig
查询线程消息
struct
查询结构体
swap
查看swap信息
sym
符号和虚拟地址转换
sys
查看系统信息
task
查看task_struct和thread_thread信息
timer
查看timer队列
tree
查看radix树和rb树
union
查看union结构体
vm
查看虚拟内存
vtop
虚拟地址物理地址转换
waitq
查看wait queue上的进程
whatis
符号表查询
wr
改写内存
q
退出
图片发自简书App
linux crash,系统崩溃 - crash工具介绍相关推荐
- linux嵌入式系统程序加密工具推荐:Virbox Protector.
linux嵌入式系统程序加密工具推荐:Virbox Protector. 软件开发商 Virbox Protector是由北京深思数盾科技股份有限公司研发,2018年上市的一款加密工具. ARM-Li ...
- Linux -- ***检测系统(IDS)介绍及应用(1)
一.***检测工具简介 Internet上的服务器一般都会被安置在防火墙的DMZ(Demilitarized Zone)区,受到防火墙的保护.这在一定程度可以防止具有已知非法特征的危险连接和恶意*** ...
- Linux内核调试原理和工具介绍--理解静态插装/动态插装、tracepoint、ftrace、kprobe、SystemTap、Perf、eBPF
可以将linux跟踪系统分成Tracer(跟踪数据来自哪里),数据收集分析(如"ftrace")和跟踪前端(更方便的用户态工具). 1. 数据源(Tracers) printk 是 ...
- xp linux双系统引导修复工具下载,双系统引导修复工具下载
双系统引导修复工具是一款可以帮助你的电脑进行双系统引导修复的系统软件,当你安装两个系统的时候其中一个系统出现问题的时候就可以使用它来修复了,非常方便,还能把不用的系统删掉,需要的朋友快来当易网下载使用 ...
- win7 linux双系统引导修复工具,给你传授双系统引导修复工具【搞定指南】
win7系统有很多人都喜欢使用,我们操作的过程中常常会碰到win7系统双系统引导修复工具的问题.如果遇到win7系统双系统引导修复工具的问题该怎么办呢?很多电脑水平薄弱的网友不知道win7系统双系统引 ...
- 【代理设置】Linux Windows 系统下各工具设置代理方式笔记(整理中)
背景:公司局域网环境,只提供代理服务器&代理账号的方式上网.Linux服务器各种工具代理设置方法整理如下: 1. 命令行界面设置代理 命令行界面的一般代理设置方法:在profile文件中设置相 ...
- Windows系统自带工具介绍
文章目录 1 Windows系统工具 1.1 系统增强工具PowerToys 1.1.1 简介 1.1.2 使用 1.2 增强版任务管理器 Process Explorer 1.2.1 简介 1.2. ...
- linux的系统移植——交叉编译工具集
1.交叉编译工具集 \qquad 在我们安装交叉工具集时,不仅安装了arm-linux-gnueabi-gcc ,还安装了很多工具,它们共同构成交叉编译工具集. arm-linux-gnueabi-a ...
- Linux测试系统稳定性的工具,Linux桌面操作系统稳定性测试.pdf
Linux桌面操作系统稳定性测试 第 33 卷 第 7 期 计 算 机 工 程 2007 年 4 月 Vol.33 No.7 Computer Engineering April 2007 ·软件技术 ...
最新文章
- EXCEL中数据的自动匹配主要包含的内容
- 连续举办了十七年的韩国大学生智能车竞赛谢幕了
- 探索区块链-挖框体验
- The Digits String
- eclipse 摁住ctrl 键卡死
- c#.net2005 调用evc4.0生成的dll文件
- 智能家居 (5) —— LD3320语音模块二次开发
- 云上安全工作乱如麻,等保2.0来一下
- vue页面跳转数据传递
- 比特币在推特上的活跃度正接近2017年水平
- DAVINCI DM6446 开发攻略——V4L2视频驱动和应用分析
- 【认知无线网络】认知无线网络基础知识学习
- echarts2的一个地图demo
- word批量调整图片大小:
- pygame中的mixer(含music)模块
- matlab直方图拉伸、均衡化和匹配
- MDI格式文件的打开与转换(pdf)
- Google证书生成
- 计算机一级mcoffice考试题型,计算机一级MSOffice考试试题
- 变量相关性分析(决策变量和目标函数之间的关系-决策变量可加可分离性)
热门文章
- 十部门发促消费“24条”:提高相对低收入群体待遇
- 拆解报告:爱否开物1A2C 65W PD氮化镓充电器智融SW3516十分表现抢眼
- JSOUP爬取4K高清壁纸
- 艾永亮:为什么企业都在追求超级产品,超级产品的意义是什么?
- wps云文档 wps自动备份怎么设置和取消
- Spring入门——AOP(面向切面,切什么面?)
- MiddleBury与SceneFlow 数据集相机参数与pfm文件解析
- 深度学习环境搭建之七_Ubuntu安装微信、QQ、百度网盘
- 自学前端建立知识体系,是最简单入门以及工作后快速进阶的有效方法
- 抖音不做真人出镜,打造百万粉运营攻略!