nmi_watchdog解析

[NMI watchdog is available for x86 and x86-64 architectures]3    4 Is your system locking up unpredictably? No keyboard activity, just5  a frustrating complete hard lockup? Do you want to help us debugging6 such lockups? If all yes then this document is definitely for you.7   8 On many x86/x86-64 type hardware there is a feature that enables9 us to generate 'watchdog NMI interrupts'.  (NMI: Non Maskable Interrupt10   which get executed even if the system is otherwise locked up hard).11 This can be used to debug hard kernel lockups.  By executing periodic12   NMI interrupts, the kernel can monitor whether any CPU has locked up,13   and print out debugging messages if so.14 15    In order to use the NMI watchdog, you need to have APIC support in your16 kernel. For SMP kernels, APIC support gets compiled in automatically. For17   UP, enable either CONFIG_X86_UP_APIC (Processor type and features -> Local18   APIC support on uniprocessors) or CONFIG_X86_UP_IOAPIC (Processor type and19  features -> IO-APIC support on uniprocessors) in your kernel config.20 CONFIG_X86_UP_APIC is for uniprocessor machines without an IO-APIC.21 CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain22    kernel debugging options, such as Kernel Stack Meter or Kernel Tracer,23  may implicitly disable the NMI watchdog.]24   25    For x86-64, the needed APIC is always compiled in.26  27    Using local APIC (nmi_watchdog=2) needs the first performance register, so28 you can't use it for other purposes (such as high precision performance29    profiling.) However, at least oprofile and the perfctr driver disable the30   local APIC NMI watchdog automatically.31  32    To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot33   parameter.  Eg. the relevant lilo.conf entry:34   35            append="nmi_watchdog=1"36 37    For SMP machines and UP machines with an IO-APIC use nmi_watchdog=1.38   For UP machines without an IO-APIC use nmi_watchdog=2, this only works39 for some processor types.  If in doubt, boot with nmi_watchdog=1 and40   check the NMI count in /proc/interrupts; if the count is zero then41  reboot with nmi_watchdog=2 and check the NMI count.  If it is still42    zero then log a problem, you probably have a processor that needs to be43 added to the nmi code.44  45    A 'lockup' is the following scenario: if any CPU in the system does not46   execute the period local timer interrupt for more than 5 seconds, then47  the NMI handler generates an oops and kills the process. This48   'controlled crash' (and the resulting kernel messages) can be used to49 debug the lockup. Thus whenever the lockup happens, wait 5 seconds and50  the oops will show up automatically. If the kernel produces no messages51 then the system has crashed so hard (eg. hardware-wise) that either it52  cannot even accept NMI interrupts, or the crash has made the kernel53 unable to print messages.54   55    Be aware that when using local APIC, the frequency of NMI interrupts56    it generates, depends on the system load. The local APIC NMI watchdog,57  lacking a better source, uses the "cycles unhalted" event. As you may58 guess it doesn't tick when the CPU is in the halted state (which happens59   when the system is idle), but if your system locks up on anything but the60   "hlt" processor instruction, the watchdog will trigger very soon as the61   "cycles unhalted" event will happen every clock tick. If it locks up on62   "hlt", then you are out of luck -- the event will not happen at all and the63   watchdog won't trigger. This is a shortcoming of the local APIC watchdog64   -- unfortunately there is no "clock ticks" event that would work all the65  time. The I/O APIC watchdog is driven externally and has no such shortcoming.66   But its NMI frequency is much higher, resulting in a more significant hit67   to the overall system performance.68  69    On x86 nmi_watchdog is disabled by default so you have to enable it with70    a boot time parameter.71  72    It's possible to disable the NMI watchdog in run-time by writing "0" to73  /proc/sys/kernel/nmi_watchdog. Writing "1" to the same file will re-enable74    the NMI watchdog. Notice that you still need to use "nmi_watchdog=" parameter75    at boot time.76   77    NOTE: In kernels prior to 2.4.2-ac18 the NMI-oopser is enabled unconditionally78  on x86 SMP boxes.

上面的内容介绍了nmi watchdog的一些相关知识,以及NMI watchdog一般应用!是一篇很不错的文章!

函数调用关系:

arch/x86/kernel entry_64.s ENTRY(nmi) -> do_nmi(arch/x86/kernel/traps.c) ->  default_do_nmi -> nmi_watchdog_tick (检查到5s 没有调度,即时钟中断没有刷新计数器,oops, 打印调用栈)

void notrace __kprobes
die_nmi(char *str, struct pt_regs *regs, int do_panic)
{
unsigned long flags;

if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
return;

/*
* We are in trouble anyway, lets at least try
* to get a message out.
*/
flags = oops_begin();
printk(KERN_EMERG "%s", str);
printk(" on CPU%d, ip %08lx, registers:\n",
smp_processor_id(), regs->ip);
show_registers(regs);
oops_end(flags, regs, 0);
if (do_panic || panic_on_oops)
panic("Non maskable interrupt");
nmi_exit();
local_irq_enable();
do_exit(SIGBUS);
}

参考资料:

nmi_watchdog.txt

nmi_watchdog原理(用于检测关中断死锁)

http://blog.chinaunix.net/xmlrpc.php?r=blog/article&uid=14528823&id=4215546

nmi watchdog相关推荐

  1. NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s!

    <NMI watchdog: BUG: soft lockup> <kernel:NMI watchdog: BUG: soft lockup - CPU#6 stuck for 2 ...

  2. linux服务器关不了机,解决Linux关不了机开机,报错NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s的bug...

    问题描述 在安装完Ubuntu或者其他Linux, 关机时会卡死, 循环报错NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s!, 无法关机. 在 ...

  3. ubuntu:“NMI watchdog: BUG: soft lockup-CPU#0 stuck for 22s“

    目录 一.问题描述 二.解决方法 三.问题分析 一.问题描述 NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s![migration/2:18] ...

  4. 活久见 kernel:NMI watchdog: BUG: soft lockup - CPU#8 stuck for 28s! [xsoftdd/12:0]

    1. 引入 在一台linux机器上工作,没敲入任何命令,但命令行里突然出现 "kernel:NMI watchdog: BUG: soft lockup - CPU#8 stuck for ...

  5. 系统自己弹出诸如 kernel:NMI watchdog: BUG: soft lockup - CPU#2 stuck for 26s [mysqld:2875]

    系统在没有人使用的情况下自己弹出诸如以下关于内核的报错 [root@bkce tmp]# Message from syslogd@bkce at Oct 13 14:25:00 - kernel:N ...

  6. 报错 kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [insmod:55902]处理

    运行之前说的tcrypt.c的修改程序(只跑摘要算法md5,sha1) insmod tcrypt.ko sec=2 mode=400 报错 kernel:NMI watchdog: BUG: sof ...

  7. Linux Watchdog 机制

    ​前言 Watchdog 是 Linux 系统一个很重要的机制,其目的是监测系统运行的情况,一旦出现锁死,死机的情况,能及时重启机器(取决于设置策略),并收集crash dump. watchdog, ...

  8. 什么是Watchdog?

    什么是Watchdog? Watchdog,又称watchdog timer,是计算机可靠性(dependability)领域中一个极为简单同时非常有效的检测(detection)工具.其基本思想是针 ...

  9. Linux内核中的Watchdog

    linux内核中有多个watchdog,他们属于不同模块,可同时存在. 用户态watchdog 可以在用户态程序操作,设置超时时间喂狗时间 .(只是通过内核提供的接口操作) 1./dev/watchd ...

最新文章

  1. 从风投看中国IT行业的发展
  2. jQuery 之 [ 动画 ]
  3. hadoop启动报错:localhost: ssh: Could not resolve hostname localhost
  4. java - 百钱百鸡小算法
  5. 精致的App登录页设计欣赏给你灵感
  6. JavaScript 简介 1
  7. 【常识】2016-10-26
  8. 马里兰大学calce电池循环测试数据集_Nature系列/Joule/Angew/EES超强盘点:水体系电池10大热点论文及发文趋势...
  9. TestNG基础教程 - IntelliJ IDEA中配置TestNG.xml, 查看TestNG Report
  10. Windows常用快捷键和Windows CMD命令大全
  11. 克隆PDB数据库操作
  12. 关于坐标系(大地坐标、平面坐标、投影、北京54、西安80、WGS84)的一些理解
  13. css图片菜鸟教程,css 常用样式(分享)
  14. HCIP 综合实验(一)
  15. TOJ 5238: C实验:变量交换函数
  16. 超赞,52个数据可视化图表鉴赏!
  17. java 开发优势_Java最核心的优势是什么?
  18. 新手小白学影视剪辑50天日入500,她的方法秘籍全在这里了!【覃小龙课堂】
  19. Scientific Linux 6(x86_64) 之旅
  20. ALSA --- amixer控制声卡驱动实现Line-in功能

热门文章

  1. 在Win10系统上使用VScode + Cmake配置C/C++开发环境,实现一键编译运行
  2. Schwarz不等式 三角不等式
  3. java 车站分级_做题中的错误总结 - osc_p4wgjz7p的个人空间 - OSCHINA - 中文开源技术交流社区...
  4. 弹性理论法研究桩基受力计算公式_水平荷载作用下群桩计算方法研究
  5. cocos 线性插值lerp
  6. hal系统命令 android,转换 HAL 模块  |  Android 开源项目  |  Android Open Source Project...
  7. 计算机管理在扩展器,tplogin.cn扩展器设置教程(电脑版)
  8. CrowdHuman数据集
  9. MySQL利用关系代数进行查询_关系代数
  10. 从零开始 之 使用 MapReduce 对文件进行词频统计