看门狗最初的意义是因为早期嵌入式设备上的程序经常跑飞(比如说电磁干扰等),所以专门设置了一个硬件看门狗,每隔一段时间,看门狗就去检查某个参数是不是被设置了,如果发现该参数被设置了,则判断为系统出错,然后强制重启。

Watchdog是Android用于对SystemServer的参数设置进行监听的看门狗。那它看的是哪几个门呢,主要是几个重要的service的门。

ActivityManagerService

PowerManagerService

WindowManagerService

一旦发现service出了问题,就会杀掉system_server,而这也会使zygote随其一起自杀,最后导致重启java世界。

那system_server是如何使用Watchdog来为自己服务的呢?

system_server和Watchdog的交互流程可以总结为以下三个步骤:

Watchdog.getInstance().init()

Watchdog.getInstance().start().

Watchdog.getInstance().addMonitor()

这三个步骤都非常简单。先看第一步

创建和初始化Watchdog

getInstance用于创建Watchdog

public static Watchdog getInstance() {

if (sWatchdog == null) {

sWatchdog = new Watchdog();

}

return sWatchdog;

}

private Watchdog() {

super("watchdog");

// Initialize handler checkers for each common thread we want to check. Note

// that we are not currently checking the background thread, since it can

// potentially hold longer running operations with no guarantees about the timeliness

// of operations there.

// The shared foreground thread is the main checker. It is where we

// will also dispatch monitor checks and do other work.

mMonitorChecker = new HandlerChecker(FgThread.getHandler(),

"foreground thread", DEFAULT_TIMEOUT);

mHandlerCheckers.add(mMonitorChecker);

// Add checker for main thread. We only do a quick check since there

// can be UI running on the thread.

mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),

"main thread", DEFAULT_TIMEOUT));

// Add checker for shared UI thread.

mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),

"ui thread", DEFAULT_TIMEOUT));

// And also check IO thread.

mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),

"i/o thread", DEFAULT_TIMEOUT));

// And the display thread.

mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),

"display thread", DEFAULT_TIMEOUT));

// Initialize monitor for Binder threads.

addMonitor(new BinderThreadMonitor());

}

接着看看Init函数做了些什么

public void init(Context context, ActivityManagerService activity) {

mResolver = context.getContentResolver();

mActivity = activity;

context.registerReceiver(new RebootRequestReceiver(),

new IntentFilter(Intent.ACTION_REBOOT),

android.Manifest.permission.REBOOT, null);

}

2.让Watchdog看门狗跑起来

SystemServer调用了Watchdog的start函数,这将导致Watchdog的run在另外一个线程中被执行。

public void run() {

boolean waitedHalf = false;

while (true) {

final ArrayList blockedCheckers;

final String subject;

final boolean allowRestart;

int debuggerWasConnected = 0;

synchronized (this) {

long timeout = CHECK_INTERVAL;

// Make sure we (re)spin the checkers that have become idle within

// this wait-and-check interval

for (int i=0; i

HandlerChecker hc = mHandlerCheckers.get(i);

hc.scheduleCheckLocked();

}

if (debuggerWasConnected > 0) {

debuggerWasConnected--;

}

// NOTE: We use uptimeMillis() here because we do not want to increment the time we

// wait while asleep. If the device is asleep then the thing that we are waiting

// to timeout on is asleep as well and won't have a chance to run, causing a false

// positive on when to kill things.

long start = SystemClock.uptimeMillis();

while (timeout > 0) {

if (Debug.isDebuggerConnected()) {

debuggerWasConnected = 2;

}

try {

wait(timeout);

} catch (InterruptedException e) {

Log.wtf(TAG, e);

}

if (Debug.isDebuggerConnected()) {

debuggerWasConnected = 2;

}

timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);

}

final int waitState = evaluateCheckerCompletionLocked();

if (waitState == COMPLETED) {

// The monitors have returned; reset

waitedHalf = false;

continue;

} else if (waitState == WAITING) {

// still waiting but within their configured intervals; back off and recheck

continue;

} else if (waitState == WAITED_HALF) {

if (!waitedHalf) {

// We've waited half the deadlock-detection interval. Pull a stack

// trace and wait another half.

ArrayList pids = new ArrayList();

pids.add(Process.myPid());

ActivityManagerService.dumpStackTraces(true, pids, null, null,

NATIVE_STACKS_OF_INTEREST);

waitedHalf = true;

}

continue;

}

// something is overdue!

blockedCheckers = getBlockedCheckersLocked();

subject = describeCheckersLocked(blockedCheckers);

allowRestart = mAllowRestart;

}

// If we got here, that means that the system is most likely hung.

// First collect stack traces from all threads of the system process.

// Then kill this process so that the system will restart.

EventLog.writeEvent(EventLogTags.WATCHDOG, subject);

ArrayList pids = new ArrayList();

pids.add(Process.myPid());

if (mPhonePid > 0) pids.add(mPhonePid);

// Pass !waitedHalf so that just in case we somehow wind up here without having

// dumped the halfway stacks, we properly re-initialize the trace file.

final File stack = ActivityManagerService.dumpStackTraces(

!waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST);

// Give some extra time to make sure the stack traces get written.

// The system's been hanging for a minute, another second or two won't hurt much.

SystemClock.sleep(2000);

// Pull our own kernel thread stacks as well if we're configured for that

if (RECORD_KERNEL_THREADS) {

dumpKernelStackTraces();

}

// Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log

doSysRq('w');

doSysRq('l');

// Try to add the error to the dropbox, but assuming that the ActivityManager

// itself may be deadlocked. (which has happened, causing this statement to

// deadlock and the watchdog as a whole to be ineffective)

Thread dropboxThread = new Thread("watchdogWriteToDropbox") {

public void run() {

mActivity.addErrorToDropBox(

"watchdog", null, "system_server", null, null,

subject, null, stack, null);

}

};

dropboxThread.start();

try {

dropboxThread.join(2000); // wait up to 2 seconds for it to return.

} catch (InterruptedException ignored) {}

IActivityController controller;

synchronized (this) {

controller = mController;

}

if (controller != null) {

Slog.i(TAG, "Reporting stuck state to activity controller");

try {

Binder.setDumpDisabled("Service dumps disabled due to hung system process.");

// 1 = keep waiting, -1 = kill system

int res = controller.systemNotResponding(subject);

if (res >= 0) {

Slog.i(TAG, "Activity controller requested to coninue to wait");

waitedHalf = false;

continue;

}

} catch (RemoteException e) {

}

}

// Only kill the process if the debugger is not attached.

if (Debug.isDebuggerConnected()) {

debuggerWasConnected = 2;

}

if (debuggerWasConnected >= 2) {

Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");

} else if (debuggerWasConnected > 0) {

Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");

} else if (!allowRestart) {

Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");

} else {

Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);

for (int i=0; i

Slog.w(TAG, blockedCheckers.get(i).getName() + " stack trace:");

StackTraceElement[] stackTrace

= blockedCheckers.get(i).getThread().getStackTrace();

for (StackTraceElement element: stackTrace) {

Slog.w(TAG, " at " + element);

}

}

Slog.w(TAG, "*** GOODBYE!");

//这回真有问题了,所以就把自己干掉吧。

Process.killProcess(Process.myPid());

System.exit(10);

}

waitedHalf = false;

}

}

隔一段时间给另外一个线程发送一条monitor消息,那个线程将检查各个service的健康情况。而看门狗会等待检查结果,如果最后没有返回结果,那么它会杀掉systemServer.

3.列队检查

要想支持看门狗的检查,就需要让这些Service实现Monitor接口,

public interface Monitor {

void monitor();

}

例如WindowManagerServer

public class WindowManagerService extends IWindowManager.Stub

implements ==Watchdog.Monitor,== WindowManagerPolicy.WindowManagerFuncs

然后Watchdog就会调用它们的monitor函数进行检查了。

那么Service的健康是如何判定的呢。我们以WindowManagerService为例,先看看它是怎么把自己交给看门狗检查的,代码如下:

// Add ourself to the Watchdog monitors.

//在构造函数中把自己加入了Watchdog的检查列队中

Watchdog.getInstance().addMonitor(this);

而Watchdog调用各个monitor函数到底又检查了什么呢?再看看它实现的monitor函数吧。

WindowManagerServer-->

@Override

public void monitor() {

//原来monitor检查的就是这些service是不是又发生死锁了

synchronized (mWindowMap) { }

}

原来,watchdog最怕系统服务死锁了,对于这种情况也只能采取杀系统的方式了。

说明:这种情况我只碰过一次,原因是一个函数占着锁,但长时间没有返回。没有返回的原因是这个函数需要和硬件交互,而硬件又没有及时返回。

java实现看门狗_Watchdog看门狗分析相关推荐

  1. 【嵌入式系统】独立看门狗原理+看门狗实验分析

    [嵌入式系统]独立看门狗原理+看门狗实验分析 1.看门狗模块概述 在由单片机构成的微机系统中,由于单片机工作常常会受到来自外界电磁场干扰导致程序跑飞,陷入死循环--即程序正常运行被打断,系统无法继续工 ...

  2. TLF35584看门狗(窗狗,功能狗)

    TLF35584看门狗 窗狗 功能狗 35584有两个相互独立的看门狗,可编程触发(SPI写入WWDSCMD或pin WDI触发)的窗狗.一个独立的功能或问答监督(FWD)看门狗.两者都是用来监控MC ...

  3. 【STM32】HAL库 STM32CubeMX教程五----看门狗(独立看门狗,窗口看门狗)

    前言:   今天我们来学习看门狗的配置与函数,看门狗可以有效解决程序的跑飞,在使用过程中比较常见,是防止芯片故障的有效外设,我们一起来学习下HAL库 STM32CubeMX的独立看门狗,窗口看门狗的使 ...

  4. STM32的两只狗儿——狗立看门狗

    为什么80%的码农都做不了架构师?>>>    STM32狗立看门狗,由专门的低速时钟(LSI)驱动,即便是主时钟发生故障它仍能够有效,所以此狗狗可以工作在与主时钟无关的要求下,或者 ...

  5. 什么是看门狗(watchdog)?看门狗有什么作用?

    什么是看门狗(watchdog) 看门狗,又叫 watchdog timer,是一个定时器电路, 一般有一个输入,叫喂狗,一个输出到MCU的RST端,MCU正常工作的时候,每隔一端时间输出一个信号到喂 ...

  6. linux 喂狗时间,看门狗喂狗时间及程序

    什么是看门狗定时器 看门狗定时器(WDT,Watch Dog TImer)是单片机的一个组成部分,它实际上是一个计数器,一般给看门狗一个数字,程序开始运行后看门狗开始倒计数.如果程序运行正常,过一段时 ...

  7. linux需要看门狗喂狗程序,多任务看门狗, 喂狗方法

    看门狗分和 软件看门狗.硬件看门狗是利用一个定时器电路,其定时输出连接到电路的复位端,程序在一定时间范围内对定时器清零(俗称"喂狗"),因此程序正常工作时, 定时器总不能溢出,也就 ...

  8. c语言中什么叫喂狗,STM32是怎么初始化看门狗和喂狗的

    STM32是怎么初始化看门狗和喂狗的 看门狗初始化步骤: 1.IWDG_PR和IWDG_RLR寄存器具有写保护功能.要修改这两个寄存器的值,必须先向 IWDG_KR寄存器中写入0x5555. 2.设置 ...

  9. 多任务看门狗, 喂狗方法

    看门狗分硬件看门狗和 软件看门狗.硬件看门狗是利用一个定时器电路,其定时输出连接到电路的复位端,程序在一定时间范围内对定时器清零(俗称"喂狗"),因此程序正常工作时, 定时器总不能 ...

最新文章

  1. 交易中台架构设计:海量并发高扩展,新业务秒级接入
  2. Android标题栏,状态栏
  3. java线程同步barrier_Java多线程同步工具类之CyclicBarrier
  4. 深度学习核心技术精讲100篇(二)-图网络中的社群及社群发现算法
  5. python字符集_PYTHON 中的字符集
  6. [1197]约瑟夫问题 (循环链表)SDUT
  7. T-SQL with关键字
  8. Java 线程池的复用原理
  9. UVA545 LA5263 Heads【对数】
  10. php图像生成和处理,PHP的gd库(图像生成和处理)的应用
  11. oracle between 和大于小于性能_2.oracle伪例+序列
  12. HTML5页面实现文件下载
  13. SocksCap64应用程序通过SOCKS代理
  14. 8266WIFI模块
  15. win10打开计算机出现马赛克,传授win10系统在线播放视频出现马赛克的技巧
  16. undo和redo日志
  17. 错误:ssh_exchange_identification: read: connection reset by peer
  18. 【PTA 7-9】剥洋葱
  19. zoom,登录失败,错误代码(1044)
  20. strongswan官方文档

热门文章

  1. 死了都要try.【转】
  2. 独立开发变现周刊(第55期):构建Ruby on Rails课程平台月收入6万美金
  3. gpg4win使用教程_Gpg4win使用教程
  4. EasyRecovery15电脑装机下载必备的数据恢复软件
  5. 如何编译ASP.NET Core源码
  6. 小学老师工资多少一个月_当农村小学教师工资一年有多少,我给你们看一看
  7. Shadowify(PS投影插件)v1.0版本更新
  8. 工资少不加班与工资高但996,你选哪个
  9. 声网-本地视频录制sdk配置说明
  10. vc messagebox怎么选择选项_亚马逊VC卖家被迫转向第三方卖家,下一步要怎么做?...