Platform: RK3368

OS: Android 6.0

Kernel: 3.10.0


文章目录

  • 现象
  • 调试分析
  • 代码分析Wifif enableNetwork流程:
  • 关于AsyncChannel

现象

logcat Watchdog系统重启日志:

    10-13 17:17:23.106   571   614 W IPCThreadState: Waiting for thread to be free. mExecutingThreadsCount=15 mMaxThreads=1510-13 17:17:34.195   571   922 I Process : Sending signal. PID: 571 SIG: 310-13 17:17:34.196   571   576 I art     : Thread[2,tid=576,WaitingInMainSignalCatcherLoop,Thread*=0x55840f6430,peer=0x12c010a0,"Signal Catcher"]: reacting to signal 310-13 17:17:34.875   571   625 E SensorsHal: poll() failed (Interrupted system call)10-13 17:17:36.779   571   656 D WifiConfigStore: Retrieve network priorities after PNO.10-13 17:17:36.781   571   576 I art     : Wrote stack traces to '/data/anr/traces.txt'10-13 17:17:40.746   571   922 I Watchdog_N: dumpKernelStacks10-13 17:17:40.859   571   922 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.Watchdog$BinderThreadMonitor on foreground thread (android.fg)10-13 17:17:40.859   571   922 W Watchdog: foreground thread stack trace:10-13 17:17:40.860   571   922 W Watchdog:     at android.os.Binder.blockUntilThreadAvailable(Native Method)10-13 17:17:40.860   571   922 W Watchdog:     at com.android.server.Watchdog$BinderThreadMonitor.monitor(Watchdog.java:200)10-13 17:17:40.860   571   922 W Watchdog:     at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:172)10-13 17:17:40.860   571   922 W Watchdog:     at android.os.Handler.handleCallback(Handler.java:739)10-13 17:17:40.860   571   922 W Watchdog:     at android.os.Handler.dispatchMessage(Handler.java:95)10-13 17:17:40.860   571   922 W Watchdog:     at android.os.Looper.loop(Looper.java:148)10-13 17:17:40.860   571   922 W Watchdog:     at android.os.HandlerThread.run(HandlerThread.java:61)10-13 17:17:40.860   571   922 W Watchdog:     at com.android.server.ServiceThread.run(ServiceThread.java:46)10-13 17:17:40.860   571   922 W Watchdog: *** GOODBYE!10-13 17:17:40.860   571   922 I Process : Sending signal. PID: 571 SIG: 910-13 17:17:40.977   199   199 E installd: eof10-13 17:17:40.977   199   199 E installd: failed to read size10-13 17:17:40.977   199   199 I installd: closing connection10-13 17:17:40.978   188   188 I lowmemorykiller: ActivityManager disconnected10-13 17:17:40.978   188   188 I lowmemorykiller: Closing Activity Manager data connection10-13 17:17:40.988   189   189 I ServiceManager: service 'lock_settings' died10-13 17:17:40.988   189   189 I ServiceManager: service 'network_management' died10-13 17:17:40.988   189   189 I ServiceManager: service 'clipboard' died10-13 17:17:40.988   189   189 I ServiceManager: service 'batterystats' died

kernel 日志:

    [ 1012.641269]  [3:       watchdog:  922] SysRq : Show Blocked State[ 1012.641365]  [3:       watchdog:  922]   task                        PC stack   pid father[ 1012.641535]  [3:       watchdog:  922] rknand_bg       D ffffffc0000856c0     0   153      2 0x00000000[ 1012.641570]  [3:       watchdog:  922] Call trace:[ 1012.641613]  [3:       watchdog:  922] [<ffffffc0000856c0>] __switch_to+0x80/0x8c[ 1012.641650]  [3:       watchdog:  922] [<ffffffc0008347dc>] __schedule+0x400/0x608[ 1012.641682]  [3:       watchdog:  922] [<ffffffc000834a48>] schedule+0x64/0x70[ 1012.641714]  [3:       watchdog:  922] [<ffffffc000832e34>] schedule_timeout+0x1cc/0x1fc[ 1012.641770]  [3:       watchdog:  922] [<ffffffbffc000ca0>] nand_blktrans_bg_thread+0x28c/0x2fc [rk30xxnand_ko][ 1012.641807]  [3:       watchdog:  922] [<ffffffc0000c8024>] kthread+0xac/0xb8[ 1012.641835]  [3:       watchdog:  922] rknand          D ffffffc0000856c0     0   154      2 0x00000000[ 1012.641867]  [3:       watchdog:  922] Call trace:[ 1012.641897]  [3:       watchdog:  922] [<ffffffc0000856c0>] __switch_to+0x80/0x8c[ 1012.641928]  [3:       watchdog:  922] [<ffffffc0008347dc>] __schedule+0x400/0x608[ 1012.641959]  [3:       watchdog:  922] [<ffffffc000834a48>] schedule+0x64/0x70[ 1012.641989]  [3:       watchdog:  922] [<ffffffc000832e34>] schedule_timeout+0x1cc/0x1fc[ 1012.642041]  [3:       watchdog:  922] [<ffffffbffc001854>] nand_blktrans_thread+0x1a8/0x61c [rk30xxnand_ko][ 1012.642074]  [3:       watchdog:  922] [<ffffffc0000c8024>] kthread+0xac/0xb8[ 1012.642632]  [3:       watchdog:  922] Sched Debug Version: v0.10, 3.10.0 #441...[ 1012.836914]  [1:  kworker/u16:0:    6] binder: release 571:604 transaction 20644 in, still active[ 1012.837026]  [1:  kworker/u16:0:    6] binder: send failed reply for transaction 20644 to 711:1430[ 1012.837147]  [1:  kworker/u16:0:    6] binder: release 571:605 transaction 38506 in, still activell active

调试分析

看日志只能知道是MonitorChecker超时了,BinderThreadMonitor阻塞导致的。此时只看logcat日志是无法定位到更深层次的原因,就要看traces文件日志了;

触发ANR,dump trace文件的场景如下:

  • Service Timeout:比如前台服务在20s内未执行完成
  • BroadcastQueue Timeout:比如前台广播在10s内未执行完成
  • ContentProvider Timeout:内容提供者,在publish过超时10s
  • InputDispatching Timeout: 输入事件分发超时5s,包括按键和触摸事件

其实还有一种情况也会生成trace文件,那就是触发Framework的Watchdog的时候。

ANR traces 文件默认是由SystemProperties dalvik.vm.stack-trace-file指定的:
dalvik.vm.stack-trace-file = /data/anr/traces.txt

根据日志10-13 17:17:36.781 571 576 I art : Wrote stack traces to ‘/data/anr/traces.txt’,我们查找trace文件对应时间段的日志,发现多个binder在Native Waiting阻塞,是APP在调用WifiManager.enableNetwork,达到32个:

        "Binder_1" prio=5 tid=8 Waiting| group="main" sCount=1 dsCount=0 obj=0x12ca00a0 self=0x55841284e0| sysTid=604 nice=0 cgrp=default sched=0/0 handle=0x7f829f4450| state=S schedstat=( 0 0 0 ) utm=33 stm=17 core=0 HZ=100| stack=0x7f828f8000-0x7f828fa000 stackSize=1013KB| held mutexes=at java.lang.Object.wait!(Native method)- waiting on <0x091b4b3e> (a java.lang.Object)at com.android.internal.util.AsyncChannel$SyncMessenger.sendMessageSynchronously(AsyncChannel.java:820)- locked <0x091b4b3e> (a java.lang.Object)at com.android.internal.util.AsyncChannel$SyncMessenger.-wrap0(AsyncChannel.java:-1)at com.android.internal.util.AsyncChannel.sendMessageSynchronously(AsyncChannel.java:654)at com.android.internal.util.AsyncChannel.sendMessageSynchronously(AsyncChannel.java:699)at com.android.server.wifi.WifiStateMachine.syncEnableNetwork(WifiStateMachine.java:2328)at com.android.server.wifi.WifiServiceImpl.enableNetwork(WifiServiceImpl.java:1034)at android.net.wifi.IWifiManager$Stub.onTransact(IWifiManager.java:141)at android.os.Binder.execTransact(Binder.java:453)

经过多次测试分析发现一些:

  1. 每次都是开机16分钟后出现"Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.Watchdog$BinderThreadMonitor on foreground thread (android.fg)";
  2. 都是出现"W IPCThreadState: Waiting for thread to be free. mExecutingThreadsCount=15 mMaxThreads=15",即Binder线程达到最大值,已经没有线程去处理新的binder消息,导致binder transaction failed.
  3. 结合trace文件分析,发现确实有多个Binder处于Waiting状态,都是阻塞在com.android.server.wifi.WifiServiceImpl.enableNetwork;
  4. 发现有个APP会频繁的调用enableNetwork,而network id是-1,问题可能是由此引发的.

代码分析Wifif enableNetwork流程:

frameworks/base/wifi/java/android/net/wifi/WifiManager.java

     * @return {@code true} if the operation succeeded*/public boolean enableNetwork(int netId, boolean disableOthers) {final boolean pin = disableOthers && mTargetSdkVersion < Build.VERSION_CODES.LOLLIPOP;if (pin) {registerPinningNetworkCallback();}boolean success;try {success = mService.enableNetwork(netId, disableOthers);} catch (RemoteException e) {success = false;}if (pin && !success) {unregisterPinningNetworkCallback();}return success;}

APP 调用WifiManager.enableNetwork,通过AIDL与服务端进行跨进程通信。

frameworks/base/wifi/java/android/net/wifi/IWifiManager.aidl

    interface IWifiManager{int getSupportedFeatures();WifiActivityEnergyInfo reportActivityInfo();List<WifiConfiguration> getConfiguredNetworks();List<WifiConfiguration> getPrivilegedConfiguredNetworks();WifiConfiguration getMatchingWifiConfig(in ScanResult scanResult);int addOrUpdateNetwork(in WifiConfiguration config);boolean removeNetwork(int netId);boolean enableNetwork(int netId, boolean disableOthers);

服务端:

frameworks/opt/net/wifi/service/java/com/android/server/wifi/WifiServiceImpl.java

        /*** See {@link android.net.wifi.WifiManager#enableNetwork(int, boolean)}* @param netId the integer that identifies the network configuration* to the supplicant* @param disableOthers if true, disable all other networks.* @return {@code true} if the operation succeeded*/public boolean enableNetwork(int netId, boolean disableOthers) {enforceChangePermission();if (mWifiStateMachineChannel != null) {return mWifiStateMachine.syncEnableNetwork(mWifiStateMachineChannel, netId,disableOthers);} else {Slog.e(TAG, "mWifiStateMachineChannel is not initialized");return false;}}

frameworks/opt/net/wifi/service/java/com/android/server/wifi/WifiStateMachine.java

        /*** Enable a network** @param netId         network id of the network* @param disableOthers true, if all other networks have to be disabled* @return {@code true} if the operation succeeds, {@code false} otherwise*/public boolean syncEnableNetwork(AsyncChannel channel, int netId, boolean disableOthers) {Log.d(TAG, "syncEnableNetwork "+channel);Message resultMsg = channel.sendMessageSynchronously(CMD_ENABLE_NETWORK, netId,disableOthers ? 1 : 0);boolean result = (resultMsg.arg1 != FAILURE);resultMsg.recycle();Log.d(TAG, "syncEnableNetwork "+channel +" result="+result);return result;}
    class ConnectModeState extends State {@Overridepublic void enter() {connectScanningService();}@Overridepublic boolean processMessage(Message message) {WifiConfiguration config;int netId;boolean ok;boolean didDisconnect;String bssid;String ssid;NetworkUpdateResult result;logStateAndMessage(message, getClass().getSimpleName());switch (message.what) {case WifiMonitor.ASSOCIATION_REJECTION_EVENT:......case CMD_ENABLE_NETWORK:boolean disableOthers = message.arg2 == 1;netId = message.arg1;config = mWifiConfigStore.getWifiConfiguration(netId);if (config == null) {loge("No network with id = " + netId);messageHandlingStatus = MESSAGE_HANDLING_STATUS_FAIL;break;}// Tell autojoin the user did try to select to that network// However, do NOT persist the choice by bumping the priority of the networkif (disableOthers) {mWifiAutoJoinController.updateConfigurationHistory(netId, true, false);// Set the last selected configuration so as to allow the system to// stick the last user choice without persisting the choicemWifiConfigStore.setLastSelectedConfiguration(netId);// Remember time of last connection attemptlastConnectAttemptTimestamp = System.currentTimeMillis();mWifiConnectionStatistics.numWifiManagerJoinAttempt++;}// Cancel auto roam requestsautoRoamSetBSSID(netId, "any");int uid = message.sendingUid;ok = mWifiConfigStore.enableNetwork(netId, disableOthers, uid);if (!ok) {messageHandlingStatus = MESSAGE_HANDLING_STATUS_FAIL;}replyToMessage(message, message.what, ok ? SUCCESS : FAILURE);break;case CMD_ENABLE_ALL_NETWORKS:

看到问题了,ConnectModeState处理CMD_ENABLE_NETWORK命令时,如果没有找到netId对应的网络,直接break了,而AsyncChannel的客户端还一直在等应答。如果APP频繁的调用enableNetwork,而且network id是-1,就会导致“W IPCThreadState: Waiting for thread to be free. mExecutingThreadsCount=15 mMaxThreads=15”.

修改后的代码:

    --- a/service/java/com/android/server/wifi/WifiStateMachine.java+++ b/service/java/com/android/server/wifi/WifiStateMachine.java@@ -7356,6 +7358,7 @@ public class WifiStateMachine extends StateMachine implements WifiNative.WifiPnoif (config == null) {loge("No network with id = " + netId);messageHandlingStatus = MESSAGE_HANDLING_STATUS_FAIL;+                        replyToMessage(message, message.what, FAILURE);break;}

关于AsyncChannel

AsyncChannel为两个Handler之间的异步通道,Handler可能在同一个进程中,也可能在另一个进程中。AysncChannel可以使用两种协议样式。

  • 第一种是简单的请求/应答协议,其中服务器不需要知道是哪个客户机发出请求。在简单的请求/应答协议中,客户端/源端向服务器/目的地发送请求。服务器使用replyToMessage方法。

  • 第二种是服务器/目的地也需要知道它连接的是哪个客户端。例如,服务器需要将未经请求的消息发送回客户机。或者服务器为每个客户机保留不同的状态。在这个模型中,服务器还将使用连接方法。

frameworks/base/core/java/com/android/internal/util/AsyncChannel.java

    private static Message sendMessageSynchronously(Messenger dstMessenger, Message msg) {SyncMessenger sm = SyncMessenger.obtain();try {if (dstMessenger != null && msg != null) {msg.replyTo = sm.mMessenger;synchronized (sm.mHandler.mLockObject) {dstMessenger.send(msg);sm.mHandler.mLockObject.wait();}} else {sm.mHandler.mResultMsg = null;}} catch (InterruptedException e) {sm.mHandler.mResultMsg = null;} catch (RemoteException e) {sm.mHandler.mResultMsg = null;}Message resultMsg = sm.mHandler.mResultMsg;sm.recycle();return resultMsg;}

系统死机重启调试分析相关推荐

  1. 恒玄BES调试笔记-BES2500死机重启分析手段

    有时候会遇到死机重启等问题,一般可以通过分析lst,map文件可以看出点端倪! lst文件的生成,其实是在编译命令的基础上加上all lst参数即可,这两个文件都可以在out目录下找到. make T ...

  2. STM32 软件重启导致死机的情况分析

    一.背景 在一个项目中需要用到看门狗和软件重启,但是在工作过程中发现存在一种情况会导致STM32死机,经过分析是软件重启过程中复位信号被外部电路强制拉高,无法在复位后4个时钟拉低复位管脚,因此,将该复 ...

  3. 如果计算机黑屏请分析原因是什么原因,电脑系统黑屏死机故障原因分析处理

    电脑黑屏的问题经常发生,其中的原因涉及到很多方面,网罗计算机里面所有的软件与硬件,系统死机故障多半表现为黑屏,这类故障与显示器.显示卡关系很密切,同时系统主板.CPU.CACHE.内存条,电源等部件的 ...

  4. Linux拷贝数据死机了,Linux系统“死机”时怎么办?

    如果问题能够再现,那么问题已经解决 80% 了.对于操作系统核心而言,如果有问题的再现方法,那么可以说是已经解决 99% 了.经常遇到的问题是系统可以正常运行一段时间,然后死机.如果不好再现问题,那么 ...

  5. linux进程假死的原因_Linux系统死机的原因及解决办法

    系统出现死机,一般分为两种情况:一是硬件问题;二是软件问题. 一.硬件问题 可以考虑分析以下几点: 1.不要超频CPU,如果已经超频的先回复到原来的频率 虽然平时运行没有问题,但可能在高负载的使用中就 ...

  6. 插入u盘需重启计算机,电脑一插U盘就死机重启如何解决

    要使用U盘我们首先就需要将U盘插入电脑,但是很多朋友却往往连这第一步都完不成,因为他们刚把U盘插入电脑后,电脑就死机重启了.其实对于U盘等外部设备连接引起的电脑死机问题,我们应该具体问题具体分析,对症 ...

  7. android 常见死机问题-如何分析

    android 常见死机问题--log分析 ============================================================================== ...

  8. android 常见死机问题--log分析

    android 常见死机问题--log分析 ============================================================================== ...

  9. ubuntu16.04系统死机,鼠标可移动,但点击无反应,键盘无反应,解决办法总结

    ubuntu16.04系统死机,鼠标可移动,但点击无反应,键盘无反应,解决办法总结 Ubuntu16.04下这两天频繁死机,情况是鼠标可移动,但点击没反应,键盘按键也没反应,不想强制重启关机,毕竟可能 ...

最新文章

  1. 《拥抱变化——社交网络时代的企业转型之道》一构筑社会信任
  2. nginx reload内存碎片问题-(一)
  3. Oracle的分页实现
  4. 操作系统之进程管理:18、预防死锁
  5. 效果好,速度快!DenseNAS:密集连接搜索空间下的高灵活度网络结构搜索
  6. Spring Session + Redis实现分布式Session共享
  7. [51单片机] SPI nRF24L01无线 [可以放在2个单片机里实现通信]
  8. 关于使用JAVA正则表达式报java.lang.StackOverflowError错误问题
  9. Maven 在eclipse中如何配置
  10. winform 属性
  11. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition(SPP-net)
  12. 苹果手机如何用计算机,电脑怎么控制iPhone手机
  13. 为什么阿里巴巴的企业Logo是它?
  14. linux的dve界面如何debug,VCS课时3:使用DVE进行Debug
  15. css 语音,CSS3新特性一览-- 语音
  16. html如何让图片不失真,图片放大不失真的几种方法
  17. Nginx学习(四)
  18. 使用组策略将用户文件夹重定向到D盘
  19. 认识一下网关,路由器,交换机,猫
  20. 弘辽科技:B站携手阿里入股如涵 UP主里要诞生下一个李佳琦?

热门文章

  1. go 获得 mysql 实际运行 SQL_mysql对标准sql的goup by进行了扩展 | 学步园
  2. 生物统计学(biostatistics)学习笔记(五)卡方检验
  3. uniapp微信小程序canvas生成简单海报并下载
  4. PHP中htmlentities和 htmlspecialchars区别
  5. java实现的本地文件管理器——仿Windows文件资源管理器
  6. unescape的C++实现
  7. 组合导航原理剖析(二):惯性导航方法与应用综述
  8. Vue环境下,点击按钮进行上下页的切换
  9. 绩效考核主义该缓行了
  10. WIN7 I2TP/IPSEC连接 错误809