本文基于 Android 7.1

一、SignalCatcher 线程的启动

1.1 StartSignalCatcher

runtime.cc

void Runtime::InitNonZygoteOrPostFork(JNIEnv* env, bool is_system_server, NativeBridgeAction action, const char* isa) {...StartSignalCatcher();...
}

由上面可知,SignalCatcher 线程是在 InitNonZygoteOrPostFork 方法中启动的
runtime.cc

void Runtime::StartSignalCatcher() {if (!is_zygote_) {signal_catcher_ = new SignalCatcher(stack_trace_file_);}
}

如果不是 zygote 进程,则创建一个 SignalCatcher,由此也可以知道 zygote 进程中是没有 SignalCatcher 线程的,并且用 adb shell ps -t 可以进行确认

1.2 创建 SignalCatcher

1.2.1 SignalCatcher(…)

signal_catcher.cc

SignalCatcher::SignalCatcher(const std::string& stack_trace_file): stack_trace_file_(stack_trace_file),lock_("SignalCatcher lock"),cond_("SignalCatcher::cond_", lock_),thread_(nullptr) {SetHaltFlag(false);// Create a raw pthread; its start routine will attach to the runtime.CHECK_PTHREAD_CALL(pthread_create, (&pthread_, nullptr, &Run, this), "signal catcher thread");Thread* self = Thread::Current();MutexLock mu(self, lock_);while (thread_ == nullptr) {cond_.Wait(self);}
}

CHECK_PTHREAD_CALL(pthread_create, (&pthread_, nullptr, &Run, this), "signal catcher thread") 实际上会调用 pthread_create(&pthread_, nullptr, &Run, this) 即新创建一个线程,并调用 Run(this) 方法,pthread_ 会指向新创建的线程

1.2.2 SignalCatcher::Run

signal_catcher.cc

void* SignalCatcher::Run(void* arg) {SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);CHECK(signal_catcher != nullptr);Runtime* runtime = Runtime::Current();// 将当前线程 attach 到当前的 JavaVMCHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(),!runtime->IsAotCompiler()));Thread* self = Thread::Current();DCHECK_NE(self->GetState(), kRunnable);{MutexLock mu(self, signal_catcher->lock_);signal_catcher->thread_ = self;signal_catcher->cond_.Broadcast(self);}// Set up mask with signals we want to handle.SignalSet signals;signals.Add(SIGQUIT);signals.Add(SIGUSR1);while (true) {// 见 1.2.3int signal_number = signal_catcher->WaitForSignal(self, signals);if (signal_catcher->ShouldHalt()) {runtime->DetachCurrentThread();return nullptr;}switch (signal_number) {case SIGQUIT:signal_catcher->HandleSigQuit();break;case SIGUSR1:signal_catcher->HandleSigUsr1();break;default:LOG(ERROR) << "Unexpected signal %d" << signal_number;break;}}
}

由上可知,其会添加想要 sigwait() 的信号(SIGQUIT、SIGUSR1),并执行 WaitForSignal 等待信号的到来,然后对信号分类进行处理

1.2.3 WaitForSignal

signal_catcher.cc

int SignalCatcher::WaitForSignal(Thread* self, SignalSet& signals) {ScopedThreadStateChange tsc(self, kWaitingInMainSignalCatcherLoop);// Signals for sigwait() must be blocked but not ignored.  We// block signals like SIGQUIT for all threads, so the condition// is met.  When the signal hits, we wake up, without any signal// handlers being invoked.int signal_number = signals.Wait();if (!ShouldHalt()) {// Let the user know we got the signal, just in case the system's too screwed for us to// actually do what they want us to do...LOG(INFO) << *self << ": reacting to signal " << signal_number;// If anyone's holding locks (which might prevent us from getting back into state Runnable), say so...Runtime::Current()->DumpLockHolders(LOG(INFO));}return signal_number;
}

注意上面的注释:这里 wait 的 Signals 必须是 blocked,但不是 ignored 的. 因为对于所有线程我们将类似于 SIGQUIT 的信号都 block 了(见下一节),因此条件达成。当信号到来时,程序会唤醒,并且没有 signal handlers 被调用

1.2.4 BlockSignals

runtime.cc

bool Runtime::Init(RuntimeArgumentMap&& runtime_options_in) {...BlockSignals();...
}void Runtime::BlockSignals() {SignalSet signals;signals.Add(SIGPIPE);// SIGQUIT is used to dump the runtime's state (including stack traces).signals.Add(SIGQUIT);// SIGUSR1 is used to initiate a GC.signals.Add(SIGUSR1);signals.Block();
}

在虚拟机的创建过程中会将信号 block

二、HandleSigQuit

当收到 SIGQUIT 信号,即 signal 3 时,会调用 signal_catcher->HandleSigQuit() 来 dump 一些信息和 stack traces

2.1 SignalCatcher::HandleSigQuit

signal_catcher.cc

void SignalCatcher::HandleSigQuit() {Runtime* runtime = Runtime::Current();std::ostringstream os;// ----- pid 2830 at 2017-11-16 11:22:53 -----os << "\n"<< "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n";// Cmd line: system_serverDumpCmdLine(os);std::string fingerprint = runtime->GetFingerprint();// Build fingerprint: 'Xiaomi/cancro_wc_lte/cancro:6.0.1/MMB29M/1.1.1:user/test-keys'os << "Build fingerprint: '" << (fingerprint.empty() ? "unknown" : fingerprint) << "'\n";// ABI: 'arm'os << "ABI: '" << GetInstructionSetString(runtime->GetInstructionSet()) << "'\n";// Build type: optimizedos << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n";runtime->DumpForSigQuit(os);if ((false)) {std::string maps;if (ReadFileToString("/proc/self/maps", &maps)) {os << "/proc/self/maps:\n" << maps;}}// ----- end 2830 -----os << "----- end " << getpid() << " -----\n";Output(os.str());
}

2.2 DumpForSigQuit

runtime.cc

void Runtime::DumpForSigQuit(std::ostream& os) {// Zygote loaded classes=4188 post zygote classes=3570GetClassLinker()->DumpForSigQuit(os);// Intern table: 59686 strong; 10043 weakGetInternTable()->DumpForSigQuit(os);// JNI: CheckJNI is off; globals=1993 (plus 2995 weak)// Libraries: /system/lib/hw/gralloc.msm8974.so ...GetJavaVM()->DumpForSigQuit(os);oat_file_manager_->DumpForSigQuit(os);if (GetJit() != nullptr) {GetJit()->DumpForSigQuit(os);} else {os << "Running non JIT\n";}TrackedAllocators::Dump(os);os << "\n";thread_list_->DumpForSigQuit(os);BaseMutex::DumpAll(os);
}

thread_list_->DumpForSigQuit(os) 是关键的 dump,会 dump stack traces

2.3 thread_list_->DumpForSigQuit

2.3.1 ThreadList::DumpForSigQuit

thread_list.cc

void ThreadList::DumpForSigQuit(std::ostream& os) {{ScopedObjectAccess soa(Thread::Current());// Only print if we have samples.if (suspend_all_historam_.SampleSize() > 0) {Histogram<uint64_t>::CumulativeData data;suspend_all_historam_.CreateHistogram(&data);suspend_all_historam_.PrintConfidenceIntervals(os, 0.99, data);  // Dump time to suspend.}}bool dump_native_stack = Runtime::Current()->GetDumpNativeStackOnSigQuit();Dump(os, dump_native_stack);// dump 当前进程中没有 attach 的线程的 stack tracesDumpUnattachedThreads(os, dump_native_stack);
}

2.3.2 ThreadList::Dump

thread_list.cc

void ThreadList::Dump(std::ostream& os, bool dump_native_stack) {{MutexLock mu(Thread::Current(), *Locks::thread_list_lock_);os << "DALVIK THREADS (" << list_.size() << "):\n";}DumpCheckpoint checkpoint(&os, dump_native_stack);size_t threads_running_checkpoint;{// Use SOA to prevent deadlocks if multiple threads are calling Dump() at the same time.ScopedObjectAccess soa(Thread::Current());threads_running_checkpoint = RunCheckpoint(&checkpoint);}if (threads_running_checkpoint != 0) {checkpoint.WaitForThreadsToRunThroughCheckpoint(threads_running_checkpoint);}
}

由上面可以看到其创建了一个 DumpCheckpoint 对象 checkpoint,然后调用 RunCheckpoint(&checkpoint),下面我们看一下 DumpCheckpoint 是什么

2.3.3 DumpCheckpoint

thread_list.cc

class DumpCheckpoint FINAL : public Closure {public:DumpCheckpoint(std::ostream* os, bool dump_native_stack): os_(os),barrier_(0),backtrace_map_(dump_native_stack ? BacktraceMap::Create(getpid()) : nullptr),dump_native_stack_(dump_native_stack) {}void Run(Thread* thread) OVERRIDE {Thread* self = Thread::Current();std::ostringstream local_os;{ScopedObjectAccess soa(self);// 1. dump traces 等if (!timeout_threads_.empty()&& find(timeout_threads_.begin(), timeout_threads_.end(), thread) != timeout_threads_.end()) {Thread::DumpState(local_os, thread, thread->GetTid(), true);} else {thread->Dump(local_os, dump_native_stack_, backtrace_map_.get());}}local_os << "\n";{// Use the logging lock to ensure serialization when writing to the common ostream.MutexLock mu(self, *Locks::logging_lock_);*os_ << local_os.str();}// 2. 每个线程在 Run 函数中 Dump thread 完成后,会通知 barrier_ 对其 count_ -1,当 count_ 为0时,说明所有线程已经完成 dump,同时把 thread_list_ 中完成 dump 的 thread 去掉 barrier_.Pass(self, thread);}private:// The common stream that will accumulate all the dumps.std::ostream* const os_;// The barrier to be passed through and for the requestor to wait upon.Barrier barrier_;// A backtrace map, so that all threads use a shared info and don't reacquire/parse separately.std::unique_ptr<BacktraceMap> backtrace_map_;// Whether we should dump the native stack.const bool dump_native_stack_;std::list<Thread*> timeout_threads_;
};

可以看到:

  • 创建 DumpCheckpoint 对象时,仅仅是对一些成员变量进行赋值
  • DumpCheckpoint 的 Run 方法主要实现了两方面的功能:
    • 其是真正执行 dump 信息的地方
    • 每个线程在 Run 函数中 Dump thread 完成后,会通知 barrier_ 对其 count_ -1,当 count_ 为0时,说明所有线程已经完成 dump,同时把 thread_list_ 中完成 dump 的 thread 去掉

下面我们再来看一下 RunCheckpoint 做了什么

2.3.4 ThreadList::RunCheckpoint

thread_list.cc

size_t ThreadList::RunCheckpoint(Closure* checkpoint_function, bool isDumpCheckpoint) { // isDumpCheckpoint 默认为 falseThread* self = Thread::Current();Locks::mutator_lock_->AssertNotExclusiveHeld(self);Locks::thread_list_lock_->AssertNotHeld(self);Locks::thread_suspend_count_lock_->AssertNotHeld(self);std::vector<Thread*> suspended_count_modified_threads;size_t count = 0;{// Call a checkpoint function for each thread, threads which are suspend get their checkpoint// manually called.MutexLock mu(self, *Locks::thread_list_lock_);MutexLock mu2(self, *Locks::thread_suspend_count_lock_);if(isDumpCheckpoint) {((DumpCheckpoint *)checkpoint_function)->SetThreadList(self, list_);}// 1. 对于 list_ 中的 thread 分情况进行处理count = list_.size();for (const auto& thread : list_) {if (thread != self) {while (true) {if (thread->RequestCheckpoint(checkpoint_function)) {// This thread will run its checkpoint some time in the near future.break;} else {// 对于 suspended 线程,先 modify SuspendCount,然后将其加入 suspended_count_modified_threads 中,后面会继续对 suspended_count_modified_threads 进行处理if (thread->GetState() == kRunnable) {// Spurious fail, try again.continue;}thread->ModifySuspendCount(self, +1, nullptr, false);suspended_count_modified_threads.push_back(thread);break;}}}}}// Run the checkpoint on ourself while we wait for threads to suspend.// 2. 对于 Signal Catcher 线程,在这里执行 CheckPoint function 的 Run 函数调用,进行 Thread dumpcheckpoint_function->Run(self);// Run the checkpoint on the suspended threads.for (const auto& thread : suspended_count_modified_threads) {if (!thread->IsSuspended()) {if (ATRACE_ENABLED()) {std::ostringstream oss;thread->ShortDump(oss);ATRACE_BEGIN((std::string("Waiting for suspension of thread ") + oss.str()).c_str());}// Busy wait until the thread is suspended.const uint64_t start_time = NanoTime();do {ThreadSuspendSleep(kThreadSuspendInitialSleepUs);} while (!thread->IsSuspended());const uint64_t total_delay = NanoTime() - start_time;// Shouldn't need to wait for longer than 1000 microseconds.constexpr uint64_t kLongWaitThreshold = MsToNs(1);ATRACE_END();}// We know for sure that the thread is suspended at this point.// 3. 对于 suspended 线程,执行 checkpoint_function 的 Run 方法checkpoint_function->Run(thread);{MutexLock mu2(self, *Locks::thread_suspend_count_lock_);// 4. 对于已经 dump 的线程,将其 suspend count -1thread->ModifySuspendCount(self, -1, nullptr, false);}}{// 5. Imitate ResumeAll, threads may be waiting on Thread::resume_cond_ since we raised their// suspend count. Now the suspend_count_ is lowered so we must do the broadcast.MutexLock mu2(self, *Locks::thread_suspend_count_lock_);Thread::resume_cond_->Broadcast(self);}// return 的是 thread_list 的 sizereturn count;
}

2.3.5 WaitForThreadsToRunThroughCheckpoint

thread_list.cc

class DumpCheckpoint FINAL : public Closure {public:void WaitForThreadsToRunThroughCheckpoint(size_t threads_running_checkpoint) {Thread* self = Thread::Current();ThreadState new_state = kWaitingForCheckPointsToRun;if(Locks::abort_lock_->IsExclusiveHeld(self) && self->GetState() == kRunnable) {new_state = kRunnable;}ScopedThreadStateChange tsc(self, new_state);bool timed_out = barrier_.Increment(self, threads_running_checkpoint, kDumpWaitTimeout);if (timed_out) {// Avoid a recursive abort.LOG(ERROR) << "Unexpected time out during dump checkpoint.";std::list<Thread*> list = barrier_.GetThreadList(self);timeout_threads_.assign(list.begin(), list.end());{// abnormal dumpMutexLock mu(self, *Locks::logging_lock_);*os_ << " ------- " << timeout_threads_.size() << " threads dump checkpoint timed out --------\n\n";}for (const auto& thread : timeout_threads_) {bool contains = false;{MutexLock mu(self, *Locks::thread_list_lock_);std::list<Thread*> thread_list = Runtime::Current()->GetThreadList()->GetList();contains = find(thread_list.begin(), thread_list.end(), thread) != thread_list.end();}// 1. detached thread should have already passed the barrier// 2. only kRunnable thread have been set a checkpoint function// 3. non kRunnable thread is dumped by this thread, will not timeoutif (contains && thread->HasCheckpointFunction(this)) {thread->RunCheckpointFunction();}}}}
};

barrier.cc

bool Barrier::Increment(Thread* self, int delta, uint32_t timeout_ms) {MutexLock mu(self, lock_);SetCountLocked(self, count_ + delta);bool timed_out = false;if (count_ != 0) {uint32_t timeout_ns = 0;uint64_t abs_timeout = NanoTime() + MsToNs(timeout_ms);for (;;) {timed_out = condition_.TimedWait(self, timeout_ms, timeout_ns);if (timed_out || count_ == 0) return timed_out;// Compute time remaining on timeout.uint64_t now = NanoTime();int64_t time_left = abs_timeout - now;if (time_left <= 0) return true;timeout_ns = time_left % (1000*1000);timeout_ms = time_left / (1000*1000);}}return timed_out;
}

可以看到 Increment 在两种情况下会返回,timeout 或者 count_ == 0(即所有的线程都完成 dump)

由此,也可以看出 WaitForThreadsToRunThroughCheckpoint 方法的作用就是等待所有的线程都完成 dump,并且对于超时没有完成 dump 的情况进行一些特殊处理

2.3.6 RequestCheckpoint

thread.cc

bool Thread::RequestCheckpoint(Closure* function) {union StateAndFlags old_state_and_flags;old_state_and_flags.as_int = tls32_.state_and_flags.as_int;if (old_state_and_flags.as_struct.state != kRunnable) {return false;  // 1. Fail, thread is suspended and so can't run a checkpoint.}uint32_t available_checkpoint = kMaxCheckpoints;for (uint32_t i = 0 ; i < kMaxCheckpoints; ++i) {if (tlsPtr_.checkpoint_functions[i] == nullptr) {available_checkpoint = i;break;}}if (available_checkpoint == kMaxCheckpoints) {// 2. No checkpoint functions available, we can't run a checkpointreturn false;}// 3. 设置 checkpoint_functiontlsPtr_.checkpoint_functions[available_checkpoint] = function;// Checkpoint function installed now install flag bit.// We must be runnable to request a checkpoint.DCHECK_EQ(old_state_and_flags.as_struct.state, kRunnable);union StateAndFlags new_state_and_flags;new_state_and_flags.as_int = old_state_and_flags.as_int;new_state_and_flags.as_struct.flags |= kCheckpointRequest;bool success = tls32_.state_and_flags.as_atomic_int.CompareExchangeStrongSequentiallyConsistent(old_state_and_flags.as_int, new_state_and_flags.as_int);if (UNLIKELY(!success)) {// The thread changed state before the checkpoint was installed.CHECK_EQ(tlsPtr_.checkpoint_functions[available_checkpoint], function);tlsPtr_.checkpoint_functions[available_checkpoint] = nullptr;} else {CHECK_EQ(ReadFlag(kCheckpointRequest), true);TriggerSuspend();}return success;
}

这个方法实际上是针对 kRunnable 线程的,会对其设置 checkpoint_function,当线程运行到 checkpoint 的点时,会执行 checkpoint_functions 中的 function,在我们这种情况下会执行到 DumpCheckpoint 的 Run 方法。

2.4 总结

从上面的分析可以看出:

  • 进行 DumpForSigQuit 时,RunCheckpoint 是最主要的处理,其主要将线程分为 suspended 和 kRunnable 两种情况来对线程进行 dump:

    • 对于 suspended 状态的线程,会将其存在一个 suspended_count_modified_threads 结构中,后面会对 suspended_count_modified_threads 中的每个线程执行 DumpCheckpoint 的 Run 方法(即 dump);这种情况下,对每个 suspended 线程的 dump 运行在 “Signal Catcher” 线程中
    • 对于 kRunnable 状态的线程,会对其执行 RequestCheckpoint 操作,即对其设置 checkpoint_function,当线程运行到 checkpoint 的点时,会执行 checkpoint_functions 中的 function;可以看到这种情况下 dump 操作在各个线程中
  • 执行过 RunCheckpoint 方法后,会执行 checkpoint.WaitForThreadsToRunThroughCheckpoint(threads_running_checkpoint)
    • threads_running_checkpoint 是整个 thread_list 的 size,也就是需要 dump 的线程的数量
    • 每个线程在 Run 函数中 Dump thread 完成后,会通知 barrier_ 对其 count_ -1,当 count_ 为0时,说明所有线程已经完成 dump,同时把 thread_list_ 中完成 dump 的 thread 去掉;这里等待的就是 barrier_ 的 count 变为 0,如果超时未完成则会进行一些处理

SignalCatcher相关推荐

  1. Android ANR 分析

    首先贴一下trace 文件 Process: com.oppo.reader PID: 20358 Time: 2933175644_1545041895232 Flags: 0x38d83e44 P ...

  2. android 调试技巧

    1.查看当前堆栈 Call tree new Exception("print trace").printStackTrace(); (在logcat中打印当前函数调用关系) 2. ...

  3. java 安卓调试_【转】Android 调试技术

    一.JAVA层单步调试 二.Native层单步调试 三.JAVA层堆栈打印 1. 在指定的函数内打印相关java调用 Log.d(TAG,Log.getStackTraceString(new Thr ...

  4. Android日志[进阶篇]五-阅读错误报告

    Android日志[进阶篇]一-使用 Logcat 写入和查看日志 Android日志[进阶篇]二-分析堆栈轨迹(调试和外部堆栈) Android日志[进阶篇]三-Logcat命令行工具 Androi ...

  5. Android 系统(126)---Android的死机、重启问题分析方法

    Android的死机.重启问题分析方法 阅读数:11986 Android的死机.重启问题分析方法 1.     死机现象 1.1      死机定义 当手机长时间无法再被用户控制操作时,我们称为死机 ...

  6. Android 系统(115)---死机问题分析

    一.死机现象 1. 死机/phong hang 定义 当手机长时间无法再被用户控制操作时,我们称为死机或者hang 机. 在这里我们强调长时间,如果是短时间,我们归结为Perfomance 问题,即性 ...

  7. Android系统开发(3)---如何分析ANR Log的总结

    如何分析ANR Log的总结 引起ANR问题的根本原因,总的来说可以归纳为两类: 应用进程自身引起的,例如: 主线程阻塞.挂起.死循环 应用进程的其他线程的CPU占用率高,使得主线程无法抢占到CPU时 ...

  8. 微信Android客户端的ANR监控方案

    微信Android客户端的ANR监控方案 微信公众号,WeMobileDev 2021年7月19日发布的 微信Android客户端的ANR监控方案 该方案的所有代码已经在Matrix(https:// ...

  9. Android 12 Watchdog(4) Trace生成过程

    文章托管在gitee上 Android Notes , 同步csdn Trace生成流程 从第2篇可知,Watchdog的Trace生成过程如下: 当等待时间 >Max/2 , 即评估状态为 W ...

最新文章

  1. python程序填空程序改错_Python - class dummyclass(object): 改错
  2. 钰群的USB3.0采集,可以实现哪些采集卡方案?
  3. SpringMVC解决跨域的两种方案
  4. hystrix threadpool coresize_Hystrix断路器 - 求知若渴的蜗牛
  5. Magento 1.9.2 Unknown cipher in list: TLSv1
  6. MySQL数据库中如何使用rand随机查询记录
  7. (libgdx学习)Net的使用
  8. 前端实践(4)——表单验证(密码重复输入检查)
  9. Atitit WatchService 使用和不能监控抓取到的解决 原因是生成速度太快,但处理速度慢,导致许多event 忽视了.. How to solu??? asyn to process
  10. 寻找春天 九宫格日记-2013.01.12
  11. QQ大厅游戏 大家来找茬辅助
  12. 如何改善物流行业项目管理?
  13. 怎么提取视频中的音频?教你快速学会这三个方法
  14. 如何快速算出一个数有多少个因子(c++)
  15. java计算机毕业设计中医药科普网站源码+mysql数据库+系统+lw文档+部署
  16. 微信开发(3):微信公众号发现金红包功能开发,利用第三方SDK实现(Java)
  17. 飞思卡尔mc9s08烧录方法_飞思卡尔8位单片机MC9S08JM60开发板实践教程
  18. Navicat将表生成pdm文件
  19. 感谢这4个自学网站,坚持了一个月,让我从月薪3000涨到10000
  20. 通达OA系统myisam转innodb引擎

热门文章

  1. macOS 入门指南
  2. Android 面试系列(一)Android 基础
  3. NOI2018 冒泡排序规律证明
  4. 名创优品通过上市聆讯:寻求双重主要上市 年营收91亿
  5. 微信浏览器浏览,后台编辑器添加视频
  6. 于繁华中寻觅一份淡然(街灯上的藤蔓)
  7. C语言哈夫曼编码压缩解压
  8. 上海高一计算机奥赛,上海物理奥赛金牌“大神”爱番剧和高达,已保送清华大学姚班...
  9. 毕业设计 单片机指纹识别考勤系统 - 嵌入式 物联网
  10. CAD中如何调整对象的前后顺序、AUTOCAD——参照编辑如何使用