本文基于 Android 7.1

一、SignalCatcher 线程的启动

1.1 StartSignalCatcher

runtime.cc

void Runtime::InitNonZygoteOrPostFork(JNIEnv* env, bool is_system_server, NativeBridgeAction action, const char* isa) {...StartSignalCatcher();...
}

由上面可知，SignalCatcher 线程是在 InitNonZygoteOrPostFork 方法中启动的
runtime.cc

void Runtime::StartSignalCatcher() {if (!is_zygote_) {signal_catcher_ = new SignalCatcher(stack_trace_file_);}
}

如果不是 zygote 进程，则创建一个 SignalCatcher，由此也可以知道 zygote 进程中是没有 SignalCatcher 线程的，并且用 adb shell ps -t 可以进行确认

1.2 创建 SignalCatcher

1.2.1 SignalCatcher(…)

signal_catcher.cc

SignalCatcher::SignalCatcher(const std::string& stack_trace_file): stack_trace_file_(stack_trace_file),lock_("SignalCatcher lock"),cond_("SignalCatcher::cond_", lock_),thread_(nullptr) {SetHaltFlag(false);// Create a raw pthread; its start routine will attach to the runtime.CHECK_PTHREAD_CALL(pthread_create, (&pthread_, nullptr, &Run, this), "signal catcher thread");Thread* self = Thread::Current();MutexLock mu(self, lock_);while (thread_ == nullptr) {cond_.Wait(self);}
}

CHECK_PTHREAD_CALL(pthread_create, (&pthread_, nullptr, &Run, this), "signal catcher thread") 实际上会调用 pthread_create(&pthread_, nullptr, &Run, this) 即新创建一个线程，并调用 Run(this) 方法，pthread_ 会指向新创建的线程

1.2.2 SignalCatcher::Run

signal_catcher.cc

void* SignalCatcher::Run(void* arg) {SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);CHECK(signal_catcher != nullptr);Runtime* runtime = Runtime::Current();// 将当前线程 attach 到当前的 JavaVMCHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(),!runtime->IsAotCompiler()));Thread* self = Thread::Current();DCHECK_NE(self->GetState(), kRunnable);{MutexLock mu(self, signal_catcher->lock_);signal_catcher->thread_ = self;signal_catcher->cond_.Broadcast(self);}// Set up mask with signals we want to handle.SignalSet signals;signals.Add(SIGQUIT);signals.Add(SIGUSR1);while (true) {// 见 1.2.3int signal_number = signal_catcher->WaitForSignal(self, signals);if (signal_catcher->ShouldHalt()) {runtime->DetachCurrentThread();return nullptr;}switch (signal_number) {case SIGQUIT:signal_catcher->HandleSigQuit();break;case SIGUSR1:signal_catcher->HandleSigUsr1();break;default:LOG(ERROR) << "Unexpected signal %d" << signal_number;break;}}
}

由上可知，其会添加想要 sigwait() 的信号（SIGQUIT、SIGUSR1），并执行 WaitForSignal 等待信号的到来，然后对信号分类进行处理

1.2.3 WaitForSignal

signal_catcher.cc

int SignalCatcher::WaitForSignal(Thread* self, SignalSet& signals) {ScopedThreadStateChange tsc(self, kWaitingInMainSignalCatcherLoop);// Signals for sigwait() must be blocked but not ignored.  We// block signals like SIGQUIT for all threads, so the condition// is met.  When the signal hits, we wake up, without any signal// handlers being invoked.int signal_number = signals.Wait();if (!ShouldHalt()) {// Let the user know we got the signal, just in case the system's too screwed for us to// actually do what they want us to do...LOG(INFO) << *self << ": reacting to signal " << signal_number;// If anyone's holding locks (which might prevent us from getting back into state Runnable), say so...Runtime::Current()->DumpLockHolders(LOG(INFO));}return signal_number;
}

注意上面的注释：这里 wait 的 Signals 必须是 blocked，但不是 ignored 的. 因为对于所有线程我们将类似于 SIGQUIT 的信号都 block 了（见下一节），因此条件达成。当信号到来时，程序会唤醒，并且没有 signal handlers 被调用

1.2.4 BlockSignals

runtime.cc

bool Runtime::Init(RuntimeArgumentMap&& runtime_options_in) {...BlockSignals();...
}void Runtime::BlockSignals() {SignalSet signals;signals.Add(SIGPIPE);// SIGQUIT is used to dump the runtime's state (including stack traces).signals.Add(SIGQUIT);// SIGUSR1 is used to initiate a GC.signals.Add(SIGUSR1);signals.Block();
}

在虚拟机的创建过程中会将信号 block

二、HandleSigQuit

当收到 SIGQUIT 信号，即 signal 3 时，会调用 signal_catcher->HandleSigQuit() 来 dump 一些信息和 stack traces

2.1 SignalCatcher::HandleSigQuit

signal_catcher.cc

void SignalCatcher::HandleSigQuit() {Runtime* runtime = Runtime::Current();std::ostringstream os;// ----- pid 2830 at 2017-11-16 11:22:53 -----os << "\n"<< "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n";// Cmd line: system_serverDumpCmdLine(os);std::string fingerprint = runtime->GetFingerprint();// Build fingerprint: 'Xiaomi/cancro_wc_lte/cancro:6.0.1/MMB29M/1.1.1:user/test-keys'os << "Build fingerprint: '" << (fingerprint.empty() ? "unknown" : fingerprint) << "'\n";// ABI: 'arm'os << "ABI: '" << GetInstructionSetString(runtime->GetInstructionSet()) << "'\n";// Build type: optimizedos << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n";runtime->DumpForSigQuit(os);if ((false)) {std::string maps;if (ReadFileToString("/proc/self/maps", &maps)) {os << "/proc/self/maps:\n" << maps;}}// ----- end 2830 -----os << "----- end " << getpid() << " -----\n";Output(os.str());
}

2.2 DumpForSigQuit

runtime.cc

void Runtime::DumpForSigQuit(std::ostream& os) {// Zygote loaded classes=4188 post zygote classes=3570GetClassLinker()->DumpForSigQuit(os);// Intern table: 59686 strong; 10043 weakGetInternTable()->DumpForSigQuit(os);// JNI: CheckJNI is off; globals=1993 (plus 2995 weak)// Libraries: /system/lib/hw/gralloc.msm8974.so ...GetJavaVM()->DumpForSigQuit(os);oat_file_manager_->DumpForSigQuit(os);if (GetJit() != nullptr) {GetJit()->DumpForSigQuit(os);} else {os << "Running non JIT\n";}TrackedAllocators::Dump(os);os << "\n";thread_list_->DumpForSigQuit(os);BaseMutex::DumpAll(os);
}

thread_list_->DumpForSigQuit(os) 是关键的 dump，会 dump stack traces

2.3 thread_list_->DumpForSigQuit

2.3.1 ThreadList::DumpForSigQuit

thread_list.cc

void ThreadList::DumpForSigQuit(std::ostream& os) {{ScopedObjectAccess soa(Thread::Current());// Only print if we have samples.if (suspend_all_historam_.SampleSize() > 0) {Histogram<uint64_t>::CumulativeData data;suspend_all_historam_.CreateHistogram(&data);suspend_all_historam_.PrintConfidenceIntervals(os, 0.99, data);  // Dump time to suspend.}}bool dump_native_stack = Runtime::Current()->GetDumpNativeStackOnSigQuit();Dump(os, dump_native_stack);// dump 当前进程中没有 attach 的线程的 stack tracesDumpUnattachedThreads(os, dump_native_stack);
}

2.3.2 ThreadList::Dump

thread_list.cc

void ThreadList::Dump(std::ostream& os, bool dump_native_stack) {{MutexLock mu(Thread::Current(), *Locks::thread_list_lock_);os << "DALVIK THREADS (" << list_.size() << "):\n";}DumpCheckpoint checkpoint(&os, dump_native_stack);size_t threads_running_checkpoint;{// Use SOA to prevent deadlocks if multiple threads are calling Dump() at the same time.ScopedObjectAccess soa(Thread::Current());threads_running_checkpoint = RunCheckpoint(&checkpoint);}if (threads_running_checkpoint != 0) {checkpoint.WaitForThreadsToRunThroughCheckpoint(threads_running_checkpoint);}
}

由上面可以看到其创建了一个 DumpCheckpoint 对象 checkpoint，然后调用 RunCheckpoint(&checkpoint)，下面我们看一下 DumpCheckpoint 是什么

2.3.3 DumpCheckpoint

thread_list.cc

class DumpCheckpoint FINAL : public Closure {public:DumpCheckpoint(std::ostream* os, bool dump_native_stack): os_(os),barrier_(0),backtrace_map_(dump_native_stack ? BacktraceMap::Create(getpid()) : nullptr),dump_native_stack_(dump_native_stack) {}void Run(Thread* thread) OVERRIDE {Thread* self = Thread::Current();std::ostringstream local_os;{ScopedObjectAccess soa(self);// 1. dump traces 等if (!timeout_threads_.empty()&& find(timeout_threads_.begin(), timeout_threads_.end(), thread) != timeout_threads_.end()) {Thread::DumpState(local_os, thread, thread->GetTid(), true);} else {thread->Dump(local_os, dump_native_stack_, backtrace_map_.get());}}local_os << "\n";{// Use the logging lock to ensure serialization when writing to the common ostream.MutexLock mu(self, *Locks::logging_lock_);*os_ << local_os.str();}// 2. 每个线程在 Run 函数中 Dump thread 完成后，会通知 barrier_ 对其 count_ -1，当 count_ 为0时，说明所有线程已经完成 dump，同时把 thread_list_ 中完成 dump 的 thread 去掉 barrier_.Pass(self, thread);}private:// The common stream that will accumulate all the dumps.std::ostream* const os_;// The barrier to be passed through and for the requestor to wait upon.Barrier barrier_;// A backtrace map, so that all threads use a shared info and don't reacquire/parse separately.std::unique_ptr<BacktraceMap> backtrace_map_;// Whether we should dump the native stack.const bool dump_native_stack_;std::list<Thread*> timeout_threads_;
};

可以看到：

创建 DumpCheckpoint 对象时，仅仅是对一些成员变量进行赋值
DumpCheckpoint 的 Run 方法主要实现了两方面的功能：
- 其是真正执行 dump 信息的地方
- 每个线程在 Run 函数中 Dump thread 完成后，会通知 barrier_ 对其 count_ -1，当 count_ 为0时，说明所有线程已经完成 dump，同时把 thread_list_ 中完成 dump 的 thread 去掉

下面我们再来看一下 RunCheckpoint 做了什么

2.3.4 ThreadList::RunCheckpoint

thread_list.cc

size_t ThreadList::RunCheckpoint(Closure* checkpoint_function, bool isDumpCheckpoint) { // isDumpCheckpoint 默认为 falseThread* self = Thread::Current();Locks::mutator_lock_->AssertNotExclusiveHeld(self);Locks::thread_list_lock_->AssertNotHeld(self);Locks::thread_suspend_count_lock_->AssertNotHeld(self);std::vector<Thread*> suspended_count_modified_threads;size_t count = 0;{// Call a checkpoint function for each thread, threads which are suspend get their checkpoint// manually called.MutexLock mu(self, *Locks::thread_list_lock_);MutexLock mu2(self, *Locks::thread_suspend_count_lock_);if(isDumpCheckpoint) {((DumpCheckpoint *)checkpoint_function)->SetThreadList(self, list_);}// 1. 对于 list_ 中的 thread 分情况进行处理count = list_.size();for (const auto& thread : list_) {if (thread != self) {while (true) {if (thread->RequestCheckpoint(checkpoint_function)) {// This thread will run its checkpoint some time in the near future.break;} else {// 对于 suspended 线程，先 modify SuspendCount，然后将其加入 suspended_count_modified_threads 中，后面会继续对 suspended_count_modified_threads 进行处理if (thread->GetState() == kRunnable) {// Spurious fail, try again.continue;}thread->ModifySuspendCount(self, +1, nullptr, false);suspended_count_modified_threads.push_back(thread);break;}}}}}// Run the checkpoint on ourself while we wait for threads to suspend.// 2. 对于 Signal Catcher 线程，在这里执行 CheckPoint function 的 Run 函数调用，进行 Thread dumpcheckpoint_function->Run(self);// Run the checkpoint on the suspended threads.for (const auto& thread : suspended_count_modified_threads) {if (!thread->IsSuspended()) {if (ATRACE_ENABLED()) {std::ostringstream oss;thread->ShortDump(oss);ATRACE_BEGIN((std::string("Waiting for suspension of thread ") + oss.str()).c_str());}// Busy wait until the thread is suspended.const uint64_t start_time = NanoTime();do {ThreadSuspendSleep(kThreadSuspendInitialSleepUs);} while (!thread->IsSuspended());const uint64_t total_delay = NanoTime() - start_time;// Shouldn't need to wait for longer than 1000 microseconds.constexpr uint64_t kLongWaitThreshold = MsToNs(1);ATRACE_END();}// We know for sure that the thread is suspended at this point.// 3. 对于 suspended 线程，执行 checkpoint_function 的 Run 方法checkpoint_function->Run(thread);{MutexLock mu2(self, *Locks::thread_suspend_count_lock_);// 4. 对于已经 dump 的线程，将其 suspend count -1thread->ModifySuspendCount(self, -1, nullptr, false);}}{// 5. Imitate ResumeAll, threads may be waiting on Thread::resume_cond_ since we raised their// suspend count. Now the suspend_count_ is lowered so we must do the broadcast.MutexLock mu2(self, *Locks::thread_suspend_count_lock_);Thread::resume_cond_->Broadcast(self);}// return 的是 thread_list 的 sizereturn count;
}

2.3.5 WaitForThreadsToRunThroughCheckpoint

thread_list.cc

class DumpCheckpoint FINAL : public Closure {public:void WaitForThreadsToRunThroughCheckpoint(size_t threads_running_checkpoint) {Thread* self = Thread::Current();ThreadState new_state = kWaitingForCheckPointsToRun;if(Locks::abort_lock_->IsExclusiveHeld(self) && self->GetState() == kRunnable) {new_state = kRunnable;}ScopedThreadStateChange tsc(self, new_state);bool timed_out = barrier_.Increment(self, threads_running_checkpoint, kDumpWaitTimeout);if (timed_out) {// Avoid a recursive abort.LOG(ERROR) << "Unexpected time out during dump checkpoint.";std::list<Thread*> list = barrier_.GetThreadList(self);timeout_threads_.assign(list.begin(), list.end());{// abnormal dumpMutexLock mu(self, *Locks::logging_lock_);*os_ << " ------- " << timeout_threads_.size() << " threads dump checkpoint timed out --------\n\n";}for (const auto& thread : timeout_threads_) {bool contains = false;{MutexLock mu(self, *Locks::thread_list_lock_);std::list<Thread*> thread_list = Runtime::Current()->GetThreadList()->GetList();contains = find(thread_list.begin(), thread_list.end(), thread) != thread_list.end();}// 1. detached thread should have already passed the barrier// 2. only kRunnable thread have been set a checkpoint function// 3. non kRunnable thread is dumped by this thread, will not timeoutif (contains && thread->HasCheckpointFunction(this)) {thread->RunCheckpointFunction();}}}}
};

barrier.cc

bool Barrier::Increment(Thread* self, int delta, uint32_t timeout_ms) {MutexLock mu(self, lock_);SetCountLocked(self, count_ + delta);bool timed_out = false;if (count_ != 0) {uint32_t timeout_ns = 0;uint64_t abs_timeout = NanoTime() + MsToNs(timeout_ms);for (;;) {timed_out = condition_.TimedWait(self, timeout_ms, timeout_ns);if (timed_out || count_ == 0) return timed_out;// Compute time remaining on timeout.uint64_t now = NanoTime();int64_t time_left = abs_timeout - now;if (time_left <= 0) return true;timeout_ns = time_left % (1000*1000);timeout_ms = time_left / (1000*1000);}}return timed_out;
}

可以看到 Increment 在两种情况下会返回，timeout 或者 count_ == 0（即所有的线程都完成 dump）

由此，也可以看出 WaitForThreadsToRunThroughCheckpoint 方法的作用就是等待所有的线程都完成 dump，并且对于超时没有完成 dump 的情况进行一些特殊处理

2.3.6 RequestCheckpoint

thread.cc

bool Thread::RequestCheckpoint(Closure* function) {union StateAndFlags old_state_and_flags;old_state_and_flags.as_int = tls32_.state_and_flags.as_int;if (old_state_and_flags.as_struct.state != kRunnable) {return false;  // 1. Fail, thread is suspended and so can't run a checkpoint.}uint32_t available_checkpoint = kMaxCheckpoints;for (uint32_t i = 0 ; i < kMaxCheckpoints; ++i) {if (tlsPtr_.checkpoint_functions[i] == nullptr) {available_checkpoint = i;break;}}if (available_checkpoint == kMaxCheckpoints) {// 2. No checkpoint functions available, we can't run a checkpointreturn false;}// 3. 设置 checkpoint_functiontlsPtr_.checkpoint_functions[available_checkpoint] = function;// Checkpoint function installed now install flag bit.// We must be runnable to request a checkpoint.DCHECK_EQ(old_state_and_flags.as_struct.state, kRunnable);union StateAndFlags new_state_and_flags;new_state_and_flags.as_int = old_state_and_flags.as_int;new_state_and_flags.as_struct.flags |= kCheckpointRequest;bool success = tls32_.state_and_flags.as_atomic_int.CompareExchangeStrongSequentiallyConsistent(old_state_and_flags.as_int, new_state_and_flags.as_int);if (UNLIKELY(!success)) {// The thread changed state before the checkpoint was installed.CHECK_EQ(tlsPtr_.checkpoint_functions[available_checkpoint], function);tlsPtr_.checkpoint_functions[available_checkpoint] = nullptr;} else {CHECK_EQ(ReadFlag(kCheckpointRequest), true);TriggerSuspend();}return success;
}

这个方法实际上是针对 kRunnable 线程的，会对其设置 checkpoint_function，当线程运行到 checkpoint 的点时，会执行 checkpoint_functions 中的 function，在我们这种情况下会执行到 DumpCheckpoint 的 Run 方法。

2.4 总结

从上面的分析可以看出：

进行 DumpForSigQuit 时，RunCheckpoint 是最主要的处理，其主要将线程分为 suspended 和 kRunnable 两种情况来对线程进行 dump：
- 对于 suspended 状态的线程，会将其存在一个 suspended_count_modified_threads 结构中，后面会对 suspended_count_modified_threads 中的每个线程执行 DumpCheckpoint 的 Run 方法（即 dump）；这种情况下，对每个 suspended 线程的 dump 运行在 “Signal Catcher” 线程中
- 对于 kRunnable 状态的线程，会对其执行 RequestCheckpoint 操作，即对其设置 checkpoint_function，当线程运行到 checkpoint 的点时，会执行 checkpoint_functions 中的 function；可以看到这种情况下 dump 操作在各个线程中
执行过 RunCheckpoint 方法后，会执行 checkpoint.WaitForThreadsToRunThroughCheckpoint(threads_running_checkpoint)
- threads_running_checkpoint 是整个 thread_list 的 size，也就是需要 dump 的线程的数量
- 每个线程在 Run 函数中 Dump thread 完成后，会通知 barrier_ 对其 count_ -1，当 count_ 为0时，说明所有线程已经完成 dump，同时把 thread_list_ 中完成 dump 的 thread 去掉；这里等待的就是 barrier_ 的 count 变为 0，如果超时未完成则会进行一些处理

SignalCatcher相关推荐

Android ANR 分析
首先贴一下trace 文件 Process: com.oppo.reader PID: 20358 Time: 2933175644_1545041895232 Flags: 0x38d83e44 P ...
android 调试技巧
1.查看当前堆栈 Call tree new Exception("print trace").printStackTrace(); (在logcat中打印当前函数调用关系) 2. ...
java 安卓调试_【转】Android 调试技术
一.JAVA层单步调试二.Native层单步调试三.JAVA层堆栈打印 1. 在指定的函数内打印相关java调用 Log.d(TAG,Log.getStackTraceString(new Thr ...
Android日志[进阶篇]五-阅读错误报告
Android日志[进阶篇]一-使用 Logcat 写入和查看日志 Android日志[进阶篇]二-分析堆栈轨迹(调试和外部堆栈) Android日志[进阶篇]三-Logcat命令行工具 Androi ...
Android 系统（126）---Android的死机、重启问题分析方法
Android的死机.重启问题分析方法阅读数:11986 Android的死机.重启问题分析方法 1. 死机现象 1.1 死机定义当手机长时间无法再被用户控制操作时,我们称为死机 ...
Android 系统（115）---死机问题分析
一.死机现象 1. 死机/phong hang 定义当手机长时间无法再被用户控制操作时,我们称为死机或者hang 机. 在这里我们强调长时间,如果是短时间,我们归结为Perfomance 问题,即性 ...
Android系统开发（3）---如何分析ANR Log的总结
如何分析ANR Log的总结引起ANR问题的根本原因,总的来说可以归纳为两类: 应用进程自身引起的,例如: 主线程阻塞.挂起.死循环应用进程的其他线程的CPU占用率高,使得主线程无法抢占到CPU时 ...
微信Android客户端的ANR监控方案
微信Android客户端的ANR监控方案微信公众号,WeMobileDev 2021年7月19日发布的微信Android客户端的ANR监控方案该方案的所有代码已经在Matrix(https:// ...
Android 12 Watchdog(4) Trace生成过程
文章托管在gitee上 Android Notes , 同步csdn Trace生成流程从第2篇可知,Watchdog的Trace生成过程如下: 当等待时间 >Max/2 , 即评估状态为 W ...

SignalCatcher

一、SignalCatcher 线程的启动

1.1 StartSignalCatcher

1.2 创建 SignalCatcher

1.2.1 SignalCatcher(…)

1.2.2 SignalCatcher::Run

1.2.3 WaitForSignal

1.2.4 BlockSignals

二、HandleSigQuit

2.1 SignalCatcher::HandleSigQuit

2.2 DumpForSigQuit

2.3 thread_list_->DumpForSigQuit

2.3.1 ThreadList::DumpForSigQuit

2.3.2 ThreadList::Dump

2.3.3 DumpCheckpoint

2.3.4 ThreadList::RunCheckpoint

2.3.5 WaitForThreadsToRunThroughCheckpoint

2.3.6 RequestCheckpoint

2.4 总结

SignalCatcher相关推荐

最新文章

热门文章