测试

根据 Thread.setDefaultUncaughtExceptionHandler 的方法文档

Uncaught exception handling is controlled first by the thread, then by the thread's ThreadGroup object and finally by the default uncaught exception handler. If the thread does not have an explicit uncaught exception handler set, and the thread's thread group (including parent thread groups) does not specialize its uncaughtException method, then the default handler's uncaughtException method will be invoked.

public void uncaughtException(Thread t, Throwable e) {
    if (parent != null) {
        parent.uncaughtException(t, e);
    } else {
        Thread.UncaughtExceptionHandler ueh =
            Thread.getDefaultUncaughtExceptionHandler();
        if (ueh != null) {
            ueh.uncaughtException(t, e);
        } else if (!(e instanceof ThreadDeath)) {
            System.err.print("Exception in thread \\\\""
                             + t.getName() + "\\\\" ");
            e.printStackTrace(System.err);
        }
    }
}

当发生 Uncaught Exception 时,将会按照 Thread.uncaughtExceptionHandler -> ThreadGroup.uncaughtException -> Thread.defaultUncaughtExceptionHandler 的优先级次序去寻找异常处理器

而 ThreadGroup.uncaughtException 的默认实现仅仅是像事件冒泡那样把异常往上传递,跑到 root ThreadGroup 后中止冒泡并交由 DefaultUncaughtExceptionHandler 处理 or 打印至标准错误流,所以可以把 ThreadGroup.uncaughtException 当作透明的层忽略之

下面仅考虑 Thread.uncaughtExceptionHandler、DefaultUncaughtExceptionHandler 和 主线程 三个因素

主线程

Thread.uncaughtExceptionHandler/DefaultUncaughtExceptionHandler 可以捕获异常,但无法改变 app 被 blocked 住,然后出现 ANR(即使点击 等待 依然被 blocked 住),点击 确定 后被 kill 的命运

2021-06-12 11:13:29.464 28055-28055/com.example.myapplication D/AndroidRuntime: Shutting down VM

    --------- beginning of crash
2021-06-12 11:13:29.465 28055-28055/com.example.myapplication E/AndroidRuntime: FATAL EXCEPTION: main
    Process: com.example.myapplication, PID: 28055
    java.lang.RuntimeException: Test Exception
        at com.example.myapplication.MainActivityKt.throwException(MainActivity.kt:58)
        at com.example.myapplication.MainActivity.onCreate$lambda-2(MainActivity.kt:30)
        at com.example.myapplication.MainActivity.lambda$hUPUYmntOngyO5ji3KzmjKQ19D4(Unknown Source:0)
        at com.example.myapplication.-$$Lambda$MainActivity$hUPUYmntOngyO5ji3KzmjKQ19D4.onClick(Unknown Source:0)
        at android.view.View.performClick(View.java:7509)
        at com.google.android.material.button.MaterialButton.performClick(MaterialButton.java:1119)
        at android.view.View.performClickInternal(View.java:7486)
        at android.view.View.access$3600(View.java:841)
        at android.view.View$PerformClick.run(View.java:28709)
        at android.os.Handler.handleCallback(Handler.java:938)
        at android.os.Handler.dispatchMessage(Handler.java:99)
        at android.os.Looper.loop(Looper.java:236)
        at android.app.ActivityThread.main(ActivityThread.java:8061)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:656)
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:967)

// 如果 Thread.uncaughtExceptionHandler != null or DefaultUncaughtExceptionHandler != null 则能够在自己的 UncaughtExceptionHandler 里捕获异常
2021-06-12 11:13:29.471 28055-28055/com.example.myapplication E/cyrus: main-2 UncaughtExceptionHandler: Test Exception
    java.lang.RuntimeException: Test Exception
        at com.example.myapplication.MainActivityKt.throwException(MainActivity.kt:58)
        at com.example.myapplication.MainActivity.onCreate$lambda-2(MainActivity.kt:30)
        at com.example.myapplication.MainActivity.lambda$hUPUYmntOngyO5ji3KzmjKQ19D4(Unknown Source:0)
        at com.example.myapplication.-$$Lambda$MainActivity$hUPUYmntOngyO5ji3KzmjKQ19D4.onClick(Unknown Source:0)
        at android.view.View.performClick(View.java:7509)
        at com.google.android.material.button.MaterialButton.performClick(MaterialButton.java:1119)
        at android.view.View.performClickInternal(View.java:7486)
        at android.view.View.access$3600(View.java:841)
        at android.view.View$PerformClick.run(View.java:28709)
        at android.os.Handler.handleCallback(Handler.java:938)
        at android.os.Handler.dispatchMessage(Handler.java:99)
        at android.os.Looper.loop(Looper.java:236)
        at android.app.ActivityThread.main(ActivityThread.java:8061)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:656)
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:967)

// app 没有立刻崩溃,但是进入 blocked 状态,点击 app 没响应然后触发 ANR
2021-06-12 11:17:10.449 1764-28190/? I/ActivityManager: Collecting stacks for pid 28055
2021-06-12 11:17:10.449 1764-28190/? I/system_server: libdebuggerd_client: started dumping process 28055
2021-06-12 11:17:10.450 697-697/? I/tombstoned: registered intercept for pid 28055 and type kDebuggerdJavaBacktrace
2021-06-12 11:17:10.450 28055-28065/com.example.myapplication I/e.myapplicatio: Thread[6,tid=28065,WaitingInMainSignalCatcherLoop,Thread*=0xb400007271841000,peer=0x13780260,"Signal Catcher"]: reacting to signal 3
2021-06-12 11:17:10.528 697-697/? I/tombstoned: received crash request for pid 28055
2021-06-12 11:17:10.528 697-697/? I/tombstoned: found intercept fd 512 for pid 28055 and type kDebuggerdJavaBacktrace
2021-06-12 11:17:10.528 28055-28065/com.example.myapplication I/e.myapplicatio: Wrote stack traces to tombstoned
2021-06-12 11:17:10.529 1764-28190/? I/system_server: libdebuggerd_client: done dumping process 28055

// 需要一段时间来 dump ANR 信息
2021-06-12 11:17:15.839 1764-28190/? E/ActivityManager: ANR in com.example.myapplication (com.example.myapplication/.MainActivity)
    PID: 28055
    Reason: Input dispatching timed out (com.example.myapplication/com.example.myapplication.MainActivity, 7fc1f1 com.example.myapplication/com.example.myapplication.MainActivity (server) is not responding. Waited 8001ms for MotionEvent(action=DOWN))
    Parent: com.example.myapplication/.MainActivity
    Load: 0.13 / 0.47 / 0.83
    ----- Output from /proc/pressure/memory -----
    some avg10=0.00 avg60=0.00 avg300=0.00 total=15206960
    full avg10=0.00 avg60=0.00 avg300=0.00 total=5127529
    ----- End output from /proc/pressure/memory -----

    CPU usage from 0ms to 5796ms later (2021-06-12 11:17:09.982 to 2021-06-12 11:17:15.778):
      0.1% 1465/media.codec: 0% user + 0% kernel / faults: 38284 minor
      18% 1764/system_server: 8.2% user + 10% kernel / faults: 4408 minor
      0% 1509/media.swcodec: 0% user + 0% kernel / faults: 21804 minor
      0% 899/media.hwcodec: 0% user + 0% kernel / faults: 7313 minor
      0.1% 5899/com.sohu.inputmethod.sogou: 0.1% user + 0% kernel / faults: 1684 minor
      2% 895/kworker/u16:16: 0% user + 2% kernel
      2% 2930/com.android.phone: 1.2% user + 0.8% kernel / faults: 1835 minor
      1.8% 1284/adbd: 0.5% user + 1.3% kernel
      0% 1426/media.extractor: 0% user + 0% kernel / faults: 3197 minor
      1.8% 25102/kworker/u16:5: 0% user + 1.8% kernel
      1.5% 982/surfaceflinger: 0.1% user + 1.3% kernel / faults: 41 minor
      0% 7493/kworker/u16:14: 0% user + 0% kernel
      0% 28055/com.example.myapplication: 0% user + 0% kernel / faults: 1974 minor
      1.2% 5242/com.android.nfc: 1% user + 0.1% kernel / faults: 796 minor
      1% 18959/com.viomi.fridge.vertical: 0.8% user + 0.1% kernel / faults: 17 minor
      1% 28105/kworker/u16:0: 0% user + 1% kernel
      0.8% 584/logd: 0.3% user + 0.5% kernel
      0.8% 836/[email protected]: 0.3% user + 0.5% kernel / faults: 115 minor
      0.8% 1359/cnss_diag: 0.6% user + 0.1% kernel
      0.8% 25780/com.xiaomi.market: 0.6% user + 0.1% kernel / faults: 736 minor 2 major
      0.6% 806/[email protected]: 0% user + 0.6% kernel / faults: 238 minor 2 major
      0.6% 873/[email protected]: 0.3% user + 0.3% kernel / faults: 33 minor
      0% 1443/mediaserver: 0% user + 0% kernel / faults: 70 minor
      0.6% 16518/com.miui.player: 0.1% user + 0.5% kernel
      0% 1/init: 0% user + 0% kernel
      0% 697/tombstoned: 0% user + 0% kernel
      0% 799/[email protected]_64: 0% user + 0% kernel / faults: 26 minor
      0% 949/audioserver: 0% user + 0% kernel / faults: 56 minor
      0.5% 20092/kworker/u17:0: 0% user + 0.5% kernel
      0.5% 28100/logcat: 0% user + 0.5% kernel
      0.3% 9/rcu_preempt: 0% user + 0.3% kernel
      0.3% 495/crtc_commit:131: 0% user + 0.3% kernel
      0.3% 534/irq/303-fts: 0% user + 0.3% kernel
      0.3% 704/statsd: 0.1% user + 0.1% kernel / faults: 27 minor
      0.3% 705/netd: 0.1% user + 0.1% kernel / faults: 62 minor
      0% 1367/drmserver: 0% user + 0% kernel / faults: 16 minor
      0.3% 2635/com.android.systemui: 0.3% user + 0% kernel / faults: 27 minor
      0.3% 3547/irq/32-90b6400.: 0% user + 0.3% kernel
      0.3% 6938/kworker/u17:2: 0% user + 0.3% kernel
      0.3% 8546/com.tencent.mm:toolsmp: 0.1% user + 0.1% kernel / faults: 5 minor
      0.3% 13715/com.tencent.mm: 0.1% user + 0.1% kernel / faults: 5 minor
      0.1% 10/rcu_sched: 0% user + 0.1% kernel
      0.1% 12/rcuop/0: 0% user + 0.1% kernel
      0% 13/rcuos/0: 0% user + 0% kernel
      0.1% 30/rcuop/2: 0% user + 0.1% kernel
      0% 31/rcuos/2: 0% user + 0% kernel
      0.1% 38/rcuop/3: 0% user + 0.1% kernel
      0% 66/migration/7: 0% user + 0% kernel
      0.1% 586/servicemanager: 0.1% user + 0% kernel
      0.1% 598/[email protected]: 0% user + 0.1% kernel / faults: 11 minor
      0.1% 628/vold: 0% user + 0.1% kernel / faults: 29 minor
      0.1% 664/ipacm: 0% user + 0.1% kernel
      0.1% 676/jbd2/sda31-8: 0% user + 0.1% kernel
      0% 793/android.hardware.audio.service: 0% user + 0% kernel / faults: 39 minor
      0% 798/[email protected]: 0% user + 0% kernel / faults: 11 minor
2021-06-12 11:17:15.839 1764-28190/? E/ActivityManager:   0.1% 807/[email protected]: 0% user + 0.1% kernel / faults: 9 minor
      0% 828/[email protected]: 0% user + 0% kernel / faults: 65 minor
      0.1% 851/[email protected]: 0.1% user + 0% kernel
      0% 1365/cameraserver: 0% user + 0% kernel / faults: 70 minor
      0% 1440/media.metrics: 0% user + 0% kernel / faults: 36 minor 1 major
      0% 1572/gatekeeperd: 0% user + 0% kernel / faults: 28 minor 7 major
      0% 1605/[email protected]: 0% user + 0% kernel / faults: 16 minor
      0.1% 2251/cds_ol_rx_threa: 0% user + 0.1% kernel
      0.1% 2884/com.qualcomm.qti.devicestatisticsservice: 0.1% user + 0% kernel / faults: 1 minor
      0.1% 3549/irq/33-90cd000.: 0% user + 0.1% kernel
      0.1% 6156/com.xiaomi.xmsf: 0.1% user + 0% kernel / faults: 27 minor
      0.1% 8036/com.tencent.mm:appbrand0: 0.1% user + 0% kernel / faults: 6 minor
      0.1% 8047/com.tencent.mm:appbrand1: 0% user + 0.1% kernel / faults: 6 minor
      0.1% 8864/com.xiaomi.joyose: 0.1% user + 0% kernel
      0.1% 14440/com.tencent.mm:push: 0% user + 0.1% kernel / faults: 8 minor
      0.1% 15807/com.miui.personalassistant: 0% user + 0.1% kernel / faults: 9 minor
      0.1% 27963/kworker/2:3: 0% user + 0.1% kernel
      0.1% 28106/kworker/u16:3: 0% user + 0.1% kernel
      0.1% 28171/kworker/0:1: 0% user + 0.1% kernel
    14% TOTAL: 7.1% user + 6.5% kernel + 0.1% iowait + 0.5% irq + 0.2% softirq
    CPU usage from 41ms to 439ms later (2021-06-12 11:17:10.023 to 2021-06-12 11:17:10.422) with 99% awake:
      51% 1764/system_server: 18% user + 33% kernel / faults: 929 minor
        39% 28190/AnrConsumer: 9% user + 30% kernel
        6% 1789/android.ui: 3% user + 3% kernel
        3% 2347/InputDispatcher: 3% user + 0% kernel
      2.5% 66/migration/7: 0% user + 2.5% kernel
      2.6% 534/irq/303-fts: 0% user + 2.6% kernel
      2.6% 584/logd: 2.6% user + 0% kernel
      2.7% 873/[email protected]: 0% user + 2.7% kernel / faults: 6 minor
        2.7% 873/[email protected]: 0% user + 2.7% kernel
       +0% 28191/POSIX timer 269: 0% user + 0% kernel
       +0% 28192/POSIX timer 269: 0% user + 0% kernel
      2.7% 895/kworker/u16:16: 0% user + 2.7% kernel
      2.8% 982/surfaceflinger: 2.8% user + 0% kernel
      2.8% 1284/adbd: 0% user + 2.8% kernel
    8.9% TOTAL: 2.8% user + 5.7% kernel + 0.3% irq

// 点击 ANR 对话框的确定按钮杀死 app
2021-06-12 11:18:58.935 1764-1789/? I/ActivityManager: Killing 28055:com.example.myapplication/u0a161 (adj 0): user request after error
2021-06-12 11:18:58.937 1764-1789/? I/Process: PerfMonitor : current process sending signal quiet. PID: 28055 SIG: 9
2021-06-12 11:18:58.938 1764-1802/? I/Process: PerfMonitor : current process killing process group. PID: 28055
2021-06-12 11:18:58.965 706-706/? I/Zygote: Process 28055 exited due to signal 9 (Killed)
2021-06-12 11:18:58.966 1764-1802/? I/libprocessgroup: Successfully killed process cgroup uid 10161 pid 28055 in 28ms
2021-06-12 11:18:58.970 882-882/? I/[email protected]: killProcess is called for pid : 28055
2021-06-12 11:18:58.970 1764-5448/? W/ANRStateManager: clear state, but process isn't exist. hash=92035998 uid=10161 pid=28055 state=16

子线程且 UEH != null

app 没有发生 ANR 也没有崩溃,且无论 DefaultUncaughtExceptionHandler 是否为 null,Thread.uncaughtExceptionHandler 都能够有限捕获异常,说明线程的 UncaughtExceptionHandler 比默认的 UncaughtExceptionHandler 优先级要高

2021-06-12 17:20:41.503 20615-22015/com.example.myapplication E/AndroidRuntime: FATAL EXCEPTION: Thread-10
    Process: com.example.myapplication, PID: 20615
    java.lang.RuntimeException: Test Exception
        at com.example.myapplication.MainActivityKt.throwException(MainActivity.kt:75)
        at com.example.myapplication.MainActivity$onCreate$4$1.invoke(MainActivity.kt:38)
        at com.example.myapplication.MainActivity$onCreate$4$1.invoke(MainActivity.kt:36)
        at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

2021-06-12 17:20:41.509 20615-22015/com.example.myapplication E/cyrus: Thread-10-3890 UncaughtExceptionHandler: Test Exception
    java.lang.RuntimeException: Test Exception
        at com.example.myapplication.MainActivityKt.throwException(MainActivity.kt:75)
        at com.example.myapplication.MainActivity$onCreate$4$1.invoke(MainActivity.kt:38)
        at com.example.myapplication.MainActivity$onCreate$4$1.invoke(MainActivity.kt:36)
        at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

两个 UEH 都置空 or DUEH != null

app 没有发生 ANR 也没有崩溃

    --------- beginning of crash
2021-06-12 11:06:58.817 27816-27929/com.example.myapplication E/AndroidRuntime: FATAL EXCEPTION: Thread-3
    Process: com.example.myapplication, PID: 27816
    java.lang.RuntimeException: Test Exception
        at com.example.myapplication.MainActivityKt.throwException(MainActivity.kt:58)
        at com.example.myapplication.MainActivity$onCreate$2$1.invoke(MainActivity.kt:24)
        at com.example.myapplication.MainActivity$onCreate$2$1.invoke(MainActivity.kt:22)
        at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

// DefaultUncaughtExceptionHandler != null 则能够捕获异常
2021-06-12 17:03:51.701 20615-20884/com.example.myapplication E/cyrus: Thread-2-3882 DefaultUncaughtExceptionHandler: Test Exception
    java.lang.RuntimeException: Test Exception
        at com.example.myapplication.MainActivityKt.throwException(MainActivity.kt:75)
        at com.example.myapplication.MainActivity$onCreate$2$1.invoke(MainActivity.kt:24)
        at com.example.myapplication.MainActivity$onCreate$2$1.invoke(MainActivity.kt:22)
        at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

// 当 DefaultUncaughtExceptionHandler == null 时异常被输出到标准异常流
2021-06-12 17:13:54.170 20615-21405/com.example.myapplication W/System.err: Exception in thread "Thread-8" java.lang.RuntimeException: Test Exception
2021-06-12 17:13:54.171 20615-21405/com.example.myapplication W/System.err:     at com.example.myapplication.MainActivityKt.throwException(MainActivity.kt:75)
2021-06-12 17:13:54.171 20615-21405/com.example.myapplication W/System.err:     at com.example.myapplication.MainActivity$onCreate$1$1.invoke(MainActivity.kt:17)
2021-06-12 17:13:54.171 20615-21405/com.example.myapplication W/System.err:     at com.example.myapplication.MainActivity$onCreate$1$1.invoke(MainActivity.kt:15)
2021-06-12 17:13:54.171 20615-21405/com.example.myapplication W/System.err:     at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

子线程且为默认 DUEH

默认的 DefaultUncaughtExceptionHandler 是 KillApplicationHandler,它会杀死 app

// 除了有上一节的日志外,还会有以下日志并且 app 被 kill
// app process 收到信号 SIGKILL(9) 被迫退出,app 立刻崩溃掉
2021-06-12 11:06:58.870 27816-27929/com.example.myapplication I/Process: Sending signal. PID: 27816 SIG: 9
2021-06-12 11:06:58.910 1764-1802/? I/Process: PerfMonitor : current process killing process group. PID: 27816
2021-06-12 11:06:58.911 1764-2936/? I/ActivityManager: Process com.example.myapplication (pid 27816) has died: prcp CRE
2021-06-12 11:06:58.911 706-706/? I/Zygote: Process 27816 exited due to signal 9 (Killed)
2021-06-12 11:06:58.912 1764-1802/? I/libprocessgroup: Successfully killed process cgroup uid 10161 pid 27816 in 0ms
2021-06-12 11:06:58.917 1764-2936/? W/ANRStateManager: clear state, but process isn't exist. hash=224220282 uid=10161 pid=27816 state=16
2021-06-12 11:06:58.919 882-8554/? I/[email protected]: killProcess is called for pid : 27816

总结

所在线程 子条件 结果
主线程 始终会被 blocked 住,然后发生 ANR,最后被杀死
子线程 两个 UncaughtExceptionHandler 都置空
有任意一个自定义的 UncaughtExceptionHandler app 没事
默认 KillApplicationHandler 捕获到异常并杀死 app
// art/runtime/jni/jni_internal.cc
static jint Throw(JNIEnv* env, jthrowable java_exception) {
  ScopedObjectAccess soa(env);
  ObjPtr<mirror::Throwable> exception = soa.Decode<mirror::Throwable>(java_exception);
  if (exception == nullptr) {
    return JNI_ERR;
  }
  soa.Self()->SetException(exception);
  return JNI_OK;
}

// art/runtime/thread.cc
void Thread::SetException(ObjPtr<mirror::Throwable> new_exception) {
  CHECK(new_exception != nullptr);
  // TODO: DCHECK(!IsExceptionPending());
  tlsPtr_.exception = new_exception.Ptr();
}

// art/runtime/thread.h
mirror::Throwable* exception;   // The pending exception or null.

代码跟踪抛出 UE 时发生了什么

有一个 API 可以抛出异常:JNIEnv->Throw,所以我猜当 java 层发生 uncaught exception 时相当于调用了它

这个方法的实现很简单,就是把 exception 记录在 Thread::tlsPtr_::exception

Thread.start()
Thread.nativeCreate()

// java_lang_Thread.cc
static void Thread_nativeCreate(JNIEnv* env, jclass, jobject java_thread, jlong stack_size,
                                jboolean daemon) {
  // There are sections in the zygote that forbid thread creation.
  Runtime* runtime = Runtime::Current();
  if (runtime->IsZygote() && runtime->IsZygoteNoThreadSection()) {
    jclass internal_error = env->FindClass("java/lang/InternalError");
    CHECK(internal_error != nullptr);
    env->ThrowNew(internal_error, "Cannot create threads in zygote");
    return;
  }

  Thread::CreateNativeThread(env, java_thread, stack_size, daemon == JNI_TRUE);
}

// art/runtime/thread.cc
// 最终调用 pthread_create 创建线程,新线程的入口是 Thread::CreateCallback
void Thread::CreateNativeThread(JNIEnv* env, jobject java_peer, size_t stack_size, bool is_daemon) {
  CHECK(java_peer != nullptr);
  Thread* self = static_cast<JNIEnvExt*>(env)->GetSelf();

  if (VLOG_IS_ON(threads)) {
    ScopedObjectAccess soa(env);

    ArtField* f = jni::DecodeArtField(WellKnownClasses::java_lang_Thread_name);
    ObjPtr<mirror::String> java_name =
        f->GetObject(soa.Decode<mirror::Object>(java_peer))->AsString();
    std::string thread_name;
    if (java_name != nullptr) {
      thread_name = java_name->ToModifiedUtf8();
    } else {
      thread_name = "(Unnamed)";
    }

    VLOG(threads) << "Creating native thread for " << thread_name;
    self->Dump(LOG_STREAM(INFO));
  }

  Runtime* runtime = Runtime::Current();

  // Atomically start the birth of the thread ensuring the runtime isn't shutting down.
  bool thread_start_during_shutdown = false;
  {
    MutexLock mu(self, *Locks::runtime_shutdown_lock_);
    if (runtime->IsShuttingDownLocked()) {
      thread_start_during_shutdown = true;
    } else {
      runtime->StartThreadBirth();
    }
  }
  if (thread_start_during_shutdown) {
    ScopedLocalRef<jclass> error_class(env, env->FindClass("java/lang/InternalError"));
    env->ThrowNew(error_class.get(), "Thread starting during runtime shutdown");
    return;
  }

  Thread* child_thread = new Thread(is_daemon);
  // Use global JNI ref to hold peer live while child thread starts.
  child_thread->tlsPtr_.jpeer = env->NewGlobalRef(java_peer);
  stack_size = FixStackSize(stack_size);

  // Thread.start is synchronized, so we know that nativePeer is 0, and know that we're not racing
  // to assign it.
  env->SetLongField(java_peer, WellKnownClasses::java_lang_Thread_nativePeer,
                    reinterpret_cast<jlong>(child_thread));

  // Try to allocate a JNIEnvExt for the thread. We do this here as we might be out of memory and
  // do not have a good way to report this on the child's side.
  std::string error_msg;
  std::unique_ptr<JNIEnvExt> child_jni_env_ext(
      JNIEnvExt::Create(child_thread, Runtime::Current()->GetJavaVM(), &error_msg));

  int pthread_create_result = 0;
  if (child_jni_env_ext.get() != nullptr) {
    pthread_t new_pthread;
    pthread_attr_t attr;
    child_thread->tlsPtr_.tmp_jni_env = child_jni_env_ext.get();
    CHECK_PTHREAD_CALL(pthread_attr_init, (&attr), "new thread");
    CHECK_PTHREAD_CALL(pthread_attr_setdetachstate, (&attr, PTHREAD_CREATE_DETACHED),
                       "PTHREAD_CREATE_DETACHED");
    CHECK_PTHREAD_CALL(pthread_attr_setstacksize, (&attr, stack_size), stack_size);
    pthread_create_result = pthread_create(&new_pthread,
                                           &attr,
                                           Thread::CreateCallback,
                                           child_thread);
    CHECK_PTHREAD_CALL(pthread_attr_destroy, (&attr), "new thread");

    if (pthread_create_result == 0) {
      // pthread_create started the new thread. The child is now responsible for managing the
      // JNIEnvExt we created.
      // Note: we can't check for tmp_jni_env == nullptr, as that would require synchronization
      //       between the threads.
      child_jni_env_ext.release();  // NOLINT pthreads API.
      return;
    }
  }

  // Either JNIEnvExt::Create or pthread_create(3) failed, so clean up.
  {
    MutexLock mu(self, *Locks::runtime_shutdown_lock_);
    runtime->EndThreadBirth();
  }
  // Manually delete the global reference since Thread::Init will not have been run. Make sure
  // nothing can observe both opeer and jpeer set at the same time.
  child_thread->DeleteJPeer(env);
  delete child_thread;
  child_thread = nullptr;
  // TODO: remove from thread group?
  env->SetLongField(java_peer, WellKnownClasses::java_lang_Thread_nativePeer, 0);
  {
    std::string msg(child_jni_env_ext.get() == nullptr ?
        StringPrintf("Could not allocate JNI Env: %s", error_msg.c_str()) :
        StringPrintf("pthread_create (%s stack) failed: %s",
                                 PrettySize(stack_size).c_str(), strerror(pthread_create_result)));
    ScopedObjectAccess soa(env);
    soa.Self()->ThrowOutOfMemoryError(msg.c_str());
  }
}

// art/runtime/thread.cc
void* Thread::CreateCallback(void* arg) {
  Thread* self = reinterpret_cast<Thread*>(arg);
  Runtime* runtime = Runtime::Current();
  if (runtime == nullptr) {
    LOG(ERROR) << "Thread attaching to non-existent runtime: " << *self;
    return nullptr;
  }
  {
    // TODO: pass self to MutexLock - requires self to equal Thread::Current(), which is only true
    //       after self->Init().
    MutexLock mu(nullptr, *Locks::runtime_shutdown_lock_);
    // Check that if we got here we cannot be shutting down (as shutdown should never have started
    // while threads are being born).
    CHECK(!runtime->IsShuttingDownLocked());
    // Note: given that the JNIEnv is created in the parent thread, the only failure point here is
    //       a mess in InitStackHwm. We do not have a reasonable way to recover from that, so abort
    //       the runtime in such a case. In case this ever changes, we need to make sure here to
    //       delete the tmp_jni_env, as we own it at this point.
    CHECK(self->Init(runtime->GetThreadList(), runtime->GetJavaVM(), self->tlsPtr_.tmp_jni_env));
    self->tlsPtr_.tmp_jni_env = nullptr;
    Runtime::Current()->EndThreadBirth();
  }
  {
    ScopedObjectAccess soa(self);
    self->InitStringEntryPoints();

    // Copy peer into self, deleting global reference when done.
    CHECK(self->tlsPtr_.jpeer != nullptr);
    self->tlsPtr_.opeer = soa.Decode<mirror::Object>(self->tlsPtr_.jpeer).Ptr();
    // Make sure nothing can observe both opeer and jpeer set at the same time.
    self->DeleteJPeer(self->GetJniEnv());
    self->SetThreadName(self->GetThreadName()->ToModifiedUtf8().c_str());

    ArtField* priorityField = jni::DecodeArtField(WellKnownClasses::java_lang_Thread_priority);
    self->SetNativePriority(priorityField->GetInt(self->tlsPtr_.opeer));

    runtime->GetRuntimeCallbacks()->ThreadStart(self);

    // Unpark ourselves if the java peer was unparked before it started (see
    // b/28845097#comment49 for more information)

    ArtField* unparkedField = jni::DecodeArtField(
        WellKnownClasses::java_lang_Thread_unparkedBeforeStart);
    bool should_unpark = false;
    {
      // Hold the lock here, so that if another thread calls unpark before the thread starts
      // we don't observe the unparkedBeforeStart field before the unparker writes to it,
      // which could cause a lost unpark.
      art::MutexLock mu(soa.Self(), *art::Locks::thread_list_lock_);
      should_unpark = unparkedField->GetBoolean(self->tlsPtr_.opeer) == JNI_TRUE;
    }
    if (should_unpark) {
      self->Unpark();
    }

    // 重点在这里,执行 Thread.run()
    // Invoke the 'run' method of our java.lang.Thread.
    ObjPtr<mirror::Object> receiver = self->tlsPtr_.opeer;
    jmethodID mid = WellKnownClasses::java_lang_Thread_run;
    ScopedLocalRef<jobject> ref(soa.Env(), soa.AddLocalReference<jobject>(receiver));
    InvokeVirtualOrInterfaceWithJValues(soa, ref.get(), mid, nullptr);
  }

  // Thread.run() 返回后就销毁此线程
  // Detach and delete self.
  Runtime::Current()->GetThreadList()->Unregister(self);

  return nullptr;
}

// art/runtime/thread_list.cc
void ThreadList::Unregister(Thread* self) {
  DCHECK_EQ(self, Thread::Current());
  CHECK_NE(self->GetState(), kRunnable);
  Locks::mutator_lock_->AssertNotHeld(self);

  VLOG(threads) << "ThreadList::Unregister() " << *self;

  {
    MutexLock mu(self, *Locks::thread_list_lock_);
    ++unregistering_count_;
  }

  // Any time-consuming destruction, plus anything that can call back into managed code or
  // suspend and so on, must happen at this point, and not in ~Thread. The self->Destroy is what
  // causes the threads to join. It is important to do this after incrementing unregistering_count_
  // since we want the runtime to wait for the daemon threads to exit before deleting the thread
  // list.
  self->Destroy();

  // If tracing, remember thread id and name before thread exits.
  Trace::StoreExitingThreadInfo(self);

  uint32_t thin_lock_id = self->GetThreadId();
  while (true) {
    // Remove and delete the Thread* while holding the thread_list_lock_ and
    // thread_suspend_count_lock_ so that the unregistering thread cannot be suspended.
    // Note: deliberately not using MutexLock that could hold a stale self pointer.
    {
      MutexLock mu(self, *Locks::thread_list_lock_);
      if (!Contains(self)) {
        std::string thread_name;
        self->GetThreadName(thread_name);
        std::ostringstream os;
        DumpNativeStack(os, GetTid(), nullptr, "  native: ", nullptr);
        LOG(ERROR) << "Request to unregister unattached thread " << thread_name << "\\\\n" << os.str();
        break;
      } else {
        MutexLock mu2(self, *Locks::thread_suspend_count_lock_);
        if (!self->IsSuspended()) {
          list_.remove(self);
          break;
        }
      }
    }
    // In the case where we are not suspended yet, sleep to leave other threads time to execute.
    // This is important if there are realtime threads. b/111277984
    usleep(1);
    // We failed to remove the thread due to a suspend request, loop and try again.
  }
  delete self;

  // Release the thread ID after the thread is finished and deleted to avoid cases where we can
  // temporarily have multiple threads with the same thread id. When this occurs, it causes
  // problems in FindThreadByThreadId / SuspendThreadByThreadId.
  ReleaseThreadId(nullptr, thin_lock_id);

  // Clear the TLS data, so that the underlying native thread is recognizably detached.
  // (It may wish to reattach later.)
#ifdef __BIONIC__
  __get_tls()[TLS_SLOT_ART_THREAD_SELF] = nullptr;
#else
  CHECK_PTHREAD_CALL(pthread_setspecific, (Thread::pthread_key_self_, nullptr), "detach self");
  Thread::self_tls_ = nullptr;
#endif

  // Signal that a thread just detached.
  MutexLock mu(nullptr, *Locks::thread_list_lock_);
  --unregistering_count_;
  Locks::thread_exit_cond_->Broadcast(nullptr);
}

// art/runtime/thread.cc
void Thread::Destroy() {
  Thread* self = this;
  DCHECK_EQ(self, Thread::Current());

  if (tlsPtr_.jni_env != nullptr) {
    {
      ScopedObjectAccess soa(self);
      MonitorExitVisitor visitor(self);
      // On thread detach, all monitors entered with JNI MonitorEnter are automatically exited.
      tlsPtr_.jni_env->monitors_.VisitRoots(&visitor, RootInfo(kRootVMInternal));
    }
    // Release locally held global references which releasing may require the mutator lock.
    if (tlsPtr_.jpeer != nullptr) {
      // If pthread_create fails we don't have a jni env here.
      tlsPtr_.jni_env->DeleteGlobalRef(tlsPtr_.jpeer);
      tlsPtr_.jpeer = nullptr;
    }
    if (tlsPtr_.class_loader_override != nullptr) {
      tlsPtr_.jni_env->DeleteGlobalRef(tlsPtr_.class_loader_override);
      tlsPtr_.class_loader_override = nullptr;
    }
  }

  if (tlsPtr_.opeer != nullptr) {
    ScopedObjectAccess soa(self);

    // 销毁线程的时候会检查一下有没 pending exception,也就是此线程在执行代码过程中发生的 uncaught exception
    // We may need to call user-supplied managed code, do this before final clean-up.
    HandleUncaughtExceptions(soa);
    RemoveFromThreadGroup(soa);
    Runtime* runtime = Runtime::Current();
    if (runtime != nullptr) {
      runtime->GetRuntimeCallbacks()->ThreadDeath(self);
    }

    // this.nativePeer = 0;
    if (Runtime::Current()->IsActiveTransaction()) {
      jni::DecodeArtField(WellKnownClasses::java_lang_Thread_nativePeer)
          ->SetLong<true>(tlsPtr_.opeer, 0);
    } else {
      jni::DecodeArtField(WellKnownClasses::java_lang_Thread_nativePeer)
          ->SetLong<false>(tlsPtr_.opeer, 0);
    }

    // Thread.join() is implemented as an Object.wait() on the Thread.lock object. Signal anyone
    // who is waiting.
    ObjPtr<mirror::Object> lock =
        jni::DecodeArtField(WellKnownClasses::java_lang_Thread_lock)->GetObject(tlsPtr_.opeer);
    // (This conditional is only needed for tests, where Thread.lock won't have been set.)
    if (lock != nullptr) {
      StackHandleScope<1> hs(self);
      Handle<mirror::Object> h_obj(hs.NewHandle(lock));
      ObjectLock<mirror::Object> locker(self, h_obj);
      locker.NotifyAll();
    }
    tlsPtr_.opeer = nullptr;
  }

  {
    ScopedObjectAccess soa(self);
    Runtime::Current()->GetHeap()->RevokeThreadLocalBuffers(this);
  }
  // Mark-stack revocation must be performed at the very end. No
  // checkpoint/flip-function or read-barrier should be called after this.
  if (kUseReadBarrier) {
    Runtime::Current()->GetHeap()->ConcurrentCopyingCollector()->RevokeThreadLocalMarkStack(this);
  }
}

// art/runtime/thread.cc
void Thread::HandleUncaughtExceptions(ScopedObjectAccessAlreadyRunnable& soa) {
  if (!IsExceptionPending()) {
    return;
  }
  ScopedLocalRef<jobject> peer(tlsPtr_.jni_env, soa.AddLocalReference<jobject>(tlsPtr_.opeer));
  ScopedThreadStateChange tsc(this, kNative);

  // Get and clear the exception.
  ScopedLocalRef<jthrowable> exception(tlsPtr_.jni_env, tlsPtr_.jni_env->ExceptionOccurred());
  tlsPtr_.jni_env->ExceptionClear();

  // 如果存在 pending exception/uncaught exception,则执行 Thread.dispatchUncaughtException()
  // Call the Thread instance's dispatchUncaughtException(Throwable)
  tlsPtr_.jni_env->CallVoidMethod(peer.get(),
      WellKnownClasses::java_lang_Thread_dispatchUncaughtException,
      exception.get());

  // If the dispatchUncaughtException threw, clear that exception too.
  tlsPtr_.jni_env->ExceptionClear();
}

接下来我猜想埋点在代码里的异常检查流程在发现 pending exception != null 后,会中断字节码的执行(Thread.run())从而回到 native 代码

让我们从开启一个线程 Thread.start() 看看这个流程

线程进入 VM 的入口点是 Thread.run(),执行完毕(或者发生 uncaught exception 被中断字节码的执行)退出 VM 回到 native 代码后,就执行销毁线程的流程:ThreadList::Unregister -> Thread::Destroy,其中 HandleUncaughtExceptions 会检查是否有 uncaught exception/pending exception,有的话再次进入 VM 执行 Thread.dispatchUncaughtException