系列文章目录

【第一周】吴恩达团队AI for Medical Diagnosis课程笔记_十三豆腐脑的博客-CSDN博客

【第一周】吴恩达团队AI for Medical Diagnosis课程实验_十三豆腐脑的博客-CSDN博客

【第一周】吴恩达团队AI for Medical Diagnosis大作业_十三豆腐脑的博客-CSDN博客

【第二周】吴恩达团队AI for Medical Diagnosis课程笔记_十三豆腐脑的博客-CSDN博客

系列文章目录

前言

二、概述

三、指标

1.真阳性、假阳性、真阴性和假阴性

2.准确率

3.流行度

4.敏感性和特异性

5.阳性预测值（PPV）和阴性预测值（NPV）

6.ROC 曲线

四、置信区间

五、精确召回曲线

六、F1分数

七、校准

总结

前言

诊断模型的评估
欢迎来到课程 1 的第二个作业。在这个作业中，我们将使用我们在上一个作业中开发的 X 射线分类模型的结果。为了使数据处理更易于管理，我们将使用我们的训练和验证数据集的一个子集。我们还将使用手动标记的 420 X 射线测试数据集。

提醒一下，我们的数据集包含来自 14 种不同条件的 X 射线，可通过 X 射线诊断。我们将使用我们在讲座中学到的分类指标来评估我们在每个类上的表现。

在本作业结束时，您将了解：

1.准确性
2.患病率
3.特异性和敏感性
4.PPV 和 NPV
5.ROC 曲线和 AUCROC（c 统计量）
6.置信区间

一、包
在本次作业中，我们将使用以下软件包：

numpy 是一个流行的科学计算库
matplotlib 是一个兼容 numpy 的绘图库
pandas 是我们用来操作数据的工具
sklearn 将用于衡量我们模型的性能
运行下一个单元以导入所有必要的包以及自定义 util 函数。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd  import util
from public_tests import *
from test_utils import *

二、概述

我们将按以下顺序查看我们的评估指标。

TP、TN、FP、FN
准确性
患病率
敏感性和特异性
PPV 和 NPV
AUC曲线下面积
置信区间

让我们快速浏览一下我们的数据集。数据存储在名为 train_preds.csv 和 valid_preds.csv 的两个 CSV 文件中。我们已经为我们的测试用例预先计算了模型输出。我们将在整个作业中使用这些预测和真实的类标签。

train_results = pd.read_csv("data/train_preds.csv")
valid_results = pd.read_csv("data/valid_preds.csv")# the labels in our dataset
class_labels = ['Cardiomegaly','Emphysema','Effusion','Hernia','Infiltration','Mass','Nodule','Atelectasis','Pneumothorax','Pleural_Thickening','Pneumonia','Fibrosis','Edema','Consolidation']# the labels for prediction values in our dataset
pred_labels = [l + "_pred" for l in class_labels]

提取标签 (y) 和预测 (pred)。

y = valid_results[class_labels].values
pred = valid_results[pred_labels].values

运行下一个单元格以并排查看它们。

# let's take a peek at our dataset
valid_results[np.concatenate([class_labels, pred_labels])].head()

	Infiltration	Nodule	...	Infiltration_pred	Mass_pred	Nodule_pred	Atelectasis_pred	Pneumothorax_pred	Pleural_Thickening_pred	Pneumonia_pred	Fibrosis_pred	Edema_pred	Consolidation_pred
0	0	0	...	0.256020	0.266928	0.312440	0.460342	0.079453	0.271495	0.276861	0.398799	0.015867	0.156320
1	1	1	...	0.382199	0.176825	0.465807	0.489424	0.084595	0.377318	0.363582	0.638024	0.025948	0.144419
2	0	0	...	0.427727	0.115513	0.249030	0.035105	0.238761	0.167095	0.166389	0.262463	0.007758	0.125790
3	0	0	...	0.158596	0.259460	0.334870	0.266489	0.073371	0.229834	0.191281	0.344348	0.008559	0.119153
4	0	0	...	0.536762	0.198797	0.273110	0.186771	0.242122	0.309786	0.411771	0.244666	0.126930	0.342409

5 rows × 28 columns

为了进一步了解我们的数据集详细信息，这里是验证数据集中每个标签的样本数量的直方图：

plt.xticks(rotation=90)
plt.bar(x = class_labels, height= y.sum(axis=0));

看起来我们的数据集的样本数量不平衡。具体来说，我们的数据集有少数被诊断患有疝气的患者。

三、指标

1.真阳性、假阳性、真阴性和假阴性

从模型预测中计算的最基本的统计数据是真阳性、真阴性、假阳性和假阴性。
顾名思义
True Positive (TP)：模型将示例分类为正例，实际标签也为正例。
False Positive（FP）：模型将示例分类为正例，但实际标签为负例。
True Negative (TN)：模型将示例分类为负例，实际标签也为负例。
False Negative（FN）：模型将示例分类为负，但标签实际上是正的。
我们将计算给定数据中 TP、FP、TN 和 FN 的数量。我们所有的指标都可以建立在这四个统计数据之上。
回想一下，模型输出 0 到 1 之间的实数。
为了计算二元类预测，我们需要将它们转换为 0 或 1。
我们将使用阈值 thth 来执行此操作。
任何高于 thth 的模型输出都设置为 1，低于 thth 的模型输出设置为 0。
我们所有的指标（除了最后的 AUC）都将取决于这个阈值的选择。
练习 1 - 真阳性、假阳性、真阴性和假阴性
填写函数以计算下面给定阈值的 TP、FP、TN 和 FN。
第一个已为您完成。

# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def true_positives(y, pred, th=0.5):"""Count true positives.Args:y (np.array): ground truth, size (n_examples)pred (np.array): model output, size (n_examples)th (float): cutoff value for positive prediction from modelReturns:TP (int): true positives"""TP = 0# get thresholded predictionsthresholded_preds = pred >= th# compute TPTP = np.sum((y == 1) & (thresholded_preds == 1))return TPdef true_negatives(y, pred, th=0.5):"""Count true negatives.Args:y (np.array): ground truth, size (n_examples)pred (np.array): model output, size (n_examples)th (float): cutoff value for positive prediction from modelReturns:TN (int): true negatives"""TN = 0# get thresholded predictionsthresholded_preds = pred >= th### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) #### compute TNTN = np.sum((y == 0) & (thresholded_preds == 0))### END CODE HERE ###return TNdef false_positives(y, pred, th=0.5):"""Count false positives.Args:y (np.array): ground truth, size (n_examples)pred (np.array): model output, size (n_examples)th (float): cutoff value for positive prediction from modelReturns:FP (int): false positives"""FP = 0# get thresholded predictionsthresholded_preds = pred >= th### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) #### compute FPFP = np.sum((y == 0) & (thresholded_preds == 1))### END CODE HERE ###return FPdef false_negatives(y, pred, th=0.5):"""Count false positives.Args:y (np.array): ground truth, size (n_examples)pred (np.array): model output, size (n_examples)th (float): cutoff value for positive prediction from modelReturns:FN (int): false negatives"""FN = 0# get thresholded predictionsthresholded_preds = pred >= th### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) #### compute FNFN = np.sum((y == 1) & (thresholded_preds == 0))### END CODE HERE ###return FN

### do not modify this cell
get_tp_tn_fp_fn_test(true_positives, true_negatives, false_positives, false_negatives)

y_test	preds_test	category
0	1	0.8	TP
1	1	0.7	TP
2	0	0.4	TN
3	0	0.3	TN
4	0	0.2	TN
5	0	0.5	FP
6	0	0.6	FP
7	0	0.7	FP
8	0	0.8	FP
9	1	0.1	FN
10	1	0.2	FN
11	1	0.3	FN
12	1	0.4	FN
13	1	0.0	FN

Your functions calcualted: TP: 2TN: 3FP: 4FN: 5All tests passed.All tests passed.All tests passed.All tests passed.

运行下一个单元格以查看每个类的模型预测的评估指标摘要。

util.get_performance_metrics(y, pred, class_labels)

	TP	TN	FP	FN	Accuracy	Prevalence	Sensitivity	Specificity	PPV	NPV	AUC	F1	Threshold

Cardiomegaly	16	814	169	1	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Emphysema	20	869	103	8	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Effusion	99	690	196	15	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Hernia	1	743	255	1	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Infiltration	114	543	265	78	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Mass	40	789	158	13	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Nodule	28	731	220	21	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Atelectasis	64	657	249	30	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Pneumothorax	24	785	183	8	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Pleural_Thickening	24	713	259	4	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Pneumonia	14	661	320	5	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Fibrosis	10	725	261	4	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Edema	15	767	213	5	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Consolidation	36	658	297	9	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5

现在它只有TP，TN，FP，FN。在整个作业中，我们将填写所有其他指标，以了解有关我们模型性能的更多信息。

2.准确率

让我们使用 0.5 的阈值作为我们对所有类别的预测的概率截止值，并像在机器学习问题中通常那样计算模型的准确性。
准确率=（真阳性+真阴性）/（真阳性+真阴性+假阳性+假阴性）

练习 2 - get_accuracy
使用此公式计算以下精度：

# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def get_accuracy(y, pred, th=0.5):"""Compute accuracy of predictions at threshold.Args:y (np.array): ground truth, size (n_examples)pred (np.array): model output, size (n_examples)th (float): cutoff value for positive prediction from modelReturns:accuracy (float): accuracy of predictions at threshold"""accuracy = 0.0### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) #### get TP, FP, TN, FN using our previously defined functionsTP = true_positives(y, pred, th=0.5)FP = false_positives(y, pred, th=0.5)TN = true_negatives(y, pred, th=0.5)FN = false_negatives(y, pred, th=0.5)# Compute accuracy using TP, FP, TN, FNaccuracy = (TP+TN)/(TP+FP+TN+FN)### END CODE HERE ###return accuracy

### do not modify this cell
get_accuracy_test(get_accuracy)

Test Case:Test Labels:    [1 0 0 1 1]
Test Predictions:  [0.8 0.8 0.4 0.6 0.3]
Threshold:     0.5
Computed Accuracy: 0.6 All tests passed.

运行下一个单元格以查看每个类的模型输出的准确性，以及真阳性、真阴性、假阳性和假阴性的数量。

util.get_performance_metrics(y, pred, class_labels, acc=get_accuracy)

TP	TN	FP	FN	Accuracy	Prevalence	Sensitivity	Specificity	PPV	NPV	AUC	F1	Threshold

Cardiomegaly	16	814	169	1	0.83	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Emphysema	20	869	103	8	0.889	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Effusion	99	690	196	15	0.789	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Hernia	1	743	255	1	0.744	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Infiltration	114	543	265	78	0.657	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Mass	40	789	158	13	0.829	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Nodule	28	731	220	21	0.759	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Atelectasis	64	657	249	30	0.721	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Pneumothorax	24	785	183	8	0.809	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Pleural_Thickening	24	713	259	4	0.737	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Pneumonia	14	661	320	5	0.675	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Fibrosis	10	725	261	4	0.735	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Edema	15	767	213	5	0.782	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Consolidation	36	658	297	9	0.694	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5

如果我们根据准确度指标来判断模型的性能，我们会说我们的模型对于检测浸润病例不是很准确（准确度为 0.657），但对于检测肺气肿非常准确（准确度为 0.889）。
但真的是这样吗？...
让我们想象一个模型，它简单地预测任何患者没有肺气肿，而不管患者的测量结果如何。让我们计算这种模型的准确性。

get_accuracy(valid_results["Emphysema"].values, np.zeros(len(valid_results)))

0.972

正如你在上面看到的，这样一个模型的准确率是 97%！甚至比我们基于深度学习的模型还要好。
但这真的是一个好模型吗？如果患者真的有这种情况，这个模型不是 100% 的错误吗？
在接下来的部分中，我们将使用更高级的模型测量（敏感性和特异性）来解决这一问题，这些测量评估模型对患有该病的患者的阳性预测和实际上没有该病的病例的阴性预测的好坏。

3.流行度

另一个重要的概念是流行度。
在医学背景下，患病率是人群中患有疾病（或病症等）的人的比例。
在机器学习方面，这是正例的比例。流行度的表达式为：
患病率=1N∑iyi

当示例为“阳性”（患有疾病）时，yi=1yi=1。
练习 3 - get_prevalence
让我们测量每种疾病的患病率：

# UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def get_prevalence(y):"""Compute prevalence.Args:y (np.array): ground truth, size (n_examples)Returns:prevalence (float): prevalence of positive cases"""prevalence = 0.0### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###prevalence = np.sum(y==1)/np.size(y)### END CODE HERE ###return prevalence### do npt modify this cell
get_prevalence_test(get_prevalence)

Test Case:Test Labels:       [1 0 0 1 1 0 0 0 0 1]
Computed Prevalence:  0.4 All tests passed.

util.get_performance_metrics(y, pred, class_labels, acc=get_accuracy, prevalence=get_prevalence)

TP	TN	FP	FN	Accuracy	Prevalence	Sensitivity	Specificity	PPV	NPV	AUC	F1	Threshold

Cardiomegaly	16	814	169	1	0.83	0.017	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Emphysema	20	869	103	8	0.889	0.028	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Effusion	99	690	196	15	0.789	0.114	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Hernia	1	743	255	1	0.744	0.002	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Infiltration	114	543	265	78	0.657	0.192	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Mass	40	789	158	13	0.829	0.053	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Nodule	28	731	220	21	0.759	0.049	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Atelectasis	64	657	249	30	0.721	0.094	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Pneumothorax	24	785	183	8	0.809	0.032	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Pleural_Thickening	24	713	259	4	0.737	0.028	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Pneumonia	14	661	320	5	0.675	0.019	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Fibrosis	10	725	261	4	0.735	0.014	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Edema	15	767	213	5	0.782	0.02	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Consolidation	36	658	297	9	0.694	0.045	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	Not Defined	0.5

疝气的患病率为 0.002，这是我们数据集中研究的疾病中最罕见的。

4.敏感性和特异性

灵敏度和特异性是用于测量诊断测试的两个最突出的数字。

敏感性是我们的测试输出阳性的概率，假设案例实际上是阳性的。
特异性是在案例实际上是否定的情况下测试输出否定的概率。
我们可以很容易地用真阳性(TP)、真阴性(TN)、假阳性(FP)和假阴性(FN)来表达这一点：

敏感性= TP/(TP+FN)

特异性=TN/(FP+TN)

练习 4 - get_sensitive 和 get_specificity
让我们计算模型的敏感性和特异性：

# UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def get_sensitivity(y, pred, th=0.5):"""Compute sensitivity of predictions at threshold.Args:y (np.array): ground truth, size (n_examples)pred (np.array): model output, size (n_examples)th (float): cutoff value for positive prediction from modelReturns:sensitivity (float): probability that our test outputs positive given that the case is actually positive"""sensitivity = 0.0### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) #### get TP and FN using our previously defined functionsTP = true_positives(y, pred, th=0.5)FN = false_negatives(y, pred, th=0.5)# use TP and FN to compute sensitivitysensitivity = TP/(TP+FN)### do not modify this cell
get_sensitivity_specificity_test(get_sensitivity, get_specificity)### END CODE HERE ###return sensitivitydef get_specificity(y, pred, th=0.5):"""Compute specificity of predictions at threshold.Args:y (np.array): ground truth, size (n_examples)pred (np.array): model output, size (n_examples)th (float): cutoff value for positive prediction from modelReturns:specificity (float): probability that the test outputs negative given that the case is actually negative"""specificity = 0.0### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) #### get TN and FP using our previously defined functionsTN = true_negatives(y, pred, th=0.5)FP = false_positives(y, pred, th=0.5)# use TN and FP to compute specificity specificity = TN/(TN+FP)### END CODE HERE ###return specificity

Test Case:Test Labels:        [1 0 0 1 1]
Test Predictions:      [1 0 0 1 1]
Threshold:         0.5
Computed Sensitivity:  0.6666666666666666
Computed Specificity:  0.5 All tests passed.All tests passed.

util.get_performance_metrics(y, pred, class_labels, acc=get_accuracy, prevalence=get_prevalence, sens=get_sensitivity, spec=get_specificity)

TP	TN	FP	FN	Accuracy	Prevalence	Sensitivity	Specificity	PPV	NPV	AUC	F1	Threshold

Cardiomegaly	16	814	169	1	0.83	0.017	0.941	0.828	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Emphysema	20	869	103	8	0.889	0.028	0.714	0.894	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Effusion	99	690	196	15	0.789	0.114	0.868	0.779	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Hernia	1	743	255	1	0.744	0.002	0.5	0.744	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Infiltration	114	543	265	78	0.657	0.192	0.594	0.672	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Mass	40	789	158	13	0.829	0.053	0.755	0.833	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Nodule	28	731	220	21	0.759	0.049	0.571	0.769	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Atelectasis	64	657	249	30	0.721	0.094	0.681	0.725	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Pneumothorax	24	785	183	8	0.809	0.032	0.75	0.811	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Pleural_Thickening	24	713	259	4	0.737	0.028	0.857	0.734	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Pneumonia	14	661	320	5	0.675	0.019	0.737	0.674	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Fibrosis	10	725	261	4	0.735	0.014	0.714	0.735	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Edema	15	767	213	5	0.782	0.02	0.75	0.783	Not Defined	Not Defined	Not Defined	Not Defined	0.5
Consolidation	36	658	297	9	0.694	0.045	0.8	0.689	Not Defined	Not Defined	Not Defined	Not Defined	0.5

请注意，特异性和敏感性不取决于数据集中阳性类别的流行度。

这是因为统计数据仅在同一类别的人中计算
敏感性只考虑正类人的输出
同样，特异性只考虑负类人的输出。

5.阳性预测值（PPV）和阴性预测值（NPV）

然而，在诊断上，敏感性和特异性没有帮助。例如，敏感性告诉我们，假设这个人已经患有这种疾病，我们的测试输出阳性的概率。在这里，我们以我们想知道的事情为条件（患者是否有这种情况）！

考虑到我们的测试结果为阳性，更有帮助的是该人患有疾病的概率。这给我们带来了阳性预测值（PPV）和阴性预测值（NPV）。

阳性预测值 (PPV) 是筛查试验呈阳性的受试者真正患有该疾病的概率。
阴性预测值 (NPV) 是筛查结果为阴性的受试者真正没有患病的概率。
同样，我们可以用真阳性、真阴性、假阳性和假阴性来表述这些：

PPV=TP/(TP+FP)

NPV=TN/(TN+FN)

练习 5 - get_ppv 和 get_npv
让我们为我们的模型计算 PPV 和 NPV：

# UNQ_C5 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def get_ppv(y, pred, th=0.5):"""Compute PPV of predictions at threshold.Args:y (np.array): ground truth, size (n_examples)pred (np.array): model output, size (n_examples)th (float): cutoff value for positive prediction from modelReturns:PPV (float): positive predictive value of predictions at threshold"""PPV = 0.0### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) #### get TP and FP using our previously defined functionsTP = true_positives(y, pred, th=0.5)FP = false_positives(y, pred, th=0.5)# use TP and FP to compute PPVPPV = TP/(TP+FP)### END CODE HERE ###return PPVdef get_npv(y, pred, th=0.5):"""Compute NPV of predictions at threshold.Args:y (np.array): ground truth, size (n_examples)pred (np.array): model output, size (n_examples)th (float): cutoff value for positive prediction from modelReturns:NPV (float): negative predictive value of predictions at threshold"""NPV = 0.0### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) #### get TN and FN using our previously defined functionsTN = true_negatives(y, pred, th=0.5)FN = false_negatives(y, pred, th=0.5)# use TN and FN to compute NPVNPV = TN/(TN+FN)### END CODE HERE ###return NPV### do not modify this cell
get_ppv_npv_test(get_ppv, get_npv)

Test Case:Test Labels:    [1 0 0 1 1]
Test Predictions:  [1 0 0 1 1]
Threshold:     0.5
Computed PPV:      0.6666666666666666
Computed NPV:      0.5 All tests passed.All tests passed.

util.get_performance_metrics(y, pred, class_labels, acc=get_accuracy, prevalence=get_prevalence, sens=get_sensitivity, spec=get_specificity, ppv=get_ppv, npv=get_npv)

TP	TN	FP	FN	Accuracy	Prevalence	Sensitivity	Specificity	PPV	NPV	AUC	F1	Threshold

Cardiomegaly	16	814	169	1	0.83	0.017	0.941	0.828	0.086	0.999	Not Defined	Not Defined	0.5
Emphysema	20	869	103	8	0.889	0.028	0.714	0.894	0.163	0.991	Not Defined	Not Defined	0.5
Effusion	99	690	196	15	0.789	0.114	0.868	0.779	0.336	0.979	Not Defined	Not Defined	0.5
Hernia	1	743	255	1	0.744	0.002	0.5	0.744	0.004	0.999	Not Defined	Not Defined	0.5
Infiltration	114	543	265	78	0.657	0.192	0.594	0.672	0.301	0.874	Not Defined	Not Defined	0.5
Mass	40	789	158	13	0.829	0.053	0.755	0.833	0.202	0.984	Not Defined	Not Defined	0.5
Nodule	28	731	220	21	0.759	0.049	0.571	0.769	0.113	0.972	Not Defined	Not Defined	0.5
Atelectasis	64	657	249	30	0.721	0.094	0.681	0.725	0.204	0.956	Not Defined	Not Defined	0.5
Pneumothorax	24	785	183	8	0.809	0.032	0.75	0.811	0.116	0.99	Not Defined	Not Defined	0.5
Pleural_Thickening	24	713	259	4	0.737	0.028	0.857	0.734	0.085	0.994	Not Defined	Not Defined	0.5
Pneumonia	14	661	320	5	0.675	0.019	0.737	0.674	0.042	0.992	Not Defined	Not Defined	0.5
Fibrosis	10	725	261	4	0.735	0.014	0.714	0.735	0.037	0.995	Not Defined	Not Defined	0.5
Edema	15	767	213	5	0.782	0.02	0.75	0.783	0.066	0.994	Not Defined	Not Defined	0.5
Consolidation	36	658	297	9	0.694	0.045	0.8	0.689	0.108	0.987	Not Defined	Not Defined	0.5

请注意，尽管具有非常高的灵敏度和准确性，但预测的 PPV 仍然可能非常低。

例如，水肿就是这种情况。

水肿的敏感性为 0.75。
然而，鉴于模型预测为阳性，一个人患有水肿的概率（其 PPV）仅为 0.066！

6.ROC 曲线

到目前为止，我们一直在假设我们的模型对 0.5 及以上的预测应视为正数，否则应视为负数。然而，这是一个相当随意的选择。看到这一点的一种方法是查看一个非常有用的可视化，称为接收器操作特征 (ROC) 曲线。

ROC 曲线是通过在各种阈值设置下绘制真阳性率 (TPR) 与假阳性率 (FPR) 来创建的。理想点在左上角，真阳性率为1，假阳性率为0。曲线上的各个点是通过逐渐改变阈值产生的。

让我们看看我们模型的这条曲线：

util.get_curve(y, pred, class_labels)

ROC 曲线下的面积也称为 AUCROC 或 C 统计量，是拟合优度的量度。在医学文献中，这个数字还给出了随机选择的经历过某种疾病的患者比没有经历过这种疾病的患者具有更高风险评分的概率。这总结了所有阈值的模型输出，并很好地了解了给定模型的判别力。

让我们使用 sklearn 中的roc_auc_score 度量函数将此分数添加到我们的度量表中。

from sklearn.metrics import roc_auc_score
util.get_performance_metrics(y, pred, class_labels, acc=get_accuracy, prevalence=get_prevalence, sens=get_sensitivity, spec=get_specificity, ppv=get_ppv, npv=get_npv, auc=roc_auc_score)

TP	TN	FP	FN	Accuracy	Prevalence	Sensitivity	Specificity	PPV	NPV	AUC	F1	Threshold

Cardiomegaly	16	814	169	1	0.83	0.017	0.941	0.828	0.086	0.999	0.933	Not Defined	0.5
Emphysema	20	869	103	8	0.889	0.028	0.714	0.894	0.163	0.991	0.935	Not Defined	0.5
Effusion	99	690	196	15	0.789	0.114	0.868	0.779	0.336	0.979	0.891	Not Defined	0.5
Hernia	1	743	255	1	0.744	0.002	0.5	0.744	0.004	0.999	0.644	Not Defined	0.5
Infiltration	114	543	265	78	0.657	0.192	0.594	0.672	0.301	0.874	0.696	Not Defined	0.5
Mass	40	789	158	13	0.829	0.053	0.755	0.833	0.202	0.984	0.888	Not Defined	0.5
Nodule	28	731	220	21	0.759	0.049	0.571	0.769	0.113	0.972	0.745	Not Defined	0.5
Atelectasis	64	657	249	30	0.721	0.094	0.681	0.725	0.204	0.956	0.781	Not Defined	0.5
Pneumothorax	24	785	183	8	0.809	0.032	0.75	0.811	0.116	0.99	0.826	Not Defined	0.5
Pleural_Thickening	24	713	259	4	0.737	0.028	0.857	0.734	0.085	0.994	0.868	Not Defined	0.5
Pneumonia	14	661	320	5	0.675	0.019	0.737	0.674	0.042	0.992	0.762	Not Defined	0.5
Fibrosis	10	725	261	4	0.735	0.014	0.714	0.735	0.037	0.995	0.801	Not Defined	0.5
Edema	15	767	213	5	0.782	0.02	0.75	0.783	0.066	0.994	0.856	Not Defined	0.5
Consolidation	36	658	297	9	0.694	0.045	0.8	0.689	0.108	0.987	0.799	Not Defined	0.5

四、置信区间

当然，我们的数据集只是现实世界的一个样本，我们对所有上述指标的计算值是对现实世界值的估计。由于我们数据集的采样，最好量化这种不确定性。我们将通过使用置信区间来做到这一点。参数 s 的估计 ŝ 的 95% 置信区间是区间 I=(a,b)，使得在运行实验时 95% 的时间里，真实值 s 包含在 I 中。更具体地说，如果我们要多次运行实验，那么I包含真实参数的那些实验的比例将趋向于 95%。

虽然一些估计带有分析计算置信区间的方法，但更复杂的统计数据，例如 AUC，是困难的。对于这些，我们可以使用一种称为引导程序的方法。 bootstrap 通过替换重采样数据集来估计不确定性。对于每次重采样 i ，我们将得到一个新的估计值 ŝ i 。然后，我们可以通过将 ŝ i 的分布用于我们的 bootstrap 样本来估计 ŝ 的分布。

在下面的代码中，我们创建引导样本并根据这些样本计算样本 AUC。请注意，我们使用分层随机抽样（分别从正类和负类中抽样）来确保代表每个类的成员。

def bootstrap_auc(y, pred, classes, bootstraps = 100, fold_size = 1000):statistics = np.zeros((len(classes), bootstraps))for c in range(len(classes)):df = pd.DataFrame(columns=['y', 'pred'])df.loc[:, 'y'] = y[:, c]df.loc[:, 'pred'] = pred[:, c]# get positive examples for stratified samplingdf_pos = df[df.y == 1]df_neg = df[df.y == 0]prevalence = len(df_pos) / len(df)for i in range(bootstraps):# stratified sampling of positive and negative examplespos_sample = df_pos.sample(n = int(fold_size * prevalence), replace=True)neg_sample = df_neg.sample(n = int(fold_size * (1-prevalence)), replace=True)y_sample = np.concatenate([pos_sample.y.values, neg_sample.y.values])pred_sample = np.concatenate([pos_sample.pred.values, neg_sample.pred.values])score = roc_auc_score(y_sample, pred_sample)statistics[c][i] = scorereturn statisticsstatistics = bootstrap_auc(y, pred, class_labels)

现在我们可以从我们计算的样本统计数据中计算置信区间。

util.print_confidence_intervals(class_labels, statistics)

	Mean AUC (CI 5%-95%)
Cardiomegaly	0.93 (0.90-0.96)
Emphysema	0.94 (0.91-0.96)
Effusion	0.89 (0.87-0.91)
Hernia	0.65 (0.30-0.98)
Infiltration	0.70 (0.66-0.73)
Mass	0.89 (0.84-0.92)
Nodule	0.75 (0.69-0.81)
Atelectasis	0.78 (0.75-0.81)
Pneumothorax	0.82 (0.74-0.89)
Pleural_Thickening	0.87 (0.79-0.92)
Pneumonia	0.76 (0.68-0.84)
Fibrosis	0.81 (0.75-0.87)
Edema	0.86 (0.81-0.89)
Consolidation	0.80 (0.75-0.85)

正如你所看到的，我们的置信区间对于某些类比其他类要宽得多。例如，疝气的区间在 (0.30 - 0.98) 左右，这表明我们不能确定它比偶然性更好（在 0.5 处）。

五、精确召回曲线

当数据中存在严重的类别不平衡时，Precision-Recall 是提供信息的预测指标。

在信息检索中

精度是结果相关性的度量，相当于我们之前定义的 PPV。
召回率衡量返回了多少真正相关的结果，这相当于我们之前定义的敏感度衡量标准。

精确召回曲线 (PRC) 显示了不同阈值的精确度和召回率之间的权衡。 曲线下面积大代表高召回率和高精度，其中高精度与低误报率相关，高召回率与低误报率相关。

两者的高分表明分类器返回了准确的结果（高精度），以及返回了所有阳性结果的大部分（高召回率）。

运行以下单元格以生成一个 PRC：

util.get_curve(y, pred, class_labels, curve='prc')

六、F1分数

F1 分数是精度和召回率的调和平均值，其中 F1 分数在 1 处达到其最佳值（完美的精度和召回率），在 0 处达到最差值。

同样，我们可以简单地使用 sklearn 的 f1_score 实用度量函数将此度量添加到我们的性能表中。

from sklearn.metrics import f1_score
util.get_performance_metrics(y, pred, class_labels, acc=get_accuracy, prevalence=get_prevalence, sens=get_sensitivity, spec=get_specificity, ppv=get_ppv, npv=get_npv, auc=roc_auc_score,f1=f1_score)

TP	TN	FP	FN	Accuracy	Prevalence	Sensitivity	Specificity	PPV	NPV	AUC	F1	Threshold

Cardiomegaly	16	814	169	1	0.83	0.017	0.941	0.828	0.086	0.999	0.933	0.158	0.5
Emphysema	20	869	103	8	0.889	0.028	0.714	0.894	0.163	0.991	0.935	0.265	0.5
Effusion	99	690	196	15	0.789	0.114	0.868	0.779	0.336	0.979	0.891	0.484	0.5
Hernia	1	743	255	1	0.744	0.002	0.5	0.744	0.004	0.999	0.644	0.008	0.5
Infiltration	114	543	265	78	0.657	0.192	0.594	0.672	0.301	0.874	0.696	0.399	0.5
Mass	40	789	158	13	0.829	0.053	0.755	0.833	0.202	0.984	0.888	0.319	0.5
Nodule	28	731	220	21	0.759	0.049	0.571	0.769	0.113	0.972	0.745	0.189	0.5
Atelectasis	64	657	249	30	0.721	0.094	0.681	0.725	0.204	0.956	0.781	0.314	0.5
Pneumothorax	24	785	183	8	0.809	0.032	0.75	0.811	0.116	0.99	0.826	0.201	0.5
Pleural_Thickening	24	713	259	4	0.737	0.028	0.857	0.734	0.085	0.994	0.868	0.154	0.5
Pneumonia	14	661	320	5	0.675	0.019	0.737	0.674	0.042	0.992	0.762	0.079	0.5
Fibrosis	10	725	261	4	0.735	0.014	0.714	0.735	0.037	0.995	0.801	0.07	0.5
Edema	15	767	213	5	0.782	0.02	0.75	0.783	0.066	0.994	0.856	0.121	0.5
Consolidation	36	658	297	9	0.694	0.045	0.8	0.689	0.108	0.987	0.799	0.19	0.5

七、校准

在进行分类时，我们通常不仅要预测类标签，还要获得每个标签的概率。理想情况下，这个概率会给我们对预测的某种信心。为了观察我们的模型生成的概率如何与真实概率对齐，我们可以绘制所谓的校准曲线。

为了生成校准图，我们首先将我们的预测分桶到 0 到 1 之间的固定数量的独立箱（例如 5 个）。然后我们为每个箱计算一个点：每个点的 x 值是我们的模型分配给这些点的概率以及该箱中每个点的真阳性分数的 y 值。然后我们在线性图中绘制这些点。校准良好的模型具有几乎与 y=x 线对齐的校准曲线。

sklearn 库有一个用于生成校准图的实用程序calibration_curve。让我们使用它来看看我们模型的校准：

from sklearn.calibration import calibration_curve
def plot_calibration_curve(y, pred):plt.figure(figsize=(20, 20))for i in range(len(class_labels)):plt.subplot(4, 4, i + 1)fraction_of_positives, mean_predicted_value = calibration_curve(y[:,i], pred[:,i], n_bins=20)plt.plot([0, 1], [0, 1], linestyle='--')plt.plot(mean_predicted_value, fraction_of_positives, marker='.')plt.xlabel("Predicted Value")plt.ylabel("Fraction of Positives")plt.title(class_labels[i])plt.tight_layout()plt.show()plot_calibration_curve(y, pred)

如上图所示，对于大多数预测，我们模型的校准图与校准良好的图并不相似。我们怎样才能解决这个问题？...

值得庆幸的是，有一种非常有用的方法称为 Platt scaling，它通过将逻辑回归模型拟合到我们模型的分数来工作。为了构建这个模型，我们将使用数据集的训练部分来生成线性模型，然后使用该模型来校准测试部分的预测。

from sklearn.linear_model import LogisticRegression as LR y_train = train_results[class_labels].values
pred_train = train_results[pred_labels].values
pred_calibrated = np.zeros_like(pred)for i in range(len(class_labels)):lr = LR(solver='liblinear', max_iter=10000)lr.fit(pred_train[:, i].reshape(-1, 1), y_train[:, i])    pred_calibrated[:, i] = lr.predict_proba(pred[:, i].reshape(-1, 1))[:,1]plot_calibration_curve(y[:,], pred_calibrated)

恭喜！这是很多需要熟悉的指标。我们希望您对自己对医学诊断评估的理解更有信心，并在您未来的工作中正确测试您的模型:)

总结

评价指标

【第二周】吴恩达团队AI for Medical Diagnosis大作业相关推荐

【第一周】吴恩达团队AI for Medical Diagnosis课程笔记
系列文章目录目录系列文章目录文章目录前言一.欢迎(大概就是课程的简介) 1.AI for Medicine专项课程以及AI for Medical Diagnosis课程介绍 2.AI fo ...
吴恩达deeplearning.ai系列课程笔记+编程作业(6)第二课改善深层神经网络-第二周：优化算法 (Optimization algorithms)
第二门课改善深层神经网络:超参数调试.正则化以及优化(Improving Deep Neural Networks:Hyperparameter tuning, Regularization and ...
吴恩达deeplearning.ai系列课程笔记+编程作业(11)第四课卷积神经网络-第二周深度卷积网络：实例探究（Deep convolutional models: case studies）
第四门课卷积神经网络(Convolutional Neural Networks) 第二周深度卷积网络:实例探究(Deep convolutional models: case studies) ...
吴恩达deeplearning.ai系列课程笔记+编程作业(14)序列模型(Sequence Models)-第二周自然语言处理与词嵌入
第五门课序列模型(Sequence Models) 第二周自然语言处理与词嵌入(Natural Language Processing and Word Embeddings) 文章目录第五门课 ...
吴恩达deeplearning.ai系列课程笔记+编程作业(15)序列模型(Sequence Models)-第三周序列模型和注意力机制
第五门课序列模型(Sequence Models) 第三周序列模型和注意力机制(Sequence models & Attention mechanism) 文章目录第五门课序列模型( ...
吴恩达deeplearning.ai系列课程笔记+编程作业(13)序列模型(Sequence Models)-第一周循环序列模型（Recurrent Neural Networks）
第五门课序列模型(Sequence Models) 第一周循环序列模型(Recurrent Neural Networks) 文章目录第五门课序列模型(Sequence Models) 第一周 ...
吴恩达团队AI诊断心律失常研究：准确率超人类医生
2019年,吴恩达团队在AI医疗领域实现了一项革命性的突破,他们成功地让AI诊断心律失常,其准确率高达83.7%,超过了人类心脏病医生的78.0%.这项研究成果已经发表在了知名期刊Nature Med ...
吴恩达deeplearning.ai深度学习课程空白作业
吴恩达deeplearning.ai深度学习课程的空白作业,包括深度学习微专业五门课程的全部空白编程作业,经多方整理而来.网上找来的作业好多都是已经被别人写过的,不便于自己练习,而且很多都缺失各种 ...
吴恩达：AI是时候从大数据转向「小数据」了
丰色编译整理量子位 | 公众号 QbitAI AI大牛吴恩达不久前刚被检测出新冠阳性,许多网友都向他表达了早日康复的祝愿. 如今,他的工作重心放在了他的Landing AI公司上. 这是一家专门面 ...
吴恩达 coursera AI 第四课总结+作业答案
前言吴恩达的课程堪称经典,有必要总结一下. 学以致用,以学促用,通过笔记总结,巩固学习成果,复习新学的概念. 目录文章目录前言目录正文正文深度神经网络 n层神经网络的信息流图. 深度网络 ...

【第二周】吴恩达团队AI for Medical Diagnosis大作业

系列文章目录

前言

二、概述

三、指标

1.真阳性、假阳性、真阴性和假阴性

2.准确率

3.流行度

4.敏感性和特异性

5.阳性预测值（PPV）和阴性预测值（NPV）

6.ROC 曲线

四、置信区间

五、精确召回曲线

六、F1分数

七、校准

总结

【第二周】吴恩达团队AI for Medical Diagnosis大作业相关推荐

最新文章

热门文章

y_test	preds_test	category
0	1	0.8	TP
1	1	0.7	TP
2	0	0.4	TN
3	0	0.3	TN
4	0	0.2	TN
5	0	0.5	FP
6	0	0.6	FP
7	0	0.7	FP
8	0	0.8	FP
9	1	0.1	FN
10	1	0.2	FN
11	1	0.3	FN
12	1	0.4	FN
13	1	0.0	FN

y_test	preds_test	category
0	1	0.8	TP
1	1	0.7	TP
2	0	0.4	TN
3	0	0.3	TN
4	0	0.2	TN
5	0	0.5	FP
6	0	0.6	FP
7	0	0.7	FP
8	0	0.8	FP
9	1	0.1	FN
10	1	0.2	FN
11	1	0.3	FN
12	1	0.4	FN
13	1	0.0	FN

y_test	preds_test	category
0	1	0.8	TP
1	1	0.7	TP
2	0	0.4	TN
3	0	0.3	TN
4	0	0.2	TN
5	0	0.5	FP
6	0	0.6	FP
7	0	0.7	FP
8	0	0.8	FP
9	1	0.1	FN
10	1	0.2	FN
11	1	0.3	FN
12	1	0.4	FN
13	1	0.0	FN