Delong test

最近需要做AUC的显著性检验，delong test是比较常见的AUC显著性检验的方法。我大体研究了一下。
原理：
1.以两个不同的模型对肿瘤良恶性进行分类，其中vgg的AUC为A1A_1A1,svm结果得到的AUC为A2A_2A2，delong test就是首先计算两者AUC差值θ=A1−A2\theta = A_1-A_2θ=A1−A2
2.然后根据计算出A1A_1A1和A2A_2A2的方差var(A1)var(A_1)var(A1),var(A2)var(A_2)var(A2)，以及两者的协方差cov(A1,A2)cov(A_1,A_2)cov(A1,A2)，关于AUC方差和协方差的计算方法我理解不深就不解释了。
3.然后算出zzz值
z=(A1−A2)var(A1)+var(A2)−2cov(A1,A2)z = \frac{(A_1-A_2)} {var(A_1)+var(A_2)-2cov(A_1,A_2)}z=var(A1)+var(A2)−2cov(A1,A2)(A1−A2)
4。然后将Z值分布作为正太分布，做显著性检验，得到P值，如果p值＜0.05说明两个AUC之间存在显著性差异。

在github上找到的代码链接
delong test.
delong test相关参考文章
《Comparing the Areas Under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach》
《Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves》
github上的代码有一些问题，我简单做了一些修改，并用自己的数据分别用medclac和文中的代码做了实验，两者计算的AUC，以及CI，pvalue都是一样的。
python代码结果图

medcalc结果图

以下是代码块。

#!/usr/bin/env python3# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import scipy.stats
from scipy import stats# AUC comparison adapted from
# https://github.com/Netflix/vmaf/
def compute_midrank(x):"""Computes midranks.Args:x - a 1D numpy arrayReturns:array of midranks"""J = np.argsort(x)Z = x[J]N = len(x)T = np.zeros(N, dtype=np.float)i = 0while i < N:j = iwhile j < N and Z[j] == Z[i]:j += 1T[i:j] = 0.5*(i + j - 1)i = jT2 = np.empty(N, dtype=np.float)# Note(kazeevn) +1 is due to Python using 0-based indexing# instead of 1-based in the AUC formula in the paperT2[J] = T + 1return T2def compute_midrank_weight(x, sample_weight):"""Computes midranks.Args:x - a 1D numpy arrayReturns:array of midranks"""J = np.argsort(x)Z = x[J]cumulative_weight = np.cumsum(sample_weight[J])N = len(x)T = np.zeros(N, dtype=np.float)i = 0while i < N:j = iwhile j < N and Z[j] == Z[i]:j += 1T[i:j] = cumulative_weight[i:j].mean()i = jT2 = np.empty(N, dtype=np.float)T2[J] = Treturn T2def fastDeLong(predictions_sorted_transposed, label_1_count, sample_weight=None):if sample_weight is None:return fastDeLong_no_weights(predictions_sorted_transposed, label_1_count)else:return fastDeLong_weights(predictions_sorted_transposed, label_1_count, sample_weight)def fastDeLong_weights(predictions_sorted_transposed, label_1_count, sample_weight):"""The fast version of DeLong's method for computing the covariance ofunadjusted AUC.Args:predictions_sorted_transposed: a 2D numpy.array[n_classifiers, n_examples]sorted such as the examples with label "1" are firstReturns:(AUC value, DeLong covariance)Reference:@article{sun2014fast,title={Fast Implementation of DeLong's Algorithm forComparing the Areas Under Correlated Receiver Oerating Characteristic Curves},author={Xu Sun and Weichao Xu},journal={IEEE Signal Processing Letters},volume={21},number={11},pages={1389--1393},year={2014},publisher={IEEE}}"""# Short variables are named as they are in the paperm = label_1_countn = predictions_sorted_transposed.shape[1] - mpositive_examples = predictions_sorted_transposed[:, :m]negative_examples = predictions_sorted_transposed[:, m:]k = predictions_sorted_transposed.shape[0]tx = np.empty([k, m], dtype=np.float)ty = np.empty([k, n], dtype=np.float)tz = np.empty([k, m + n], dtype=np.float)for r in range(k):tx[r, :] = compute_midrank_weight(positive_examples[r, :], sample_weight[:m])ty[r, :] = compute_midrank_weight(negative_examples[r, :], sample_weight[m:])tz[r, :] = compute_midrank_weight(predictions_sorted_transposed[r, :], sample_weight)total_positive_weights = sample_weight[:m].sum()total_negative_weights = sample_weight[m:].sum()pair_weights = np.dot(sample_weight[:m, np.newaxis], sample_weight[np.newaxis, m:])total_pair_weights = pair_weights.sum()aucs = (sample_weight[:m]*(tz[:, :m] - tx)).sum(axis=1) / total_pair_weightsv01 = (tz[:, :m] - tx[:, :]) / total_negative_weightsv10 = 1. - (tz[:, m:] - ty[:, :]) / total_positive_weightssx = np.cov(v01)sy = np.cov(v10)delongcov = sx / m + sy / nreturn aucs, delongcovdef fastDeLong_no_weights(predictions_sorted_transposed, label_1_count):"""The fast version of DeLong's method for computing the covariance ofunadjusted AUC.Args:predictions_sorted_transposed: a 2D numpy.array[n_classifiers, n_examples]sorted such as the examples with label "1" are firstReturns:(AUC value, DeLong covariance)Reference:@article{sun2014fast,title={Fast Implementation of DeLong's Algorithm forComparing the Areas Under Correlated Receiver OeratingCharacteristic Curves},author={Xu Sun and Weichao Xu},journal={IEEE Signal Processing Letters},volume={21},number={11},pages={1389--1393},year={2014},publisher={IEEE}}"""# Short variables are named as they are in the paperm = label_1_countn = predictions_sorted_transposed.shape[1] - mpositive_examples = predictions_sorted_transposed[:, :m]negative_examples = predictions_sorted_transposed[:, m:]k = predictions_sorted_transposed.shape[0]tx = np.empty([k, m], dtype=np.float)ty = np.empty([k, n], dtype=np.float)tz = np.empty([k, m + n], dtype=np.float)for r in range(k):tx[r, :] = compute_midrank(positive_examples[r, :])ty[r, :] = compute_midrank(negative_examples[r, :])tz[r, :] = compute_midrank(predictions_sorted_transposed[r, :])aucs = tz[:, :m].sum(axis=1) / m / n - float(m + 1.0) / 2.0 / nv01 = (tz[:, :m] - tx[:, :]) / nv10 = 1.0 - (tz[:, m:] - ty[:, :]) / msx = np.cov(v01)sy = np.cov(v10)delongcov = sx / m + sy / nreturn aucs, delongcovdef calc_pvalue(aucs, sigma):"""Computes log(10) of p-values.Args:aucs: 1D array of AUCssigma: AUC DeLong covariancesReturns:log10(pvalue)"""l = np.array([[1, -1]])z = np.abs(np.diff(aucs)) / (np.sqrt(np.dot(np.dot(l, sigma), l.T)) + 1e-8)pvalue = 2 * (1 - scipy.stats.norm.cdf(np.abs(z)))#  print(10**(np.log10(2) + scipy.stats.norm.logsf(z, loc=0, scale=1) / np.log(10)))return pvaluedef compute_ground_truth_statistics(ground_truth, sample_weight=None):assert np.array_equal(np.unique(ground_truth), [0, 1])order = (-ground_truth).argsort()label_1_count = int(ground_truth.sum())if sample_weight is None:ordered_sample_weight = Noneelse:ordered_sample_weight = sample_weight[order]return order, label_1_count, ordered_sample_weightdef delong_roc_variance(ground_truth, predictions):"""Computes ROC AUC variance for a single set of predictionsArgs:ground_truth: np.array of 0 and 1predictions: np.array of floats of the probability of being class 1"""sample_weight = Noneorder, label_1_count, ordered_sample_weight = compute_ground_truth_statistics(ground_truth, sample_weight)predictions_sorted_transposed = predictions[np.newaxis, order]aucs, delongcov = fastDeLong(predictions_sorted_transposed, label_1_count)assert len(aucs) == 1, "There is a bug in the code, please forward this to the developers"return aucs[0], delongcovdef delong_roc_test(ground_truth, predictions_one, predictions_two):"""Computes log(p-value) for hypothesis that two ROC AUCs are differentArgs:ground_truth: np.array of 0 and 1predictions_one: predictions of the first model,np.array of floats of the probability of being class 1predictions_two: predictions of the second model,np.array of floats of the probability of being class 1"""sample_weight = Noneorder, label_1_count,ordered_sample_weight = compute_ground_truth_statistics(ground_truth)predictions_sorted_transposed = np.vstack((predictions_one, predictions_two))[:, order]aucs, delongcov = fastDeLong(predictions_sorted_transposed, label_1_count,sample_weight)return calc_pvalue(aucs, delongcov)def delong_roc_ci(y_true,y_pred):aucs, auc_cov = delong_roc_variance(y_true, y_pred)auc_std = np.sqrt(auc_cov)lower_upper_q = np.abs(np.array([0, 1]) - (1 - alpha) / 2)ci = stats.norm.ppf(lower_upper_q,loc=aucs,scale=auc_std)ci[ci > 1] = 1return aucs,ci#examples 具体用法y_true= np.load('自己的数据路径')
y_pred_1 = np.load('自己的数据路径')
y_pred _2 = np.load('自己的数据路径')alpha = .95def delong_roc_ci(y_true,y_pred):aucs, auc_cov = delong_roc_variance(y_true, y_pred)auc_std = np.sqrt(auc_cov)lower_upper_q = np.abs(np.array([0, 1]) - (1 - alpha) / 2)ci = stats.norm.ppf(lower_upper_q,loc=aucs,scale=auc_std)ci[ci > 1] = 1return aucs,ci#pvalue
pvalue = delong_roc_test(y_true,y_pred_1,y_pred_2)
#  aucs, auc_cov = delong_roc_variance(y_true, y_pred)
auc_1, auc_cov_1 = delong_roc_variance(y_true, y_pred_1)
auc_2, auc_cov_2 = delong_roc_variance(y_true, y_pred_2)auc_std = np.sqrt(auc_cov_1)
lower_upper_q = np.abs(np.array([0, 1]) - (1 - alpha) / 2)
#
ci = stats.norm.ppf(lower_upper_q,loc=auc_1,scale=auc_std)
ci[ci > 1] = 1print('95% AUC CI:', ci)
print('AUC:', auc_1)print('p_value:', pvalue)

Delong test相关推荐

R语言deLong‘s test：通过统计学的角度来比较两个ROC曲线、检验两个ROC曲线的差异是否具有统计显著性
R语言deLong's test:通过统计学的角度来比较两个ROC曲线.检验两个ROC曲线的差异是否具有统计显著性目录
python语言deLong‘s test：通过统计学的角度来比较两个ROC曲线、检验两个ROC曲线的差异是否具有统计显著性
python语言deLong's test:通过统计学的角度来比较两个ROC曲线.检验两个ROC曲线的差异是否具有统计显著性目录
Delong test比较两个ROC曲线的性能
我们知道ROC曲线的性能可以通过曲线下面积即AUC来得到,那么如何通过统计学的角度来比较两个ROC曲线呢,就是这里说的就是Delong test,可以得到两个曲线的P值,p<0.05可以看作两个 ...
DeLong测试计算两个ROC曲线之间的统计意义
本人原帖首发于丁香园. 投了一篇文章,审稿意见说我给出的3个不同模型的roc曲线之间不能只比较AUC,需要看不同ROC的统计意义问了一下别人查了半天才知道DeLong测试是什么. 看了 Geneti ...
dd命令、cp命令详解+dd命令、cp命令对比---delong
1.dd命令详解 1)中文man手册dd的解释 NAME dd - 转换和拷贝文件摘要使用方法: dd [--help] [--version] [if=file] [ ...
Python实现显著性检验delong
Python 实现显著性检验 delong import numpy as np from matplotlib import pyplot as plt import scipy.stats as ...
基于平面几何精确且鲁棒的尺度恢复单目视觉里程计
标题:Accurate and Robust Scale Recovery for Monocular Visual Odometry Based on Plane Geometry 作者:Rui T ...
基于地面几何约束的单目视觉里程计精确鲁棒尺度恢复（ICRA 2021）
点击上方"3D视觉工坊",选择"星标" 干货第一时间送达作者丨paopaoslam 来源丨泡泡机器人SLAM 标题:Accurate and Robust S ...
51篇最新CV领域综述论文速递！涵盖14个方向：目标检测/图像分割/医学影像/人脸识别等方向...
点击上方"3D视觉工坊",选择"星标" 干货第一时间送达本文共汇总了从2020年4月至今的计算机视觉领域综述性论文,共54篇,涵盖图像分割. 图像识别.人脸识 ...

Delong test

Delong test相关推荐

最新文章

热门文章