基本概念

ROC全称是“受试者工作特征”（Receiver Operating Characteristic）曲线。ROC曲线的纵轴是“真正率”（True Positive Rate， TPR），横轴是“假正例率”（False Positive Rate， FPR）。

TPR=TP/(TP+FN)
FPR= FP/(TN+FP)

进行学习器的比较时，若一个学习器的ROC曲线被另一个学习器的曲线完全“包住”，则可以断言后者的性能优于前者；若两个学习器的ROC曲线发生交叉，则难以一般性地断言两者孰优孰劣。此时较为合理的判据是比较ROC曲面下的面积，即AUC（Area Under ROC Curve）。

示例演示

我们看看sklearn的官方例子：Receiver Operating Characteristic (ROC) with cross validation。

"""
=============================================================
Receiver Operating Characteristic (ROC) with cross validation
=============================================================Example of Receiver Operating Characteristic (ROC) metric to evaluate
classifier output quality using cross-validation.ROC curves typically feature true positive rate on the Y axis, and false
positive rate on the X axis. This means that the top left corner of the plot is
the "ideal" point - a false positive rate of zero, and a true positive rate of
one. This is not very realistic, but it does mean that a larger area under the
curve (AUC) is usually better.The "steepness" of ROC curves is also important, since it is ideal to maximize
the true positive rate while minimizing the false positive rate.This example shows the ROC response of different datasets, created from K-fold
cross-validation. Taking all of these curves, it is possible to calculate the
mean area under curve, and see the variance of the curve when the
training set is split into different subsets. This roughly shows how the
classifier output is affected by changes in the training data, and how
different the splits generated by K-fold cross-validation are from one another... note::See also :func:`sklearn.metrics.roc_auc_score`,:func:`sklearn.model_selection.cross_val_score`,:ref:`sphx_glr_auto_examples_model_selection_plot_roc.py`,"""
print(__doc__)import numpy as np
from scipy import interp
import matplotlib.pyplot as pltfrom sklearn import svm, datasets
from sklearn.metrics import auc
from sklearn.metrics import plot_roc_curve
from sklearn.model_selection import StratifiedKFold# #############################################################################
# Data IO and generation# Import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target
X, y = X[y != 2], y[y != 2]
n_samples, n_features = X.shape# Add noisy features
random_state = np.random.RandomState(0)
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]# #############################################################################
# Classification and ROC analysis# Run classifier with cross-validation and plot ROC curves
cv = StratifiedKFold(n_splits=6)
classifier = svm.SVC(kernel='linear', probability=True,random_state=random_state)tprs = []
aucs = []
mean_fpr = np.linspace(0, 1, 100)fig, ax = plt.subplots()
for i, (train, test) in enumerate(cv.split(X, y)):classifier.fit(X[train], y[train])viz = plot_roc_curve(classifier, X[test], y[test],name='ROC fold {}'.format(i),alpha=0.3, lw=1, ax=ax)interp_tpr = interp(mean_fpr, viz.fpr, viz.tpr)interp_tpr[0] = 0.0tprs.append(interp_tpr)aucs.append(viz.roc_auc)ax.plot([0, 1], [0, 1], linestyle='--', lw=2, color='r',label='Chance', alpha=.8)mean_tpr = np.mean(tprs, axis=0)
mean_tpr[-1] = 1.0
mean_auc = auc(mean_fpr, mean_tpr)
std_auc = np.std(aucs)
ax.plot(mean_fpr, mean_tpr, color='b',label=r'Mean ROC (AUC = %0.2f $\pm$ %0.2f)' % (mean_auc, std_auc),lw=2, alpha=.8)std_tpr = np.std(tprs, axis=0)
tprs_upper = np.minimum(mean_tpr + std_tpr, 1)
tprs_lower = np.maximum(mean_tpr - std_tpr, 0)
ax.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,label=r'$\pm$ 1 std. dev.')ax.set(xlim=[-0.05, 1.05], ylim=[-0.05, 1.05],title="Receiver operating characteristic example")
ax.legend(loc="lower right")
plt.show()

运行结果

4.模型评估之ROC和AUC相关推荐

机器学习模型评估指标ROC、AUC详解
我是小z ROC/AUC作为机器学习的评估指标非常重要,也是面试中经常出现的问题(80%都会问到).其实,理解它并不是非常难,但是好多朋友都遇到了一个相同的问题,那就是:每次看书的时候都很明白,但回过 ...
机器学习模型评估指标 - ROC曲线和AUC值
机器学习算法-随机森林初探(1) 随机森林拖了这么久,终于到实战了.先分享很多套用于机器学习的多种癌症表达数据集 https://file.biolab.si/biolab/supp/bi-cance ...
【PCA、LDA降维，及模型评估（SE,SP,AUC）】
1. 采用 PCA 对男女生样本数据中的(身高.体重.鞋码.50m 成绩.肺活量) 共 5 个特征进行特征降维,并实现 LDA 算法对处理后的特征进行分类,计算模型预测性能(包含 SE.SP.ACC ...
模型评价指标之ROC、AUC和GAUC
模型优劣的评价有完整的评价指标,通过这些指标我们可以评判出该模型的性能,同时可以比对不同模型进行择优,下面我们从混淆模型开始,学习通过ROC曲线.AUC以及GAUC来对普通分类模型以及推荐模型的评判. ...
ROC和AUC也不是评估机器学习性能的金标准
承接:样本分布不平衡,机器学习准确率高又有什么用? 对于不平衡数据集,AUC值是分类器效果评估的常用标准.但如果在解释时不仔细,它也会有一些误导.以Davis and Goadrich (2006)中 ...
【采用】信贷业务风控逾期指标及风控模型评估指标
一.互联网金融中需要关注的风控逾期指标 1.逾期天数 DPD (Days Past Due) 自应还日次日起到实还日期间的日期数举例:DPDN+表示逾期天数 >=N天,如DPD30+表逾期天数 ...
一套完整的基于随机森林的机器学习流程（特征选择、交叉验证、模型评估））...
机器学习实操(以随机森林为例) 为了展示随机森林的操作,我们用一套早期的前列腺癌和癌旁基因表达芯片数据集,包含102个样品(50个正常,52个肿瘤),2个分组和9021个变量 (基因).(https: ...
解读:信贷业务风控逾期指标及风控模型评估指标
<解读>信贷业务风控逾期指标及风控模型评估指标一.互联网金融中需要关注的风控逾期指标 1.逾期天数 DPD (Days Past Due) 自应还日次日起到实还日期间的日期数举例:DP ...
【模型评估】AP 和他的兄弟们：mAP、AP50、APs、APm、APl
AP是在目标检测任务中,尝尝被用于评估模型预测能力的指标.那AP是什么?为什么能够充当不同模型综合对比评测的公认指标呢? 在学习下文之前,混淆矩阵和ROC可以先了解下: [模型评估]混淆矩阵(conf ...

4.模型评估之ROC和AUC

基本概念

示例演示

运行结果

4.模型评估之ROC和AUC相关推荐

最新文章

热门文章