零信任模型

In the world of deep learning, there are certain safety-critical applications where a cold prediction is just not very helpful. If a patient without symptoms is diagnosed with a serious and time-sensitive illness, can the doctor trust the model to administer immediate treatment? We are currently in a phase between solely human doctors being useful and complete AI hegemony in terms of diagnosing illness: deep learning models perform better than human experts independently, but a cooperation between human experts and AI is the optimal strategy.

在深度学习的世界中,在某些对安全性要求很高的应用中,冷预测并不是很有用。 如果没有症状的患者被诊断出患有严重且对时间敏感的疾病,医生可以信任该模型来立即进行治疗吗? 当前,在诊断方面,人类医生正处于有用和完全AI霸权之间的阶段:深度学习模型的性能要比人类专家独立更好,但是人类专家和AI之间的合作是最佳策略。

Human experts must gauge the certainty behind the deep learning model’s predictions if they are to provide an additional layer of judgement to the diagnosis. And to gauge the trust we can put into the model, we must be able to measure the types of uncertainty of the predictions.

人类专家必须为深度学习模型的预测提供依据,才能为诊断提供额外的判断层。 为了衡量我们对模型的信任程度,我们必须能够衡量预测不确定性的类型。

建模不确定性 (Modelling Uncertainty)

A deep learning model trained on an infinite amount of perfect data for an infinite amount of time necessarily reaches 100% certainty. In the real world, however, we don’t have perfect data or an infinite amount of it, and this is what causes the uncertainty of deep learning models.

在无限长的时间内对无限量的完美数据进行训练的深度学习模型必须达到100%的确定性。 但是,在现实世界中,我们没有完美的数据,也没有无限的数据,这就是造成深度学习模型不确定性的原因。

We call it aleatoric uncertainty when we have less-than-perfect data. If we had an infinite amount of it, the model will still not perform perfectly. It is the uncertainty stemming from noisy data.

当我们获得的数据不完美时,我们称其为不确定性 。 如果我们拥有无限量,则该模型仍将无法完美运行。 这是来自嘈杂数据的不确定性。

And when we have high quality data, but we still are not performing perfectly, we are dealing with epistemic uncertainty, the uncertainty due to imperfect parameter values.

而且,当我们拥有高质量的数据,但仍无法完美运行时,我们将处理认知不确定性(由于参数值不完善而导致的不确定性)。

A measure of aleatoric uncertainty becomes more important in large-data tasks, since more data explains away epistemic uncertainty. In small datasets, however, epistemic uncertainty proves to be a greater issue, especially in biomedical settings, where we work with a small amount of well-prepared and high-quality data.

在大数据任务中,测量不确定性变得更加重要,因为更多的数据可以解释认知不确定性。 然而,在小型数据集中,认知不确定性被证明是一个更大的问题,尤其是在生物医学环境中,在该环境中,我们将处理少量准备充分且高质量的数据。

Aleatoric uncertainty can be measured by directly adding a term to the loss function, such that the model predicts the input’s prediction and the prediction’s uncertainty. Epistemic uncertainty is slightly more tricky, since this uncertainty comes from the model itself. If we were to measure epistemic uncertainty as we would aleatoric, the model would have to do the impossible task of predicting the imperfection of its own parameters.

可以通过将一项直接添加到损失函数中来测量运动不确定性,以使模型预测输入的预测和预测的不确定性。 认知不确定性稍微复杂一些,因为这种不确定性来自模型本身。 如果我们要像测量性那样测量认知不确定性,则该模型将不得不完成预测其自身参数不完善的不可能的任务。

For the remainder of this article, I will focus more on epistemic uncertainty rather than aleatoric uncertainty. Both aleatoric and epistemic uncertainty can be measured in a single model, but I find epistemic uncertainty is far more significant in most biomedical and other safety-critical applications.

在本文的其余部分中,我将更多地关注认知不确定性而不是偶然不确定性。 可以在单个模型中测量无意识不确定性和认知不确定性,但是我发现认知不确定性在大多数生物医学和其他对安全至关重要的应用中更为重要。

测量认知不确定性 (Measuring Epistemic Uncertainty)

Let’s make a toy example and create some training data. Suppose we have 10 points of data. The x-value of each point is evenly spaced, and the y value is determined by adding some random noise to x.

让我们做一个玩具示例并创建一些训练数据。 假设我们有10点数据。 每个点的x值均匀分布,并且y值是通过向x添加一些随机噪声来确定的。

This is our training data; let’s fit three models (10-degree polynomials) to it:

这是我们的训练数据; 让我们拟合三个模型(10次多项式):

Three models, all fitting perfectly on the training data, and all evidently different.
三种模型都完全适合训练数据,并且明显不同。

In this toy example, each model fits perfectly on the data. This means that for any input that is identical to one of the points of the training data, the model perfectly predicts its y-value. However, the if we take any x-value other than that of the training set, the predictions will be wildly off, depending on which model we use.

在这个玩具示例中,每个模型都完全适合数据。 这意味着对于任何与训练数据的一个点相同的输入,模型可以完美地预测其y值。 但是,如果我们采用除训练集以外的任何x值,则根据我们使用的模型,预测将大相径庭。

This is the intuition behind measuring epistemic uncertainty: we trained three different models on the training data, and we got three different models. If we give each model an input, .25 for example, then the first model will give us 20 for its prediction, about -20 for the second model, and around -60 for the third. There is a high standard deviation between these predictions, meaning each model does not accurately represent that data point.

这就是测量认知不确定性的直觉:我们在训练数据上训练了三种不同的模型,并且得到了三种不同的模型。 如果我们给每个模型一个输入,例如.25,那么第一个模型将为我们提供20的预测,第二个模型约-20,第三个模型约-60。 这些预测之间存在很高的标准偏差,这意味着每个模型都无法准确表示该数据点。

The epistemic uncertainty can thus be defined as the standard deviation of these three predictions, since wildly different predictions on otherwise similar models suggests that each model is guessing for that data point.

因此,可以将认知不确定性定义为这三个预测的标准偏差,因为在其他方面相似的模型上的预测差异很大,这表明每个模型都在猜测该数据点。

We can execute this procedure in practice by training several (usually 10) different models on the same training data, and during inference, take the standard deviation of each model’s prediction to estimate the epistemic uncertainty.

在实践中,我们可以通过在同一训练数据上训练几个(通常为10个)不同模型来执行此过程,并在推断过程中采用每个模型预测的标准差来估计认知不确定性。

However, training 10 different models is computationally expensive and sometimes infeasible for deep learning models training on giant datasets. Fortunately, there is a simple alternative that uses a single model to estimate epistemic uncertainty, that is, by using dropout on inference time.

但是,训练10个不同的模型在计算上很昂贵,有时对于在巨型数据集上进行深度学习模型训练有时是不可行的。 幸运的是,有一个简单的替代方法,即使用单个模型来估计认知不确定性,即通过使用推断时间的下降。

Dropout regularization, for those who are unfamiliar, is a technique that literally drops out random neurons of the network when training each batch during training. It is usually turned off at inference time, but, if we turn it on, and predict the test example 10 times, we can effectively simulate the approach of using 10 different models for epistemic uncertainty, since random dropout essentially results in a different model.

对于不熟悉的人,辍学正则化是一种在训练过程中训练每批时从字面上去除网络随机神经元的技术。 通常在推理时将其关闭,但是,如果我们将其打开并预测10次测试示例,则可以有效地模拟使用10个不同模型进行认知不确定性的方法,因为随机丢弃实际上会导致一个不同的模型。

This approach is called Monte Carlo dropout, and it is currently the standard in estimating epistemic uncertainty. There are some issues with the approach, namely that it requires nearly ten times more computation during inference time relative to a standard prediction without an uncertainty measurement. Monte Carlo dropout is therefore impractical for many real-time applications, leading to the alternative usage of often less effective but quicker methods to measure uncertainty.

这种方法称为“ 蒙特卡洛辍学” ,目前是估计认知不确定性的标准。 该方法存在一些问题,即与没有不确定性度量的标准预测相比,在推理时间内它需要的计算量增加了近十倍。 因此,对于许多实时应用而言,蒙特卡洛辍学是不切实际的,从而导致通常使用效率较低但较快的方法来测量不确定性的替代方法 。

警告 (Caveat)

We have measured the epistemic uncertainty, but in reality, we must remain uncertain about that very uncertainty measure. Indeed, we can measure the uncertainty of our measurement by graphing the standard deviation against absolute error. If the aleatoric uncertainty remains negligible, then there should be a linear relation between our epistemic uncertainty measurement and the absolute error between the predictions and the labels of a test set on a regression task.

我们已经测量了认知的不确定性,但是实际上,我们必须对该不确定性测量保持不确定性。 实际上,我们可以通过将标准偏差与绝对误差作图来测量测量的不确定性。 如果误差不确定性仍然可以忽略不计,那么我们的认知不确定性度量与预测和回归任务的测试集标签之间的绝对误差之间应该存在线性关系。

Estimate of epistemic uncertainty measurement’s accuracy on a regression task (SAMPL dataset)
估计回归任务上的认知不确定性测量的准确性(SAMPL数据集)

A perfect uncertainty measurement should be a straight, positive-sloping line, so although the uncertainty measurement evidently correlates with absolute error (which is what we want), the measurement is imperfect.

理想的不确定度测量应该是一条直线,正斜线,因此尽管不确定度测量显然与绝对误差(这是我们想要的)相关,但该测量并不完美。

This imperfect measurement is better than nothing, but what we really just did is add an uncertain proxy to represent uncertainty. Again, the proxy reduces uncertainty given the positive correlation, but the uncertainty remains.

这种不完美的测量总比没有好,但是我们真正所做的是添加一个不确定的代理来表示不确定性。 同样,在给定正相关的情况下,代理可以减少不确定性,但是不确定性仍然存在。

含义 (Implications)

What does this mean for doctors that might use uncertainty measurements? As of now, take the predictions of a deep learning model, whether it be the actual predictions of the label or its uncertainty measurements, with a grain of salt. The model cannot contextualize circumstances as a human expert might, so if a doctor finds that a model predicts a case which has evidently low aleatoric uncertainty (i.e. the data is high-quality and unequivocal) with high uncertainty, he or she is best to doubt the prediction on grounds of epistemic uncertainty. High epistemic uncertainty means the model was not trained optimally for the context of that particular case, and the model is just not generalizing well.

对于可能使用不确定性测量的医生意味着什么? 到目前为止,采用一粒盐来进行深度学习模型的预测,无论是标签的实际预测还是不确定性测量。 该模型无法像人类专家那样对环境进行上下文描述,因此,如果医生发现模型预测的案例具有明显的低不确定性(即,数据质量高且明确)且不确定性很高,那么最好怀疑他或她基于认知不确定性的预测。 较高的认知不确定性意味着该模型并未针对该特定案例进行最佳训练,并且该模型推广得还不够好。

Indeed, one study directly supports this logic by illustrating that human experts are worse than deep learning algorithms in cases where the algorithm predicts with low uncertainty, but human experts outperform the algorithm on high uncertainty cases.

实际上, 一项研究通过说明人类专家在算法具有较低不确定性的情况下比深度学习算法更糟糕,而人类专家在高不确定性情况下的性能优于该算法,从而直接支持了这种逻辑。

Uncertainty measurements are only metrics for the purpose of human understanding; they do not aid model performance at all. However, the performance of human-AI systems, on the other hand, directly benefit from uncertainty measurements.

不确定性度量只是出于人类理解目的的度量; 它们根本无法提高模型性能。 然而,另一方面,人类AI系统的性能直接受益于不确定性测量。

Thus, measuring model uncertainty is not just a case of keeping humans passively informed; for it directly improves the performance of the broader system.

因此,测量模型的不确定性不仅仅是让人们被动地了解情况的一种情况。 因为它直接改善了整个系统的性能。

引文 (Citations)

  1. Measuring Uncertainty

    测量不确定度

  2. Real-time Epistemic Uncertainty

    实时认知不确定性

  3. Human-augmented deep learning

    人为增强的深度学习

And a thank you to DeepChem tutorials for originally introducing many of these concepts.

非常感谢DeepChem教程最初介绍了其中许多概念。

翻译自: https://medium.com/swlh/on-trusting-the-model-e67c94b5d205

零信任模型


http://www.taodudu.cc/news/show-863663.html

相关文章:

  • 乐器演奏_深度强化学习代理演奏的蛇
  • 深度学习模型建立过程_所有深度学习都是统计模型的建立
  • 使用TensorFlow进行鬼写
  • 使用OpenCV和Python从图像中提取形状
  • NLP的特征工程
  • 无监督学习 k-means_无监督学习-第1部分
  • keras时间序列数据预测_使用Keras的时间序列数据中的异常检测
  • 端口停止使用_我停止使用
  • opencv 分割边界_电影观众:场景边界分割
  • 监督学习无监督学习_无监督学习简介
  • kusto使用_Python查找具有数据重复问题的Kusto表
  • 使用GridSearchCV和RandomizedSearchCV进行超参数调整
  • rust面向对象_面向初学者的Rust操作员综合教程
  • 深度学习术语_您应该意识到这些(通用)深度学习术语和术语
  • 问题解决方案_问题
  • airflow使用_使用AirFlow,SAS Viya和Docker像Pro一样自动化ML模型
  • 迁移学习 nlp_NLP的发展-第3部分-使用ULMFit进行迁移学习
  • 情感分析朴素贝叶斯_朴素贝叶斯推文的情感分析
  • 梯度下降优化方法'原理_优化梯度下降的新方法
  • DengAI —数据预处理
  • k 最近邻_k最近邻与维数的诅咒
  • 使用Pytorch进行密集视频字幕
  • 5g与edge ai_使用OpenVINO部署AI Edge应用
  • 法庭上认可零和博弈的理论吗_从零开始的本征理论
  • 极限学习机和支持向量机_极限学习机I
  • 如何在不亏本的情况下构建道德数据科学系统?
  • ann人工神经网络_深度学习-人工神经网络(ANN)
  • 唐宇迪机器学习课程数据集_最受欢迎的数据科学和机器学习课程-2020年8月
  • r中如何求变量的对数转换_对数转换以求阳性。
  • 美团脱颖而出的经验_使数据科学项目脱颖而出的6种方法

零信任模型_关于信任模型相关推荐

  1. 人口预测和阻尼-增长模型_使用分类模型预测利率-第2部分

    人口预测和阻尼-增长模型 We are back! This post is a continuation of the series "Predicting Interest Rate w ...

  2. logit回归模型_常见机器学习模型的假设

    > Photo by Thought Catalog on Unsplash 暂时忘记深度学习和神经网络. 随着越来越多的人开始进入数据科学领域,我认为重要的是不要忘记这一切的基础. 统计. 如 ...

  3. 决策树模型 朴素贝叶斯模型_有关决策树模型的概述

    决策树模型 朴素贝叶斯模型 Decision Trees are one of the highly interpretable models and can perform both classif ...

  4. 机器学习 训练较快的模型_通过心理模型更快地学习软件,第1部分

    机器学习 训练较快的模型 什么是心理模型? (What Are Mental Models?) The easiest way to describe them is that they're pat ...

  5. reactor多线程模型_网络编程模型的演进之路

    在没有IO多路复用的模型的情况下,为了支持高并发采取以下网络模型 一:阻塞IO+多线程 client连接服务器,服务器有一个线程阻塞的调用accept,accept接收到连接后,创建一个线程来读写读写 ...

  6. 斯特林发动机图纸尺寸_南昌教学模型订做,航空发动机模型_境海模型

    首页 > 新闻中心 发布时间:2020-11-08 13:57:07 导读:境海模型为您提供南昌教学模型订做,航空发动机模型的相关知识与详情: 曾经的沙盘模型一般只有模型自身,没有现代的视觉作用 ...

  7. 人口预测和阻尼-增长模型_使用分类模型预测利率-第1部分

    人口预测和阻尼-增长模型 A couple of years ago, I started working for a quant company called M2X Investments, an ...

  8. wow修改人物模型_玻璃钢气球狗模型景观雕-东莞气球树脂雕塑

    文章来源:润艺雕塑发布时间:2020-12-20 18:38:00 玻璃钢气球狗模型景观雕-东莞气球树脂雕塑 归隐山林,这就是其中的原因之一.人物雕塑在动静结合之下更加美丽,对于艺术形象和人物形象的塑 ...

  9. 人口预测和阻尼-增长模型_使用分类模型预测利率-第3部分

    人口预测和阻尼-增长模型 This is the final article of the series " Predicting Interest Rate with Classifica ...

最新文章

  1. 大数加法【HDU 1002】
  2. 【Python基础】字符串专题总结
  3. 使用git推送代码到开源中国以及IDEA环境下使用git
  4. 最简单的docker教程:在docker里运行nginx服务器
  5. iOS手势操作简介(四)
  6. Python 常用函数 configparser模块
  7. jdk源码分析书籍 pdf_什么?Spring5 AOP 默认使用Cglib?从现象到源码深度分析
  8. 英语口语-文章朗读Week9 Wednesday
  9. 买书(信息学奥数一本通-T1293)
  10. WannaCry感染文件恢复方法,企业再也不用愁了!
  11. mongo go 查询指定字段_使用PyMongo查询MongoDB数据库!
  12. 计算机页面的工具,魔兽窗口化工具
  13. android wifi驱动详解,Android wifi驱动的移植 realtek 8188
  14. 替换class文件,重启Tomcat不生效
  15. android魅族 小红点,魅族公布手机APP公约 小红点不能超过2个
  16. Scratch编程(八)扩展模块:文字朗读模块
  17. 安徽科技学院 信网学院网络文化节 陈鑫鑫
  18. 蓝牙音箱延迟测试软件,“Latency Test”详细操作流程,一款测试TWS耳机延迟的软件...
  19. “低碳生活,绿建未来”主题活动——微信运动步数打卡比赛统计分析
  20. 2019全国大众点评数据更新

热门文章

  1. jquery data()
  2. 现在就启用 HTTPS,免费的!
  3. 基于DirectShow的局域网内音视频流的多机共享
  4. Wdows server 2003 ipv6下IP和 IIS的 WEB/ FTP设置
  5. ICE专题:实现简单的聊天室(一)
  6. GARFIELD@11-10-2004
  7. linux系统安装锐捷客户端下载,Linux在宿舍里如何上网?--Fedora下锐捷802.1x客户端软件的安装和使用方法...
  8. 基于角色的权限管理数据库设计(RBAC)
  9. autojsui界面关闭_autojs 第九次 ui界面交互获取
  10. html渐变效果做网页,CSS实现文本渐变效果