贝叶斯深度神经网络

Recently I came across an interesting Paper named, “Deep Ensembles: A Loss Landscape Perspective” by a Laxshminarayan et al.In this article, I will break down the paper, summarise it’s findings and delve into some of the techniques and strategies they used that will be useful for delving into understanding models and their learning process. It will also go over some possible extensions to the paper. You can also find my annotations on the paper down below.

最近，我碰到了Laxshminarayan等人写的一篇有趣的论文，名为“ 深度合奏：一种迷失的景观视角” 。在本文中，我将对该论文进行分解，总结其发现，并深入研究他们使用的一些技术和策略。有助于深入了解模型及其学习过程。它还将介绍本文的一些可能扩展。您也可以在下面的论文中找到我的注释。

理论 (The Theory)

The authors conjectured (correctly) that Deep Ensembles (an ensemble of Deep learning models) outperform Bayesian Neural Networks because “popular scalable variational Bayesian methods tend to focus on a single mode, whereas deep ensembles tend to explore diverse modes in function space.”

作者推测(正确)，深度集成(深度学习模型的集合)优于贝叶斯神经网络，因为“流行的可扩展变分贝叶斯方法倾向于专注于单一模式，而深度集成则倾向于探索功能空间中的多种模式。”

In simple words, when running a Bayesian Network at a single initialization it will reach one of the peaks and stop. Deep ensembles will explore different modes, therefore reducing error when put in practice. In picture form:

简而言之，在一次初始化中运行贝叶斯网络时，它将到达其中一个峰值并停止。深度合奏将探索不同的模式，因此在实际操作中会减少错误。图片形式：

Depending on it’s hyperparameters, a single run of a bayesian network will find one of the paths (colors)and it’s mode. Therefore it won’t explore the set of parameters. On the other hand, a deep ensemble will explore all the paths, and therefore get a better understanding of the weight space (and solutions). To understand why this translates to better understanding consider the following illustration.

取决于它的超参数，一次运行贝叶斯网络将找到路径(颜色)及其模式之一。因此，它不会探索参数集。另一方面，一个深层次的合奏将探索所有路径，从而更好地理解权重空间(和解决方案)。要了解为什么这可以更好地理解，请考虑以下插图。

3 Possible Solutions. The area colored red is what it gets wrong

In the diagram, we have 3 possible solution spaces, corresponding to each of the trajectories. The optimized mode for each gives a performance gives us a score of 90% (for example). Each mode is unable to solve a certain kind of problem (highlighted in red). A Bayesian Network will get to either A, B, or C in a run while a Deep Ensemble will be able to train over all 3.

在该图中，我们有3个可能的解空间，分别对应于每个轨迹。每种模式的优化模式都能为我们提供90％的得分(例如)。每种模式都无法解决某种问题(以红色突出显示)。一个贝叶斯网络将同时到达A，B或C，而Deep Ensemble则将能够训练全部3个。

技术 (The Techniques)

They proved their hypothesis using various strategies. This allowed them to approach the problem from various perspectives. I will show the details for each of them.

他们使用各种策略证明了自己的假设。这使他们能够从各种角度解决问题。我将显示它们的详细信息。

余弦相似度： (Cosine Similarity:)

Cosine similarity is defined, as the “measure of similarity between two non-zero vectors of an inner product space. It is defined to equal the cosine of the angle between them, which is also the same as the inner product of the same vectors normalized to both have length 1.” It is derived from the dot product between vectors. Imagine 3 texts: A, B, and C. A and C and large documents on a similar topic and B is a very short summary of A. A and C might end up having a low Euclidean Distance because they have a lot of overlapping words or phrases, while A and B will have a larger distance because of the difference in size. The cosine similarities would paint a different picture, however, since A and B would have a low angle (thus high similarity) between them.

余弦相似度定义为“内积空间的两个非零向量之间的相似度的度量。它被定义为等于它们之间夹角的余弦，这也与归一化为长度为1的相同向量的内积相同。想象一下3个文本：A，B和C。A和C以及类似主题的大型文档，而B是A的简短摘要。A和C最终的欧氏距离可能很低，因为它们有很多重叠的单词或词组，而A和B的距离会因为大小的不同而变大。余弦相似度会描绘出不同的画面，因为A和B之间的夹角较小(因此相似度较高)。

This diagram is used to show that “checkpoints along a trajectory are largely similar both in the weight space and the function space.” Checkpoint 30 and Checkpoint 25 have high similarity (red) while 30 and 5 have relatively low similarity (grey). An interesting thing to note is that the lowest labeled point is a similarity of 0.68. This goes to show how quickly models

该图用于表明“沿着轨迹的检查点在权重空间和功能空间上都非常相似。” 检查点30和检查点25具有很高的相似性(红色)，而检查点30和5具有相对较低的相似性(灰色)。值得注意的是，最低的标记点相似度为0.68。这表明模型有多快

功能空间上的分歧： (Disagreement in Function Space:)

Defined as “the fraction of points the checkpoints disagree on”, or,

定义为“检查点不同意的分数分数”，或者，

network for input x:

输入x的网络：

here f(x; theta) denotes the class label predicted by the model for input x

Disagreement is like the complement to the similarity scores and serves to showcase the difference in checkpoints along the same trajectory in a more direct manner. Both similarity and disagreement are calculated over a single run with a single model. The highest slab for disagreement starts at 0.45. This shows that there is relatively low disagreement, consistent with the findings of the similarity map.

分歧就像对相似性得分的补充，有助于更直接地展示沿着相同轨迹的检查站之间的差异。相似性和分歧均在单个模型的单次运行中进行计算。意见分歧的最高标准从0.45开始。这表明存在相对较低的分歧，与相似性图的发现一致。

The use of disagreement and similarity shows that points along the same trajectory have very similar predictions. The third way is used to prove that trajectories can take very different paths, and thus end up unable to solve certain kinds of problems.

使用分歧和相似性表明，沿着同一轨迹的点具有非常相似的预测。第三种方法用来证明轨迹可以采取截然不同的路径，从而最终无法解决某些类型的问题。

使用tsne绘制不同的随机初始化 (Plotting Different Random Initializations using tsne)

As stated before, this diagram is used to show how different random initializations differ in function space. It is used to effectively plot the higher dimensional data into lower (human understandable) dimensions. It is an alternative to PCA. Unlike PCA, tSNE is non-linear and probabilistic in nature. It is also far more computationally expensive (esp. with lots of samples and high dimensionality). In this context, TSNE makes more sense than PCA because deep learning is also not a linear process, so TSNE suits it better. The researchers applied some preprocessing to keep the costs down. The steps they took specifically were, “for each checkpoint we take the softmax output for a set of examples, flatten the vector and use it to represent the model’s predictions. The t-SNE algorithm is then used to reduce it to a 2D point in the t-SNE plot. Figure 2(c) shows that the functions explored by different trajectories (denoted by circles with different colors) are far away, while functions explored within a single trajectory (circles with the same color) tend to be much more similar.”

如前所述，此图用于显示函数空间中不同的随机初始化如何不同。它用于将较高维度的数据有效地绘制为较低(人类可以理解的)维度。它是PCA的替代方法。与PCA不同，tSNE本质上是非线性的并且是概率性的。它在计算上也要昂贵得多(尤其是具有大量样本和高维数)。在这种情况下，TSNE比PCA更有意义，因为深度学习也不是线性过程，因此TSNE更适合它。研究人员进行了一些预处理以降低成本。他们具体采取的步骤是，“对于每个检查点，我们将softmax输出作为一组示例，将向量展平并使用它来表示模型的预测。然后使用t-SNE算法将其减少到t-SNE图中的2D点。图2(c)显示，通过不同轨迹(以不同颜色的圆圈表示)探索的功能相距遥远，而在单个轨迹(具有相同颜色的圆圈)中探索的功能往往更加相似。”

子空间采样和分集测量 (Subspace Sampling and Diversity Measurement)

The last 2 things the paper implemented were subspace sampling and diversity measurement. Subspace sampling involves training the data without all the features. This can be used to create several learners, that when combined, perform better than the original. The details of the sampling methods are in the paper. The samples, validated the results of the full-feature tsne, with random initializations going along different paths.

本文执行的最后两件事是子空间采样和分集测量。子空间采样涉及在没有所有功能的情况下训练数据。这可以用来创建多个学习器，这些学习器组合在一起后，性能会比原始学习器更好。采样方法的详细信息在本文中。这些样本验证了全功能tsne的结果，并通过不同的路径进行了随机初始化。

3 different subspace sampling methods, all lead to distinct neighborhoods

The diversity score quantifies the difference of two functions (a base solution and a sampled one), by measuring the fraction of data points on which the predictions differ. This simple approach is enough to validate the premise

分集得分通过测量预测值不同的数据点的比例来量化两个函数(基本解和采样解)的差异。这种简单的方法足以验证前提

Both the sampling and accuracy-diversity plots are further proof of the hypothesis.

采样图和准确性-多样性图都进一步证明了这一假设。

Combined these techniques prove two things:

结合这些技术，可以证明两件事：

Different initializations of a network will cause a model to reach and get stuck at different modes (peaks). This will let it solve certain problems and not others. Points along the trajectory along to the mode will have very similar predictions.网络的不同初始化将导致模型到达并陷入不同的模式(峰值)。这将使其解决某些问题，而不是其他问题。沿着轨迹到模式的点将具有非常相似的预测。
Ensembles are able to learn from multiple modes, thus improve their performance.乐团可以从多种模式中学习，从而提高演奏性能。

扩展名 (Extensions)

This paper was great in popping the hood behind the learning processes Deep Ensembles and Bayesian Networks. An analysis of the learning curves and validation curves would’ve been interesting. Furthermore, it would be interesting to see how deep learning ensembles stack up Random Forests, or other ensembles. Performing a similar analysis on the learning processes of these would allow us to create mixed ensembles that might use be good for solving complex problems.

这篇论文很好地揭开了深度整合和贝叶斯网络学习过程的面纱。对学习曲线和验证曲线的分析将很有趣。此外，有趣的是，深度学习集合会如何堆积随机森林或其他集合。对这些学习过程进行类似的分析将使我们能够创建混合乐团，这些乐团可能有助于解决复杂的问题。

重点论文 (Highlighted Paper)

Below is the paper. I have highlighted what I thought was important and added definitions to some important concepts. Hope it helps

以下是论文。我已经强调了我认为很重要的内容，并为一些重要概念添加了定义。希望能帮助到你

Please leave your feedback on this article below. If this was useful to you, please share it and follow me here. Additionally, check out my YouTube channel. I will be posting videos breaking down different concepts there. I will also be streaming on Twitch here. I will be answering any questions/having discussions there. Please go leave a follow there. If you would like to work with me email me here: devanshverma425@gmail.com or reach out to me LinkedIn

请在下面留下您对本文的反馈。如果这对您有用，请分享并在这里关注我。此外，请访问我的YouTube频道。我将在那里发布分解不同概念的视频。我还将在这里在Twitch上直播。我将在那里回答任何问题/进行讨论。请去那里跟随。如果您想与我合作，请在这里给我发电子邮件：devanshverma425@gmail.com或与我联系

翻译自: https://medium.com/swlh/why-deep-learning-ensembles-outperform-bayesian-neural-networks-dba2cd34da24

贝叶斯深度神经网络

查看全文

http://www.taodudu.cc/news/show-863627.html

模型监控psi_PSI和CSI：前2个模型监控指标
flask渲染图像_用于图像推荐的Flask应用
pytorch贝叶斯网络_贝叶斯神经网络：2个在TensorFlow和Pytorch中完全连接
稀疏组套索_Python中的稀疏组套索
deepin中zz_如何解决R中的FizzBuzz问题
图像生成对抗生成网络gan_GAN生成汽车图像
生成模型和判别模型_生成模型和判别模型简介
机器学习算法拟合曲线_制定学习曲线以检测机器学习算法中的错误
重拾强化学习的核心概念_强化学习的核心概念
gpt 语言模型_您可以使用语言模型构建的事物的列表-不仅仅是GPT-3
廉价raid_如何查找80行代码中的廉价航班
深度学习数据集制作工作_创建我的第一个深度学习+数据科学工作站
pytorch线性回归_PyTorch中的线性回归
spotify音乐下载_使用Python和R对音乐进行聚类以在Spotify上创建播放列表。
强化学习之基础入门_强化学习基础
在置信区间下置信值的计算_使用自举计算置信区间
步进电机无细分和20细分_细分网站导航会话
python gis库_使用开放的python库自动化GIS和遥感工作流
mask rcnn实例分割_使用Mask-RCNN的实例分割
使用FgSegNet进行前景图像分割
完美下巴标准_平行下颚抓
api 规则定义_API有规则，而且功能强大
r语言模型评估:_情感分析评估：对自然语言处理的过去和未来的反思
机器学习偏差方差_机器学习101 —偏差方差难题
机器学习多变量回归算法_如何为机器学习监督算法识别正确的自变量？
python 验证模型_Python中的模型验证
python文本结构化处理_在Python中标记非结构化文本数据
图像分类数据库_图像分类器-使用僧侣库对房屋房间类型进行分类
利用PyCaret的力量
ai伪造论文实验数据_5篇有关AI培训数据的基本论文

贝叶斯深度神经网络_深度学习为何胜过贝叶斯神经网络相关推荐

ann人工神经网络_深度学习-人工神经网络（ANN）
ann人工神经网络 Building your first neural network in less than 30 lines of code. 用不到30行代码构建您的第一个神经网络. 1.W ...
前馈神经网络_深度学习基础理解：以前馈神经网络为例
区别于传统统计机器学习的各类算法,我们从本篇开始探索深度学习模型.深度学习在应用上的重要性现如今已毋庸置疑,从2012年燃爆ImageNet,到2016年的AlphaGo战胜李世石,再到2018年的B ...
eclipse创建神经网络_使用Eclipse Deeplearning4j构建简单的神经网络
eclipse创建神经网络神经网络导论深度学习包含深度神经网络和深度强化学习,它们是机器学习的子集,而机器学习本身就是人工智能的子集. 广义地说,深度神经网络执行机器感知,该机器感知从原始数据中提 ...
python3 神经网络_如何在Python 3中欺骗神经网络
python3 神经网络 The author selected Dev Color to receive a donation as part of the Write for DOnations ...
[DeeplearningAI笔记]改善深层神经网络_深度学习的实用层面1.10_1.12/梯度消失/梯度爆炸/权重初始化...
觉得有用的话,欢迎一起讨论相互学习~Follow Me 1.10 梯度消失和梯度爆炸当训练神经网络,尤其是深度神经网络时,经常会出现的问题是梯度消失或者梯度爆炸,也就是说当你训练深度网络时,导数或坡 ...
nin神经网络_深度学习基础（三）NIN_Network In Network
该论文提出了一种新颖的深度网络结构,称为"Network In Network"(NIN),以增强模型对感受野内local patches的辨别能力.与传统的CNNs相比,NIN主 ...
图卷积神经网络_深度层次化图卷积神经网络
来源:IJCAI 2019 论文地址:https://arxiv.org/abs/1902.06667 代码地址:https://github.com/CRIPAC-DIG/H-GCN Introdu ...
深度卷积神经网络_深度卷积神经网络中的降采样
加入极市专业CV交流群,与6000+来自腾讯,华为,百度,北大,清华,中科院等名企名校视觉开发者互动交流!更有机会与李开复老师等大牛群内互动! 同时提供每月大咖直播分享.真实项目需求对接.干货资讯汇总 ...
深度神经网络TensorFlow基础学习（3）——卷积神经网络的参数个数和张量大小
今天,我们来分享一篇博文,关于如何计算图像张量的大小以及确定卷积神经网络各层参数个数的公式.假设我们已经熟悉了卷积神经网络相关概念.在这里,我们把张量定义为有任意通道数的图像. 张量是在深度学习中表示 ...

贝叶斯深度神经网络_深度学习为何胜过贝叶斯神经网络