注：本文是楼主在原文的基础上，结合网上内容理解整理的。该文不一定准确，仅供各位参考，欢迎批评指正！另外，禁止商业用途的转载，谢谢！

写在前面
1. 核心思想
- 1.1. 概念介绍 (Dimpled Manifold Model, DMM)
- 1.2. DMM 解释训练过程
- 1.3. DMM 解释对抗扰动
- 1.4. DMM 解释对抗训练
2. 解释说明
- 2.1. 什么是对抗样本？
- 2.2. 为什么对抗样本在良性样本附近？
- 2.3. 为什么对抗扰动类似随机噪音？
- 2.4. 为什么网络鲁棒性和准确率不可兼得？
3. 相关工作
- 3.1. 非线性特性
- 3.2. 局部线性特性
- 3.3. 鲁棒和非鲁棒特征
个人看法
- 1. 优点
- 2. 缺点
- 3. 疑惑

写在前面

文献获取：The Dimpled Manifold Model of Adversarial Examples in Machine Learning
一作介绍：图灵奖获得者、信息安全常青树 Adi Shamir
大会报告：A New Theory of Adversarial Examples in Machine Learning（推荐阅读）
相关知识：什么是潜在空间？什么是流形？什么是信息瓶颈？什么是对抗样本？什么是对抗训练？

1. 核心思想

该节主要阐述本文提出的框架和相关的实验说明。

1.1. 概念介绍 (Dimpled Manifold Model, DMM)

动机：…, none of these qualitative ideas seems to provide a simple, intuitive explanation that can be experimentally tested for adversarial examples’ existence and bizarre properties. （当前没有一个简单直观阐述对抗样本存在原因和其属性的解释）
目标：This paper aims not to propose new adversarial attacks or defenses but to propose a new comprehensive framework for thinking about adversarial examples.（目的是为了提出一个新的思考角度来看待对抗样本）

原文：We can approximate this manifold by using a high quality autoencoding DNN which first compresses the given image into a kkk-dimensional latent space, and then decompresses this latent space into a kkk-dimensional image manifold in the nnn-dimensional input space. Note that by using a DNN with ReLU activation functions, we can guarantee that the approximated manifold will be a continuous piecewise linear surface within the image cube, with a well-defined kkk-dimensional basis for the local linear subspace at any given point on it. In typical autoencoders, kkk can be one to two orders of magnitude smaller than nnn, and their outputs are visually very similar to their inputs. While we do not fully understand the global structure of this manifold, we expect it to be fairly benign. This is because image compression is achieved primarily by eliminating some high spatial frequencies and exploiting the non-uniform distribution of small patches in natural images.

作者采用自动编码器来解释什么是 Dimpled Manifold Model (DMM)，并先解释说明了整个假设背景：假定输入空间是 nnn 维向量，则 DNN 会首先将图像提取特征到 kkk 维的潜在空间 (Latent Space) 中，接着将其保留 kkk 维的图像流形 (Image Manifold) 并解压缩到原始 nnn 维的输入空间 (Input Space) 内。需注意给出的额外约束：

其中，通过使用带有 ReLU 激活函数，DNN 得到的流形是连续分段线性的曲面。⟹\implies⟹ 这样的流形是十分光滑的，图像之间存在边界，局部流行之间的分布可能是非均匀的。
另外，通常情况下，kkk 维度一般比 nnn 维度要小 1 到 2 个维度。
并且，由于图像压缩通常是剔除高频，并且利用图像块非均匀分布来提取特征，因此默认得到的该图像流形上均为自然的良性样本。

原文：Next, we note that while decision boundaries are n−1n-1n−1 dimensional objects, their quality is judged only by their performance on some natural images within the tiny kkk-dimensional image manifold. In fact, during the training, networks try to utilize the vast perpendicular subspace (on which they are not judged) to make it easier for them to place the decision boundary within the image manifold correctly. We visually demonstrate this mental image for the low dimensional case of k=2k=2k=2 and n=3n=3n=3 in the middle part of Fig. 1, in which a 2D image manifold floats in the middle of the 3D cube of possible inputs at height z=0.5z=0.5z=0.5. According to the simplistic mental image, when we add to the input space a new third dimension we could expect a 1D grey decision boundary as in the left image to be extended upwards and downwards (for all 0≤z≤10≤z≤10≤z≤1) into a vertical 2D wall that separates the red and blue clusters. However, in our synthetic simulation of a DNN on the synthetic dataset in the middle figure (and in the vast majority of real image settings), we got the decision boundary depicted on the right side of Fig. 1, which clings very closely to the whole image manifold, except for shallow dimples that gently undulate below the red clusters, and above the blue clusters.

如下图所示，作者通过模拟 DNN，将输入空间假定为三维，流形为二维圆形在三维的映射球体，得到了决策边界的可视化样子。.⟹\implies⟹ 可以看出在二分类下，三维空间的决策边界如同波纹一般上下起伏，这样的流形被作者称为 Dimpled Manifold。

三维输入空间内二维流形以及决策边界
（左图红色和蓝色球体分别表示不同类别的样本，右图地势低表红色，高的为蓝色）

1.2. DMM 解释训练过程

原文：We call this conceptual framework the Dimpled Manifold Model (DMM) and note that it is based on two testable claims about how decision boundaries evolve during the training process:

Training a DNN proceeds in two distinct phases: a (typically fast) clinging phase which brings the decision boundary very close to the image manifold, followed by a (typically slower) dimpling phase which creates shallow bumps in the decision boundary that try to move the boundary to the correct side of the training examples

To help the training, DNNs develop large gradients in the confidence levels in the vicinity of the training examples, which point roughly perpendicularly to the image manifold

根据文中实验总结，DNN 训练主要包括两个连续的阶段（符合信息瓶颈理论，具体可参考会议报告）：

Clinging 阶段（速度快）
在训练早期，网络更多的是抽取特征而非学习与标签的关系，因此决策边界在第一阶段快速靠近图像流形。
Dimpling 阶段（速度慢）
在训练后期，网络重点关注训练样本的标签，决策边界根据流形产生波动，逐渐形成 Dimpled Manifold。

如下图所示，决策边界初始是随机的，随后快速拟合数据流形（前 30 epoch），紧接着通过大量的调整形成上下凹凸的边界（由 30 到 100 epoch）。

二维输入空间对一维流形的决策边界生成过程
（红蓝圆点表示不同类别的训练数据，灰线表示决策边界）

同理，在三维输入空间内也是进行两阶段过程，可视化效果如下图所示。

三维输入空间对二维流形的决策边界生成过程
（红蓝球体表示不同类别的训练数据，地势图表示决策边界）

其中，针对第二阶段，作者提出值得注意的部分：

原文：Note that by developing a large derivative in the vertical direction, the network can bend the decision boundary more gently. This bending makes it easier to gain accuracy with a simpler decision boundary that uses shallower dimples that pass on the correct sides of neighboring training examples of opposite classes. In addition, the gentle bending of the decision boundary can create larger bumps that cover multiple training examples along with regions between these examples (where no hammer blows are applied), leading to a possible generalization phenomenon. Finally, the above description explains why different architectures, complexities, and classifiers are likely to have decision boundaries that almost coincide with the image manifold (but can be quite different on other parts of the input domain, in which there are no training or test images).

如下图示例所示，红色样本应位于决策边界上方，蓝色应处于下方。因此在不符合分类要求的位置，有力量（如图中黑色箭头）将决策边界向理想状态牵引。这种牵引力会作用于数据点周围的区域，而不是仅仅作用于数据点的局部。

二维决策边界拟合情况
（红蓝球体表示不同类别的训练数据，灰线表示决策边界，箭头表示边界拟合趋势）

直观地说，当决策边界形成的 Dimple 空间较大时（如上图状态），模型的泛化能力较好，而分类精度不高；相反，如果边界越贴近图像流形，则其在训练数据上的分类准确率越高，形成的 Dimple 也就越浅（如二维输入空间对一维流形的决策边界生成过程中最后 100 epoch 示意图）。

大会解读：神经网络选择的决策边界会选择扁平的、与图像流形接近的决策边界。

1.3. DMM 解释对抗扰动

DMM 可以进一步补充解释（导致标签改变的）对抗扰动大多数是垂直于图像流形的。具体实验如下面两幅图所示。其中，圆圈代表原始数据，三角代表对抗样本，黑色箭头表示对抗扰动方向。

在二维决策边界和一维数据的对抗扰动方向
在三维决策边界和二维数据的对抗扰动方向

从上面结果不难看出，对抗扰动几乎是垂直与决策边界的。（个人理解）由于决策边界在训练后期几乎与图像流形平行（拟合），因此垂直于图像流形的方向其梯度也就最大，需要移动的距离最少（这对于 L2L_2L2 攻击生成来说是最优选择）。

另外，作者进一步探讨了对抗扰动施加在图像中的位置（包括流形上与非流形上）。下图面三幅图分别采用 PGD 攻击在 MNIST、CIFAR10、ImageNet 数据集上生成的对抗样本（每幅第一行），以及仅保留流形上 (On Manifold) 扰动（每幅第二行）和仅保留非流形上 (Off Manifold) 扰动（每幅第三行）的图片。 MNIST 对抗样本
CIFAR10 对抗样本
ImageNet 对抗样本

通过流形解释对抗扰动：

根据每幅从左数第三列图像可以发现，将生成的对抗样本扰动投影在图像流形上时得到的信息具有一定的视觉意义，而在非流形上的部分则如同随机噪声。
从每幅从左数第四列（表示网络 logits 的变化）可以了解到，流形上的扰动几乎不会改变最后 logits 的输出结果和顺序，而在非流形上的扰动则与完整的对抗样本一样可以使得 logits 发生改变。

此外，如下面三幅图所示，当分别对图像施加流形上和非流形 PGD 对抗攻击时，两者要达到与正常对抗样本同一效果需要的扰动大小（L2L_2L2 距离）不同——前者需要更大的扰动距离来实现。

（个人看法）从视觉角度上来说，对于 MNIST 和 CIFAR10 这两类低分辨率的数据集来说，加大扰动值还是能看出一些问题；但对于 ImageNet 来说，两者差异不大。

MNIST 对抗样本
CIFAR10 对抗样本
ImageNet 对抗样本

1.4. DMM 解释对抗训练

原文：As adversarial examples are within a short distance from the data points (and therefore from the data manifold), one can assume that the clinging phase will not change much due to the definition of adversarial examples as the train set. Therefore, the major effect of the adversarial training is within the dimples phase. After the clinging phase, the adversarial direction is perpendicular to the data manifold. Therefore, the effect of the adversarial training on the decision boundary is a deepening of the dimples, as one can see in Figure 3. Note that when the dimples get deep enough, the best adversarial direction (one calculated using a gradient with respect to the input) changes. While the shortest way to cross the boundary was previously almost orthogonal to the manifold, the dimples are deeper after adversarial training. As a result, the gradient will have a more significant on-manifold component. However, taking a slightly larger step in the same off-manifold direction will also result in an adversarial example, just a bit further.

根据上一节解释，对抗扰动方向基本上是垂直于图像流形的，因此得到的对抗样本可以参考下图 2 中蓝色横线，以微小的距离跨过原始的决策边界。当训练过程中考虑到对抗样以及对应的正确标签，可以使得神经网络在第二阶段 (Dimpling) 拟合其流形，如图 3 所示，从而增加了模型的鲁棒性。

对抗训练决策边界变化流程
（C 和 G 分别表示两类样本，灰线表示原始决策边界，绿线表示对抗训练后的决策边界）

另外，大会上作者还介绍了用 DMM 解释 Madry 等人的类似试验，具体详见此链接。

2. 解释说明

该节内容为作者利用 DMM 思想解释对抗样本反直觉的属性原因。

2.1. 什么是对抗样本？

原文：What are these adversarial examples? How can it be that next to any cat image, there is also an image of guacamole and vice versa? The answer is that all the real cat and guacamole images reside on the tiny image manifold. However, “below” and “above” the manifold, there are vast half-spaces of pseudo-images recognized by the network as cats and guacamoles even though they do not look like ones. The adversarial examples we generate are such pseudo-images. Note that when we consider multi-class classifiers in an nnn pairs of classes. In this case, any two decision boundaries have roughly perpendicular “up” dimensional input space, there are multiple n−1n−1n−1 dimensional decision boundaries between and “down” directions.

个人理解：
按照该文理论，正确类别的图像应分布在网络提取的相应类流形上，也就是下图中的蓝色或红色圆形。而对抗样本（如图中绿色区域），则是位于训练的正确流形外部（不需要具备某类图形的特征），且处于与原始样本流形分类不同的空间内，文中称为伪图像。

大会解读：由于较小的 kkk 维图像流形和较大的 nnn 维输入空间之间的错误匹配导致了对抗性样本的产生。

抽象二维示意图
（红色和蓝色为不同标签的训练样本，绿色区域为潜在的对抗样本空间）

2.2. 为什么对抗样本在良性样本附近？

原文：Why are the adversarial examples so close to the original images? As explained above, DNNs prefer to have large perpendicular derivatives in order to have shallower dimples that make it easier to undulate the decision boundary around the training examples gently. The tiny distance is a direct consequence of this large gradient since it suffices to move a short distance to significantly affect the confidence level.

个人理解：
由于 DNN 倾向于在训练样本附近赋予较大的梯度（垂直导数），因此决策边界一般都是贴近图像流形（原文用较浅的凹槽表示），从而导致（顺着梯度较大的方向）移动较小的距离就能严重影响分类置信水平。

2.3. 为什么对抗扰动类似随机噪音？

原文：Why don’t the adversarial perturbations resemble the target class? When we use an adversarial attack to modify a cat into guacamole, why doesn’t the perturbation we use look green and mushy? Most adversarial perturbations look like a featureless small-magnitude random noise. In the new mental image, we are moving roughly perpendicularly to the direction of guacamole images. For example, if a unit vector towards the nearest guacamole image is (1,0,0,...,0)(1, 0, 0, . . . , 0)(1,0,0,...,0), then a random unit vector in a perpendicular direction has the form (0,x2,x3,...,xn)(0, x_2, x_3, ..., x_n)(0,x2,x3,...,xn), in which each xix_ixi is a tiny positive or negative value around O(1/n)O(1/\sqrt{n})O(1/n). Such an adversarial perturbation looks (especially in L∞L_\inftyL∞ norm) like the random salt and pepper perturbations we see in the standard demonstrations of adversarial examples, rather than a depiction of guacamole.

个人理解：
由该文理论来说，对抗样本时由良性样本朝着目标类方向移动得到的，具体对抗扰动是由多个随机单位向量组成的（文中给出了简单假设的示例，下面翻译来自大会报告）。

示例翻译：假设原始流形为简单的 (x1,0,0,...,0)(x_1, 0, 0, . . . , 0)(x1,0,0,...,0)，在其垂直方向随机选择一个单位向量 (0,x2,x3,...,xn)(0, x_2, x_3, ... , x_n)(0,x2,x3,...,xn) 指向目标类图像，其中每一个非零项服从正态分布，其值大致位于 (−1/n,+1/n)(-1/\sqrt{n}, +1/\sqrt{n})(−1/n,+1/n) 的区间内。⟹\implies⟹ 得到类似随机噪音的对抗扰动。

2.4. 为什么网络鲁棒性和准确率不可兼得？

原文：It has been experimentally demonstrated that more robust networks tend to be less accurate. Why do robustness and accuracy trade-off? In the new model, ease of training and the existence of nearby adversarial examples are two sides of the same coin. When we train a network, we keep the images stationary and move the decision boundary around them by creating dimples; when we create adversarial examples, we keep the decision boundary stationary and move the images to its other side. Allowing a large perpendicular derivative makes the training easier since we do not have to bend the decision boundary around the training examples sharply. However, such a large derivative also creates very close adversarial examples. Any attempt to robustify a network by limiting all its directional derivatives will make it harder to train and thus less accurate.

个人理解：
较大的垂直导数使得 DNN 变得更容易进行学习，（由 DMM 理论来说）决策边界也就越贴近图像流形，同时也更佳容易被利用产生对抗样本。通过限制或改变其方向导数从而增加模型鲁棒性的方法都会使得模型更难训练，从而降低分类的精度。

3. 相关工作

该节简单列举了目前已知经典的对抗样本存在思想，更多详细分析可自行搜索。

3.1. 非线性特性

这个观点是来 Szegedy 等人的论文 Intriguing properties of neural networks，这篇也是首先提出了神经网络是存在对抗样本这类威胁的。具体思想是，由于神经网络的高度非线性（输出层单元是其输入的高度非线性函数），导致过拟合只学习到了非对抗样本的特征，并没有学到真正所需要的泛化特征。另外，他们还认为对抗样本发生的概率是很低的，并不是广泛存在与输入空间内。

（个人看法）这么想也是非常直观的，训练的数据并不如真实数据那样全面，尽管神经网络为了更好的完成复杂任务采用高度非线性函数，但仍不能完美的把所有数据区分开。当训练效果越好时，越容易产生过拟合现象，导致对抗样本这样的情况发生也是情理之中。

3.2. 局部线性特性

这个观点是来自 Goodfellow 等人的论文 Explaining and Harnessing Adversarial Examples。提出该观点主要是反驳上面提到的几个观点，证明手法也比较特殊，设计了一种新的快速产生对抗样本的方法来反驳。具体思路和数学证明解释可以自行搜索，这里就不再赘述了。

（个人看法）本人最初也是这么想的，由于数据是高维的，且神经网络的学习模式（点击累加权重偏置），那么如果对输入进行一点细微的改变，就如同蝴蝶效应，可能对最终结果有巨大的影响。

3.3. 鲁棒和非鲁棒特征

这个观点是来自于 Ilyas 等人的论文 Adversarial Examples Are Not Bugs, They Are Features。他们发现神经网络更多的学习到了非鲁棒的特征（不具有人眼意义的）与标签之间的关系，而对抗样本就是利用了这样的特征，如下图所示。该文视角出彩，建议阅读。

鲁棒特征和非鲁棒特征
（右侧上方为提取的鲁棒特征，右侧下方为提取的非鲁棒特征）

作者认为，该观点与这篇文章提出的 DMM 是一种互补的理论：鲁棒和非鲁棒是从人眼的角度分析，DMM 是从几何的角度入手。可以将流形上的数据等价于鲁棒特征，而非流形上的数据等价于非鲁棒特征，从而得到的结论相符。具体原文如下：

原文：The two perspectives can be viewed as complementary rather than contradictory, since they try to describe the same phenomenon using different languages: we use geometric intuition whereas Ilyas et al. takes a more human-centric approach. Clearly, adversarial perturbations which are mostly orthogonal to the image manifold lead to off-manifold adversarial examples. We can thus associate the off-manifold dimensions to the “non-robust features” of Ilyas et al. On the other hand, the on-manifold dimensions correspond more closely to the human-noticeable “robust features”.

个人看法

1. 优点

对于提出模型：利用二维和三维视角可视化神经网络决策边界，非常直观简洁；
对于对抗样本：合理阐释了对抗样本存在，补充证明并探讨了对抗扰动与流形的关系；
对于对抗训练：形象地解释说明了对抗训练的有效性原因。

2. 缺点

对于提出模型：文中的示例都较为理想化（流形简单）、单一化（二分类），且缺乏相关理论辅助说明；
对于对抗样本：文中只针对 L2L_2L2 （附录有简单讨论 L∞L_\inftyL∞ 的攻击为什么像随机扰动）进行了实验说明，没有研究有关 L0L_0L0 和 L1L_1L1 攻击（利用像素与输出关系）以及黑盒环境下的一些生成方法（非利用梯度的）。

3. 疑惑

对于图像流形：不同类别的图像流形不会有重叠部分吗？如果有，决策边界的拟合该会是什么样子的？
对于提出模型：在高维空间内，决策边界还是会如此贴近、包裹流形吗？

The Dimpled Manifold Model of Adversarial Examples in Machine Learning 文献阅读相关推荐

对抗样本方向（Adversarial Examples）2018-2020年最新论文调研
调研范围 2018NIPS.2019NIPS.2018ECCV.2019ICCV.2019CVPR.2020CVPR.2019ICML.2019ICLR.2020ICLR 2018NIPS Conta ...
[深度学习论文笔记][Adversarial Examples] Deep Neural Networks are Easily Fooled: High Confidence Predictions
Nguyen, Anh, Jason Yosinski, and Jeff Clune. "Deep neural networks are easily fooled: High conf ...
Explaining and Harnessing Adversarial Examples
Explaining and Harnessing Adversarial Examples 包括神经网络在内的一些机器学习模型,始终对对抗性样本进行错误分类–通过对数据集中的例子应用小的但有意的最坏 ...
论文笔记——EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES（解释和利用对抗样本）
本文参考了三篇笔记,帮助很大: <Explaining and Harnessing Adversarial Examples>阅读笔记 [论文笔记]Explaining & Ha ...
论文解读 | Explaining and Harnessing Adversarial Examples
核心观点: 神经网络对于对抗样本的攻击如此脆弱的原因,是因为网络的线性本质. 文章还提出了最早的 FGSM (Fast Gradient Sigh Method)对抗样本生成方法. 通过在训练样本中加 ...
Paper Reading（1） : ICLR2015_Explaining and Harnessing Adversarial Examples
目录 0x01.论文概要 0x02.主要内容 1.前期工作 2.对抗样本的线性解释 3.非线性模型的线性扰动 4.线性模型的对抗训练VS权重衰减 5.深层网络的对抗性训练 6.不同类型的模型容量 6. ...
[paper]Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks
本文提出了两种特征压缩方法: 减少每个像素的颜色位深度使用空间平滑来减少各个像素之间的差异特征压缩通过将与原始空间中许多不同特征向量相对应的样本合并为单个样本,从而减少了对手可用的搜索空间.通过将 ...
论文笔记：Improving Grammatical Error Correction Models with Purpose-Built Adversarial Examples
论文笔记:Improving Grammatical Error Correction Models with Purpose-Built Adversarial Examples 文章简要介绍出处 ...
论文理解——Audio Adversarial Examples:Targeted Attacks on Speech-to-Text
0-Abstract 本文构建了有关语音识别的定向语音对抗样本,给定任意音频波形,可以产生99.9%相似的另一个音频波形,且可以转录为所选择的任何短语.作者将基于白盒迭代优化攻击应用于DeepSpee ...

The Dimpled Manifold Model of Adversarial Examples in Machine Learning 文献阅读

目录