SAGAN——Self-Attention Generative Adversarial Networks

原论文下载地址：Self-Attention Generative Adversarial Networks

该文章作者为：Han Zhang

GitHub代码实现：pytorch实现

摘要部分：

本文提出了Self-Attention Generative Adversarial Networks，在注意力机制，远距离依赖（long-range dependency）下来实现了图片生成任务。（1）传统的卷积GAN网络，是通过低分辨率图像中的空间局部点来生成高分辨率细节特征。而在SAGAN中，可以使用根据所有位置的提示来生成详细信息。（2）因为判别器可以检查图像的远端部分中的高度详细的特征是否彼此一致。而最近的研究表明，对生成器进行有效的调节可以影响GAN网络的性能，所以我们在GAN网络的generator中加入了光谱正则化（spectral normalization）。最终达到了比较好的效果。

提出问题：

convolutional GANs have much more difficulty modeling some image classes than others when trained on multi-class datasets . For example, while the state-of-the-art ImageNet GAN model excels at synthesizing image classes with few structural constraints (e.g. ocean, sky and landscape classes, which are distinguished more by texture than by geometry), it fails to capture geometric or structural patterns that occur consistently in some classes (for example, dogs are often drawn with realistic fur texture but without clearly defined separate feet)。

在多种类别的数据集上训练时，卷积GAN网络比其他图像类更难以建模，也就是说GAN网络很容易合成具有少量结构约束的图像（像星空，海洋...），

在目前的图像生成模型中，一般很难处理好细节和整体的权衡

可能原因：

对此的一种可能解释是先前的模型严重依赖于卷积来模拟不同图像区域之间的依赖性。由于卷积运算符具有局部感受域，因此只能在经过多个卷积层之后处理长距离依赖性（long-range dependency）。这可能会因各种原因阻止学习长期依赖性：

（1）小型模型可能无法找到他们之间的依赖关系；

（2）优化算法可能无法发现仔细协调多个层以捕获这些依赖性的参数值；

（3）增加卷积核的大小可以增加网络的表示能力，但是又会丧失卷积网络的参数和计算效率。

解决办法-即SAGAN的优点：

self-attention 在模拟远程依赖性的能力、计算效率和统计效率之间展现出更好的平衡。自我关注模块将所有位置处的特征的加权和作为该位置的响应，其中权重 - 或注意向量 - 仅以较小的计算成本来计算。

我们提出了自我注意生成对抗网络（SAGAN），它将self-attention机制引入卷积GAN。

（1）可以很好的处理长范围、多层次的依赖(可以很好的发现图像中的依赖关系)

（2）生成图像时每一个位置的细节和远端的细节协调好

（3）判别器还可以更准确地对全局图像结构实施复杂的几何约束。

SAGAN:（参考了 https://blog.csdn.net/mx54039q/article/details/80896054）

文中提到大多数GAN用的都是卷积层，而卷积层只能够处理局部的信息，所以本文中采用了一种non-local model,是的生成器和判别器能够有效地构建各个区域之间的关系。

self-attention 机制

（1）f(x)，g(x)和h(x)都是普通的1x1卷积，差别只在于输出通道大小不同；

（2）将f(x)的输出转置，并和g(x)的输出相乘，再经过softmax归一化得到一个Attention Map；

（3）将得到的Attention Map和h(x)逐像素点相乘，得到自适应的注意力feature maps.

tensorflow code

    def attention(self, x, ch):f = conv(x, ch // 8, kernel=1, stride=1, sn=self.sn, scope='f_conv') # [bs, h, w, c']g = conv(x, ch // 8, kernel=1, stride=1, sn=self.sn, scope='g_conv') # [bs, h, w, c']h = conv(x, ch, kernel=1, stride=1, sn=self.sn, scope='h_conv') # [bs, h, w, c]# N = h * ws = tf.matmul(hw_flatten(g), hw_flatten(f), transpose_b=True) # # [bs, N, N]beta = tf.nn.softmax(s, axis=-1)  # attention mapo = tf.matmul(beta, hw_flatten(h)) # [bs, N, C]gamma = tf.get_variable("gamma", [1], initializer=tf.constant_initializer(0.0))o = tf.reshape(o, shape=x.shape) # [bs, h, w, C]x = gamma * o + xreturn x

参数的含义

（1）f(x)的输出[C/8, W, H], g(x)的输出[C/8, W, H]，为了适应矩阵乘法，文中将一个feature map的长和宽flat成一个N维的向量 (N = W x H)，即f(x)和g(x)的输出为[C/8, N];

（2）将f(x)的转置和g(x)矩阵乘，得到的输出S为一个矩阵[N, N]，S矩阵可以看做一个相关性矩阵，即长H宽W的feature map上各个像素点之间的相关性；

（3）将S矩阵逐行用Softmax归一化得到\beta矩阵，每一行(长度为N的向量)代表了一种Attention的方式；

（4）将这N种Attention的方式应用到h(x)上，即每一个像素点都与整个feature map相关，相关性来自于\beta矩阵，得到N个新的像素点作为输出O；

最终的输出为：

其中\gamma初始化为0，然后逐渐的给 non-local分配更多的权重，

而这样做的原因是因为：我们想一开始学习一些简单的任务，然后再不断地增加复杂的任务。在SAGAN中，我们提出的注意力模型用在了generator模型和discriminator模型中，最后使用交替训练的方式来最小化adversarial loss。

稳定训练GAN的一些技巧：

（1）Spectral normalization for both generator and discriminator
有效地降低了训练的计算量，使得训练更加稳定。

（2）Imbalanced learning rate for generator and discriminator updates
two-timescale update rule （TTUR）
在训练过程中，给予G和D不同的学习速率，以平衡两者的训练速度

实验结果：

（1）用来评估所使用的两个tricks

（2）评估 Self-attention 机制

spectral_normalization code (tensorflow)

w = tf.get_variable("kernel", shape=[kernel, kernel, channels, x.get_shape()[-1]], initializer=weight_init, regularizer=weight_regularizer)

x = tf.nn.conv2d_transpose(x, filter=spectral_norm(w), output_shape=output_shape, strides=[1, stride, stride, 1], padding=padding)

def spectral_norm(w, iteration=1):w_shape = w.shape.as_list()w = tf.reshape(w, [-1, w_shape[-1]])u = tf.get_variable("u", [1, w_shape[-1]], initializer=tf.truncated_normal_initializer(), trainable=False)u_hat = uv_hat = Nonefor i in range(iteration):"""power iterationUsually iteration = 1 will be enough"""v_ = tf.matmul(u_hat, tf.transpose(w))v_hat = l2_norm(v_)u_ = tf.matmul(v_hat, w)u_hat = l2_norm(u_)sigma = tf.matmul(tf.matmul(v_hat, w), tf.transpose(u_hat))w_norm = w / sigmawith tf.control_dependencies([u.assign(u_hat)]):w_norm = tf.reshape(w_norm, w_shape)return w_norm

摘自：https://github.com/taki0112/Self-Attention-GAN-Tensorflow/blob/master/ops.py

SAGAN——Self-Attention Generative Adversarial Networks相关推荐

06.SAGAN（Self-Attention Generative Adversarial Networks）
SAGAN 摘要 introduction problems SAGAN的优点好处: SAGAN理论 Self-attention架构代码三个卷积核三个映射 out SAGAN 优化 1. S ...
Self-Attention Generative Adversarial Networks（SAGAN）理解
介绍 Self-Attention Generative Adversarial Networks(SAGAN)是Han Zhang, Ian Goodfellow等人在去年提出的一种新的GAN结构, ...
Generative Adversarial Networks（WGAN、SAGAN、BigGAN）
此篇博文继续整理GAN的衍生版本. Wasserstein Generative Adversarial Networks(WGAN) GAN 在基于梯度下降训练时存在梯度消失的问题,特别是当真实样本 ...
(SAGAN)Self-Attention Generative Adversarial Networks
core idea:将self-attention机制引入到GANs的图像生成当中,来建模像素间的远距离关系,用于图像生成任务 CGAN的缺点: 1.依赖卷积建模图像不同区域的依赖关系,由于卷积核比较 ...
[论文笔记]Self-Attention Generative Adversarial Networks
1.文献综述文章链接:链接文章题目:<Self-Attention Generative Adversarial Networks> 项目地址:1.TensorFlow版 2.pyto ...
Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy（计算机视觉中的GANs:综述与分类）
Abstract: 生成对抗网络(GANs)在过去几年得到了广泛的研究.可以说,他们最重要的影响是在计算机视觉领域,在挑战方面取得了巨大的进步,如可信的图像生成,图像之间的翻译,面部属性操纵和类似领域 ...
ESRGAN - Enhanced Super-Resolution Generative Adversarial Networks论文翻译——中英文对照
文章作者:Tyan 博客:noahsnail.com | CSDN | 简书声明:作者翻译论文仅为学习,如有侵权请联系作者删除博文,谢谢! 翻译论文汇总:https://github.com ...
生成对抗网络 – Generative Adversarial Networks | GAN
生成对抗网络 – Generative Adversarial Networks | GAN 生成对抗网络 – GAN 是最近2年很热门的一种无监督算法,他能生成出非常逼真的照片,图像甚至视频.我们手 ...
Generative Adversarial Networks（CGAN、CycleGAN、CoGAN）
很久前整理了GAN和DCGAN,主要是GAN的基本原理和训练方法,以及DCGAN在图像上的应用,模式崩溃问题等.其核心思想就是通过训练两个神经网络,一个用来生成数据,另一个用于在假数据中分类出真数据, ...

SAGAN——Self-Attention Generative Adversarial Networks

SAGAN——Self-Attention Generative Adversarial Networks相关推荐

最新文章

热门文章