paper: Self-Attention Generative Adversarial Networks

Self-attention

Motivation:

Since the convolution operator has a local receptive field, long range dependencies can only be processed after passing through several convolutional layers. This could prevent learning about long-term dependencies for a variety of reasons:
- (i) a small model may not be able to represent them
- (ii) optimization algorithms may have trouble discovering parameter values that carefully coordinate multiple layers to capture these dependencies
- (iii) these parameterizations may be statistically brittle and prone to failure when applied to previously unseen inputs.
Increasing the size of the convolution kernels can increase the representational capacity of the network but doing so also loses the computational and statistical efficiency obtained by using local convolutional structure.

SAGAN

SAGAN allows attention-driven, long-range dependency modeling (卷积核易于捕捉局部信息，而 SAGAN 通过注意力机制引入广域依赖) for image generation tasks.
In the SAGAN, the proposed attention module has been applied to both the generator and the discriminator.
- (1) Generator: Details can be generated using cues from all feature locations.
- (2) Discriminator: the discriminator can check that highly detailed features in distant portions of the image are consistent with each other.
- Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.

Self-attention - 全局空间信息计算

x ∈ R C × N x\in\R^{C\times N} x∈RC×N: image features from the previous hidden layer. Here, C C C is the number of channels and N N N is the number of feature locations of features from the previous hidden layer.
f ( x ) = W f x , g ( x ) = W g x f(x) = W_f x, g(x) = W_gx f(x)=Wfx,g(x)=Wgx: transform x x x into two feature spaces f f f (key), g g g (query) to calculate the attention
- β j , i β_{j,i} βj,i indicates the extent to which the model attends to the i i ith location when synthesizing the j j jth region
- W g ∈ R C ˉ × C , W f ∈ R C ˉ × C W_g\in\R^{\bar C\times C},W_f\in\R^{\bar C\times C} Wg∈RCˉ×C,Wf∈RCˉ×C; attention map: N × N N\times N N×N
The output of the attention layer is o = ( o 1 , o 2 , . . . , o j , . . . , o N ) ∈ R C × N o = (o_1, o_2, ..., o_j , ..., o_N) ∈ \R^{C×N} o=(o1,o2,...,oj,...,oN)∈RC×N, where,
- W h ∈ R C ˉ × C , W v ∈ R C × C ˉ W_h\in\R^{\bar C\times C},W_v\in\R^{C\times\bar C} Wh∈RCˉ×C,Wv∈RC×Cˉ

W g , W f , W h , W v W_g,W_f,W_h,W_v Wg,Wf,Wh,Wv are implemented as 1 × 1 1×1 1×1 convolutions. Since We did not notice any significant performance decrease when reducing the channel number of C ˉ \bar C Cˉ to be C / k C/k C/k, where k = 1 , 2 , 4 , 8 k = 1, 2, 4, 8 k=1,2,4,8 after few training epochs on ImageNet. For memory efficiency, we choose k = 8 k = 8 k=8 (i.e., C ˉ = C / 8 \bar C = C/8 Cˉ=C/8) in all our experiments.

Self-attention - 整合全局空间信息和局部信息

In addition, we further multiply the output of the attention layer by a scale parameter and add back the input feature map. Therefore, the final output is given by,
where γ γ γ is a learnable scalar and it is initialized as 0.
Introducing the learnable γ γ γ allows the network to first rely on the cues in the local neighborhood – since this is easier – and then gradually learn to assign more weight to the non-local evidence.
- The intuition for why we do this is straightforward: we want to learn the easy task first and then progressively increase the complexity of the task.

Loss

In the SAGAN, the generator and the discriminator are trained in an alternating fashion by minimizing the hinge version of the adversarial loss (Lim & Ye, 2017; Tran et al., 2017; Miyato et al., 2018 (SNGAN)),

Spectral normalization for both generator and discriminator

In SNGAN, SN is only applied to D D D. Here, SAGAN applys spectral normalization to both GAN generator and discriminator.
- Spectral normalization in the generator can prevent the escalation of parameter magnitudes and avoid unusual gradients.
- We find empirically that spectral normalization of both generator and discriminator makes it possible to use fewer discriminator updates per generator update, thus significantly reducing the computational cost of training. The approach also shows more stable training behavior.

Imbalanced learning rate for generator and discriminator updates

In previous work, regularization of the discriminator (SNGAN; WGAN-GP) often slows down the GANs’ learning process.
- In practice, methods using regularized discriminators typically require multiple (e.g., 5) discriminator update steps per generator update step during training.
Independently, Heusel et al. (Heusel et al., 2017) have advocated using separate learning rates (TTUR; Two-Timescale Update Rule) for the generator and the discriminator.
We propose using TTUR specifically to compensate for the problem of slow learning in a regularized discriminator, making it possible to use fewer discriminator steps per generator step. Using this approach, we are able to produce better results given the same wall-clock time.
- lr for Discriminator: 0.0004
- lr for Generator: 0.0001

SAGAN: Self-attention GAN相关推荐

从DCGAN到SELF-MOD：GAN的模型架构发展一览
事实上,O-GAN的发现,已经达到了我对GAN的理想追求,使得我可以很惬意地跳出GAN的大坑了.所以现在我会试图探索更多更广的研究方向,比如NLP中还没做过的任务,又比如图神经网络,又或者其他有趣的东 ...
sagan 自注意力_请使用英语：自我注意生成对抗网络（SAGAN）
sagan 自注意力介绍 (Introduction) In my effort to better understand the concept of self-attention, I trie ...
[Intensive Reading]图像生成：SaGAN
简介首先需要说明下,SaGAN不是SAGAN,SAGAN是Self-Attention GAN,Ian Goodfellow大牛挂名的论文,而这篇文章要介绍的是SaGAN是Spatial Atten ...
关于GAN的训练技巧
本文转自极市平台:https://mp.weixin.qq.com/s/XQUGX6kH_EyjE5RJZBWRxw 导读训练生成对抗网络是很难的:我们来想办法变得简单一点. 介绍一年前,我决定开 ...
Self-Attention GAN 中的 self-attention 机制
作者丨尹相楠学校丨里昂中央理工博士在读研究方向丨人脸识别.对抗生成网络 Self Attention GAN 用到了很多新的技术.最大的亮点当然是 self-attention 机制,该机制是 N ...
深度解读DeepMind新作：史上最强GAN图像生成器—BigGAN
在碎片化阅读充斥眼球的时代,越来越少的人会去关注每篇论文背后的探索和思考. 在这个栏目里,你会快速 get 每篇精选论文的亮点和痛点,时刻紧跟 AI 前沿成果. 点击本文底部的「阅读原文」即刻加入社区 ...
GAN—为百年旧照上色
前言 Attention GAN 该项目的目的是为旧照片着色并将其修复.带自注意力机制的生成对抗网络.生成器是一个预训练 Unet,我将它修改为具有光谱归一化和自注意力.这是一个非常简单的转换过程.首 ...
不要再次进行阅读的计算机论文与理由(持续更新中)
这篇博客主要记载一些没有代码实现的计算机论文,并且确保在理论上也不值得读的论文的汇总. 论文名称不再阅读的理由所属期刊/会议 <Negative eigenvalues of the hes ...
【今日CS 视觉论文速览】Fri, 21 Dec 2018
今日CS.CV计算机视觉论文速览 Fri, 21 Dec 2018 Totally 23 papers Daily Computer Vision Papers [1] Title: Steerabl ...
《安检违禁品图像生成与评价网络模型研究》阅读笔记
<安检违禁品图像生成与评价网络模型研究>阅读笔记一.绪论 1.存在的问题 2.研究内容二.安检图像数据集构建与预处理 1. 图像采集(实验室X光机) 2.安检X光图像预处理 3.安检X ...

SAGAN: Self-attention GAN

目录

Self-attention

Spectral normalization for both generator and discriminator

Imbalanced learning rate for generator and discriminator updates

SAGAN: Self-attention GAN相关推荐

最新文章

热门文章