• paper: Self-Attention Generative Adversarial Networks

目录

  • Self-attention
  • Spectral normalization for both generator and discriminator
  • Imbalanced learning rate for generator and discriminator updates

Self-attention

Motivation:

  • Since the convolution operator has a local receptive field, long range dependencies can only be processed after passing through several convolutional layers. This could prevent learning about long-term dependencies for a variety of reasons:

    • (i) a small model may not be able to represent them
    • (ii) optimization algorithms may have trouble discovering parameter values that carefully coordinate multiple layers to capture these dependencies
    • (iii) these parameterizations may be statistically brittle and prone to failure when applied to previously unseen inputs.
  • Increasing the size of the convolution kernels can increase the representational capacity of the network but doing so also loses the computational and statistical efficiency obtained by using local convolutional structure.

SAGAN

  • SAGAN allows attention-driven, long-range dependency modeling (卷积核易于捕捉局部信息,而 SAGAN 通过注意力机制引入广域依赖) for image generation tasks.
  • In the SAGAN, the proposed attention module has been applied to both the generator and the discriminator.
    • (1) Generator: Details can be generated using cues from all feature locations.
    • (2) Discriminator: the discriminator can check that highly detailed features in distant portions of the image are consistent with each other.
    • Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.

Self-attention - 全局空间信息计算

  • x ∈ R C × N x\in\R^{C\times N} x∈RC×N: image features from the previous hidden layer. Here, C C C is the number of channels and N N N is the number of feature locations of features from the previous hidden layer.
  • f ( x ) = W f x , g ( x ) = W g x f(x) = W_f x, g(x) = W_gx f(x)=Wf​x,g(x)=Wg​x: transform x x x into two feature spaces f f f (key), g g g (query) to calculate the attention

    • β j , i β_{j,i} βj,i​ indicates the extent to which the model attends to the i i ith location when synthesizing the j j jth region
    • W g ∈ R C ˉ × C , W f ∈ R C ˉ × C W_g\in\R^{\bar C\times C},W_f\in\R^{\bar C\times C} Wg​∈RCˉ×C,Wf​∈RCˉ×C; attention map: N × N N\times N N×N
  • The output of the attention layer is o = ( o 1 , o 2 , . . . , o j , . . . , o N ) ∈ R C × N o = (o_1, o_2, ..., o_j , ..., o_N) ∈ \R^{C×N} o=(o1​,o2​,...,oj​,...,oN​)∈RC×N, where,

    • W h ∈ R C ˉ × C , W v ∈ R C × C ˉ W_h\in\R^{\bar C\times C},W_v\in\R^{C\times\bar C} Wh​∈RCˉ×C,Wv​∈RC×Cˉ

W g , W f , W h , W v W_g,W_f,W_h,W_v Wg​,Wf​,Wh​,Wv​ are implemented as 1 × 1 1×1 1×1 convolutions. Since We did not notice any significant performance decrease when reducing the channel number of C ˉ \bar C Cˉ to be C / k C/k C/k, where k = 1 , 2 , 4 , 8 k = 1, 2, 4, 8 k=1,2,4,8 after few training epochs on ImageNet. For memory efficiency, we choose k = 8 k = 8 k=8 (i.e., C ˉ = C / 8 \bar C = C/8 Cˉ=C/8) in all our experiments.


Self-attention - 整合全局空间信息和局部信息

  • In addition, we further multiply the output of the attention layer by a scale parameter and add back the input feature map. Therefore, the final output is given by,
    where γ γ γ is a learnable scalar and it is initialized as 0.
  • Introducing the learnable γ γ γ allows the network to first rely on the cues in the local neighborhood – since this is easier – and then gradually learn to assign more weight to the non-local evidence.
    • The intuition for why we do this is straightforward: we want to learn the easy task first and then progressively increase the complexity of the task.

Loss

  • In the SAGAN, the generator and the discriminator are trained in an alternating fashion by minimizing the hinge version of the adversarial loss (Lim & Ye, 2017; Tran et al., 2017; Miyato et al., 2018 (SNGAN)),

Spectral normalization for both generator and discriminator

  • In SNGAN, SN is only applied to D D D. Here, SAGAN applys spectral normalization to both GAN generator and discriminator.

    • Spectral normalization in the generator can prevent the escalation of parameter magnitudes and avoid unusual gradients.
    • We find empirically that spectral normalization of both generator and discriminator makes it possible to use fewer discriminator updates per generator update, thus significantly reducing the computational cost of training. The approach also shows more stable training behavior.

Imbalanced learning rate for generator and discriminator updates

  • In previous work, regularization of the discriminator (SNGAN; WGAN-GP) often slows down the GANs’ learning process.

    • In practice, methods using regularized discriminators typically require multiple (e.g., 5) discriminator update steps per generator update step during training.
  • Independently, Heusel et al. (Heusel et al., 2017) have advocated using separate learning rates (TTUR; Two-Timescale Update Rule) for the generator and the discriminator.
  • We propose using TTUR specifically to compensate for the problem of slow learning in a regularized discriminator, making it possible to use fewer discriminator steps per generator step. Using this approach, we are able to produce better results given the same wall-clock time.
    • lr for Discriminator: 0.0004
    • lr for Generator: 0.0001

SAGAN: Self-attention GAN相关推荐

  1. 从DCGAN到SELF-MOD:GAN的模型架构发展一览

    事实上,O-GAN的发现,已经达到了我对GAN的理想追求,使得我可以很惬意地跳出GAN的大坑了.所以现在我会试图探索更多更广的研究方向,比如NLP中还没做过的任务,又比如图神经网络,又或者其他有趣的东 ...

  2. sagan 自注意力_请使用英语:自我注意生成对抗网络(SAGAN)

    sagan 自注意力 介绍 (Introduction) In my effort to better understand the concept of self-attention, I trie ...

  3. [Intensive Reading]图像生成:SaGAN

    简介 首先需要说明下,SaGAN不是SAGAN,SAGAN是Self-Attention GAN,Ian Goodfellow大牛挂名的论文,而这篇文章要介绍的是SaGAN是Spatial Atten ...

  4. 关于GAN的训练技巧

    本文转自极市平台:https://mp.weixin.qq.com/s/XQUGX6kH_EyjE5RJZBWRxw 导读 训练生成对抗网络是很难的:我们来想办法变得简单一点. 介绍 一年前,我决定开 ...

  5. Self-Attention GAN 中的 self-attention 机制

    作者丨尹相楠 学校丨里昂中央理工博士在读 研究方向丨人脸识别.对抗生成网络 Self Attention GAN 用到了很多新的技术.最大的亮点当然是 self-attention 机制,该机制是 N ...

  6. 深度解读DeepMind新作:史上最强GAN图像生成器—BigGAN

    在碎片化阅读充斥眼球的时代,越来越少的人会去关注每篇论文背后的探索和思考. 在这个栏目里,你会快速 get 每篇精选论文的亮点和痛点,时刻紧跟 AI 前沿成果. 点击本文底部的「阅读原文」即刻加入社区 ...

  7. GAN—为百年旧照上色

    前言 Attention GAN 该项目的目的是为旧照片着色并将其修复.带自注意力机制的生成对抗网络.生成器是一个预训练 Unet,我将它修改为具有光谱归一化和自注意力.这是一个非常简单的转换过程.首 ...

  8. 不要再次进行阅读的计算机论文与理由(持续更新中)

    这篇博客主要记载一些没有代码实现的计算机论文,并且确保在理论上也不值得读的论文的汇总. 论文名称 不再阅读的理由 所属期刊/会议 <Negative eigenvalues of the hes ...

  9. 【今日CS 视觉论文速览】Fri, 21 Dec 2018

    今日CS.CV计算机视觉论文速览 Fri, 21 Dec 2018 Totally 23 papers Daily Computer Vision Papers [1] Title: Steerabl ...

  10. 《安检违禁品图像生成与评价网络模型研究》阅读笔记

    <安检违禁品图像生成与评价网络模型研究>阅读笔记 一.绪论 1.存在的问题 2.研究内容 二.安检图像数据集构建与预处理 1. 图像采集(实验室X光机) 2.安检X光图像预处理 3.安检X ...

最新文章

  1. python zen_Python的宗旨(Zen of Python)
  2. 开始计算机USB存储功能,USB存储设备禁用怎么设置
  3. fail-safe fail-fast知多少
  4. 山寨今日头条的标题title效果
  5. 【电路原理】学习笔记(0):电路与电路模型
  6. 『原创』.Net CF下ListView的数据绑定
  7. 苹果六电池_【行业分析】特斯拉“电池日”前夕供应链个股备受关注 溶剂龙头石大胜华DMC等需求量有望大增...
  8. 第一周-第11章节-Python3.5-if else流程判断
  9. ETL、BI、MMP数据库
  10. XP侧边栏(XP桌面秀)
  11. 如何用matlab求向量在基下的坐标,请问什么是有关向量的基底、基向量、基坐标?...
  12. 探索硅谷奇迹的本质--周末荐书之《硅谷之谜》
  13. 清华大学ISATAP访问IPv6设置
  14. 计算机有的应用连不上网络设置,电脑应用程序错误连不上网
  15. 作为审稿人,你什么情况下会选择拒稿?
  16. Zotero-word中引用跳转到参考文献/建立超链接-引用格式(Xie et al 2021, Achanta et al 2012)
  17. 教你怎样混社会(转)
  18. 计算机基础-工控机、上位机、下位机、stm32、单片机
  19. 彩色图像自动色阶调整和自动对比度调整
  20. 微型计算机标致寄存器实验报告,xin微机原理与汇编语言程序设计

热门文章

  1. Android拍摄raw照片,这20款摄影APP,让你的照片飞上天!
  2. Docker之使用maven插件【Dockerfile方式】构建并推送镜像到私有仓库
  3. 多平台如何发布文章?
  4. NewStarCTF 公开赛赛道 WEEK2 pwn 砍一刀
  5. CentOS8安装Docker服务
  6. 您能不能也宽容一点??
  7. 【Unity脚本】游戏开发常用功能——以平台动作游戏为例解决“瞬移穿墙”问题
  8. H3C交换机环路监测,NTP时间同步。
  9. 电脑WiFi图标不见了?!怎么办......≡ (▔﹏▔) ≡
  10. PPT提取文字C代码实现