soft Attention 和Hard Attention

以下内容摘自：https://zhuanlan.zhihu.com/p/31547842

1.attention的提出：

把输入X编码成一个固定的长度，对于句子中每个词都赋予相同的权重，这样是不合理的，没有区分度往往使模型性能下降。因此提出Attention Mechanism（注意力机制），用于对输入X的不同部分赋予不同的权重，进而实现软区分的目的。
Kelvin Xu等人与2015年发表论文《Show, Attend and Tell: Neural Image Caption Generation with Visual Attention》，在Image Caption中引入了Attention，当生成第i个关于图片内容描述的词时，用Attention来关联与i个词相关的图片的区域。

2. Soft Attention：

传统的Attention Mechanism就是Soft Attention,即通过确定性的得分计算来得到attended之后的编码隐状态。Soft Attention是参数化的（Parameterization），因此可导，可以被嵌入到模型中去，直接训练。梯度可以经过Attention Mechanism模块，反向传播到模型其他部分。
也有称作TOP-down Attention。

3. Hard Attention：

相反，Hard Attention是一个随机的过程。Hard Attention不会选择整个encoder的隐层输出做为其输入，Hard Attention会依概率Si来采样输入端的隐状态一部分来进行计算，而不是整个encoder的隐状态。为了实现梯度的反向传播，需要采用蒙特卡洛采样的方法来估计模块的梯度。
两种Attention Mechanism都有各自的优势，但目前更多的研究和应用还是更倾向于使用Soft Attention，因为其可以直接求导，进行梯度反向传播。

有一篇CVPR2018的论文《Bottom-up and Top-down attention for image captioning and visual question answering》，将Hard Attention应用到image caption上，也称作Bottom-up Attention。

更多关于attention

2.2 local / global attention
2.3 Self Attention

还有一篇相对来说理论性较强的https://www.cnblogs.com/taojake-ML/p/6113459.html

soft Attention 和Hard Attention相关推荐

注意力机制的分类 | Soft Attention和Hard Attention
在前面两节的内容中我们已经介绍了注意力机制的实现原理,在这节内容中我们讲一讲有关于注意力机制的几个变种: Soft Attention和Hard Attention 我们常用的Attention即为S ...
各类注意力机制的介绍 (Intra Inter Soft Hard Global Local Attention)
前言注意力机制最早出现于论文<Neural Machine Translation by Jointly Learning to Align and Translate> 中. 传统神经 ...
soft attention and self attention
注意力模型最近几年在深度学习各个领域被广泛使用,无论是图像处理.语音识别还是自然语言处理的各种不同类型的任务中,都很容易遇到注意力模型的身影.所以,了解注意力机制的工作原理对于关注深度学习技术发展的技 ...
Attention 与Hierarchical Attention Networks 原理
Attention 与Hierarchical Attention Networks 1. Attention 注意力机制 1.1 什么是Attention? 1.2 加入Attention的动机 1 ...
【Attention,Self-Attention Self Attention Self_Attention】通俗易懂
Attention is, to some extent, motivated by how we pay visual attention to different regions of an im ...
pytorch实现attention_Longformer: 局部Attention和全局attention的混搭
最近要开始使用Transformer去做一些事情了,特地把与此相关的知识点记录下来,构建相关的.完整的知识结构体系, 以下是要写的文章,本文是这个系列的第十一篇: Transformer:Attent ...
为节约而生：从标准Attention到稀疏Attention
作者丨苏剑林单位丨追一科技研究方向丨NLP,神经网络个人主页丨kexue.fm 如今 NLP 领域,Attention 大行其道,当然也不止 NLP,在 CV 领域 Attention 也占有一 ...
Self Attention和Multi-Head Attention的原理和实现
个人博客:http://www.chenjianqu.com/ 原文链接:http://www.chenjianqu.com/show-47.html 引言使用深度学习做NLP的方法,一般是将单词转 ...
attention机制、self-attention、channel attention、spatial attention、multi-head attention、transformer
文章目录 attention sequence attention attention 与 self-attention channel attention 与 spatial attention m ...
attention与self attention的区别
1 什么是注意力机制? 当我们观察某件事物/景色的时候,往往会先简单看下周围环境,然后将重点放在某一重要部分.深度学习中的注意力机制从本质上讲和人类的选择性视觉注意力机制类似,也是从众多信息中选择出对 ...

soft Attention 和Hard Attention

soft Attention 和Hard Attention相关推荐

最新文章

热门文章