Attention Is All You Need

摘要

提出一种新的简单的网络结构，仅基于注意力机制

背景

1.循环模型在计算隐藏状态ht时，使用了前面的ht-1和位置t，这种顺序性使得模型无法实现并行计算
2.注意力机制允许对依赖项进行建模，忽略输入或者输出项的距离
3.自注意是一种注意力机制，能够联系一个序列中的不同位置来计算序列表示

模型结构

1.encoder将输入的符号表示序列map到一个连续的表示序列z，对于z，decoder一次生成元素的符号表示输出序列；每一步都是自回归的，当生成下一个符号时，使用之前生成的符号作为附加输入
2.Encoder and Decoder Stacks

(1)Encoder:
N=6，由6个相同的层堆叠，每层有2个子层。一个是multi-head self-attention mechanism，另一个是基于位置的全连接前馈网络，每一子层进行正则化后，使用残差连接

(2)Decoder:
N=6，由6个相同的层堆叠，每层有3个子层，往中间查了一个子层，该子层接收来自Encoder的输出和上一子层的输出作为该子层的输入。这三个子层做同样的正则和残差连接操作。
# This masking, combined with fact that the output embeddings are offset by one position, ensures that the predictions for position i can depend only on the known outputs at positions less than i.
使用标记多头注意子层，确保位置i的只依赖于小于i的位置的输出？？？？偏移

3.Attention
一个attention函数，将1个query和一个key-value对集合map到输出，输出计算了value的权重和，每个value的权重是由query和对应的key之间的兼容函数决定的
（1）Scaled Dot-Product Attention （基于缩放的点积注意根下dk）

将一个query的集合合到一个Q矩阵里；在原点积注意函数基础上加上了缩放因子

（2）Multi-Head Attention

将query、key、value进行多次不同的线性投影，可以并行计算；然后将得到的结果进行连接，再进行线性投影，得到最终结果
· 允许模型在不同的位置共同关注来自不同表示子空间的信息

（3）Applications of Attention in Model
· In “encoder-decoder attention” layers，query来自之前的decoder层，key和value来自encoder层的输出
# This allows every position in the decoder to attend over all positions in the input sequence
· encoder中的self-attention层中，query、key、value都来自encoder前一层的输出；
·
4. Position-wise Feed-Forward Networks

5.Embeddings and Softmax
6.Positional Encoding

Attention Is All You Need论文笔记相关推荐

Attention Is All You Need 论文笔记
Attention Is All You Need 论文笔记文章目录 Attention Is All You Need 论文笔记背景 Tansformer模型简介 Attention & ...
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion论文笔记
CVPR2021论文笔记题目:Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Mod ...
CVPR 2021 《Causal Attention for Vision-Language Tasks》论文笔记
目录简介动机方法实验简介本文出自张含望老师课题组. 论文链接动机文章的第一句就说明了本文的动机,也就是,本文提出了一个新颖的注意力机制,可以消除现有的基于注意力的视觉语言方法中的混杂效 ...
基于Attention的机器翻译模型，论文笔记
论文题目:Neural Machine Translation by Jointly Learning to Align and Translate 论文地址:http://pdfs.semantic ...
【论文笔记】Neural Relation Extraction with Multi-lingual Attention
一.概要该paper发于ACL2017上,作者主要基于关系事实通常在各种语言中存在某种模式表达,并且不同语言之间的模式是不同的这两个动机,针对于当前存在的单语言关系抽取的方法,从而存在忽略不同语 ...
论文笔记2：Deep Attention Recurrent Q-Network
参考文献:[1512.01693] Deep Attention Recurrent Q-Network (本篇DARQN) [1507.06527v3] Deep Recurrent Q-Learn ...
论文笔记：Pay More Attention to History: A Context Modeling Strategy for Conversational Text-to-SQL
论文笔记:Pay More Attention to History: A Context Modeling Strategy for Conversational Text-to-SQL 目录论文 ...
GCN论文笔记——HopGAT: Hop-aware Supervision Graph Attention Networks for Sparsely Labeled Graphs
[论文笔记]HopGAT: Hop-aware Supervision Graph Attention Networks for Sparsely Labeled Graphs 作者:纪超杰,王如心等 ...
【论文笔记】D2A U-Net: Automatic segmentation of COVID-19 CT slices based on dual attention and hybrid di
声明不定期更新自己精度论文,通俗易懂,初级小白也可以理解涉及范围:深度学习方向,包括 CV.NLP.Data Fusion.Digital Twin 论文标题:D2A U-Net: Automat ...

Attention Is All You Need论文笔记

Attention Is All You Need

摘要

背景

模型结构

Attention Is All You Need论文笔记相关推荐

最新文章

热门文章