2016年的文章，还是在attention机制上的改进。
核心思路“监督”：计算注意力和真实对齐情况的举例，并将其作为模型损失进行训练

简介 Introduce

Given the alignments of all the training sentence pairs, we add an alignment distance cost to the objective function.

经典注意力模型（四刷了，每次都不一样）

对齐模块

Given an alignment matrix A for a sentence pair (x, y) in Figure 2 (a), where we have an end-of-source-sentence token <\eos> = x_L, and we align all the unaligned target words (y₃∗ in this example) to <\eos> , also we force y_m* (end-of-target-sentence) to be aligned to x_L with probability one.

Then we conduct two transformations to get the probability distribution matrices ((b) and © in Figure 2).

（b）（c）分别对应归一化和数据平滑

数据平滑方法

Given the original alignment matrix A, we create a matrix A∗ with all points initialized with zero. Then, for each alignment point At,i = 1, we update A∗ by adding a Gaussian distribution, g(µ, σ), with a window size w (t-w, … t … t+w).
Take the A_1,1 = 1 for example, we have A_1,1 * += 1, A_1,2 * += 0.61, and A_1,3 * += 0.14, with w=2, g(µ, σ)=g(0, 1). Then we normalize each row and get ©.

优化方法

在原有损失函数上增加对齐损失函数，优化策略可以是联合训练，也可以分别训练（先训练对齐，再训练翻译）。

实验结果

掠过，这篇文章很短，就此结束了。
比较值得注意的是他加入了高斯平滑方法。

【论文笔记】Supervised Attentions for Neural Machine Translation相关推荐

【论文阅读003】：CURE: Code-Aware Neural Machine Translation for Automatic Program Repair
论文基本情况: 论文名: CURE: Code-Aware Neural Machine Translation for Automatic Program Repair 作者:Nan Jiang(P ...
论文阅读笔记：Frequency-Aware Contrastive Learning for Neural Machine Translation
论文链接:https://arxiv.org/abs/2112.14484 author={Zhang, Tong and Ye, Wei and Yang, Baosong and Zhang, L ...
《Reducing Word Omission Errors in Neural Machine Translation:A Contrastive Learning Approach》论文阅读笔记
Reducing Word Omission Errors in Neural Machine Translation:A Contrastive Learning Approach 基本信息研究目 ...
【论文笔记】Effective Approaches to Attention-based Neural Machine Translation
这篇文章发布2015年,关于Attention的应用. 现在看来可能价值没那么大了,但是由于没读过还是要读一遍. 简介 Introduce In parallel, the concept of &q ...
【论文泛读】4. 机器翻译：Neural Machine Translation by Jointly Learning to Align and Translate
更新进度:■■■■■■■■■■■■■■■■■■■■■■■|100% 理论上一周更一个经典论文刚刚开始学习,写的不好,有错误麻烦大家留言给我啦这位博主的笔记短小精炼,爱了爱了:点击跳转目录准备 ...
《Neural Machine Translation by Jointly Learning to Align and Translate》阅读笔记
个人总结本文最大贡献是提出了注意力机制,相比于之前 NMT(Neural Machine Translation) 把整个句子压缩成一个固定向量表示的方法,对不同位置的目标单词计算每个输入的不同权重 ...
Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings 论文总结
Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings 论文总结该文是阅 ...
【机器翻译】《Gradient-guided Loss Masking for Neural Machine Translation》论文总结
<Gradient-guided Loss Masking for Neural Machine Translation>https://arxiv.org/pdf/2102.13549. ...
【机器翻译】《Nearest Neighbor Knowledge Distillation for Neural Machine Translation》论文总结
<Nearest Neighbor Knowledge Distillation for Neural Machine Translation>https://arxiv.org/pdf/ ...

【论文笔记】Supervised Attentions for Neural Machine Translation

简介 Introduce

经典注意力模型（四刷了，每次都不一样）

对齐模块

数据平滑方法

优化方法

实验结果

【论文笔记】Supervised Attentions for Neural Machine Translation相关推荐

最新文章

热门文章