这篇文章发布2015年，关于Attention的应用。
现在看来可能价值没那么大了，但是由于没读过还是要读一遍。

简介 Introduce

In parallel, the concept of “attention” has gained popularity recently in training neural networks, allowing models to learn alignments between different modalities, e.g., between image objects and agent actions in the dynamic control problem (Mnih et al., 2014), between speech frames and text in the speech recognition task, or between visual features of a picture and its text description in the image caption generation task (Xu et al., 2015). In the context of NMT, Bahdanau et al. (2015) has successfully applied such attentional mechanism to jointly translate and align words.

可以看到，attention机制对于翻译任务其实是他山之石。最早是用在视觉和语音等多模态领域的。

In this work, we design, with simplicity and fectiveness in mind, two novel types of attentionbased models: a global approach in which all source words are attended and a local one whereby only a subset of source words are considered at a time.

提出的两种注意力应用方法，核心目的是运算简单和效率：

全局注意力：所有词都被关注，和原始注意力模型类似，但更简单。

局部注意力：部分词被关注，比较新颖，作者认为这是一种***软硬注意力***的结合。

注意力机制模型 Attention-based Models

While these models differ in how the context vector c_t is derived, they share the same subsequent steps.

注意力机制用法的关键在于如何生成C_t，其他步骤都是相同的。

全局注意力机制 Global Attention

h-表示是encoder的隐状态，h表示是decoder的隐状态。
具体做法是把decoder的t时刻隐状态逐个与encoder的所有隐状态做分数计算。
分数计算方法有三种，其中第三种就是前人使用的。
与前人不同的地方在于，前人用的是t-1时刻的隐状态配合注意力生成t时刻的隐状态，本文用的是t时刻的隐状态配合注意力去做预测。（这里写得比较绕，但是认真看原文确实是这样的，两篇文关注的阶段是不同的）这样做的优势是计算更简单

局部注意力 Local Attention

Our local attention mechanism selectively focuses on a small window of context and is differentiable. This approach has an advantage of avoiding the expensive computation incurred in the soft attention and at the same time, is easier to train than the hard attention approach. In concrete details, the model first generates an aligned position
p_t for each target word at time t. The context vector c_t is then derived as a weighted average over the set of source hidden states within the window [pt−D, pt+D]; D is empirically selected.Unlike the global approach, the local alignment vector at is now fixed-dimensional R=2D+1.

先给t时刻算出一个对齐起点位置P_t,然后假设t时刻的翻译内容仅与第P_t个原文词和周围D个词相关（显然这个假设并不是合理的，那么可以预见效果并不一定特别好）。

两种对齐起点的计算方法

朴素假设：译文中的第t个词和原文中的第t个词对齐（显然这个假设更不合理）
预测对齐：构建模型去预测对齐位置（方法见下，感觉也不太靠谱诶）

预测模型的输入只有decoder的隐状态，然后有两个可学习参数。
在计算比分的最后步骤加入了一个高斯系数，即越靠近对齐起点p_t的词，得到的分数越高（感觉这个也很不靠谱诶）。

对齐覆盖问题 Input-feeding Approach

In our proposed global and local approaches, the attentional decisions are made independently, which is suboptimal. Whereas, in standard MT, a coverage set is often maintained during the translation process to keep track of which source words have been translated. Likewise, in attentional NMTs, alignment decisions should be made jointly taking into account past alignment information. To address that, we propose an input-feeding approach in which attentional vectors ˜h_t are concatenated with inputs at the next time steps as illustrated in Figure 4.11 The effects of having such connections are two-fold: (a) we hope to make the model fully aware of previous alignment choices and (b) we create a very deep network spanning both horizontally and vertically.

作者这里提到了在传统机器翻译中，会维护一个***覆盖集***用来告诉模型：原文中哪些词已经被翻译过了（像我这种后生晚辈，肯定是从来没听说过这个东西的）。

因此希望在做注意力对齐的时候，注意力模型也能知道哪些词已经被对齐过了。所以提出了一种专门的输入方法，即在解码器计算下一时刻的隐状态时，将上一时刻的隐状态和上一时刻所对齐的输入向量同时输入。

实验

从现在的角度来看，这篇文章的实验结果已经没有多大意义了，这部分我就略过了。

分析 Analysis

同样也没有太多可以讲的。

总结

In this paper, we propose two simple and effective attentional mechanisms for neural machine translation: the global approach which always looks at all source positions and the local one that only attends to a subset of source positions at a time.

在我看来这篇文章直到今天还有价值的主要原因反倒是在于他优化了attention（全局）的计算，至于那个所谓局部注意力和特殊的输入方法可能并不一定多么好。

【论文笔记】Effective Approaches to Attention-based Neural Machine Translation相关推荐

论文阅读笔记：Frequency-Aware Contrastive Learning for Neural Machine Translation
论文链接:https://arxiv.org/abs/2112.14484 author={Zhang, Tong and Ye, Wei and Yang, Baosong and Zhang, L ...
【论文泛读】4. 机器翻译：Neural Machine Translation by Jointly Learning to Align and Translate
更新进度:■■■■■■■■■■■■■■■■■■■■■■■|100% 理论上一周更一个经典论文刚刚开始学习,写的不好,有错误麻烦大家留言给我啦这位博主的笔记短小精炼,爱了爱了:点击跳转目录准备 ...
【论文阅读003】：CURE: Code-Aware Neural Machine Translation for Automatic Program Repair
论文基本情况: 论文名: CURE: Code-Aware Neural Machine Translation for Automatic Program Repair 作者:Nan Jiang(P ...
【论文阅读】Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization
本文在参考一些网上资料的基础上,对该论文的思想和重要步骤作出了总结,也加入了在与身边朋友讨论的过程中对文章更细致深入的理解的内容,同时包含了自己在阅读中发现需要了解的背景知识的简单介绍. 目录概述 ...
[持续更新] 神经机器翻译论文汇总 Papers on Neural Machine Translation
[持续更新] 神经机器翻译论文汇总 Papers on Neural Machine Translation 源文地址 :http://polarlion.github.io/nmt/2017/02/ ...
Towards Two-Dimensional Sequence to Sequence Model和Two-Way Neural Machine Translation两篇论文简单分析
第一篇是:发布于2018年Towards Two-Dimensional Sequence to Sequence Model in NeuralMachine Translation 第二篇是:与第 ...
《Effective Approaches to Attention-based Neural Machine Translation》—— 基于注意力机制的有效神经机器翻译方法
目录 <Effective Approaches to Attention-based Neural Machine Translation> 一.论文结构总览二.论文背景知识 2.1 ...
【Paper】Effective Approaches to Attention-based Neural Machine Translation
论文原文:PDF 论文年份:2015 论文被引:4675(2020/11/08) 7232(2022/03/26) 论文作者:Minh-Thang Luong et.al. 文章目录 Abstract ...
论文笔记2：Deep Attention Recurrent Q-Network
参考文献:[1512.01693] Deep Attention Recurrent Q-Network (本篇DARQN) [1507.06527v3] Deep Recurrent Q-Learn ...

【论文笔记】Effective Approaches to Attention-based Neural Machine Translation

简介 Introduce

注意力机制模型 Attention-based Models

全局注意力机制 Global Attention

局部注意力 Local Attention

两种对齐起点的计算方法

对齐覆盖问题 Input-feeding Approach

实验

分析 Analysis

总结

【论文笔记】Effective Approaches to Attention-based Neural Machine Translation相关推荐

最新文章

热门文章