




Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks 

1. To tackle this problem, we propose to build a Graph Convolutional Network (GCN) over the dependency tree of a sentence to exploit syntactical information and word dependencies.

注意over 和 exploit 的使用

2. GCN has a multi-layer architecture, with each layer encoding and updating the representation of nodes in the graph using features of immediate neighbors.


以及用with 的使用


3. Furthermore, following the idea of self-looping in Kipf and Welling (2017), each word is manually set adjacent to itself, i.e. the diagonal values of A are all ones.

Following the idea of …

the diagonal values of A are all ones. 对角线为1的矩阵A

set adjacent to itself 设置自链接

4. Experimental results have indicated that GCN brings benefit to the overall performance by leveraging both syntactical infor- mation and long-range word dependencies.

Bing benefit to

Leverage 可以翻译为利用的意思

5. While attention-based models are promising, they are insufficient to capture syntactical dependencies between context words and the aspect within a sentence.

这里描述了attention-based的缺陷,不能充分地捕捉句子的句法依赖,其实还是由于word与word之间距离远,而 导致的,其实也不能完全这么说吧,self attention 会考虑句内所有word的attention,可能能解决一些远距离的信息丢失问题吧。

While 是尽管


1. Our contributions are two-fold. Firstly, we introduce a simple and well-behaved layer-wise prop- agation rule for neural network models which operate directly on graphs and show how it can be motivated from a first-order approximation of spectral graph convolutions (Hammond et al., 2011). Secondly, we demonstrate how this form of a graph-based neural network model can be used for fast and scalable semi-supervised classification of nodes in a graph. Experiments on a number of datasets demonstrate that our model compares favorably both in classification accuracy and effi- ciency (measured in wall-clock time) against state-of-the-art methods for semi-supervised learning.


从本质上讲,GCN 是谱图卷积(spectral graph convolution) 的局部一阶近似(localized first-order approximation)。GCN的另一个特点在于其模型规模会随图中边的数量的增长而线性增长。总的来说,GCN 可以用于对局部图结构与节点特征进行编码。

2. Semantic role labeling (SRL)can be informally described as the task of discovering who did what to whom.

之前在任务定义,形式化时常常会用 is formalized as ……或者是 is define as ……problem

其实也可以使用 is described as the task of    …..被描述为这样….的任务


1. In its most general formulation, the model allows every node to attend on every other node, dropping all structural information. We inject the graph structure into the mechanism by performing masked attention—we only compute eij for nodes j ∈ Ni, where Ni is some neighborhood of nodei in the graph.




every node to attend on every other node 来表达节点相互attend的感觉

Drop all structural information. 尤其是drop的使用,这里有比较多normal的词,比如ignore,lose

injectsth into sth by sth 将某种机制,某种结构通过某种方式注入到….

mask attention 这种说法

Neighborhood of node i in the graph

2. By stacking layers in which nodes are able to attend over their neighborhoods’ features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion).

这里的which 是指在stack layers.

nodes are able to attend over their neihborhoods’ features.

specifying different weights to different nodes

Without 的使用

3. However, many interesting tasks involve data that can not be represented in a grid-like structure and that instead lies in an irregular domain.

4. This is the case of 3D meshes, social networks, telecommunication networks, biological networks or brain connectomes. Such data can usually be represented in the form of graphs.



一种是grid-like structure这样的网格结构是可以通过CNN,

还有一种是irregular domain 非规则的,比如社交网络,电信网络等

5. The idea is to compute the hidden representations of each node in the graph, by attending over its neighbors, following a self-attention strategy.


By attending over its neighbors

Following a self-attention strategy

Attention Guided Graph Convolutional Networks for Relation Extraction 

1. However, how to effectively make use of relevant information while ignoring irrelevant information from the dependency trees remains a challenge research question.


以how to do sth 作为主语

While 的使用,这里的while 表示同时


remains a challengng research question , 这里的remain用的好,比 is 表达出了这不仅仅是个问题,还是个遗留问题

2. Intuitively, we develop a “soft pruning” strategy that transforms the original dependency tree into a fully connected edge- weighted graph.



develop a  strategy that

3. With the help of dense connections, we are able to train the AGGCN model with a large depth, allowing rich local and non-local de- pendency information to be captured.

这一段描述的是dense connections 对网络的作用,虽然都是表达DC能够训练更深的网络,降低过拟合的风险,但是这个with the help of 用的好啊

With the help of

train model with a large depth  这个就比deeper network要高大上的多

local and no-local dependency information

allow的主语是model, 也更客观

model allow sth to be done.

allowing rich local and non-local dependency information to be captured. 其实这里可以借此衍生出很多改写

GCNs are neural networks that operate directly on graph structures



4. Opposite direction of a dependency arc is also included, which means Aij = 1 and Aji = 1 if there is an edge going from node i to node j,

otherwise Aij = 0 and Aji = 0.

5. Our model can be understood as a soft-pruning approach that automatically learns how to selectively attend to the relevant sub-structures useful for the relation extraction task

Sth can be understood as a ….approach that

how to selectively attend to the relevant sub-structures useful 这里的attend

6. Instead of using rule-based prun- ing, we develop a “soft pruning” strategy in the attention guided layer, which assigns weights to all edges. These weights can be learned by the model in an end-to-end fashion.


Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks 

1. When an aspect term is separated away from its sentiment phrase, it is hard to find the associated sentiment words in a sequence.

描述了为什么我们要将syntax引入到很多nlp任务中,这里指的是apsect-level sentiment classification.

因为通常我们的模型是建立在序列输入上,在序列关系上,有些文本距离一些关键信息距离很远,但是如果将其转换为句法树其实上两者存在直接的关系,这就是为什么要引入syntactic dependencies,因为能从一定程度上降低由于长距离依赖而导致的信息丢失问题。

2. Unlike these previous methods, our approach represents a sentence as a dependency graph instead of a word sequence.



our approach represents A as a B instead of C


3. We employ a multi-layer graph attention network to propagate sentiment features from important syntax neighbourhood words to the aspect target.



employsb to do sth

Propagate the ….from the important syntax neighbourhood words to the aspect target


Graph Convolution over Pruned Dependency Trees Improves Relation Extraction 

1. To resolve these issues, we further apply a Contextualized GCN (C-GCN) model, where the input word vectors are first fed into a bi-directional long short-term memory (LSTM) network to generate contextualized representations, which are then used as h(0) in the original model.

这里解释了C-GCN, 其实C-GCN很好理解,其实就是在word embedding 和 GCN layer之间插一个Bi-LSTM层(有时也被称为contextualized layer), 现将word embedding 过一遍bi-lstm再输入到gin 中对contextualized features 做propagate.

We note that this relation extraction model is conceptually similar to graph kernel-based models (Zelenko et al., 2003), in that it aims to utilize local dependency tree patterns to inform relation classification.

In that 的使用

is conceptually similar to 在概念上与…..相似

Intuitively, during each graph convolution, each node gathers and summarizes information from its neighboring nodes in the graph.

描述GCN更新结点表示的方式 gathers and summarizes information from its neighboring nodes in the graph

However, naively applying the graph convolution operation in Equation (1) could lead to node representations with drastically different magnitudes, since the degree of a token varies a lot. This could bias our sentence representation towards favoring high-degree nodes regardless of the information carried in the node (see details in Section 2.2).


Degree of a token varies a lot

be towards to doing sth 有利于做…..

  • towards doing 这个用法我不太懂

regardless of the information carried in the node 不管

Densely Connected Convolutional Networks 

DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. 



Compelling 引入注目的

Vanishing-gradient problem 梯度下降的问题

feature propagation 特征传播

Encourage feature reuse 特征重用,这里是不是可以理解为每层的输入都可能被选择,从而保留下来

substantially, 大大地,基本上;大体上;总的来说

As CNNs become increasingly deep, a new research problem emerges: as information about the input or gradient passes through many layers, it can vanish and “wash out” by the time it reaches the end (or beginning) of the network.

信息会随着网络层数变多,在传播过程中出现消失或者是wash out的现象

  • they create short paths from early layers to later layers 这句话直戳本质,解决信息vanish and “wash out” 的问题。

Although these different approaches vary in network topology and training procedure, they all share a key characteristic: they create short paths from early layers to later layers.

这里指的是为了解决由于网络变深而导致得一系列问题,包括DC,ResNets和Highway Networks都是为了缓解这些问题的

network topology 网络拓扑

share a key characteristic

这句话描述了Dense Connected layer的工作原理

To ensure maximum information flow between layers in the network, we connect all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers.


each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers.



  • 这句话描述了DenseNets 和 ResNets的不同

Concatenating feature-maps learned by different layers increases variation in the input of subsequent layers and improves efficiency. This constitutes a major difference between DenseNets and ResNets

constitutes a major difference between  A and B 构成了A和B的主要不同(差异)

increases variation 增加变化

There are other notable network architecture innovations which have yielded competitive results.

yielded competitive results 取得了具有竞争力的结果

  • 穿插一个ReNet

We empirically demonstrate DenseNet’s effectiveness on several benchmark datasets and compare with state-of-the- art architectures, especially with ResNet and its variants.


empirically 经验地

Demonstrate  A’s effectiveness on ****(数据集)and compare with *****(models or methods), especially with ****(models or methods)

DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation 

Emotion recognition in conversation (ERC) has received much attention, lately, from researchers due to its potential widespread applications in diverse areas, such as health-care, education, and human resources.


The current state-of-the-art model in emotion recognition in conversation is (Majumder et al., 2019), where authors introduced a party state and global state based recurrent model for modelling the emotional dynamics.


Thus, there is always a major interplay between the inter-speaker dependency and self- dependency with respect to the emotional dynamics in the conversation

With respect to 关于,至于

there is always a major interplay between A and B with respect to C


We also represent ui ∈ RDm as the feature representation of the utterance, obtained using the feature extraction process described below.



In theory, RNNs like long short-term memory (LSTM) and GRU should propagate long-term contextual information. However, in practice it is not always the case (Bradbury et al., 2017).


In theory 理论上


in practice it is not always the case 并非总是这样

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling 

For example, one can observe that many arcs in the syntactic dependency graph (shown in black below the sentence in Figure 1) are mirrored in the semantic dependency graph.


A is mirrored in B, A在B中有所体现

We believe that one of the reasons for this radical choice is the lack of simple and effective methods for incorporating syntactic information into sequential neural networks (namely, at the level of words).

Incorporate syntactic information into sequential neural networks.


at the level of words

Lack of

sequential neural networks 序列化的神经网络中

One layer GCN encodes only information about immediate neighbors and K layers are needed to encode K-order neighborhoods (i.e., information about nodes at most K hops aways).



immediate neighbors

encode K-order neighborhoods

  • This contrasts with recurrent and recursive neural networks which, at least in theory, can capture statistical dependencies across unbounded paths in a trees or in a sequence.
  • 这个句式没看懂

Interestingly, again unlike recursive neural networks, GCNs do not constrain the graph to be a tree.

因为之前很多方法为了获得句法、词法上的信息,通常会使用递归神经网络,但是GCNs 操作的图结果并不强制为树结构。

We believe that there are many applications in NLP, where GCN-based encoders of sentences or even documents can be used to incorporate knowledge about linguistic structures (e.g., representations of syntax, semantics or discourse).


As in standard convolutional networks (LeCun et al., 2001), by stacking GCN layers one can incorporate higher degree neighborhoods


incorporate higher degree neighborhoods

Our simplification captures the intuition that information should be propagated differently along edges depending whether this is a head-to-dependent or dependent-to-head edge (i.e., along or opposite the corresponding syntactic arc) and whether it is a self-loop.

我就是我见过表达信息沿图结构中的边传播最好的表达,其实是说图结构中的边是不同的,这里将其归为3类,一类是head->dep, 也就是从句法依存中parse的依赖关系,一类是dep->head,  是head->dep的反方向边,最后一个是self-loop, 其实就是在传播过程自我信息要加以保护的一种做法。

这篇文章认为信息沿边的传播要视边的类型而定,在聚合邻居结点是对不同类型的边学习不同的权重,因而是label-specific parameters


captures the intuition that 从句 出于****的出发点

be propagated differently along edges 沿边缘以不同方式传播

depends Whether this is A or B and whether it is C,取决于这是否是A或者B,还是C

The inability of GCNs to capture dependencies between nodes far away from each other in the graph may seem like a serious problem, especially

in the context of SRL: paths between predicates and arguments often include many dependency arcs。


However, when graph convolution is performed on top of LSTM states (i.e., LSTM states serve as input x = h(1) vv

to GCN) rather than static word embeddings, GCN may not need to capture more than a couple of hops.

这里针对上面GCN存在的问题,表示如果将GCN建在LSTM之上,也就是说用LSTM的状态作为GCN的输入,而不是静态的word embedding 那么GCN或许



The inability of ***(model)to do  sth, 模型在do sth的不足

nodes far away from each other in the graph

Paths between A and B often include many dependency arcs.

a couple of hops.

The classifier predicts semantic roles of words given the predicate while relying on word representations provided by GCN

这里的while 如何理解呢


This suggests that extra GCN layers are effective but largely redundant with respect to what LSTMs already capture.


