BiERU: Bidirectional Emotional Recurrent Unitfor Conversational Sentiment Analysis

BiERU：用于会话情感分析的双向情感递归单元

Abstract

The main difference between conversational sentiment analysis and single sentence sentiment analysis is the existence of context information which may influence the sentiment of an utterance in a dialogue.
会话情感分析和单句情感分析之间的主要区别是上下文信息的存在，该上下文信息可能会影响对话中话语的情感。
Existing approaches employ complicated deep learning structures to distinguish different parties in a conversation and then model the context information.
现有方法采用复杂的深度学习结构来区分对话中的不同方，然后对上下文信息进行建模。
In this paper, we propose a fast, compact and parameter-efficient party-ignorant framework named bidirectional emotional recurrent unit for conversational sentiment analysis.
本文提出了一种快速、紧凑、参数有效的会话情感分析框架双向情感递归单元。
In our system, a generalized neural tensor block followed by a two-channel classifier is designed to perform context compositionality and sentiment classification, respectively.
在我们的系统中，设计了广义神经张量块，然后是两通道分类器，分别用于执行上下文组合和情感分类。
Extensive experiments on three standard datasets demonstrate that our model outperforms the state of the art in most cases.
在三个标准数据集上进行的大量实验表明，在大多数情况下，我们的模型都优于最新技术。

一、Introduction

二、Related Work

三、Method

A. Problem Definition
1)Given a multiple turns conversation C, the task is to predict the sentiment labels or sentiment intensities of the constituent utterances U1,U2,…,UN.
给定多转对话C，任务是预测组成话语U1，U2，…，UN的情感标签或情感强度。
2)Taking the interactive emotional database IEMOCAP as an example, emotion labels include frustrated, excited, angry, neutral, sad and happy.
以交互式情感数据库IEMOCAP为例，情感标签包括沮丧，激动，愤怒，中立，悲伤和快乐。
3)In general, the task is formulated as a multi-class classification problem over sequential utterances; while in some scenarios, it is regarded as a regression problem given continuous sentiment intensity.
通常，将任务表述为基于顺序发声的多类分类问题。在某些情况下，鉴于持续的情感强度，它被视为回归问题。
4)In this paper, utterances are preprocessed and represented as ut using feature extractors described below.
在本文中，使用以下描述的特征提取器对发声进行预处理并将其表示为ut。
B. Textual Feature Extraction
1)Following the tradition of DialogueRNN, utterances are first embedded into vector space and then fed into CNNs for feature extraction.
遵循DialogueRNN的传统，首先将话语嵌入向量空间，然后输入CNNs进行特征提取。
2)N-gram features are obtained from each utterance by applying three different convolution filters of sizes3, 4 and 5, respectively. Each filter has 50 features-maps.
通过分别应用大小为3、4和5的三种不同的卷积滤波器，从每个话语中获得N-gram特征。每个过滤器有50个特征映射.
3)then use max-pooling followed by rectified linear unit (ReLU)activation to process the outputs of convolution operation.
然后使用最大池化，然后使用整流线性单元（ReLU）激活来处理卷积运算的输出。
4)These activation values are concatenated and fed to a 100 dimensional fully connected layer whose outputs serve as the textual utterance representation. This CNN-based feature extraction network is trained at utterance level supervised by the sentiment labels.
这些激活值被连接起来并输入到100维全连接层，其输出作为文本话语表示。这个基于CNN的特征提取网络是在情感标签的监督下，在话语水平上训练的。
C. Our Model
1)Our ERU is illustrated in Note 1 of Fig. 2, which consists of two components GNTB and EFE.
我们的ERU在图2的注释1中进行了说明，其中包括两个组件GNTB和EFE。
2)As mentioned in the introduction, there are three main steps for conversational sentiment analysis, namely obtaining the context representation;incorporating the influence of the context information into an utterance; and extracting emotional features for classification.
正如导言中提到的，进行会话情感分析主要包括三个步骤，即获取上下文表示；将上下文信息的影响整合到话语中; 提取情感特征进行分类。
3)In this paper, the ERU is employed in a bidirectional manner(BiERU) to conduct the above sentiment analysis task, reducing some expensive computations and converting the previous three-step task into a two-step task as shown in Fig.2 .
本文采用双向方式（BiERU）来进行上述情绪分析任务，减少了一些昂贵的计算，并将之前的三步任务转换为两步任务，如图2所示。
4)Similar to bidirectional LSTM (BiLSTM) [29], two ERUs are utilized for forward and backward passing the input utterances.
与双向LSTM（BiLSTM）[29]类似，两个eru用于向前和向后传递输入语句。
5)Outputs from the forward and backward ERUs are concatenated for sentiment classification or regression. More concretely, the GNTB is applied to encoding the context information and incorporating it into an utterance simultaneously; while EFE takes the output of GNTB as input and is used to obtain emotional features for classification or regression.
前向和后向ERU的输出被串联起来用于情绪分类或回归。更具体地说，GNTB用于对语境信息进行编码，同时将其整合到话语中；而EFE则以GNTB的输出作为输入，用于获取情感特征进行分类或回归。
-1) GNTB—Generalized Neural Tensor Block(广义神经张量块):
The utterance vector ut∈Rd with the context information incorporated is named as contextual utterance vector pt∈Rd in this paper, where d is the dimension of ut and pt. At time t, GNTB (Fig. 3: (a)) takes ut and pt−1 as inputs and then outputs pt, a contextual utterance vector.In this process, GNTB first extracts the context information from pt−1; then it incorporates the context information into ut; finally contextual utterance vector pt is obtained.
本文将包含上下文信息的话语向量ut∈Rd称为上下文话语向量pt∈Rd，其中d是ut和pt的维数。在时间t，GNTB（图3：（a））将ut和pt−1作为输入，然后输出上下文话语向量pt。在这个过程中，GNTB首先从pt−1中提取上下文信息，然后将上下文信息合并到ut中，最后得到上下文话语向量pt。
The first step is to capture the context information and the second step is to integrate the context information into current utterance.The combination of these two steps is regarded as context compositionality in this paper. To the best of our knowledge, this is the first work to perform context compositionality in conversational sentiment analysis. GNTBis the core part that achieves the context compositionality. The formulation of GNTB is described below:
第一步是获取语境信息，第二步是将语境信息整合到当前话语中。本文将这两个步骤的结合称为语境合成。据我们所知，这是第一个在会话情感分析中进行语境组合的工作。GNTB是实现上下文组合的核心部分。GNTB的配方如下：

m_t是p_(t-1)和u_t的串联;f是激活函数(如tanh,sigmoid等等);张量T^[1:k]和矩阵W是用来计算p_t的参数.
Each slice T[i] can be interpreted as capturing a specific type of context compositionality. Each slice W[i] maps contextual utterance vector pt and utterance vector ut into the context compositionality space.
每个片段T[i]可以解释为捕捉特定类型的上下文组合。每个片段W[i]将语境话语向量pt和话语向量ut映射到上下文组合空间中。
The main advantage over the previous neural tensor networks (NTN) [24], which is a special case of the GNTB when k is set to d, is that GNTB is suitable for different structures rather than only the recursive structure and the space complexity of GNTB is O(kd²) compared with O(d³)in NTN.
它的主要优点是GNTB适用于不同的结构，而不仅仅是递归结构.
In order to further reduce the number of parameters, we employ the following low-rank matrix approximation for each slice T[i]
为了进一步减少参数的数量，我们对每个切片T[i]采用以下低秩矩阵近似.

2) EFE—Emotion Feature Extractor:
We utilize EFE to refine the emotion features from contextual vector pt.
我们利用EFE从上下文向量pt中提取情感特征。
As shown inFig. 3: (b), the EFE is a two-channel model, including a LSTM cell [7] branch and a one-dimensional CNN [8] branch. The two branches receive the same contextual utterance vector pt and produce outputs independently.
如图所示。3:（b）EFE是一个双通道模型，包括LSTM分支和一维CNN分支。两个分支接收相同的语境话语向量pt并独立产生输出。
The hidden state ht is regarded as the emotion feature vector.
将隐藏状态ht作为情感特征向量。
The CNN receives pt as input and outputs the emotion feature vector lt.
CNN接收pt作为输入，输出情感特征向量lt。
Finally, the outputs of LSTM cell branch ht and CNN branch lt are concatenated into an emotion feature vector et which is also the output of ERU. The formulas of EFE are as follows:
最后，将LSTM单元分支ht和CNN分支lt的输出连接成情绪特征向量et，这也是ERU的输出。EFE公式如下

3) Sentiment Classification & Regression:
Taking emotion feature et as input, we use a linear neural network Wc followed by a softmax layer to predict the sentiment labels, where n_class is the number of sentiment labels.
以情感特征et为输入，采用线性神经网络Wc和一个softmax层对情感标签进行预测，其中n_class是情感标签的个数。
Then, we obtain the probability distribution S_t of the sentiment labels. Finally, we take the most possible sentiment class as the sentiment label of the utterance ut:
然后，我们获得情感标签的概率分布S_t。最后，我们将最可能的情感类别作为话语ut的情感标签：

For sentiment regression task, we use a linear neural network Wr to predict the sentiment intensity. Then, we obtain the predicted sentiment intensity qt:
对于情绪回归任务，我们使用线性神经网络Wr来预测情绪强度。然后，我们得到预测的情绪强度qt：
qt is a scalar and ~yt is the predicted sentiment label for utterance ut.
qt是一个标量，~yt是话语ut的预测情绪标签。
4) Training:
For classification task, we choose cross-entropy as the measure of loss, and use L2-regularization to relieve overfitting. The loss function is:
对于分类任务，我们选择交叉熵作为损失度量，并使用L2正则化来消除过度拟合。损失函数为：

For regression task, we choose mean square error (MSE) to measure loss, and L2-regularization to relieve overfitting. The loss function is:
对于回归任务，我们选择均方误差（MSE）来衡量损失，而L2正则化来消除过度拟合。损失函数为：
N是样本/对话的数量, Si，j是会话i的话语j的情感标签的概率分布,Si，j是会话i的话语j的情感标签的概率分布, qi，j是会话i的话语j的预测情感强度，zi，j是会话i的话语j的预期情感强度，c（i）是样本i中的话语数，λ是L2正则化权重，θ是可训练参数集。我们采用随机梯度下降法(adam优化器)来训练我们的网络.
D. Bidirectional Emotion Recurrent Unit Variants双向情感循环单位变体
Our model has two different forms according to the source of context information, namely bidirectional emotion recurrent unit with global context (BiERU-gc) and bidirectional emotion recurrent unit with local context (BiERU-lc).
根据上下文信息的来源，我们的模型有两种不同的形式，即具有全局上下文的双向情感递归单元（BiERU-gc）和具有局部上下文的双向情感递归单元（BiERU-lc）.
1)BiERU-gc:
According to equation (1), GNTB extracts the context information from pt−1, integrates the context information into ut, and thus obtains the contextual utterance vector pt.
根据式（1），GNTB从pt−1中提取语境信息，将语境信息整合到ut中，得到语境话语向量pt。
Based on the definition of contextual utterance vector, pt−1 is the utterance vector that contains information of ut−1 and pt−2.
根据语境话语向量的定义，pt−1是包含ut−1和pt−2信息的话语向量。
在这种情况下，上下文话语向量pt以循环方式保存来自所有先前话语u1、u2、···、ut−1的上下文信息。
如图2所示：（a）双向eru使pt不仅能够捕获来自先前话语的上下文信息，而且能够捕获来自未来话语ut+1、ut+2、···、uN的上下文信息。图2：（a）中的BiERU被命名为BiERU gc。
2)BiERU-lc:
根据式（1），GNTB从上下文话语向量pt−1中提取上下文信息，pt−1包含前面所有话语u1、u2、···、ut−2的上下文信息。如果在等式（1）和（2）中用ut-1替换pt-1，则pt包含ut-1和ut的信息。换句话说，ut-1不仅是话语向量，而且还可以作为ut的上下文。如图2所示：（b），双向ERU使pt获得未来信息ut+1。在这种情况下，GNTB从ut−1和ut+1中提取上下文信息，这是ut的相邻话语。我们把这个型号命名为BiERU lc。

四、Experiments

A. Datasets
我们使用三个数据集进行实验，即AVEC，IEMOCAP和MELD，一些代表性模型（例如DialogueRNN和DialogueGCN）也使用了这三个数据集。我们执行标准的数据分区速率（表I中的详细信息）.
B. Baselines and Settings
To evaluate performance of our model, we choose the following models as strong baselines including the state-of-the-art methods.
为了评估模型的性能，我们选择以下模型作为强大的基线，包括最先进的方法。
a) c-LSTM
b) CMN
c) DialogueRNN
d) DialogueCNN
e) AGHMN
f) Settings: 所有的实验都是使用cnn提取的特征进行的，如方法部分所述。为了与最新的对话模式进行比较，我们直接使用了他们的话语表征.
为了缓解过度拟合，我们在GNTB和EFE的输出上使用了Dropout[35]。
对于非线性激活函数，我们选择sigmoid函数进行情感分类，relu函数用于情感回归。
我们的模型由Adam优化器优化[30]。超参数是手动调整的。批量大小设置为1。我们把所有实验的排名设为10。我们的模型是用Pythorch实现的.
C. Results
我们使用三个标准基准将我们的模型与文本情态的基线进行比较, 总的来说，我们的模型在这些数据集上执行了所有的基线方法，包括最先进的模型，如DialogueRNN、DialogueGCN和AGHMN，并且在某些指标上明显超过了表2所示的结果。

2)Comparison between BiERU-gc and BiERU-lc:
According to experimental results in Tables II and III, the overall performance of BiERU-lc is better than BiERU-gc.
一种可能的解释是，BiERU-gc中语篇话语向量的上下文信息来自于当前会话中的所有话语。然而，在BiERU-lc中，上下文信息来自邻里话语。在这种情况下，BiERU-gc的上下文信息包含冗余信息，从而对情感特征提取产生负面影响。

D. Case Study
Fig. 5 illustrates a conversation snippet classified by BiERU-lc method.图5示出了通过BiERU-lc方法分类的会话片段。
In this snippet, person A is initially at a frustrated state while person B acts as a listener in the beginning.Then, person A changes his/her focus and questions person B on his/her job state. Person B tries to use his/her own experience to help person A get rid of the frustrating state.
在这个片段中，人A最初处于沮丧状态，而B在开始时充当听众。然后，人A改变他/她的焦点，并询问B关于他/她的工作状态。B试着用他/她自己的经验来帮助A摆脱沮丧的状态。
This snippet reveals that the sentiment of a speaker is relatively steady and the interaction between speakers may change the sentiment of a speaker.
这段话揭示了说话人的情绪是相对稳定的，说话人之间的互动可能会改变说话人的情绪。
Our BiERU-lc method shows good ability in capturing the speaker’s sentiment (turns 9, 11, 12, 14) and the interaction between speakers (turn 10). The sentiment in turn13 is very subtly. Turn 13 contains a little bit of frustration since he/she is not satisfied with his/her job state.
我们的BiERU-lc方法能够很好地捕捉说话人的情绪（第9、11、12、14轮）和说话人之间的互动（第10轮）。第13回合的情绪非常微妙。因为他/她对自己的工作状态不满意，所以他/她有点沮丧
However, considering that person B attempts to help person A, turn 13 is more likely to be in a neutral stand.
但是，考虑到人B试图帮助人A，则第13轮更有可能处于中立立场。
E. Visualization
我们对我们提出的模型和DialogueRNN进行了更深入的分析，通过在IEMOCAP上可视化学习到的情感特征表示，如图6a和图6b所示。
Vectors fed into the last dense layer followed by softmax for classification are regarded as emotion feature representations of utterances. We use principal component analysis [37] to reduce the dimension of emotion representations from our model (BiERU-lc) and DialogueRNN.
送入最后一个密集层的向量，然后用softmax进行分类，作为话语的情感特征表示。我们使用主成分分析[37]从我们的模型（BiERU-lc）和DialogueRNN中减少情感表征的维度。
The emotion representation is reduced to 3-dimensional. In Fig. 6a and Fig. 6b, each color represents a predicted sentiment label and the same color means the same sentiment label.The figures show that our model outperforms on extracting emotion features of utterances labeled with “happy”, which is consistent with the results in Table 2.
情感表征被简化为三维。在图6a和图6b中，每种颜色表示预测的情绪标签，相同的颜色表示相同的情绪标签。那个数据表明，我们的模型在提取带有“快乐”标签的话语的情感特征方面表现得更好，与表2的结果一致。
G. Ablation Study
To further explore our proposed BiERU model, we perform ablation study on its two main components, i.e., GNTB andEFE. We conduct experiments on the IEMOCAP dataset with individual GNTB and EFE module separately, and their combination, i.e., the complete BiERU. Experimental results on the IEMOCAP dataset are illustrated in Table IV.
为了进一步探索我们提出的BiERU模型，我们对其两个主要组成部分，即GNTB和efe进行了消融研究。我们在IEMOCAP数据集上分别使用单独的GNTB和EFE模块，以及它们的组合，即完整的BiERU进行了实验。IEMOCAP数据集的实验结果如表IV所示.
The performance of sole GNTB or EFE is low in terms of accuracy and f1-score. The reason is that outputs of GNTB mainly contain context information and outputs of EFE lack context information. However, when these two modules are combined together as the BiERU model, the accuracy and f1-score increase dramatically, which proves the effectiveness of our BiERU model. More importantly, the GNTB and EFE modules couples significantly well to enhance the performance.
单一GNTB或EFE在准确性和f1得分方面表现较低。原因是GNTB的输出主要包含上下文信息，而EFE的输出缺少上下文信息。然而，当这两个模块结合起来作为BiERU模型时，准确度和f1分数都会显著提高，这证明了BiERU模型的有效性。更重要的是，GNTB和EFE模块耦合得很好，可以提高性能。
E. Visualization

五、Conclusion

In this paper, we proposed a fast, compact and parameter-efficient party-ignorant framework bidirectional emotional re-current unit (BiERU) for sentiment analysis in conversations.Our proposed generalized neural tensor block (GNTB), skilled at context compositionality, reduced the number of parameters and was suitable for different structures. Additionally, our EFE is capable of extracting high-quality emotion features for sentiment analysis. We proved that it is feasible to both simplify the model structure and improve performance simultaneously.
本文提出了一种快速、紧凑、参数有效的无知方框架双向情感回流单元（BiERU）用于情绪分析对话。我们的提出的广义神经张量块（GNTB），擅长上下文合成，减少了参数数目，适用于不同结构。此外，我们的EFE能够提取高质量的情绪特征用于情绪分析。证明了在简化模型结构的同时提高性能是可行的。
Our model outperforms current state-of-the-art models on three standard datasets in most cases. In addition, our method has the ability to model conversations with arbitrary turns and speakers, which plan to study further in the future.
在大多数情况下，我们的模型在三个标准数据集上都优于当前最先进的模型。此外，我们的方法还可以模拟任意turn和任意说话人的会话，这将是今后进一步研究的方向。

BiERU: Bidirectional Emotional Recurrent Unitfor Conversational Sentiment Analysis(20.12.10)相关推荐

《Sentiment Analysis of Chinese Microblog Based on Stacked Bidirectional LSTM》论文阅读笔记
文章名:<Sentiment Analysis of Chinese Microblog Based on Stacked Bidirectional LSTM> 作者:JUNHAO ZH ...
[nlp] sentiment analysis(情感分析)
https://github.com/udacity/deep-learning-v2-pytorch https://github.com/udacity/deep-learning-v2-pyto ...
2020 CM-BERT: Cross-Modal BERT for Text-Audio Sentiment Analysis
abstract 多模态情感分析是一个新兴的研究领域,旨在使机器能够识别.解释和表达情感.通过跨模态互动,我们可以得到说话者更全面的情绪特征.来自Transformers(BERT)的双向Encode ...
Multimodal Sentiment Analysis论文汇总
多模态情绪分析论文 Year Title Network Publish Paper Code Read 2019 Multimodal Transformer for Unaligned Multi ...
中文情感分析 (Sentiment Analysis) 的难点在哪？现在做得比较好的有哪几家？
点击上方,选择星标或置顶,每天给你送干货! 阅读大概需要25分钟跟随小博主,每天进步一丢丢来自: 知乎编辑: 深度学习自然语言处理公众号地址: https://www.zhihu.com/qu ...
Aspect Based Sentiment Analysis经典模型
本文转载自:https://zhuanlan.zhihu.com/p/81542002 Different from document- and sentence-level sentiment an ...
Keras情感分析（Sentiment Analysis）实战---自然语言处理技术
情感分析(Sentiment Analysis)是自然语言处理里面比较高阶的任务之一.仔细思考一下,这个任务的究极目标其实是想让计算机理解人类的情感世界.我们自己都不一定能完全控制和了解自己的情感,更 ...
2020_ACL_A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis
A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis 论文地址:https://aclan ...
论文翻译（5）-Contextual Inter-modal Attention for Multi-modal Sentiment Analysis
Contextual Inter-modal Attention for Multi-modal Sentiment Analysis 多模态情感分析中的语境跨模态注意 github地址:https: ...
2.1 Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis
Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis 1.基本信息作者:Yan Ling, Jian ...

BiERU: Bidirectional Emotional Recurrent Unitfor Conversational Sentiment Analysis(20.12.10)