SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge

Abstract

现有的预训练语言表征模型大多忽略了文本的语言知识，而文本的语言知识可以促进自然语言处理任务中的语言理解。为了便于情感分析的下游任务，我们提出了一种新的语言表示模型，称为SentiLARE，该模型将包括词性标签(POS tag)和情感极性(sentiment polarity, 从SentiWordNet推断)在内的词级语言知识引入到预训练模型中。首先，我们提出了一种上下文感知情感注意机制来获取情感极性，同时通过查询SentiWordNet来获取每个词的词性标签。然后，我们设计了一个新的预训练任务——label-aware masked language来构建知识感知的语言表示。实验表明，SentiLARE在各种情感分析任务上获得了最新的性能。

Model

我们的任务是定义如下:给定一个文本序列 $X = (x_1, x_2, · · · , x_n)$ 长度为n的,我们的目标是获得整个序列的表示 $H = (h_1, h_2 , · · · , h_n)^ㄒ∈R^{n×d}$ , 抓住了上下文信息和语言知识, $d$ 表示向量的维数表示。

图1给出了我们模型的概述，该模型包括两个步骤:
1)Acquiring the partof-speech tag and the sentiment polarity for each word;
2)Conducting pre-training via label-aware masked language model, which contains two pretraining sub-tasks, i.e., early fusion and late supervision.
与现有的BERT-style预训练模型相比，该模型利用部分词性标签和情感极性等语言知识丰富输入序列，并利用label-aware masked language model捕捉句子级语言表示和词级语言知识之间的关系。

Linguistic Knowledge Acquisition

input: a text sequence $X = (x_1, x_2, · · · , x_n)$ , $x_i(1 ≤ i ≤ n)$ indicates a word in the vocabulary
Stanford Log-Linear Part-of-Speech Tagger: get part-of-speech tag $pos_i$ of each word $x_i$ , for simplicity, 只考虑五个tag: verb $(v)$ , noun $(n)$ , adjective $(a)$ , adverb $(r)$ , and others $(o)$
SentiWordNet: $x_i, pos_i)$ —> m个different $polar_i$

each of which contains a sense number, a positive / negative score, and a gloss $(SNi(j),Pscorei(j)/Nscorei(j),Gi(j))(SN_i^{(j)} , P_{score_i}^{(j)}/N_{score_i}^{(j)} , G^{(j)}_i )$ , 1 ≤ j ≤ m
( $S N$ 表示各个sense的排名， $P_{score_i}^{(j)}/N_{score_i}^{(j)}$ 表示由SentiWordNet得到的positive/negative得分， $Gi(j)G^{(j)}_i$ 代表每种sense的定义)
受SentiWordNet的启发，我们提出了一种情境感知的注意机制，该机制同时考虑了sense排名和context-gloss相似性来确定每个sense的注意权重：
$αi(j)=softmax(1SNi(j)⋅sim(X,Gi(j)))α_i^{(j)} = sof tmax( \frac{1}{SN^{(j)}_i} · sim(X, G^{(j)}_i))$

$1SNi(j)\frac{1}{SN^{(j)}_i}$ 近似于sense频率的影响，因为sense等级越小，表示自然语言中使用该sense的频率越高
$G^{(j)}_i))$ 表示上下文与gloss of each sense之间的文本相似性，是无监督词义消歧中常用的一个重要特征。为了计算 $X$ 和 $G^{(j)}$ 的相似度, 我们用Sentence-BERT(SBERT)对它们进行编码，它实现了语义文本相似度任务的最新性能，并得到向量之间的余弦相似度:
$G^{(j)}_i ) = cos(SBERT(X), SBERT(G^{(j)}_i ))$

Obtain the attention weight of eachsense: 计算每个 $x_i, pos_i)$ 对的情感得分, by simply weighting the scores of all the senses:
$s(xi,posi)=∑j=imαi(j)(Pscorei(j)−Nscorei(j))s(x_i, pos_i) = \sum\limits^m_{j=i}α_i^{(j)}(P_{score_i}^{(j)} − N_{score_i}^{(j)})$

Finally, the word-level sentiment polarity $p o l a r i$ for the pair $x_i, pos_i)$ can be assigned with $P o s i t i v e / N e g a t i v e / N e u t r a l$ when $s(x_i, pos_i)$ is $p o s i t i v e / n e g a t i v e / z e r o$ , respectively. Note that if we cannot find any sense for $(x i, p o s i)$ in SentiWordNet, $p o l a r i$ is assigned with $N e u t r a l$ .

Pre-training Task

上面得到了knowledge enhanced text sequence $Xk=(xi,posi,polari)i=1nX_k = {(x_i, pos_i, polar_i)^n_{i=1}}$

我们设计了一种新的有监督的训练前任务，称为label-aware masked language model(LA-MLM)，该方法在预训练阶段引入句子级情感标签 $l$ ，捕捉句子级语言表征与单个词之间的依赖关系。它包含两个独立的子任务:早期融合和后期监督（early fusion and late supervision）。

Early Fusion

$(hclsEF,h1EF,...,hnEF,hsepEF)=Transformer(X^k,l)(h^{EF}_{cls} , h^{EF}_{1}, ..., h^{EF}_{n}, h^{EF}_{sep} ) = Transformer( \hat X_k, l)$
$X^\hat{X}$ 包含：

embeddingused in BERT
the part-ofspeech (POS) embedding
word-level polarity embedding

模型需要分别预测掩码位置的词、词性标签和词级极性

Late Supervision

基于[CLS]和掩码位置的隐藏状态，让模型预测句子级标签和单词信息
$(hclsLS,h1LS,...,hnLS,hsepLS)=Transformer(X^k,l)(h^{LS}_{cls} , h^{LS}_{1}, ..., h^{LS}_{n}, h^{LS}_{sep} ) = Transformer( \hat X_k, l)$

阅读《SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge》相关推荐

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
ALBEF:Align before Fuse: Vision and Language Representation Learning with Momentum Distillation 论文链接 ...
# 互信息最大化[视角统一]:Align before Fuse: Vision Language Representation Learning with Momentum Distillation
互信息最大化[视角统一]:Align before Fuse: Vision and Language Representation Learning with Momentum Distillati ...
【论文模型讲解】VideoBERT: A Joint Model for Video and Language Representation Learning
文章目录前言 0 摘要 1 Introduction 2 相关工作 3 模型 3.1 BERT 3.2 VideoBERT 4 实验与分析 4.1 数据集 4.2 视频和语言预处理 4.3 模型预训 ...
[论文学习]TDN: An Integrated Representation Learning Model of Knowledge Graphs
[论文学习以及翻译]TDN: An Integrated Representation Learning Model of Knowledge Graphs 文章主要内容摘要前言相关工作基于T ...
Kaiming He论文阅读笔记三——Simple Siamese Representation Learning
Kaiming He大神在2021年发表的Exploring Simple Siamese Representation Learning,截至目前已经有963的引用,今天我们就一起来阅读一下这篇自监 ...
[论文阅读] iCaRL: Incremental Classifier and Representation Learning
论文地址:https://openaccess.thecvf.com/content_cvpr_2017/html/Rebuffi_iCaRL_Incremental_Classifier_CVPR_ ...
自然语言处理学习——论文分享——A Mutual Information Maximization Perspective of Language Representation Learning
资料放在前面:https://blog.csdn.net/Arnetminer/article/details/105840669 文章的贡献如下: (1)提出了一个基于最大化MI的理论框架来理解词表 ...
CVPR 2020 《12-in-1: Multi-Task Vision and Language Representation Learning》论文笔记
目录简介动机贡献方法实验简介本文是在NIPS 2019 ViLBERT上的拓展. 论文链接动机本文修改了ViLBERT的预训练过程,有两个小修改:1. 对regions进行mask时 ...
论文阅读Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
密集文本检索的无监督语料库感知语言模型预训练 ACL2022 论文链接摘要最近的研究证明了使用微调语言模型(LM)进行密集检索的有效性.然而,密集检索器很难训练,通常需要经过精心设计的微调管道才能 ...

阅读《SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge》