文章目录

论文内容
摘要(Abstract)
1. 介绍(Introduction)
2. 方法(Our Approach)
- 2.1 问题和思路(Problem and Motivation)
- 2.2 模型(Model)
- 2.3 Detection Network
- 2.3 Soft Masking
- 2.4 Correction Network
- 2.5 损失函数(Learning)
3. 实验结果(Experimental Result)
- 3.1 数据集(Datasets)
- 3.2 Baselines(略)
- 3.3 实验设置(Experiment Setting)
- 3.4 实验结果(Main Result)

论文内容

发表时间：2020年05月

论文地址：https://arxiv.org/abs/2005.07421

代码地址(非作者实现): https://github.com/quantum00549/SoftMaskedBert

摘要(Abstract)

使用Soft-Masked BERT完成中文拼写纠错(Chinses Spell Checking, CSC)任务，并且该方法也适用于其他语言。

1. 介绍(Introduction)

Soft-Masked BERT = 双向GRU(Bi-GRU) + BERT

其中Bi-GRU负责预测哪个地方有错误，BERT负责对错误进行修正。

2. 方法(Our Approach)

2.1 问题和思路(Problem and Motivation)

作者说原始的BERT是mask了 “15%” 的词训练的，这不足以让BERT学会如何找出句子中的错误，所以要使用新的方法。

2.2 模型(Model)

该模型分为三部分：

Detection Network：负责预测句子中每个字错误的概率
Correct Network：负责将错字纠正成正确的字。
Soft Masking：Detection Network和Correction Network之间的桥梁，负责根据Detection Network的输出对原始句子embedding进行mask。

2.3 Detection Network

输入: embedding后的characters序列。embedding方式和BERT一样，包括word embedding，position embedding和segment embedding.

网络架构：Bi-GRU -> 全连接层(Linear) -> Sigmoid

输出：每个character为错字的概率，越接近1表示越有可能是错的。

2.3 Soft Masking

Soft Masking模块就是对Input进行mask，方式就是加权，公式为：

$ei′=pi⋅emask+(1−pi)⋅eie_i^{\prime}=p_i \cdot e_{m a s k}+\left(1-p_i\right) \cdot e_i$

$e'_i$ ：第 $i$ 个character进行mask后的结果。
$p_i$ ：第 $i$ 个character为错字的概率， $pi∈[0,1]p_i \in [0,1]$ 。
$e_{mask}$ ：mask embeding。具体是什么原文中并没有说明。github上quantum00549的论文复现使用的是
$e_i$ ：第 $i$ 个character的词向量。

2.4 Correction Network

输入：soft-masking后的input。

网络架构：BERT+全连接层(Linear)+Softmax

输出：将词修正后的结果。

注意：在BERT和Linear之间，有一个残差连接，即将input和bert的output进行相加。用公式表示则为：

$hi′=hic+eih_i^{\prime}=h_i^c+e_i$

$h'_i$ ：Linear的输入的第 $i$ 个character。
$hich^c_i$ ：Bert的输出的第 $i$ 个character的隐状态。
$e_i$ ：第 $i$ 个character的词向量。

2.5 损失函数(Learning)

Detection Network和Correction Network损失函数使用的都是CrossEntropy，用公式表示为：

$Ld=−∑i=1nlog⁡Pd(gi∣X)Lc=−∑i=1nlog⁡Pc(yi∣X)\begin{aligned} \mathcal{L}_d &=-\sum_{i=1}^n \log P_d\left(g_i \mid X\right) \\ \mathcal{L}_c &=-\sum_{i=1}^n \log P_c\left(y_i \mid X\right) \end{aligned}$

$Ld\mathcal{L}_d$ ：Detection Network的损失
$Lc\mathcal{L}_c$ ：Correction Network的损失

联合起来为：

$L=λ⋅Lc+(1−λ)⋅Ld\mathcal{L}=\lambda \cdot \mathcal{L}_c+(1-\lambda) \cdot \mathcal{L}_d$

其中 $λ\lambda$ 为 $[0, 1]$ 的超参数。

3. 实验结果(Experimental Result)

3.1 数据集(Datasets)

benchmark: SIGHAN

训练集：自己造的，使用confusion table的方式。具体为将一个句子中15%的字替换成与其相同发音的其他常见字。在所有样本中，有80%的句子按上述方式处理，剩下20%则是直接随机替换成任意文字。

3.2 Baselines(略)

3.3 实验设置(Experiment Setting)

优化器(optimizer)：Adam

学习策略(Learning Scheduler): 无

学习率(Learning Rate)：2e-5

The size of hidden unit in Bi-GRU is 256

batch size: 320

作者还使用了500w个训练样本和SIGHAN中训练样本对BERT进行了fine-tune.

3.4 实验结果(Main Result)

Acc：Accuracy，准确率
Pre：Precision，精准率
Rec：Recall，召回率
F1：F1 Score

【论文阅读】Spelling Error Correction with Soft-Masked BERT相关推荐

论文解读：Spelling Error Correction with Soft-Masked BERT
论文解读:Spelling Error Correction with Soft-Masked BERT(2020ACL) 拼写错误纠错是一个比较重要且挑战的任务,非常依赖于人类的语言理解能力.本 ...
论文解读：DCSpell：A Detector-Corrector Framework for Chinese Spelling Error Correction
论文解读:DCSpell:A Detector-Corrector Framework for Chinese Spelling Error Correction 简要信息: 序号属性值 1 模型 ...
Spelling Error Correction with Soft-Masked BERT
使用Soft-Masked BERT纠正拼写错误 Shaohua Zhang 1 , Haoran Huang 1 , Jicong Liu 2 and Hang Li 1 1 ByteDance A ...
Self-Supervised Curriculum Learning for Spelling Error Correction
半监督课程学习用于中文拼写纠错任务课程学习(CL) 以一种从易到难的顺序促进模型训练,该方法需要对数据难度和训练细则进行仔细设计. 纠错中得数据的难易程度受许多因素的影响,如句子长度.词的稀缺性和错 ...
BERT论文阅读(二): CG-BERT:Conditional Text Generation with BERT for Generalized Few-shot Intent Detection
目录 The proposed method Input Representation The Encoder The Decoder fine-tuning discriminate a joint ...
【论文阅读】ReaLiSe：Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking
文章目录本篇论文所需基础论文内容摘要(Abstract) 1. 介绍(Introduction)(略) 2. 相关工作 3. 模型部分(The REALISE Model) 3.1 语义编码器( ...
[论文阅读笔记36]CASREL代码运行记录
<[论文阅读笔记33]CASREL:基于标注与bert的实体与关系抽取>https://blog.csdn.net/ld326/article/details/116465089 总的来说 ...
【语音】论文阅读笔记 Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM
目录摘要介绍预备和相关工作 1. CTC-based ASR 2. Masked LM 3. ASR error correction 提出的方法 1. Phone-conditioned Ma ...
论文阅读：Overview of the NLPCC 2018 Shared Task: Grammatical Error Correction
论文阅读:Overview of the NLPCC 2018 Shared Task: Grammatical Error Correction 1. 引言 2. 任务定义 3. 数据 3.1 训练 ...

【论文阅读】Spelling Error Correction with Soft-Masked BERT