基于BERT预训练模型的阅读理解任务
阅读理解是自然语言处理中的一个重要的任务,最常见的数据集是单篇章、抽取式阅读理解数据集。具体的任务定义为:对于一个给定的问题q和一个篇章p,根据篇章内容,给出该问题的答案a。数据集中的每个样本,是一个三元组<q, p, a>,例如:问题 q: 乔丹打了多少个赛季篇章 p: 迈克尔.乔丹在NBA打了15个赛季。他在84年进入nba,期间在1993年10月6日第一次退役改打棒球,95年3月18日重新回归,在99年1月13日第二次退役,后于2001年10月31日复出,在03年最终退役…参考答案 a: [‘15个’,‘15个赛季’]阅读理解模型的鲁棒性是衡量该技术能否在实际应用中大规模落地的重要指标之一。随着当前技术的进步,模型虽然能够在一些阅读理解测试集上取得较好的性能,但在实际应用中,这些模型所表现出的鲁棒性仍然难以令人满意。本示例使用的DuReader-robust数据集作为首个关注阅读理解模型鲁棒性的中文数据集,旨在考察模型在真实应用场景中的过敏感性、过稳定性以及泛化能力等问题。关于该数据集的详细内容,可参考数据集论文,或官方比赛链接,本教程为该比赛DuReader-robust部分的baseline版本。AI Studio平台后续会默认安装PaddleNLP,在此之前可使用如下命令安装In [1]
!pip install --upgrade paddlenlp\>=2.0.0rc -i https://pypi.org/simple
Requirement already up-to-date: paddlenlp>=2.0.0rc in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (2.0.0rc12)
Requirement already satisfied, skipping upgrade: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp>=2.0.0rc) (4.1.0)
Requirement already satisfied, skipping upgrade: seqeval in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp>=2.0.0rc) (1.2.2)
Requirement already satisfied, skipping upgrade: jieba in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp>=2.0.0rc) (0.42.1)
Requirement already satisfied, skipping upgrade: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp>=2.0.0rc) (2.1.1)
Requirement already satisfied, skipping upgrade: h5py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp>=2.0.0rc) (2.9.0)
Requirement already satisfied, skipping upgrade: colorama in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp>=2.0.0rc) (0.4.4)
Requirement already satisfied, skipping upgrade: scikit-learn>=0.21.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from seqeval->paddlenlp>=2.0.0rc) (0.22.1)
Requirement already satisfied, skipping upgrade: numpy>=1.14.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from seqeval->paddlenlp>=2.0.0rc) (1.16.4)
Requirement already satisfied, skipping upgrade: Pillow>=7.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp>=2.0.0rc) (7.1.2)
Requirement already satisfied, skipping upgrade: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp>=2.0.0rc) (3.14.0)
Requirement already satisfied, skipping upgrade: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp>=2.0.0rc) (2.22.0)
Requirement already satisfied, skipping upgrade: shellcheck-py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp>=2.0.0rc) (0.7.1.1)
Requirement already satisfied, skipping upgrade: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp>=2.0.0rc) (1.21.0)
Requirement already satisfied, skipping upgrade: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp>=2.0.0rc) (0.8.53)
Requirement already satisfied, skipping upgrade: flask>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp>=2.0.0rc) (1.1.1)
Requirement already satisfied, skipping upgrade: six>=1.14.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp>=2.0.0rc) (1.15.0)
Requirement already satisfied, skipping upgrade: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp>=2.0.0rc) (3.8.2)
Requirement already satisfied, skipping upgrade: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp>=2.0.0rc) (1.0.0)
Requirement already satisfied, skipping upgrade: joblib>=0.11 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp>=2.0.0rc) (0.14.1)
Requirement already satisfied, skipping upgrade: scipy>=0.17.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp>=2.0.0rc) (1.3.0)
Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp>=2.0.0rc) (1.25.6)
Requirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp>=2.0.0rc) (3.0.4)
Requirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp>=2.0.0rc) (2.8)
Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp>=2.0.0rc) (2019.9.11)
Requirement already satisfied, skipping upgrade: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp>=2.0.0rc) (5.1.2)
Requirement already satisfied, skipping upgrade: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp>=2.0.0rc) (1.4.10)
Requirement already satisfied, skipping upgrade: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp>=2.0.0rc) (2.0.1)
Requirement already satisfied, skipping upgrade: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp>=2.0.0rc) (1.3.0)
Requirement already satisfied, skipping upgrade: importlib-metadata; python_version < "3.8" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp>=2.0.0rc) (0.23)
Requirement already satisfied, skipping upgrade: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp>=2.0.0rc) (0.10.0)
Requirement already satisfied, skipping upgrade: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp>=2.0.0rc) (16.7.9)
Requirement already satisfied, skipping upgrade: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp>=2.0.0rc) (1.3.4)
Requirement already satisfied, skipping upgrade: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->paddlenlp>=2.0.0rc) (0.18.0)
Requirement already satisfied, skipping upgrade: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->paddlenlp>=2.0.0rc) (3.9.9)
Requirement already satisfied, skipping upgrade: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddlenlp>=2.0.0rc) (7.0)
Requirement already satisfied, skipping upgrade: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddlenlp>=2.0.0rc) (1.1.0)
Requirement already satisfied, skipping upgrade: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddlenlp>=2.0.0rc) (0.16.0)
Requirement already satisfied, skipping upgrade: Jinja2>=2.10.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddlenlp>=2.0.0rc) (2.10.1)
Requirement already satisfied, skipping upgrade: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp>=2.0.0rc) (0.6.1)
Requirement already satisfied, skipping upgrade: pycodestyle<2.7.0,>=2.6.0a1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp>=2.0.0rc) (2.6.0)
Requirement already satisfied, skipping upgrade: pyflakes<2.3.0,>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp>=2.0.0rc) (2.2.0)
Requirement already satisfied, skipping upgrade: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddlenlp>=2.0.0rc) (2.8.0)
Requirement already satisfied, skipping upgrade: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddlenlp>=2.0.0rc) (2019.3)
Requirement already satisfied, skipping upgrade: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->pre-commit->visualdl->paddlenlp>=2.0.0rc) (0.6.0)
Requirement already satisfied, skipping upgrade: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.10.1->flask>=1.1.1->visualdl->paddlenlp>=2.0.0rc) (1.1.1)
Requirement already satisfied, skipping upgrade: more-itertools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from zipp>=0.5->importlib-metadata; python_version < "3.8"->pre-commit->visualdl->paddlenlp>=2.0.0rc) (7.2.0)
PaddleNLP一键加载预训练模型
阅读理解本质是一个答案抽取任务,PaddleNLP对于各种预训练模型已经内置了对于下游任务-答案抽取的Fine-tune网络。以下项目以BERT为例,介绍如何将预训练模型Fine-tune完成答案抽取任务。答案抽取任务的本质就是根据输入的问题和文章,预测答案在文章中的起始位置和结束位置。基于BERT的答案抽取原理如下图所示:图1:基于BERT的答案抽取原理示意图paddlenlp.transformers.BertForQuestionAnswering()
一行代码即可加载预训练模型BERT用于答案抽取任务的Fine-tune网络。paddlenlp.transformers.BertForQuestionAnswering.from_pretrained()
指定想要使用的模型名称和文本分类的类别数,一行代码完成网络构建。In [3]
import paddlenlp as ppnlp# 设置想要使用模型的名称
MODEL_NAME = "bert-base-chinese"model = ppnlp.transformers.BertForQuestionAnswering.from_pretrained(MODEL_NAME)
[2021-03-14 17:57:52,504] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/bert-base-chinese/bert-base-chinese.pdparams
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.weight. classifier.weight is not found in the provided dict.warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.bias. classifier.bias is not found in the provided dict.warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
数据处理
数据集加载
PaddleNLP已经内置SQuAD,CMRC等中英文阅读理解数据集,使用paddlenlp.datasets.load_dataset()API即可一键加载。本实例加载的是DuReaderRobust中文阅读理解数据集。由于DuReaderRobust数据集采用SQuAD数据格式,InputFeature使用滑动窗口的方法生成,即一个example可能对应多个InputFeature。答案抽取任务即根据输入的问题和文章,预测答案在文章中的起始位置和结束位置。由于文章加问题的文本长度可能大于max_seq_length,答案出现的位置有可能出现在文章最后,所以不能简单的对文章进行截断。那么对于过长的文章,则采用滑动窗口将文章分成多段,分别与问题组合。再用对应的tokenizer转化为模型可接受的feature。doc_stride参数就是每次滑动的距离。滑动窗口生成InputFeature的过程如下图:图2:滑动窗口生成InputFeature示意图In [4]
train_ds, dev_ds = ppnlp.datasets.load_dataset('dureader_robust', splits=('train', 'dev'))for idx in range(2):print(train_ds[idx]['question'])print(train_ds[idx]['context'])print(train_ds[idx]['answers'])print(train_ds[idx]['answer_starts'])print()
仙剑奇侠传3第几集上天界
第35集雪见缓缓张开眼睛,景天又惊又喜之际,长卿和紫萱的仙船驶至,见众人无恙,也十分高兴。众人登船,用尽合力把自身的真气和水分输给她。雪见终于醒过来了,但却一脸木然,全无反应。众人向常胤求助,却发现人世界竟没有雪见的身世纪录。长卿询问清微的身世,清微语带双关说一切上了天界便有答案。长卿驾驶仙船,众人决定立马动身,往天界而去。众人来到一荒山,长卿指出,魔界和天界相连。由魔界进入通过神魔之井,便可登天。众人至魔界入口,仿若一黑色的蝙蝠洞,但始终无法进入。后来花楹发现只要有翅膀便能飞入。于是景天等人打下许多乌鸦,模仿重楼的翅膀,制作数对翅膀状巨物。刚佩戴在身,便被吸入洞口。众人摔落在地,抬头发现魔界守卫。景天和众魔套交情,自称和魔尊重楼相熟,众魔不理,打了起来。
['第35集']
[0]
燃气热水器哪个牌子好
选择燃气热水器时,一定要关注这几个问题:1、出水稳定性要好,不能出现忽热忽冷的现象2、快速到达设定的需求水温3、操作要智能、方便4、安全性要好,要装有安全报警装置 市场上燃气热水器品牌众多,购买时还需多加对比和仔细鉴别。方太今年主打的磁化恒温热水器在使用体验方面做了全面升级:9秒速热,可快速进入洗浴模式;水温持久稳定,不会出现忽热忽冷的现象,并通过水量伺服技术将出水温度精确控制在±0.5℃,可满足家里宝贝敏感肌肤洗护需求;配备CO和CH4双气体报警装置更安全(市场上一般多为CO单气体报警)。另外,这款热水器还有智能WIFI互联功能,只需下载个手机APP即可用手机远程操作热水器,实现精准调节水温,满足家人多样化的洗浴需求。当然方太的磁化恒温系列主要的是增加磁化功能,可以有效吸附水中的铁锈、铁屑等微小杂质,防止细菌滋生,使沐浴水质更洁净,长期使用磁化水沐浴更利于身体健康。
['方太']
[110]
ppnlp.transformers.BertTokenizer
调用BertTokenizer进行数据处理。预训练模型Bert对中文数据的处理是以字为单位。PaddleNLP对于各种预训练模型已经内置了相应的tokenizer,指定想要使用的模型名字即可加载对应的tokenizer。tokenizer的作用是将原始输入文本转化成模型可以接受的输入数据形式。In [5]tokenizer = ppnlp.transformers.BertTokenizer.from_pretrained(MODEL_NAME)
[2021-03-14 17:58:07,484] [    INFO] - Found /home/aistudio/.paddlenlp/models/bert-base-chinese/bert-base-chinese-vocab.txt
数据处理
使用load_dataset()API默认读取到的数据集是MapDataset对象,MapDataset是paddle.io.Dataset的功能增强版本。其内置的map()方法适合用来进行批量数据集处理。map()方法传入的是一个用于数据处理的function。 以下是Dureader-Robust中数据转化的用法:In [6]
from utils import prepare_train_features, prepare_validation_features
from functools import partialmax_seq_length = 512
doc_stride = 128train_trans_func = partial(prepare_train_features, max_seq_length=max_seq_length, doc_stride=doc_stride,tokenizer=tokenizer)train_ds.map(train_trans_func, batched=True)dev_trans_func = partial(prepare_validation_features, max_seq_length=max_seq_length, doc_stride=doc_stride,tokenizer=tokenizer)dev_ds.map(dev_trans_func, batched=True)
<paddlenlp.datasets.experimental.dataset.MapDataset at 0x7fe54c08ec10>
In [10]
for idx in range(2):print(train_ds[idx]['input_ids'])print(train_ds[idx]['token_type_ids'])print(train_ds[idx]['overflow_to_sample'])print(train_ds[idx]['offset_mapping'])print(train_ds[idx]['start_positions'])print(train_ds[idx]['end_positions'])print()
[101, 803, 1187, 1936, 899, 837, 124, 5018, 1126, 7415, 677, 1921, 4518, 102, 5018, 8198, 7415, 7434, 6224, 5353, 5353, 2476, 2458, 4706, 4714, 8024, 3250, 1921, 1348, 2661, 1348, 1599, 722, 7354, 8024, 7270, 1321, 1469, 5166, 5858, 4638, 803, 5670, 7724, 5635, 8024, 6224, 830, 782, 3187, 2610, 8024, 738, 1282, 1146, 7770, 1069, 511, 830, 782, 4633, 5670, 8024, 4500, 2226, 1394, 1213, 2828, 5632, 6716, 4638, 4696, 3698, 1469, 3717, 1146, 6783, 5314, 1961, 511, 7434, 6224, 5303, 754, 7008, 6814, 3341, 749, 8024, 852, 1316, 671, 5567, 3312, 4197, 8024, 1059, 3187, 1353, 2418, 511, 830, 782, 1403, 2382, 5530, 3724, 1221, 8024, 1316, 1355, 4385, 782, 686, 4518, 4994, 3766, 3300, 7434, 6224, 4638, 6716, 686, 5279, 2497, 511, 7270, 1321, 6418, 7309, 3926, 2544, 4638, 6716, 686, 8024, 3926, 2544, 6427, 2372, 1352, 1068, 6432, 671, 1147, 677, 749, 1921, 4518, 912, 3300, 5031, 3428, 511, 7270, 1321, 7730, 7724, 803, 5670, 8024, 830, 782, 1104, 2137, 4989, 7716, 1220, 6716, 8024, 2518, 1921, 4518, 5445, 1343, 511, 830, 782, 3341, 1168, 671, 5774, 2255, 8024, 7270, 1321, 2900, 1139, 8024, 7795, 4518, 1469, 1921, 4518, 4685, 6825, 511, 4507, 7795, 4518, 6822, 1057, 6858, 6814, 4868, 7795, 722, 759, 8024, 912, 1377, 4633, 1921, 511, 830, 782, 5635, 7795, 4518, 1057, 1366, 8024, 820, 5735, 671, 7946, 5682, 4638, 6073, 6075, 3822, 8024, 852, 1993, 5303, 3187, 3791, 6822, 1057, 511, 1400, 3341, 5709, 3516, 1355, 4385, 1372, 6206, 3300, 5420, 5598, 912, 5543, 7607, 1057, 511, 754, 3221, 3250, 1921, 5023, 782, 2802, 678, 6387, 1914, 723, 7887, 8024, 3563, 820, 7028, 3517, 4638, 5420, 5598, 8024, 1169, 868, 3144, 2190, 5420, 5598, 4307, 2342, 4289, 511, 1157, 877, 2785, 1762, 6716, 8024, 912, 6158, 1429, 1057, 3822, 1366, 511, 830, 782, 3035, 5862, 1762, 1765, 8024, 2848, 1928, 1355, 4385, 7795, 4518, 2127, 1310, 511, 3250, 1921, 1469, 830, 7795, 1947, 769, 2658, 8024, 5632, 4917, 1469, 7795, 2203, 7028, 3517, 4685, 4225, 8024, 830, 7795, 679, 4415, 8024, 2802, 749, 6629, 3341, 511, 102]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
0
[(0, 0), (0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9), (9, 10), (10, 11), (11, 12), (0, 0), (0, 1), (1, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9), (9, 10), (10, 11), (11, 12), (12, 13), (13, 14), (14, 15), (15, 16), (16, 17), (17, 18), (18, 19), (19, 20), (20, 21), (21, 22), (22, 23), (23, 24), (24, 25), (25, 26), (26, 27), (27, 28), (28, 29), (29, 30), (30, 31), (31, 32), (32, 33), (33, 34), (34, 35), (35, 36), (36, 37), (37, 38), (38, 39), (39, 40), (40, 41), (41, 42), (42, 43), (43, 44), (44, 45), (45, 46), (46, 47), (47, 48), (48, 49), (49, 50), (50, 51), (51, 52), (52, 53), (53, 54), (54, 55), (55, 56), (56, 57), (57, 58), (58, 59), (59, 60), (60, 61), (61, 62), (62, 63), (63, 64), (64, 65), (65, 66), (66, 67), (67, 68), (68, 69), (69, 70), (70, 71), (71, 72), (72, 73), (73, 74), (74, 75), (75, 76), (76, 77), (77, 78), (78, 79), (79, 80), (80, 81), (81, 82), (82, 83), (83, 84), (84, 85), (85, 86), (86, 87), (87, 88), (88, 89), (89, 90), (90, 91), (91, 92), (92, 93), (93, 94), (94, 95), (95, 96), (96, 97), (97, 98), (98, 99), (99, 100), (100, 101), (101, 102), (102, 103), (103, 104), (104, 105), (105, 106), (106, 107), (107, 108), (108, 109), (109, 110), (110, 111), (111, 112), (112, 113), (113, 114), (114, 115), (115, 116), (116, 117), (117, 118), (118, 119), (119, 120), (120, 121), (121, 122), (122, 123), (123, 124), (124, 125), (125, 126), (126, 127), (127, 128), (128, 129), (129, 130), (130, 131), (131, 132), (132, 133), (133, 134), (134, 135), (135, 136), (136, 137), (137, 138), (138, 139), (139, 140), (140, 141), (141, 142), (142, 143), (143, 144), (144, 145), (145, 146), (146, 147), (147, 148), (148, 149), (149, 150), (150, 151), (151, 152), (152, 153), (153, 154), (154, 155), (155, 156), (156, 157), (157, 158), (158, 159), (159, 160), (160, 161), (161, 162), (162, 163), (163, 164), (164, 165), (165, 166), (166, 167), (167, 168), (168, 169), (169, 170), (170, 171), (171, 172), (172, 173), (173, 174), (174, 175), (175, 176), (176, 177), (177, 178), (178, 179), (179, 180), (180, 181), (181, 182), (182, 183), (183, 184), (184, 185), (185, 186), (186, 187), (187, 188), (188, 189), (189, 190), (190, 191), (191, 192), (192, 193), (193, 194), (194, 195), (195, 196), (196, 197), (197, 198), (198, 199), (199, 200), (200, 201), (201, 202), (202, 203), (203, 204), (204, 205), (205, 206), (206, 207), (207, 208), (208, 209), (209, 210), (210, 211), (211, 212), (212, 213), (213, 214), (214, 215), (215, 216), (216, 217), (217, 218), (218, 219), (219, 220), (220, 221), (221, 222), (222, 223), (223, 224), (224, 225), (225, 226), (226, 227), (227, 228), (228, 229), (229, 230), (230, 231), (231, 232), (232, 233), (233, 234), (234, 235), (235, 236), (236, 237), (237, 238), (238, 239), (239, 240), (240, 241), (241, 242), (242, 243), (243, 244), (244, 245), (245, 246), (246, 247), (247, 248), (248, 249), (249, 250), (250, 251), (251, 252), (252, 253), (253, 254), (254, 255), (255, 256), (256, 257), (257, 258), (258, 259), (259, 260), (260, 261), (261, 262), (262, 263), (263, 264), (264, 265), (265, 266), (266, 267), (267, 268), (268, 269), (269, 270), (270, 271), (271, 272), (272, 273), (273, 274), (274, 275), (275, 276), (276, 277), (277, 278), (278, 279), (279, 280), (280, 281), (281, 282), (282, 283), (283, 284), (284, 285), (285, 286), (286, 287), (287, 288), (288, 289), (289, 290), (290, 291), (291, 292), (292, 293), (293, 294), (294, 295), (295, 296), (296, 297), (297, 298), (298, 299), (299, 300), (300, 301), (301, 302), (302, 303), (303, 304), (304, 305), (305, 306), (306, 307), (307, 308), (308, 309), (309, 310), (310, 311), (311, 312), (312, 313), (313, 314), (314, 315), (315, 316), (316, 317), (317, 318), (318, 319), (319, 320), (320, 321), (321, 322), (322, 323), (323, 324), (324, 325), (325, 326), (326, 327), (327, 328), (328, 329), (329, 330), (330, 331), (331, 332), (0, 0)]
14
16[101, 4234, 3698, 4178, 3717, 1690, 1525, 702, 4277, 2094, 1962, 102, 6848, 2885, 4234, 3698, 4178, 3717, 1690, 3198, 8024, 671, 2137, 6206, 1068, 3800, 6821, 1126, 702, 7309, 7579, 8038, 122, 510, 1139, 3717, 4937, 2137, 2595, 6206, 1962, 8024, 679, 5543, 1139, 4385, 2575, 4178, 2575, 1107, 4638, 4385, 6496, 123, 510, 2571, 6862, 1168, 6809, 6392, 2137, 4638, 7444, 3724, 3717, 3946, 124, 510, 3082, 868, 6206, 3255, 5543, 510, 3175, 912, 125, 510, 2128, 1059, 2595, 6206, 1962, 8024, 6206, 6163, 3300, 2128, 1059, 2845, 6356, 6163, 5390, 2356, 1767, 677, 4234, 3698, 4178, 3717, 1690, 1501, 4277, 830, 1914, 8024, 6579, 743, 3198, 6820, 7444, 1914, 1217, 2190, 3683, 1469, 798, 5301, 7063, 1166, 511, 3175, 1922, 791, 2399, 712, 2802, 4638, 4828, 1265, 2608, 3946, 4178, 3717, 1690, 1762, 886, 4500, 860, 7741, 3175, 7481, 976, 749, 1059, 7481, 1285, 5277, 8038, 130, 4907, 6862, 4178, 8024, 1377, 2571, 6862, 6822, 1057, 3819, 3861, 3563, 2466, 8039, 3717, 3946, 2898, 719, 4937, 2137, 8024, 679, 833, 1139, 4385, 2575, 4178, 2575, 1107, 4638, 4385, 6496, 8024, 2400, 6858, 6814, 3717, 7030, 848, 3302, 2825, 3318, 2199, 1139, 3717, 3946, 2428, 5125, 4802, 2971, 1169, 1762, 11349, 119, 9687, 8024, 1377, 4007, 6639, 2157, 7027, 2140, 6564, 3130, 2697, 5491, 5502, 3819, 2844, 7444, 3724, 8039, 6981, 1906, 100, 1469, 100, 1352, 3698, 860, 2845, 6356, 6163, 5390, 3291, 2128, 1059, 8020, 2356, 1767, 677, 671, 5663, 1914, 711, 100, 1296, 3698, 860, 2845, 6356, 8021, 511, 1369, 1912, 8024, 6821, 3621, 4178, 3717, 1690, 6820, 3300, 3255, 5543, 100, 757, 5468, 1216, 5543, 8024, 1372, 7444, 678, 6770, 702, 2797, 3322, 100, 1315, 1377, 4500, 2797, 3322, 6823, 4923, 3082, 868, 4178, 3717, 1690, 8024, 2141, 4385, 5125, 1114, 6444, 5688, 3717, 3946, 8024, 4007, 6639, 2157, 782, 1914, 3416, 1265, 4638, 3819, 3861, 7444, 3724, 511, 2496, 4197, 3175, 1922, 4638, 4828, 1265, 2608, 3946, 5143, 1154, 712, 6206, 4638, 3221, 1872, 1217, 4828, 1265, 1216, 5543, 8024, 1377, 809, 3300, 3126, 1429, 7353, 3717, 704, 4638, 7188, 7224, 510, 7188, 2244, 5023, 2544, 2207, 3325, 6574, 8024, 7344, 3632, 5301, 5826, 3996, 4495, 8024, 886, 3759, 3861, 3717, 6574, 3291, 3815, 1112, 8024, 7270, 3309, 886, 4500, 4828, 1265, 3717, 3759, 3861, 3291, 1164, 754, 6716, 860, 978, 2434, 511, 102]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
1
[(0, 0), (0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9), (9, 10), (0, 0), (0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9), (9, 10), (10, 11), (11, 12), (12, 13), (13, 14), (14, 15), (15, 16), (16, 17), (17, 18), (18, 19), (19, 20), (20, 21), (21, 22), (22, 23), (23, 24), (24, 25), (25, 26), (26, 27), (27, 28), (28, 29), (29, 30), (30, 31), (31, 32), (32, 33), (33, 34), (34, 35), (35, 36), (36, 37), (37, 38), (38, 39), (39, 40), (40, 41), (41, 42), (42, 43), (43, 44), (44, 45), (45, 46), (46, 47), (47, 48), (48, 49), (49, 50), (50, 51), (51, 52), (52, 53), (53, 54), (54, 55), (55, 56), (56, 57), (57, 58), (58, 59), (59, 60), (60, 61), (61, 62), (62, 63), (63, 64), (64, 65), (65, 66), (66, 67), (67, 68), (68, 69), (69, 70), (70, 71), (71, 72), (72, 73), (73, 74), (74, 75), (75, 76), (76, 77), (77, 78), (78, 79), (79, 80), (80, 81), (82, 83), (83, 84), (84, 85), (85, 86), (86, 87), (87, 88), (88, 89), (89, 90), (90, 91), (91, 92), (92, 93), (93, 94), (94, 95), (95, 96), (96, 97), (97, 98), (98, 99), (99, 100), (100, 101), (101, 102), (102, 103), (103, 104), (104, 105), (105, 106), (106, 107), (107, 108), (108, 109), (109, 110), (110, 111), (111, 112), (112, 113), (113, 114), (114, 115), (115, 116), (116, 117), (117, 118), (118, 119), (119, 120), (120, 121), (121, 122), (122, 123), (123, 124), (124, 125), (125, 126), (126, 127), (127, 128), (128, 129), (129, 130), (130, 131), (131, 132), (132, 133), (133, 134), (134, 135), (135, 136), (136, 137), (137, 138), (138, 139), (139, 140), (140, 141), (141, 142), (142, 143), (143, 144), (144, 145), (145, 146), (146, 147), (147, 148), (148, 149), (149, 150), (150, 151), (151, 152), (152, 153), (153, 154), (154, 155), (155, 156), (156, 157), (157, 158), (158, 159), (159, 160), (160, 161), (161, 162), (162, 163), (163, 164), (164, 165), (165, 166), (166, 167), (167, 168), (168, 169), (169, 170), (170, 171), (171, 172), (172, 173), (173, 174), (174, 175), (175, 176), (176, 177), (177, 178), (178, 179), (179, 180), (180, 181), (181, 182), (182, 183), (183, 184), (184, 185), (185, 186), (186, 187), (187, 188), (188, 189), (189, 190), (190, 191), (191, 193), (193, 194), (194, 196), (196, 197), (197, 198), (198, 199), (199, 200), (200, 201), (201, 202), (202, 203), (203, 204), (204, 205), (205, 206), (206, 207), (207, 208), (208, 209), (209, 210), (210, 211), (211, 212), (212, 213), (213, 214), (214, 215), (215, 217), (217, 218), (218, 221), (221, 222), (222, 223), (223, 224), (224, 225), (225, 226), (226, 227), (227, 228), (228, 229), (229, 230), (230, 231), (231, 232), (232, 233), (233, 234), (234, 235), (235, 236), (236, 237), (237, 238), (238, 239), (239, 241), (241, 242), (242, 243), (243, 244), (244, 245), (245, 246), (246, 247), (247, 248), (248, 249), (249, 250), (250, 251), (251, 252), (252, 253), (253, 254), (254, 255), (255, 256), (256, 257), (257, 258), (258, 259), (259, 260), (260, 264), (264, 265), (265, 266), (266, 267), (267, 268), (268, 269), (269, 270), (270, 271), (271, 272), (272, 273), (273, 274), (274, 275), (275, 276), (276, 279), (279, 280), (280, 281), (281, 282), (282, 283), (283, 284), (284, 285), (285, 286), (286, 287), (287, 288), (288, 289), (289, 290), (290, 291), (291, 292), (292, 293), (293, 294), (294, 295), (295, 296), (296, 297), (297, 298), (298, 299), (299, 300), (300, 301), (301, 302), (302, 303), (303, 304), (304, 305), (305, 306), (306, 307), (307, 308), (308, 309), (309, 310), (310, 311), (311, 312), (312, 313), (313, 314), (314, 315), (315, 316), (316, 317), (317, 318), (318, 319), (319, 320), (320, 321), (321, 322), (322, 323), (323, 324), (324, 325), (325, 326), (326, 327), (327, 328), (328, 329), (329, 330), (330, 331), (331, 332), (332, 333), (333, 334), (334, 335), (335, 336), (336, 337), (337, 338), (338, 339), (339, 340), (340, 341), (341, 342), (342, 343), (343, 344), (344, 345), (345, 346), (346, 347), (347, 348), (348, 349), (349, 350), (350, 351), (351, 352), (352, 353), (353, 354), (354, 355), (355, 356), (356, 357), (357, 358), (358, 359), (359, 360), (360, 361), (361, 362), (362, 363), (363, 364), (364, 365), (365, 366), (366, 367), (367, 368), (368, 369), (369, 370), (370, 371), (371, 372), (372, 373), (373, 374), (374, 375), (375, 376), (376, 377), (377, 378), (378, 379), (379, 380), (380, 381), (381, 382), (382, 383), (383, 384), (384, 385), (385, 386), (386, 387), (387, 388), (388, 389), (0, 0)]
121
122从以上结果可以看出,数据集中的example已经被转换成了模型可以接收的feature,包括input_ids、token_type_ids、答案的起始位置等信息。 其中:input_ids: 表示输入文本的token ID。
token_type_ids: 表示对应的token属于输入的问题还是答案。(Transformer类预训练模型支持单句以及句对输入)。
overflow_to_sample: feature对应的example的编号。
offset_mapping: 每个token的起始字符和结束字符在原文中对应的index(用于生成答案文本)。
start_positions: 答案在这个feature中的开始位置。
end_positions: 答案在这个feature中的结束位置。
数据读入
使用paddle.io.DataLoader接口多线程异步加载数据。同时使用paddlenlp.data中提供的方法把feature组成batchIn [11]
import paddle
from paddlenlp.data import Stack, Dict, Padbatch_size = 8train_batch_sampler = paddle.io.DistributedBatchSampler(train_ds, batch_size=batch_size, shuffle=True)train_batchify_fn = lambda samples, fn=Dict({"input_ids": Pad(axis=0, pad_val=tokenizer.pad_token_id),"token_type_ids": Pad(axis=0, pad_val=tokenizer.pad_token_type_id),"start_positions": Stack(dtype="int64"),"end_positions": Stack(dtype="int64")
}): fn(samples)train_data_loader = paddle.io.DataLoader(dataset=train_ds,batch_sampler=train_batch_sampler,collate_fn=train_batchify_fn,return_list=True)dev_batch_sampler = paddle.io.BatchSampler(dev_ds, batch_size=batch_size, shuffle=False)dev_batchify_fn = lambda samples, fn=Dict({"input_ids": Pad(axis=0, pad_val=tokenizer.pad_token_id),"token_type_ids": Pad(axis=0, pad_val=tokenizer.pad_token_type_id)
}): fn(samples)dev_data_loader = paddle.io.DataLoader(dataset=dev_ds,batch_sampler=dev_batch_sampler,collate_fn=dev_batchify_fn,return_list=True)
设置Fine-Tune优化策略
适用于ERNIE/BERT这类Transformer模型的学习率为warmup的动态学习率。图3:动态学习率示意图In [14]
# 训练过程中的最大学习率
learning_rate = 3e-5
# 训练轮次
epochs = 1
# 学习率预热比例
warmup_proportion = 0.1
# 权重衰减系数,类似模型正则项策略,避免模型过拟合
weight_decay = 0.01num_training_steps = len(train_data_loader) * epochs
lr_scheduler = ppnlp.transformers.LinearDecayWithWarmup(learning_rate, num_training_steps, warmup_proportion)# Generate parameter names needed to perform weight decay.
# All bias and LayerNorm parameters are excluded.
decay_params = [p.name for n, p in model.named_parameters()if not any(nd in n for nd in ["bias", "norm"])
]
optimizer = paddle.optimizer.AdamW(learning_rate=lr_scheduler,parameters=model.parameters(),weight_decay=weight_decay,apply_decay_param_fun=lambda x: x in decay_params)
设计loss function
由于BertForQuestionAnswering模型对将BertModel的sequence_output拆开成start_logits和end_logits进行输出,所以阅读理解任务的loss也由start_loss和end_loss组成,我们需要自己定义loss function。对于答案其实位置和结束位置的预测可以分别成两个分类任务。所以设计的loss function如下:In [15]
class CrossEntropyLossForSQuAD(paddle.nn.Layer):def __init__(self):super(CrossEntropyLossForSQuAD, self).__init__()def forward(self, y, label):start_logits, end_logits = y   # both shape are [batch_size, seq_len]start_position, end_position = labelstart_position = paddle.unsqueeze(start_position, axis=-1)end_position = paddle.unsqueeze(end_position, axis=-1)start_loss = paddle.nn.functional.softmax_with_cross_entropy(logits=start_logits, label=start_position, soft_label=False)start_loss = paddle.mean(start_loss)end_loss = paddle.nn.functional.softmax_with_cross_entropy(logits=end_logits, label=end_position, soft_label=False)end_loss = paddle.mean(end_loss)loss = (start_loss + end_loss) / 2return loss
模型训练与评估
模型训练的过程通常有以下步骤:从dataloader中取出一个batch data
将batch data喂给model,做前向计算
将前向计算结果传给损失函数,计算loss。
loss反向回传,更新梯度。重复以上步骤。
每训练一个epoch时,程序通过evaluate()调用paddlenlp.metric.squad中的squad_evaluate(), compute_predictions()评估当前模型训练的效果,其中:compute_predictions()用于生成可提交的答案;squad_evaluate()用于返回评价指标。二者适用于所有符合squad数据格式的答案抽取任务。这类任务使用Rouge-L和exact来评估预测的答案和真实答案的相似程度。In [16]
from utils import evaluatecriterion = CrossEntropyLossForSQuAD()
global_step = 0
for epoch in range(1, epochs + 1):for step, batch in enumerate(train_data_loader, start=1):global_step += 1input_ids, segment_ids, start_positions, end_positions = batchlogits = model(input_ids=input_ids, token_type_ids=segment_ids)loss = criterion(logits, (start_positions, end_positions))if global_step % 100 == 0 :print("global step %d, epoch: %d, batch: %d, loss: %.5f" % (global_step, epoch, step, loss))loss.backward()optimizer.step()lr_scheduler.step()optimizer.clear_grad()evaluate(model=model, data_loader=dev_data_loader) # 如果需要生成这个比赛(https://aistudio.baidu.com/aistudio/competition/detail/49?castk=LTE=)指定格式的结果,传入test_data_loader并设置do_pred=True即可model.save_pretrained('/home/aistudio/checkpoint')
tokenizer.save_pretrained('/home/aistudio/checkpoint')
global step 100, epoch: 1, batch: 100, loss: 3.08420
global step 200, epoch: 1, batch: 200, loss: 2.63844
global step 300, epoch: 1, batch: 300, loss: 1.51924
global step 400, epoch: 1, batch: 400, loss: 2.86848
global step 500, epoch: 1, batch: 500, loss: 1.02958
global step 600, epoch: 1, batch: 600, loss: 1.50782
global step 700, epoch: 1, batch: 700, loss: 1.69180
global step 800, epoch: 1, batch: 800, loss: 0.98495
global step 900, epoch: 1, batch: 900, loss: 1.44308
global step 1000, epoch: 1, batch: 1000, loss: 2.48363
global step 1100, epoch: 1, batch: 1100, loss: 1.39599
global step 1200, epoch: 1, batch: 1200, loss: 1.41797
global step 1300, epoch: 1, batch: 1300, loss: 1.27790
global step 1400, epoch: 1, batch: 1400, loss: 1.38966
global step 1500, epoch: 1, batch: 1500, loss: 1.96879
global step 1600, epoch: 1, batch: 1600, loss: 1.38952
global step 1700, epoch: 1, batch: 1700, loss: 1.88264
global step 1800, epoch: 1, batch: 1800, loss: 0.91366
global step 1900, epoch: 1, batch: 1900, loss: 0.97432
global step 2000, epoch: 1, batch: 2000, loss: 0.36408
global step 2100, epoch: 1, batch: 2100, loss: 1.12839
global step 2200, epoch: 1, batch: 2200, loss: 2.03863
Processing example: 1000
time per 1000: 10.376596450805664
{"exact": 69.08962597035992,"f1": 83.65628439091694,"total": 1417,"HasAns_exact": 69.08962597035992,"HasAns_f1": 83.65628439091694,"HasAns_total": 1417
}问题: 爬行垫什么材质的好
原文: 爬行垫根据中间材料的不同可以分为:XPE爬行垫、EPE爬行垫、EVA爬行垫、PVC爬行垫;其中XPE爬行垫、EPE爬行垫都属于PE材料加保鲜膜复合而成,都是无异味的环保材料,但是XPE爬行垫是品质较好的爬行垫,韩国进口爬行垫都是这种爬行垫,而EPE爬行垫是国内厂家为了减低成本,使用EPE(珍珠棉)作为原料生产的一款爬行垫,该材料弹性差,易碎,开孔发泡防水性弱。EVA爬行垫、PVC爬行垫是用EVA或PVC作为原材料与保鲜膜复合的而成的爬行垫,或者把图案转印在原材料上,这两款爬行垫通常有异味,如果是图案转印的爬行垫,油墨外露容易脱落。 当时我儿子爬的时候,我们也买了垫子,但是始终有味。最后就没用了,铺的就的薄毯子让他爬。
答案: PE材料加保鲜膜问题: 范冰冰多高真实身高
原文: 真实情况是160-162。她平时谎报的168是因为不离脚穿高水台恨天高(15厘米) 图1她穿着高水台恨天高和刘亦菲一样高,(刘亦菲对外报身高172)范冰冰礼服下厚厚的高水台暴露了她的心机,对比一下两者的鞋子吧 图2 穿着高水台恨天高才和刘德华谢霆锋持平,如果她真的有168,那么加上鞋高,刘和谢都要有180?明显是不可能的。所以刘德华对外报的身高174减去10-15厘米才是范冰冰的真实身高 图3,范冰冰有一次脱鞋上场,这个最说明问题了,看看她的身体比例吧。还有目测一下她手上鞋子的鞋跟有多高多厚吧,至少超过10厘米。
答案: 160-162问题: 小米6防水等级
原文: 防水作为目前高端手机的标配,特别是苹果也支持防水之后,国产大多数高端旗舰手机都已经支持防水。虽然我们真的不会故意把手机放入水中,但是有了防水之后,用户心里会多一重安全感。那么近日最为火热的小米6防水吗?小米6的防水级别又是多少呢? 小编查询了很多资料发现,小米6确实是防水的,但是为了保持低调,同时为了不被别人说防水等级不够,很多资料都没有标注小米是否防水。根据评测资料显示,小米6是支持IP68级的防水,是绝对能够满足日常生活中的防水需求的。
答案: IP68级问题: 怀孕多久会有反应
原文: 这位朋友你好,女性出现妊娠反应一般是从6-12周左右,也就是女性怀孕1个多月就会开始出现反应,第3个月的时候,妊辰反应基本结束。 而大部分女性怀孕初期都会出现恶心、呕吐的感觉,这些症状都是因人而异的,除非恶心、呕吐的非常厉害,才需要就医,否则这些都是刚怀孕的的正常症状。1-3个月的时候可以观察一下自己的皮肤,一般女性怀孕初期可能会产生皮肤色素沉淀或是腹壁产生妊娠纹,特别是在怀孕的后期更加明显。 还有很多女性怀孕初期会出现疲倦、嗜睡的情况。怀孕三个月的时候,膀胱会受到日益胀大的子宫的压迫,容量会变小,所以怀孕期间也会有尿频的现象出现。月经停止也是刚怀孕最容易出现的症状,只要是平时月经正常的女性,在性行为后超过正常经期两周,就有可能是怀孕了。 如果你想判断自己是否怀孕,可以看看自己有没有这些反应。当然这也只是多数人的怀孕表现,也有部分女性怀孕表现并不完全是这样,如果你无法确定自己是否怀孕,最好去医院检查一下。
答案: 6-12周左右,也就是女性怀孕1个多月问题: 研发费用加计扣除比例
原文: 【东奥会计在线——中级会计职称频道推荐】根据《关于提高科技型中小企业研究开发费用税前加计扣除比例的通知》的规定,研发费加计扣除比例提高到75%。|财政部、国家税务总局、科技部发布《关于提高科技型中小企业研究开发费用税前加计扣除比例的通知》。|通知称,为进一步激励中小企业加大研发投入,支持科技创新,就提高科技型中小企业研究开发费用(以下简称研发费用)税前加计扣除比例有关问题发布通知。|通知明确,科技型中小企业开展研发活动中实际发生的研发费用,未形成无形资产计入当期损益的,在按规定据实扣除的基础上,在2017年1月1日至2019年12月31日期间,再按照实际发生额的75%在税前加计扣除;形成无形资产的,在上述期间按照无形资产成本的175%在税前摊销。|科技型中小企业享受研发费用税前加计扣除政策的其他政策口径按照《财政部国家税务总局科技部关于完善研究开发费用税前加计扣除政策的通知》(财税〔2015〕119号)规定执行。|科技型中小企业条件和管理办法由科技部、财政部和国家税务总局另行发布。科技、财政和税务部门应建立信息共享机制,及时共享科技型中小企业的相关信息,加强协调配合,保障优惠政策落实到位。|上一篇文章:关于2016年度企业研究开发费用税前加计扣除政策企业所得税纳税申报问题的公告 下一篇文章:关于提高科技型中小企业研究开发费用税前加计扣除比例的通知
答案: 75%
更多预训练模型
PaddleNLP不仅支持BERT预训练模型,还支持ERNIE、RoBERTa、Electra等预训练模型。 下表汇总了目前PaddleNLP支持的各类预训练模型。用户可以使用PaddleNLP提供的模型,完成问答、序列分类、token分类等任务。同时我们提供了22种预训练的参数权重供用户使用,其中包含了11种中文语言模型的预训练权重。Model Tokenizer   Supported Task  Model Name
BERT    BertTokenizer   BertModel
BertForQuestionAnswering
BertForSequenceClassification
BertForTokenClassification  bert-base-uncased
bert-large-uncased
bert-base-multilingual-uncased
bert-base-cased
bert-base-chinese
bert-base-multilingual-cased
bert-large-cased
bert-wwm-chinese
bert-wwm-ext-chinese
ERNIE   ErnieTokenizer
ErnieTinyTokenizer  ErnieModel
ErnieForQuestionAnswering
ErnieForSequenceClassification
ErnieForTokenClassification ernie-1.0
ernie-tiny
ernie-2.0-en
ernie-2.0-large-en
RoBERTa RobertaTokenizer    RobertaModel
RobertaForQuestionAnswering
RobertaForSequenceClassification
RobertaForTokenClassification   roberta-wwm-ext
roberta-wwm-ext-large
rbt3
rbtl3
ELECTRA ElectraTokenizer    ElectraModel
ElectraForSequenceClassification
ElectraForTokenClassification
electra-small
electra-base
electra-large
chinese-electra-small
chinese-electra-base
注:其中中文的预训练模型有 bert-base-chinese, bert-wwm-chinese, bert-wwm-ext-chinese, ernie-1.0, ernie-tiny, roberta-wwm-ext, roberta-wwm-ext-large, rbt3, rbtl3, chinese-electra-base, chinese-electra-small 等。更多预训练模型参考:https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/docs/transformers.md 更多预训练模型fine-tune下游任务使用方法,请参考examples。

『NLP经典项目集』08: 使用预训练模型完成阅读理解相关推荐

  1. 『NLP经典项目集』05:新年到,飞桨带你对对联

    基于seq2seq的对联生成 对联,是汉族传统文化之一,是写在纸.布上或刻在竹子.木头.柱子上的对偶语句.对联对仗工整,平仄协调,是一字一音的汉语独特的艺术形式,是中国传统文化瑰宝.这里,我们将根据上 ...

  2. 『NLP经典项目集』10:使用预训练模型优化快递单信息抽取

    使用PaddleNLP语义预训练模型ERNIE完成快递单信息抽取 注意本项目代码需要使用GPU环境来运行:命名实体识别是NLP中一项非常基础的任务,是信息提取.问答系统.句法分析.机器翻译等众多NLP ...

  3. 『NLP经典项目集』06: 使用预训练模型ERNIE-GEN自动写诗

    使用PaddleNLP预训练模型ERNIE-GEN生成诗歌 诗歌,是中国文化的瑰宝,它饱含作者的思想感情与丰富的想象,语言凝练而形象性强,具有鲜明的节奏,和谐的音韵,富于音乐美.诗歌语句一般分行排列, ...

  4. 『NLP打卡营』实践课6:机器阅读理解

    基于预训练模型的机器阅读理解 阅读理解是检索问答系统中的重要组成部分,最常见的数据集是单篇章.抽取式阅读理解数据集. 该示例展示了如何使用PaddleNLP快速实现基于预训练模型的机器阅读理解任务. ...

  5. 『深度学习项目四』基于ResNet101人脸特征点检测

    相关文章: [深度学习项目一]全连接神经网络实现mnist数字识别 [深度学习项目二]卷积神经网络LeNet实现minst数字识别 [深度学习项目三]ResNet50多分类任务[十二生肖分类] 『深度 ...

  6. Day03『NLP打卡营』实践课3:使用预训练模型实现快递单信息抽取

    Day03 词法分析作业辅导 本教程旨在辅导同学如何完成 AI Studio课程--『NLP打卡营』实践课3:使用预训练模型实现快递单信息抽取 课后作业. 1. 更换预训练模型 在PaddleNLP ...

  7. day01『NLP打卡营』实践课1:词向量应用演示

    Day01 词向量作业辅导 本教程旨在辅导同学如何完成 AI Studio课程--『NLP打卡营』实践课1:词向量应用展示 课后作业. 1. 选择词向量预训练模型 在PaddleNLP 中文Embed ...

  8. 『NLP打卡营』实践课5:文本情感分析

    『NLP直播课』Day 5:情感分析预训练模型SKEP 本项目将详细全面介绍情感分析任务的两种子任务,句子级情感分析和目标级情感分析. 同时演示如何使用情感分析预训练模型SKEP完成以上两种任务,详细 ...

  9. 关于NLP相关技术全部在这里:预训练模型、图神经网络、模型压缩、知识图谱、信息抽取、序列模型、深度学习、语法分析、文本处理...

    NLP近几年非常火,且发展特别快.像BERT.GPT-3.图神经网络.知识图谱等技术应运而生. 我们正处在信息爆炸的时代.面对每天铺天盖地的网络资源和论文.很多时候我们面临的问题并不是缺资源,而是找准 ...

最新文章

  1. 深入delphi编程(转)
  2. bzoj 1731: [Usaco2005 dec]Layout 排队布局【差分约束】
  3. vscode Go 1.11.4 编译错误 need Delve built by Go 1.11 or later
  4. 前端干货之JS最佳实践
  5. 本期课程已满,欢迎关注后续期次 | 临床基因组学数据分析实战助力解析Case,快速发表文章...
  6. 通过jQuery获取Select选中的值或文本
  7. java clone方法_Java基础:Cloneable接口和Object的clone()方法
  8. OneAlert 入门(三)——事件分析
  9. UNIX编程艺术笔记
  10. 数学建模的论文格式以及visio画图
  11. Scratch-Q版三国小人物角色素材分享,值得您的收藏!
  12. 处理器管理及并发进程-多道程序设计
  13. Object类型转换为int型
  14. kdj的matlab代码,8个字符的Kdj股票技术指标公式源代码(插图)
  15. 美团笔试题及解析(时间:2022年9月3号)
  16. 一张图看懂财务报表分析
  17. 制作钓鱼网站进行渗透测试——内网SET工具包
  18. Linux网络编程 - 在服务器端运用进程间通信之管道(pipe)
  19. 连不上WiFi了怎么办之网络异常时的网络重置
  20. php v9 用户头像,phpcms v9前台会员中心上传头像可getshell | CN-SEC 中文网

热门文章

  1. 投影仪哪个牌子好?家用投影仪评测对比
  2. 10 款优秀的在线格式转化器
  3. 中国这门失传已久的武林绝学,竟让研究它的老外拿了诺贝尔奖
  4. lol八月那服务器有无限火力,lol无限火力2018时间表 lol无限火力这周有开吗
  5. 荣耀的双十一:为冠军而来
  6. 终端linux输入法安装程序,在Ubuntu系统下安装百度输入法Linux版的方法
  7. 解决IE11中无法加载flash
  8. 使用 pandas处理股票数据并作分析
  9. 分手后三句话刺痛前任心,有可能会被挽留
  10. 12月份前端资源分享