基于Bert的实体关系抽取模型

关注微信公众号：NLP分享汇。【喜欢的扫波关注，每天都在更新自己之前的积累】

文章链接：https://mp.weixin.qq.com/s/OebxnvwjQiVbBZZFL2Un3A

前言

信息抽取(Information Extraction, IE)是从自然语言文本中抽取实体、属性、关系及事件等事实类信息的文本处理技术，是信息检索、智能问答、智能对话等人工智能应用的重要基础，一直受到业界的广泛关注。信息抽取任务涉及命名实体识别、指代消解、关系分类等复杂技术，极具挑战性。而本文旨在介绍如何利用Bert预训练模型进行关系抽取任务。

相关链接

GitHub：https://github.com/yuanxiaosc/Entity-Relation-Extraction

竞赛官网：http://lic2019.ccf.org.cn/

理解难度：★★★★★

解决思路

1、先使用bert搭建的关系分类模型，简单来看就是一个多标签分类任务，类别就是下述的那几种关系；

2、接着用预测出来的关系和文本，使用bert搭建一个实体抽取的模型，其简单来看也是一个分类模型，类别是：

["[Padding]", "[category]", "[##WordPiece]", "[CLS]", "[SEP]", "B-SUB", "I-SUB", "B-OBJ", "I-OBJ", "O"]

【 SUB对应的就是subject，B-SUB就是第一个实体开始的位置，后续的是I-SUB，OBJ就是第二个实体】

关系类型如下：

['丈夫', '上映时间', '专业代码', '主持人', '主演', '主角', '人口数量', '作曲', '作者', '作词', '修业年限', '出品公司', '出版社', '出生地', '出生日期', '创始人', '制片人', '占地面积', '号', '嘉宾', '国籍', '妻子', '字', '官方语言', '导演', '总部地点', '成立日期', '所在城市', '所属专辑', '改编自', '朝代', '歌手', '母亲', '毕业院校', '民族', '气候', '注册资本', '海拔', '父亲', '目', '祖籍', '简称', '编剧', '董事长', '身高', '连载网站', '邮政编码', '面积', '首都']

思路分析

所以第二个模型就是预测每一个tokens的标示，最后根据标示可提取出实体对。
第二个模型是一个多分类任务，我们知道一句话中有可能有多个三元组，为此在进行第二个模型的时候，是先依据第一个模型预测出来的关系类如当前句子预测出3个关系，那么就重复该句话分成3个样本，那么3个样本就对应的是3个多分类单标签任务，为了使实体对和关系对应，所以第二个模型在计算loss的时候是综合考虑了关系和tokens标示的预测的。
说到这里可能会有这样的疑惑？两个模型（所谓的管道），似乎没有交集，都是先处理准备好各自数据，然后各自训练各自的，而且第二个模型同时进行了关系和实体标示预测，那么第一个模型只预测了关系，那么第一个模型存在的意义是什么？直接用第二模型不就可以啦？
- 逻辑是这样的：
  
  训练确实没有交集，各自训练各自的，因为训练样本都是精确的，无可厚非，但是在预测的时候我们只有一句话，要预测出这句话中的所有三元组，如果只采用第二个模型的话，它一句话根据当前的关系只能预测出一种三元组，所以需要第一个模型打前阵，先将句子中有多少种关系预测出来，然后再将句子依照关系分成一句话一个三元组，训练的时候我们是知道每一个句子有多少种关系，但是预测的时候我们并不知道，这就是第一个模型存在的意义，那么我就要用第二个模型同时解决一句话预测所有三元组呢？其实很难，因为首先tokens标记就是问题，怎么将标示和关系对应呢？是吧，所以这里所谓管道，其实是在预测过程体现的，下面就可以看到另外按照上面的思路，该模型bert的输入端一个样本的特征应该是当前句子和关系，而不仅仅是一个句子，不然的话将一句话拆分成多个样本还有什么意义？大家不就都一样啦，正是由于还有关系，才能实现同一句子抽取出不同的实体对（对应当前关系）。
那么输入端是怎么将将关系整合进去的呢？
- 解决办法是这样的：
  
  将关系作为当前句子的下句，还记得text_b吗？第一个模型设为了None,这里将类别平铺成和text_a一样长度，当成一句话作为text_a（当前句子）的下文，该部分具体看convert_single_example：

核心代码部分

模型一：bert搭建的关系分类模型

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,                 labels, num_labels, use_one_hot_embeddings):    """Creates a classification model."""    model = modeling.BertModel(        config=bert_config,        is_training=is_training,        input_ids=input_ids,        input_mask=input_mask,        token_type_ids=segment_ids,        use_one_hot_embeddings=use_one_hot_embeddings)    # In the demo, we are doing a simple classification task on the entire    # segment.    #    # If you want to use the token-level output, use model.get_sequence_output()    # instead.    output_layer = model.get_pooled_output()    hidden_size = output_layer.shape[-1].value    output_weights = tf.get_variable(        "output_weights", [num_labels, hidden_size],        initializer=tf.truncated_normal_initializer(stddev=0.02))    output_bias = tf.get_variable(        "output_bias", [num_labels], initializer=tf.zeros_initializer())    with tf.variable_scope("loss"):        if is_training:            # I.e., 0.1 dropout            output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)        logits_wx = tf.matmul(output_layer, output_weights, transpose_b=True)        logits = tf.nn.bias_add(logits_wx, output_bias)        probabilities = tf.sigmoid(logits)        label_ids = tf.cast(labels, tf.float32)        per_example_loss = tf.reduce_sum(            tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=label_ids), axis=-1)        loss = tf.reduce_mean(per_example_loss)        return loss, per_example_loss, logits, probabilities

模型二：bert搭建一个实体抽取模型

1、关系预测部分

    predicate_output_layer = model.get_pooled_output()    intent_hidden_size = predicate_output_layer.shape[-1].value    predicate_output_weights = tf.get_variable(        "predicate_output_weights", [num_predicate_labels, intent_hidden_size],        initializer=tf.truncated_normal_initializer(stddev=0.02))    predicate_output_bias = tf.get_variable(        "predicate_output_bias", [num_predicate_labels], initializer=tf.zeros_initializer())    with tf.variable_scope("predicate_loss"):        if is_training:            # I.e., 0.1 dropout            predicate_output_layer = tf.nn.dropout(predicate_output_layer, keep_prob=0.9)        predicate_logits = tf.matmul(predicate_output_layer, predicate_output_weights, transpose_b=True)        predicate_logits = tf.nn.bias_add(predicate_logits, predicate_output_bias)        predicate_probabilities = tf.nn.softmax(predicate_logits, axis=-1)        predicate_prediction = tf.argmax(predicate_probabilities, axis=-1, output_type=tf.int32)        predicate_labels = tf.one_hot(predicate_label_id, depth=num_predicate_labels, dtype=tf.float32)        predicate_per_example_loss = tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(logits=predicate_logits, labels=predicate_labels), -1)        predicate_loss = tf.reduce_mean(predicate_per_example_loss)

2、序列标注部分

    token_label_output_layer = model.get_sequence_output()    token_label_hidden_size = token_label_output_layer.shape[-1].value    token_label_output_weight = tf.get_variable(        "token_label_output_weights", [num_token_labels, token_label_hidden_size],        initializer=tf.truncated_normal_initializer(stddev=0.02)    )    token_label_output_bias = tf.get_variable(        "token_label_output_bias", [num_token_labels], initializer=tf.zeros_initializer()    )    with tf.variable_scope("token_label_loss"):        if is_training:            token_label_output_layer = tf.nn.dropout(token_label_output_layer, keep_prob=0.9)        token_label_output_layer = tf.reshape(token_label_output_layer, [-1, token_label_hidden_size])        token_label_logits = tf.matmul(token_label_output_layer, token_label_output_weight, transpose_b=True)        token_label_logits = tf.nn.bias_add(token_label_logits, token_label_output_bias)        token_label_logits = tf.reshape(token_label_logits, [-1, FLAGS.max_seq_length, num_token_labels])        token_label_log_probs = tf.nn.log_softmax(token_label_logits, axis=-1)        token_label_one_hot_labels = tf.one_hot(token_label_ids, depth=num_token_labels, dtype=tf.float32)        token_label_per_example_loss = -tf.reduce_sum(token_label_one_hot_labels * token_label_log_probs, axis=-1)        token_label_loss = tf.reduce_sum(token_label_per_example_loss)        token_label_probabilities = tf.nn.softmax(token_label_logits, axis=-1)        token_label_predictions = tf.argmax(token_label_probabilities, axis=-1)        # return (token_label_loss, token_label_per_example_loss, token_label_logits, token_label_predict)    loss = 0.5 * predicate_loss + token_label_loss

基于Bert的实体关系抽取模型相关推荐

AGGCN | 基于图神经网络的关系抽取模型
今天给大家介绍2019年6月发表在ACL上的论文"Attention Guided Graph Convolutional Networks for Relation Extraction& ...
论文学习18-Relation extraction and the influence of automatic named-entity recognition（联合实体关系抽取模型,2007）
文章目录 abstract 1.introduction 3.问题形式化 4.系统架构 5. 命名实体识别 6.关系抽取(核方法) 6.1global context kernel 6.2 local ...
论文学习15-Table Filling Multi-Task Recurrent Neural Network(联合实体关系抽取模型）
文章目录 abstract 1 introduction 2.方法 2.1实体关系表(Figure-2) 2.2 The Table Filling Multi-Task RNN Model 2.3 ...
ACL 2018论文解读 | 基于路径的实体图关系抽取模型
在碎片化阅读充斥眼球的时代,越来越少的人会去关注每篇论文背后的探索和思考. 在这个栏目里,你会快速 get 每篇精选论文的亮点和痛点,时刻紧跟 AI 前沿成果. 点击本文底部的「阅读原文」即刻加入社区 ...
知识图谱从哪儿来？实体关系抽取的现状和未来
12月17日晚,2019年清华特奖获得者之一,清华大学自然语言处理实验室大四本科生高天宇,在智源论坛Live第1期,以<实体关系抽取的现状和未来>为主题,与150位观众进行了在线交流.本文 ...
技术动态 | 知识图谱从哪里来：实体关系抽取的现状与未来
本文作者为:韩旭.高天宇.刘知远.转载自刘知远老师的知乎专栏,文章链接:https://zhuanlan.zhihu.com/p/91762831 最近几年深度学习引发的人工智能浪潮席卷全球,在互联网 ...
基于主体掩码的实体关系抽取方法
点击上方蓝字关注我们基于主体掩码的实体关系抽取方法郑慎鹏1, 陈晓军1, 向阳1, 沈汝超2 1 同济大学电子与信息工程学院,上海 201804 2 上海国际港务(集团)股份有限公司,上海 200 ...
《面向对话的融入交互信息的实体关系抽取》--中文信息学报
实体关系抽取旨在从文本中抽取出实体之间的语义关系,是自然语言处理的一项基本任务.在新闻报道,维基百科等规范文本上,该任务的研究相对丰富且已取得了一定的效果,但面对对话文本的相关研究的还处于起始阶段.相 ...
阿里云医疗实体关系抽取大赛
1.本项目是基于阿里云比赛开放的医疗数据集去做的实体关系抽取.下面会从数据的详情,模型的选取,模型的训练,模型的验证和模型的预测去讲述. 2.数据准备阶段 1.数据来源是阿里云医疗大赛,选取的是其中一 ...

基于Bert的实体关系抽取模型

基于Bert的实体关系抽取模型相关推荐

最新文章

热门文章