关系抽取(三) — Dependency-based Models

  • Improved Relation Classification by Deep Recurrent Neural Networks with Data Augmentation
    • 现状
    • 模型
  • Bidirectional Recurrent Convolutional Neural Network for Relation Classification
    • 模型
      • 贡献
      • 模型架构
    • 代码

Improved Relation Classification by Deep Recurrent Neural Networks with Data Augmentation

现状

使用feature-based和kernel-based方式:feature: lexical, syntactic, semantic one; kernel: require predefined similarity measure of two data samples, a kernel along the shortest dependency path (SDP) between two entities
当前的NN都是single layer,需要用一个deep的架构

模型

  1. 将SDP拆分成两条sub-path,每个sub-path用RNN处理。
  2. 每个RNN利用4个channel:word embeddings, POS embeddings, grammatical relation embeddings, and WordNet embeddings。
  • Word representations: Each word in a given sentence is mapped to a real-valued vector by looking up in a word embedding table. Unsupervisedly trained on a large corpus, word embeddings are thought to be able to well capture words’ syntactic and semantic information (Mikolov et al., 2013b)
  • Part-of-speech tags: Since word embed- dings are obtained on a generic corpus of a large scale, the information they contain may not agree with a specific sentence. We deal with this problem by allying each input word with its POS tag, e.g., noun, verb, etc.In our experiment, we only take into use a coarse-grained POS category, containing 15 different tags
  • Grammatical relations: The dependency relations between a governing word and its children makes a difference in meaning. A same word pair may have different dependency relation types. For example, “beats nsubj − − − → it” is distinct from “beats dobj − − − → it.”Thus, it is necessary to capture such grammatical relations in SDPs
  • WordNet hypernyms: As illustrated in Section 1, hyponymy information is also useful for relation classification. (Details are not repeated here.) To leverage WordNet hypernyms, we use a tool developed by Ciaramita and Altun (2006).2The tool assigns a hypernym to each word, from 41 predefined concepts in WordNet, e.g., noun.food, verb.motion, etc. Given its hypernym, each word gains a more abstract concept, which helps to build a linkage between different but conceptual similar words.
  1. 层叠拼接4个RNN层StackedRNNCells
  2. 每个RNN后接一个max-pooling来搜集信息
  3. 所有的max-pooling结果concatenation在一起送入softmax
  4. 为了解决deep数据不足的问题,通过将SDP反向+directed relation来进行data augmentation

Bidirectional Recurrent Convolutional Neural Network for Relation Classification

模型

最短依存路径SDP

贡献

  1. 提出RCNN,通过two-channel LSTM捕捉SDP的全局特征,通过CNN捕捉相邻节点的局部特征
  2. 提出沿着SDP双向BiRCNN,从两个方向去捕捉信息。

模型架构

  1. SDP作为输入,包括words和relation,其中words用预训练向量初始化,relation随机初始化。将words和relation分别通过embedding层送入LSTM组成two-channel LSTM(words和relation分别通过不同的LSTM, 互不影响,是两个channel)
  2. CONV从LSTM结果中捕捉dependency unit的局部特征(两个word+一个relation),再通过max-pooling搜集全局信息
  3. 损失函数
    J=∑i=12K+1ti→logyi→+∑i=12K+1ti←logyi←+∑i=1Ktilogyi+λ⋅∥θ∥2J=\sum_{i=1}^{2K+1}\overrightarrow{t_i}log\overrightarrow{y_i} + \sum_{i=1}^{2K+1}\overleftarrow{t_i}log\overleftarrow{y_i} + \sum_{i=1}^{K}{t_i}log{y_i} + \lambda·\|\theta\|^2J=i=12K+1ti

    logyi

    +
    i=12K+1ti

    logyi

    +
    i=1Ktilogyi+λθ2

    其中前两项对应是forward和backward的损失函数,第三项对应拼接forward和backward的损失函数,第4项是l2正则项。
    预测:
    ytest=α⋅y→+(1−α)⋅y←y_{test}=\alpha·\overrightarrow{y} + (1-\alpha)·\overleftarrow{y}ytest=αy

    +
    (1α)y

代码

def model(input_, word_vec_matrix_pretrained, keep_prob, config):word_vec = tf.constant(value=word_vec_matrix_pretrained, name="word_vec", dtype=tf.float32)rel_vec = tf.Variable(tf.random_uniform([config.rel_size, config.rel_vec_size], -0.05, 0.05), name="rel_vec", dtype=tf.float32)#tf.add_to_collection(l2_collection_name, word_vec)tf.add_to_collection(l2_collection_name, rel_vec)# 前向two-channel BiLSTM + CNNwith tf.name_scope("look_up_table_f"):inputs_words_f = tf.nn.embedding_lookup(word_vec, input_["sdp_words_index"])inputs_rels_f = tf.nn.embedding_lookup(rel_vec, input_["sdp_rels_index"])inputs_words_f = tf.nn.dropout(inputs_words_f, keep_prob)inputs_rels_f = tf.nn.dropout(inputs_rels_f, keep_prob)with tf.name_scope("lstm_f"):words_lstm_rst_f = lstm_layer(inputs_words_f, length2(input_["sdp_words_index"]), config.word_lstm_hidden_size, config.forget_bias, "word_lstm_f")rels_lstm_rst_f = lstm_layer(inputs_rels_f, length2(input_["sdp_rels_index"]), config.rel_lstm_hidden_size, config.forget_bias, "rel_lstm_f")tf.summary.histogram("words_lstm_rst_f", words_lstm_rst_f)tf.summary.histogram("rels_lstm_rst_f", rels_lstm_rst_f)with tf.name_scope("conv_max_pool_f"):conv_output_f = conv_layer(words_lstm_rst_f, rels_lstm_rst_f, input_["mask"], config.concat_conv_size, config.conv_out_size, "conv_f")pool_output_f = pool_layer(conv_output_f, config)tf.summary.histogram("conv_output_f", conv_output_f)tf.summary.histogram("pool_output_f", pool_output_f)# 后向two-channel BiLSTM + CNNwith tf.name_scope("look_up_table_b"):inputs_words_b = tf.nn.embedding_lookup(word_vec, input_["sdp_rev_words_index"])inputs_rels_b = tf.nn.embedding_lookup(rel_vec, input_["sdp_rev_rels_index"])inputs_words_b = tf.nn.dropout(inputs_words_b, keep_prob)inputs_rels_b = tf.nn.dropout(inputs_rels_b, keep_prob)with tf.name_scope("lstm_b"):words_lstm_rst_b = lstm_layer(inputs_words_b, length2(input_["sdp_rev_words_index"]), config.word_lstm_hidden_size, config.forget_bias, "word_lstm_b")rels_lstm_rst_b = lstm_layer(inputs_rels_b, length2(input_["sdp_rev_rels_index"]), config.rel_lstm_hidden_size, config.forget_bias, "rel_lstm_b")tf.summary.histogram("words_lstm_rst_b", words_lstm_rst_b)tf.summary.histogram("rels_lstm_rst_b", rels_lstm_rst_b)with tf.name_scope("conv_max_pool_b"):conv_output_b = conv_layer(words_lstm_rst_b, rels_lstm_rst_b, input_["mask"], config.concat_conv_size, config.conv_out_size, "conv_b")pool_output_b = pool_layer(conv_output_b, config)tf.summary.histogram("conv_output_b", conv_output_b)tf.summary.histogram("pool_output_b", pool_output_b)with tf.name_scope("softmax"):# 拼接前向和后向的输出pool_concat = tf.concat([pool_output_f, pool_output_b], 1)logits_f, hypothesis_f = softmax_layer(pool_output_f, config.conv_out_size, 19, "softmax_f")logits_b, hypothesis_b = softmax_layer(pool_output_b, config.conv_out_size, 19, "softmax_b")logits_concat, hypothesis_concat = softmax_layer(pool_concat, 2*(config.conv_out_size), 10, "softmax_concat")# L2 regularizationregularizers = 0vars = tf.get_collection(l2_collection_name)for var in vars:regularizers += tf.nn.l2_loss(var)# loss function# 损失函数loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits_f, labels=input_["label_fb"])loss += tf.nn.softmax_cross_entropy_with_logits(logits=logits_b, labels=input_["label_fb"])if config.has_corase_grained:loss += tf.nn.softmax_cross_entropy_with_logits(logits=logits_concat, labels=input_["label_concat"])loss_avg = tf.reduce_mean(loss) + config.l2 * regularizers# gradient cliptvars = tf.trainable_variables()grads, _ = tf.clip_by_global_norm(tf.gradients(loss_avg, tvars), config.grad_clip)#train_op = tf.train.AdamOptimizer(config.lr)train_op = tf.train.AdadeltaOptimizer(config.lr)optimizer = train_op.apply_gradients(zip(grads, tvars))# get predict results# 预测prediction = get_prediction(hypothesis_f, hypothesis_b, config.alpha)correct_prediction = tf.equal(prediction, tf.argmax(input_["label_fb"], 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))loss_summary = tf.summary.scalar("loss", loss_avg)accuracy_summary = tf.summary.scalar("accuracy_summary", accuracy)grad_summaries = []for g, v in zip(grads, tvars):if g is not None:grad_hist_summary = tf.summary.histogram("{}/grad/hist".format(v.name), g)sparsity_summary = tf.summary.histogram("{}/grad/sparsity".format(v.name), tf.nn.zero_fraction(g))grad_summaries.append(grad_hist_summary)grad_summaries.append(sparsity_summary)grad_summaries_merged = tf.summary.merge(grad_summaries)summary = tf.summary.merge_all()return loss_avg, accuracy, prediction, optimizer, summary

随笔-关系抽取(三) — Dependency-based Models相关推荐

  1. 关系抽取调研-工业界

    关系抽取调研--工业界 目录 1. 任务 1.1. 任务定义 1.2. 数据集 1.3. 评测标准 2. 方法总结 2.1. 基于模板的方法 2.1.1. 基于触发词/字符串 2.1.2. 基于依存句 ...

  2. 基于监督学习和远程监督的神经关系抽取

    基于监督学习和远程监督的神经关系抽取 作者:王嘉宁  QQ:851019059  Email:lygwjn@126.com 最新:博主发表在华东师范大学学报(自然科学版)的<基于远程监督的关系抽 ...

  3. 信息抽取(三)三元关系抽取——改良后的层叠式指针网络,让我的模型F1提升近4%(接上篇)

    信息抽取(三)三元关系抽取--改良后的层叠式指针网络 前言 优化在验证集上的模型推理结果的SPO抽取方法 不随机选择S(subject),⽽是遍历所有不同主语的标注样本构建训练集. 模型优化 加入对抗 ...

  4. 文档级关系抽取:QIUXP:DORE: Document Ordered Relation Extraction based on Generative Framework

    DORE: Document Ordered Relation Extraction based on Generative Framework 文档级关系抽取是从整篇文档中抽取出三元组.更难,但也很 ...

  5. 论文学习16-Going out on a limb: without Dependency Trees(联合实体关系抽取2017)

    文章目录 abstract 1. Introduction 2. 相关工作 3. Model 3.1 Multi-layer Bi-directional Recurrent Network 3.2实 ...

  6. 论文小综 | 文档级关系抽取方法(下)

    本文作者: 陈想,浙江大学在读博士,研究方向为自然语言处理 张宁豫,浙江大学助理研究员,研究方向为自然语言处理.知识表示与推理 这篇推文是文档级关系抽取方法的第二部分,前面的部分请移步推文" ...

  7. 论文阅读课1-Attention Guided Graph Convolutional Networks for Relation Extraction(关系抽取,图卷积,ACL2019,n元)

    文章目录 abstract 1.introduction 1.1 dense connection+GCN 1.2 效果突出 1.3 contribution 2.Attention Guided G ...

  8. 知识图谱课程报告-关系抽取文献综述

    关系抽取文献综述 引言: ​ 随着大数据的不断发展,在海量的结构化数据或非结构化数据中更低成本的抽取出有价值的信息越来越重要,可以说信息抽取是自然语言处理领域的一项最基本任务,信息抽取进而可被分成三个 ...

  9. 关系抽取论文总结(relation extraction)不断更新

    2000 1.Miller, Scott, et al. "A novel use of statistical parsing to extract information from te ...

最新文章

  1. Javascript - 栈 和 单链表
  2. 是否可以从一个static方法内部发出对非static方法的调用?
  3. drozer与adb工具的安装与使用
  4. 电脑卡顿不流畅怎么解决_如何解决因电脑内存容量不足引起的卡顿?
  5. c语言程序设计 doc,《C语言程序设计》.doc
  6. 爬虫:验证码识别准确率(Tesseract-OCR)
  7. android51版本小游戏,世界游戏大全51游戏下载-世界游戏大全51预约 安卓版v1.0.0-PC6手游网...
  8. 大数据分析推动业务增长的方法有哪些
  9. 我的blog终于开通了!
  10. 黑苹果驱动_黑苹果怎么更新驱动程序?
  11. 【Oracle】904: “xxx“: 标识符无效
  12. layui框架轮播图实现轮播图片自适应视口缩放
  13. 前端使用微信sdk上传图片的坑,及万千种方法比较实用的一种,亲测有效
  14. 基于python的数据爬取与分析_基于Python的网站数据爬取与分析的技术实现策略
  15. 11月20日 如何在场景开启Debug,自定义AI任务,EQS,创建自己的环境任务,使用Pawn环境检测来检测周围的环境,让AI动作更顺滑(动画混合
  16. 证明:如果向量组A可由向量组B线性表示,那么A的秩小于等于B的秩
  17. 柯布-道格拉斯效用函数下的pcr抽卡策略
  18. (二进制枚举+思维)1625 夹克爷发红包
  19. CUDA 深入浅出谈
  20. 运行python程序电脑卡死了怎么办_【贴士】电脑运行卡或软件卡死无响应怎么办?...

热门文章

  1. 标准化/归一化对机器学习经典模型的影响
  2. 《炬丰科技-半导体工艺》通过蚀刻技术为LED衬底开发低成本、高通量的硅
  3. 学习笔记 | 演化简单的程序用于玩 Atari 游戏
  4. oCPC和CPC之间的区别
  5. Win下VS2019配置PCL点云库总结
  6. html地图周边搜索,高德地图API实现定位、地点搜索和周边搜索(H5/Vue/微信小程序)...
  7. 精读《算法 - 动态规划》
  8. Android的MVP架构
  9. python壁纸数据抓取_Python爬虫:爬取必应壁纸(可直接运行)
  10. Android app crash的问题排查思路与反思