论文:Clinical Context–Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation

Afzal M, Alam F, Malik K, Malik G
Clinical Context–Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation
J Med Internet Res 2020;22(10):e19810
URL: https://www.jmir.org/2020/10/e19810
DOI: 10.2196/19810

摘要

*背景:*深度学习在自动文本摘要方面比较传统的有比较大优势,可是在医学文本方面,还未有研究。【Automatic text summarization (ATS)】

*目的:*传统方法带来的基础问题例如捕捉临床上下文,证据质量,对于摘要的文本目标的段落选择。围绕这些问题提出准确的,简明的,一致的信息抽取。

*方法:*提了一个框架: Biomed-Summarizer 。基于质量感知的病人/问题(Patient/Problem),干预(Intervention),比较(Comparison)和结果(Outcome )(PICO)的智能和上下文支持的总结生物医学文本。

第一步,开发一个二类分类器。用作质量识别,过滤掉质量不好的科学研究;

第二步,开发一个Bi-LSTM作为下下文的感知分类器。用作为PICO句子的识别;

第三步,语义相似器。使用Jaccard相似来计算query与PICO文本表达序列近似值,这里加入了丰富的医学本体语义;

*第四步:*最后从高分PICO中生成摘要表达;

结果:

​ 1. 识别质量:95.41% (2562/2686);

​ 2. 分类(5类【 aim, population, intervention, results, outcome】):93% (16127/17341) ;

​ 3. 语义相似度算法,相对于基线提升了8.9%;

​ 4. 生成的摘要,经过三个专家从不同维度去估计,获得比较高的正相关结果,表明自动摘要系统是令人满意的。

结论:通过应用提出的Biomed-Summarizer,在ATS上获得高精度,使生物医学文献的研究证据的无缝管理能够用于临床决策。

整体框架

主要分成四个部分:

1. data preprocessing

2.quality recognition

3.contextual text classification

4.text summarization

流程(案例)

数据预测处理

预后质量识别模型

Prognosis quality recognition (PQR) model

5个特征:two data features (title and abstract),three metadata features (article type, publishing journal, authors).

CCA分类模型

对于CCA分类模型:主要是对pico进行分类;

ATS评分

Automatic text summarization

基于多特征矩阵的句子评分机制来来实现句子抽取。

Relevance score

Jaccard similarity metric: Jaccard similarity with semantic enrichments (JS2E)

semantic enrichment : biomedical ontologies(SNOMED CT, MedDRA, NBO, NIFSTD)

步骤:

step 1, 清洗query句子;

step2, 使用BioPortal进行对句子标注;

step3, 每个token通过使用本体中“definition,” “synonyms,” and “prefLabel”来丰富语义;

step4, to retrieve the annotations of text;

step5, 构建metatokens数据结构;

step6, 计算相似值;

Study Type

研究类型在临床上办演着重要的角色。这个经过专业人员的打分。

Venue Credibility

这个经过专业人员的打分。

Freshness

Text Selection for Summary

(1) PICO-based summary;

对于PICO每部分选择topK的文本;采用这些句子来构建摘要;

(2) non-PICO-based summary,没有考虑句子分类,直接是topk.

Example Case

输入:“How does family history affect rupture probability in intracranial aneurysms; is it a significant factor?“ – 摘要查询:颅内动脉瘤家族史

第一,查询提取;

第二,PubMed search service; – 搜索返回239 studies, 130预后研究;

第三,130篇预后研究经这PQM进行对质量过滤;

第四,PICO 分类模型;Aim (32), Patients (9), Intervention (1), Results (168), and Outcome (49).

第五,计算语义相似度;

第六,分数进行组合,并进行排序;

第七,生成摘要;

实验

数据集: BioMed_Summarizer. Brain_Aneurysm_Research.: GitHub URL: https://github.com/smileslab/Brain_Aneurysm_Research/tree/master/BioMed_Summarizer [accessed 2020-10-07]

工具:RapidMiner

PQR结果:

CCA结果

**Proposed Semantic Similarity Algorithm (JS2E)**结果

文本抽取后的评价

结论

这是一篇比较全的系统文章,主要是一个深度机器学习的应用。对于以后设计一个医学摘要系统时,要参考这几块的内容。

相关技术

生物医学领域的ATS

Summarization分为两类:abstractive(抽象式),extractive(抽取式)

概述的综述性文章:Gambhir M, Gupta V. Recent automatic text summarization techniques: a survey. Artif Intell Rev 2016 Mar 29;47(1):1-66. [doi: 10.1007/s10462-016-9475-9]

分类为: statistical-, topic-,graph-, discourse-, machine learning–based approaches ;

基于item set–based mining approach抽取域概念以生成graph的摘要。

Nasr Azadani M, Ghadiri N, Davoodijam E. Graph-based biomedical text summarization: An itemset mining and sentence

clustering approach. J Biomed Inform 2018 Aug;84:42-58 [FREE Full text] [doi: 10.1016/j.jbi.2018.06.005] [Medline:

29906584]

Moradi M. Small-world networks for summarization of biomedical articles. arXiv 2019 Mar 7:1903.02861 [FREE Full

text]

Quantifying the informativeness for biomedical literature summarization: An itemset mining method.

Comput Methods Programs Biomed 2017 Jul;146:77-89. [doi: 10.1016/j.cmpb.2017.05.011] [Medline: 28688492]

基于统计特征 such as term frequency, sentence position,and similarity with the title

Luhn HP. The Automatic Creation of Literature Abstracts. IBM J Res Dev 1958 Apr;2(2):159-165. [doi: 10.1147/rd.22.0159]

Ferreira R, de Souza Cabral L, Freitas F, Lins RD, de França Silva G, Simske SJ, et al. A multi-document summarization

system based on statistics and linguistic treatment. Exp Syst Appl 2014 Oct;41(13):5780-5787. [doi:

10.1016/j.eswa.2014.03.023]

从外部引入semantic information

Ferreira R, de Souza Cabral L, Freitas F, Lins RD, de França Silva G, Simske SJ, et al. A multi-document summarization

system based on statistics and linguistic treatment. Exp Syst Appl 2014 Oct;41(13):5780-5787. [doi:

10.1016/j.eswa.2014.03.023]

Lynn HM, Choi C, Kim P. An improved method of automatic text summarization for web contents using lexical chain with

semantic-related terms. Soft Comput 2017 Apr 27;22(12):4013-4023. [doi: 10.1007/s00500-017-2612-9]

文本中PICO元素的识别(3大类)

第一类,individual PICO element identification;

Bui DDA, Del Fiol G, Hurdle JF, Jonnalagadda S. Extractive text summarization system to aid data extraction from full

text in systematic review development. J Biomed Inform 2016 Dec;64:265-272 [FREE Full text] [doi:10.1016/j.jbi.2016.10.014] [Medline: 27989816]

Boudin F, Shi L, Nie J. Improving medical information retrieval with PICO element detection. : Springer; 2010 Mar

Presented at: European Conference on Information Retrieval; March 2010; Berlin, Heidelberg. [doi: 10.1007/978-3-642-12275-0_8]

Huang K, Chiang I, Xiao F, Liao C, Liu C, Wong J. PICO element detection in medical text without metadata: are first sentences enough? J Biomed Inform 2013 Oct;46(5):940-946 [FREE Full text] [doi: 10.1016/j.jbi.2013.07.009] [Medline:

23899909]

第二类,sentence classification;

Jin D, Szolovits P. PICO Element Detection in Medical Text via Deep Neural Networks. In: BioNLP 2018 Workshop.:

Association for Computational Linguistics; 2018 Jul Presented at: Proceedings of the BioNLP 2018 workshop; July 2018;

Melbourne, Australia URL: https://www.aclweb.org/anthology/papers/W/W18/W18-2308/ [doi: 10.18653/v1/w18-2308]

Kim S, Martinez D, Cavedon L, Yencken L. Automatic classification of sentences to support Evidence Based Medicine.

BMC Bioinformatics 2011;12(Suppl 2):S5. [doi: 10.1186/1471-2105-12-s2-s5]

第三类,question and answer with summarization;

Bui DDA, Del Fiol G, Hurdle JF, Jonnalagadda S. Extractive text summarization system to aid data extraction from full

text in systematic review development. J Biomed Inform 2016 Dec;64:265-272 [FREE Full text] [doi:

10.1016/j.jbi.2016.10.014] [Medline: 27989816]

Demner-Fushman D, Lin J. Answering Clinical Questions with Knowledge-Based and Statistical Techniques. Comp Ling

2007 Mar;33(1):63-103. [doi: 10.1162/coli.2007.33.1.63]

element level and sentence level

Zlabinger M, Andersson L, Hanbury A, Andersson M, Quasnik V, Brassey J. Medical entity corpus with pico elements

and sentiment analysis. In: European Language Resources Association (ELRA). 2018 May Presented at: Eleventh International

Conference on Language Resources and Evaluation (LREC 2018); 2018; Miyazaki, Japan

machine learning and rule-based methods

Chabou S, Iglewski M. PICO Extraction by combining the robustness of machine-learning methods with the rule-based methods. : IEEE; 2015 Jun Presented at: 2015 World Congr Inf Technol Comput Appl WCITCA 2015 Internet IEEE; June 2015; Hammamet, Tunisia. [doi: 10.1109/wcitca.2015.7367038]

supervised distance supervision approach

Wallace B, Kuiper J, Sharma A, Zhu M, Marshall I. Extracting PICO Sentences from Clinical Trial Reports using Supervised

Distant Supervision. J Mach Learn Res 2016;17:132 [FREE Full text] [Medline: 27746703]

naave Bayes–based classifier

Huang K, Chiang I, Xiao F, Liao C, Liu C, Wong J. PICO element detection in medical text without metadata: are first

sentences enough? J Biomed Inform 2013 Oct;46(5):940-946 [FREE Full text] [doi: 10.1016/j.jbi.2013.07.009] [Medline:

23899909]

multiple supervised classification algorithms

Boudin F, Nie J, Bartlett JC, Grad R, Pluye P, Dawes M. Combining classifiers for robust PICO element detection. BMC

Med Inform Decis Mak 2010 May 15;10(1):29 [FREE Full text] [doi: 10.1186/1472-6947-10-29] [Medline: 20470429]

生物医学研究的质量

Towards automatic recognition of scientifically rigorous clinical research evidence 2009
An overview of the design and methods for retrieving high-quality studies for clinical care. 2005
Developing optimal search strategies for detecting clinically sound prognostic studies in MEDLINE: an analytic survey 2004
Text categorization models for high-quality article retrieval in internal medicine 2005
A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents 2006
MEDLINE clinical queries are robust when searching in recent publishing years 2013 Medical Subject Heading (MeSH) terms
A Deep Learning Method to Automatically Identify Reports of Scientifically Rigorous Clinical Research from the Biomedical Literature: Comparative Analytic Study 2018 Medical Subject Heading (MeSH) terms
Impact of Automatic Query Generation and Quality Recognition Using Deep Learning to Curate Evidence From Biomedical Literature: Empirical Study 2019
Comparison of the time-to-indexing in PubMed between biomedical journals according to impactfactor, discipline, and focus. 2017

总结的句子评分和排名

[45]:最常见的方法是基于频率的方法

[5]:一个句子中围绕着题目的词来描述,表述这个句子应该评分比较高;

[4,6,44]:关键词技术;

[3,46-49]:深度学习也引入来了;

[49]:multidocument summarization

[46]query-focus summarization system called AttSum; – AttSum: Joint learning of focusing and summarization with neural attention

参考

brain aneurysm: [医]脑动脉瘤

prognosis:预后(根据经验预测的疾病发展情况)

scientifically sound:科学合理的

happyprince; https://blog.csdn.net/ld326/article/details/115012807

[论文阅读笔记29]生物医学文本摘要(Biomedical Text Summarization)相关推荐

  1. 自动文本摘要(automatic text summarization)目前的研究方法分类

    2019独角兽企业重金招聘Python工程师标准>>> 自动文本摘要通常可分为两类,分别是抽取式(extractive)和生成式(abstractive).  (1)抽取式摘要判断原 ...

  2. 论文阅读笔记:Graph Convolutional Networks for Text Classification

    Abstract 文本分类作为一个经典的自然语言处理任务,已经有很多利用卷积神经网络进行文本分类的研究,但是利用图卷积神经网络进行研究的仍然较少. 本文基于单词共现和文档单词间的关系构建一个text ...

  3. 论文阅读笔记:Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

    论文阅读笔记:Swin Transformer 摘要 1 简介 2 相关工作 3 方法论 3.1 总览 Swin Transformer block 3.2 shifted window-based ...

  4. 《论文阅读》开放域对话摘要(长文本|知识嵌入)

    <论文阅读>开放域对话摘要(长文本|知识嵌入) 前言 相关知识 文本摘要 对话摘要 Mind the Gap! Injecting Commonsense Knowledge for Ab ...

  5. [论文阅读笔记52]深度学习实体关系抽取研究综述

    来源:软件学报 2019 1.摘要: 围绕有监督和远程监督两个领域,系统总结了近几年来中外学者基于深度学习的实体关系 抽取研究进展,并对未来可能的研究方向进行了探讨和展望. 2.经典的实体关系抽取方法 ...

  6. 论文阅读笔记(三)——从老虎到熊猫:动物头部检测

    论文阅读笔记(三)--从老虎到熊猫:动物头部检测 论文简介 论文中文翻译:<从老虎到熊猫:动物头部检测> 论文名称:<From Tiger to Panda: Animal Head ...

  7. 论文阅读笔记(五)——狐猴识别系统:一种便于狐猴个体识别的面部识别系统

    论文阅读笔记(五)--狐猴识别系统:一种便于狐猴个体识别的面部识别系统 论文简介 论文中文翻译:狐猴识别系统:一种便于狐猴个体识别的面部识别系统 论文名称:<LemurFaceID: a fac ...

  8. [论文阅读笔记15]Recognizing Complex Entity Mentions:A Review and Future Directions

    一,题目 Recognizing Complex Entity Mentions:A Review and Future Directions 识别复杂实体mentions:回顾与未来方向 Dai X ...

  9. 论文阅读笔记——Vulnerability Dataset Construction Methods Applied To Vulnerability Detection A Survey

    本论文相关内容 论文下载地址--Web Of Science 论文中文翻译--Vulnerability Dataset Construction Methods Applied To Vulnera ...

  10. 论文阅读笔记(一)——铁饼鱼的面部识别:使用数字模型的实验方法

    论文阅读笔记(一)--铁饼鱼的面部识别:使用数字模型的实验方法 论文简介 期刊情况 摘要 研究背景 正文 动物实验对象的制备 社交展示的描述 实验过程 实验1 实验2 道德声明 结果 商量 论文简介 ...

最新文章

  1. MP4移动商学院―――管理者贴身教练!
  2. python怎么导入视频-python怎么导入数据
  3. 基于51单片机实现模拟IIC总线时序
  4. SQL查询【根据生日计算】
  5. 动态页面技术(JSP/EL/JSTL)
  6. P4922-[MtOI2018]崩坏3?非酋之战!【dp】
  7. 【渝粤题库】陕西师范大学200681C语言程序设计 作业(高起专、高起本)
  8. Maven多模块打包遇到的问题详解
  9. 取代不了 C/C++ 的 Rust 如何“逆袭”?
  10. jquery-修改、回退结果集
  11. 招投标相关法律及条例
  12. 深入浅出Linux设备驱动编程--引言
  13. 用VC++自制王码五笔输入法安装包(转)
  14. VirtualBox调试分辨率时遇到的问题
  15. ubuntu20.04.4虚拟机 ping不通百度问题解决
  16. keepalived配置虚拟IP
  17. html5源码笔记(三)【爱创课堂专业前端培训】
  18. 你不可不用的十类Mac装机必备软件
  19. Linux发展史之简要概述
  20. 生命周期_axios

热门文章

  1. 【业余无线电BI1FKP】宝峰UV9R-Plus写频、自制写频线
  2. 深蓝超级计算机象棋人机大战,象棋人机大战绝唱:超级计算机“浪潮天梭”vs“象棋第一人”许银川的巅峰之战...
  3. python生成三维点云包围盒
  4. iOS开发 调用系统的震动和提示音
  5. 瑞晟蓝牙来电语音软件下载_语音导出app手机最新版-语音导出app安卓免费版下载v8.3-领航下载站...
  6. 创意的键盘钢琴音源 Native Instruments Hybrid Keys 1.1.0
  7. 四巨头键盘钢琴音源完整版-Spectrasonics Keyscape v1.1.3C WiN-MAC
  8. RamDisk Plus内存虚拟硬盘软件
  9. Windows系统语言切换问题
  10. 离散数学_命题逻辑_部分习题