7.7   Summary 小结

  • Information extraction systems search large bodies of unrestricted text for specific types of entities and relations, and use them to populate well-organized databases. These databases can then be used to find answers for specific questions.
  • The typical architecture for an information extraction system begins by segmenting, tokenizing, and part-of-speech tagging the text. The resulting data is then searched for specific types of entity. Finally, the information extraction system looks at entities that are mentioned near one another in the text, and tries to determine whether specific relationships hold between those entities.
  • Entity recognition is often performed using chunkers, which segment multi-token sequences, and label them with the appropriate entity type. Common entity types include ORGANIZATION, PERSON, LOCATION, DATE, TIME, MONEY, and GPE (geo-political entity).
  • Chunkers can be constructed using rule-based systems, such as the RegexpParser class provided by NLTK; or using machine learning techniques, such as the ConsecutiveNPChunker presented in this chapter. In either case, part-of-speech tags are often a very important feature when searching for chunks.
  • Although chunkers are specialized to create relatively flat data structures, where no two chunks are allowed to overlap, they can be cascaded together to build nested structures.
  • Relation extraction can be performed using either rule-based systems which typically look for specific patterns in the text that connect entities and the intervening words; or using machine-learning systems which typically attempt to learn such patterns automatically from a training corpus.

转载于:https://www.cnblogs.com/yuxc/archive/2012/02/09/2344461.html

Python自然语言处理学习笔记(66):7.7 小结相关推荐

  1. Python自然语言处理学习笔记(2):Preface 前言

    Updated 1st:2011/8/5 Updated 2nd:2012/5/14  中英对照完成 Preface 前言 This is a book about Natural Language ...

  2. Python自然语言处理学习笔记(7):1.5 自动理解自然语言

    Updated log 1st:2011/8/5 1.5 Automatic Natural Language Understanding 自然语言的自动理解 We have been explori ...

  3. python自然语言处理学习笔记一

    第一章 语言处理与python 1 语言计算 文本与词汇 NLTK入门 下载安装nltk http://www.nltk.org 下载数据 >>> import nltk >& ...

  4. python自然语言处理-学习笔记(一)之nltk入门

    nltk学习第一章 一,入门 1,nltk包的导入和报的下载 import nltk nltk.download() (eg: nltk.download('punkt'),也可以指定下载那个包) 2 ...

  5. python自然语言处理学习笔记三

    第三章 处理原始文本 1 从网络和硬盘访问文本 #<<罪与罚>>的英文翻译 未作测试?? From utlib import urlopen Url='http://www.g ...

  6. Python自然语言处理学习笔记(32):4.4 函数:结构化编程的基础

    4.4   Functions: The Foundation of Structured Programming 函数:结构化编程的基础 Functions provide an effective ...

  7. Python自然语言处理学习笔记(19):3.3 使用Unicode进行文字处理

    3.3 Text Processing with Unicode 使用Unicode进行文字处理   Our programs will often need to deal with differe ...

  8. Python自然语言处理学习笔记(68):7.9 练习

    7.9   Exercises  练习 ☼ The IOB format categorizes tagged tokens as I, O and B. Why are three tags nec ...

  9. Python自然语言处理学习笔记(41):5.2 标注语料库

    5.2   Tagged Corpora 标注语料库 Representing Tagged Tokens 表示标注的语言符号 By convention in NLTK, a tagged toke ...

最新文章

  1. DNS--3--Master DNS架设
  2. 2019~2020这个时间段适合买房吗?
  3. JAVA 泛型与反射
  4. bzoj 3343 教主的魔法 分块
  5. c# datetime._C#| DateTime.AddTicks()方法与示例
  6. 佩斯大学计算机科学世界排名,美国佩斯大学留学推荐 计算机科学专业
  7. 机器学习入门:多变量线性回归
  8. 《ArcGIS Runtime SDK for Android开发笔记》——(3)、ArcGIS Runtime SDK概述
  9. 群消息已读回执(这个diao),究竟是推还是拉?
  10. java 异常提示_Java显示异常信息与异常分类
  11. poi-java导出word的表格中换行
  12. abrt-hook-ccpp占用cpu过大,使用systemctl进行关闭
  13. ORACLE:单行函数
  14. 莫队算法(普通莫队、带修莫队、树上莫队、不删除莫队)学习笔记【理解+套路/核心代码+例题及题解】
  15. docker中部署piggymetrics微服务项目
  16. jq实现一个简易的选项卡
  17. Laplace变换基础
  18. 故障:卡死原因及解决
  19. Java的平台无关性是怎么实现的?
  20. 计算机理科和工科学哪个好,理科和工科哪个更厉害 两者哪个更好就业

热门文章

  1. 杭电 HOJ 1251 统计难题 解题报告
  2. c语言求n以内的素数的个数,关于求N以内素数的一点小问题(N小于一亿)
  3. python无法导入pyaudio_python – pyaudio无法导入_portaudio
  4. 使用nginx后如何在web应用中获取用户ip及原理解释
  5. vim-go开发环境Tagbar插件和NERTree插件安装
  6. 流式处理框架storm浅析(下篇)
  7. Bluetooth Low Energy 嗅探
  8. 《OpenGL编程指南》一第2章 着色器基础
  9. 「daza.io」这将是我独立完成全端开发的项目
  10. Android自动化测试之虚拟机中软件安装方法(四)