Python自然语言处理学习笔记(64): 7.5 命名实体识别
Python自然语言处理学习笔记(64): 7.5 命名实体识别
7.5 Named Entity Recognition 命名实体识别
At the start of this chapter, we briefly introduced named entities (NEs). Named entities are definite(确定的) noun phrases that refer to specific types of individuals, such as organizations, persons, dates, and so on(命名实体是明确的名词短语,指的是个体的具体类型,例如组织,个人,日期等等). Table 7.4 lists some of the more commonly used types of NEs. These should be self-explanatory(不言自明的), except for "Facility": human-made artifacts(人工产品) in the domains of architecture and civil engineering(土木工程); and "GPE": geo-political entities(地缘政治实体) such as city, state/province, and country.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The goal of a named entity recognition (NER) system is to identify all textual mentions of the named entities. This can be broken down into(分解成) two sub-tasks: identifying the boundaries of the NE, and identifying its type(识别NE的边界和它们的类型). While named entity recognition is frequently a prelude(序曲) to identifying relations in Information Extraction, it can also contribute to other tasks. For example, in Question Answering (QA), we try to improve the precision of Information Retrieval by recovering not whole pages, but just those parts which contain an answer to the user's question. Most QA systems take the documents returned by standard Information Retrieval, and then attempt to isolate the minimal text snippet in the document containing the answer. Now suppose the question was Who was the first President of the US?, and one of the documents that was retrieved contained the following passage:
(5) |
The Washington Monument is the most prominent structure in Washington, D.C. and one of the city's early attractions. It was built in honor of George Washington, who led the country to independence and then became its first President. |
Analysis of the question leads us to expect that an answer should be of the form X was the first President of the US, where X is not only a noun phrase, but also refers to a named entity of type PER. This should allow us to ignore the first sentence in the passage. While it contains two occurrences of Washington, named entity recognition should tell us that neither of them has the correct type.
How do we go about identifying named entities? One option would be to look up each word in an appropriate list of names. For example, in the case of locations, we could use a gazetteer(地名词典), or geographical dictionary, such as the Alexandria Gazetteer or the Getty Gazetteer. However, doing this blindly runs into problems, as shown in Figure 7.12.
Eddy N B-PER
Bonte N I-PER
is V O
woordvoerder N O
van Prep O
diezelfde Pron O
Hogeschool N B-ORG
. Punc O
In this representation, there is one token per line, each with its part-of-speech tag and its named entity tag. Based on this training corpus, we can construct a tagger that can be used to label new sentences; and use the nltk.chunk.conlltags2tree() function to convert the tag sequences into a chunk tree.
NLTK provides a classifier that has already been trained to recognize named entities, accessed with the function nltk.ne_chunk(). If we set the parameter binary=True, then named entities are just tagged as NE; otherwise, the classifier adds category labels such as PERSON, ORGANIZATION, and GPE.
|
||
|
Python自然语言处理学习笔记(64): 7.5 命名实体识别相关推荐
- 系统学习NLP(十一)--命名实体识别
转自:https://www.cnblogs.com/bep-feijin/articles/9650898.html 命名实体识别(Named EntitiesRecognition, NER)是自 ...
- [Python人工智能] 二十七.基于BiLSTM-CRF的医学命名实体识别研究(下)模型构建
这篇文章写得很冗余,但是我相信你如果真的看完,并且按照我的代码和逻辑进行分析,对您以后的数据预处理和命名实体识别都有帮助,只有真正对这些复杂的文本进行NLP处理后,您才能适应更多的真实环境,坚持!毕竟 ...
- 自然语言处理(二)——词性标注与命名实体识别
文章目录 一.Jieba 二.NLPIR 三.nltk 四.SnowNLP 五.StandFordNLP 六.thulac 七.StandfordNLP 八.结论 本次依然使用上篇博客( 自然语言处理 ...
- python Name Entity Recognition(NER) extract company name 命名实体识别 提取公司名
命名实体识别(NER)可能是信息提取的第一步,该过程旨在将文本中的命名实体定位和分类为预定义类别,例如人员名称,组织,位置,时间表达,数量,货币价值,百分比,等等.NER在自然语言处理(NLP)的许多 ...
- Python自然语言处理学习笔记(2):Preface 前言
Updated 1st:2011/8/5 Updated 2nd:2012/5/14 中英对照完成 Preface 前言 This is a book about Natural Language ...
- Python自然语言处理学习笔记(7):1.5 自动理解自然语言
Updated log 1st:2011/8/5 1.5 Automatic Natural Language Understanding 自然语言的自动理解 We have been explori ...
- python自然语言处理学习笔记一
第一章 语言处理与python 1 语言计算 文本与词汇 NLTK入门 下载安装nltk http://www.nltk.org 下载数据 >>> import nltk >& ...
- python自然语言处理-学习笔记(一)之nltk入门
nltk学习第一章 一,入门 1,nltk包的导入和报的下载 import nltk nltk.download() (eg: nltk.download('punkt'),也可以指定下载那个包) 2 ...
- Python自然语言处理学习笔记(32):4.4 函数:结构化编程的基础
4.4 Functions: The Foundation of Structured Programming 函数:结构化编程的基础 Functions provide an effective ...
- Python自然语言处理学习笔记(19):3.3 使用Unicode进行文字处理
3.3 Text Processing with Unicode 使用Unicode进行文字处理 Our programs will often need to deal with differe ...
最新文章
- 【数学和算法】最小二乘法理论(附c++代码)
- 线程同步以及yield()、wait()、Notify()、Notifyall()
- [book]道法自然
- django 创建mysql失败_创建表时出现Django MySQL错误
- 一步步编写操作系统 39 二级页表1
- 关于linux的进程和线程
- 社会内卷的真正原因:华为内部论坛的这篇短文讲透了
- Python3.x中的三目运算实现方法
- linux导出Mysql数据sql脚本
- pandas--groupby相关操作
- 浙大计算机基础知识试题及答案,14年浙大远程教育计算机基础4.电子表格Excel 2010知识题(高起专)作业题4答案...
- 【2022最新Java面试宝典】—— Java基础知识面试题(91道含答案)
- 深度学习(二),终于理解了深度学习原理--SPGD(SGD)优化算法的实现原理
- 计算机编程的地位,学习编程的重要性
- EI会议列表--IEEE主办的会议
- 如何设计一份令人舒服的PPT,每次看都有新的idea
- linux操作系统拼音,linux怎么读(中文读音发音)
- ubuntu 刷新频率 如何查看_Ubuntu 7.04救命啊!屏幕刷新频率只有50HZ眼不行啦!显示器是CRT...
- FocalLoss解析
- C#指定图片添加文字——修改版