【java】使用Stanford CoreNLP处理英文（词性标注/词形还原/解析等）

本文旨在学习使用Stanford CoreNLP进行自然语言处理。

编程环境：64位win7系统，NetBeans，java要求1.8+

CoreNLP版本：3.6.0，下载地址：http://stanfordnlp.github.io/CoreNLP/，获取stanford-corenlp-full-2015-12-09.zip压缩包。

Stanford CoreNLP功能：分词（tokenize）、分句（ssplit）、词性标注（pos）、词形还原（lemma,中文没有）、命名实体识别（ner）、语法解析（parse）、情感分析（sentiment）、指代消解（coreference resolution）等。

支持语言：中文、英文、法语、德语、西班牙语、阿拉伯语等。

具体使用：

1.在NetBeans中新建工程；

2.解压stanford-corenlp-full-2015-12-09.zip，将下面的jar包导入工程库中：

slf4j-api.jar
slf4j-simple.jar
stanford-corenlp-3.6.0.jar
stanford-corenlp-3.6.0-javadoc.jar
stanford-corenlp-3.6.0-models.jar
stanford-corenlp-3.6.0-sources.jar
xom.jar

3.新建如下代码：

package corenlp;/*** 功能：练习使用CoreNLP，针对英文处理* 时间：2016年4月22日 14:03:42*
*/
import java.util.List;
import java.util.Map;
import java.util.Properties;import edu.stanford.nlp.dcoref.CorefChain;
import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.LemmaAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
// import edu.stanford.nlp.sentiment.SentimentCoreAnnotations;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.util.CoreMap;public class CoreNLP {public static void main(String[] args) {/*** 创建一个StanfordCoreNLP object* tokenize(分词)、ssplit(断句)、 pos(词性标注)、lemma(词形还原)、* ner(命名实体识别)、parse(语法解析)、指代消解？同义词分辨？*/Properties props = new Properties();    props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");    // 七种AnnotatorsStanfordCoreNLP pipeline = new StanfordCoreNLP(props);    // 依次处理String text = "This is a test.";               // 输入文本Annotation document = new Annotation(text);    // 利用text创建一个空的Annotationpipeline.annotate(document);                   // 对text执行所有的Annotators（七种）// 下面的sentences 中包含了所有分析结果，遍历即可获知结果。List<CoreMap> sentences = document.get(SentencesAnnotation.class);System.out.println("word\tpos\tlemma\tner");for(CoreMap sentence: sentences) {for (CoreLabel token: sentence.get(TokensAnnotation.class)) {String word = token.get(TextAnnotation.class);            // 获取分词String pos = token.get(PartOfSpeechAnnotation.class);     // 获取词性标注String ne = token.get(NamedEntityTagAnnotation.class);    // 获取命名实体识别结果String lemma = token.get(LemmaAnnotation.class);          // 获取词形还原结果System.out.println(word+"\t"+pos+"\t"+lemma+"\t"+ne);}// 获取parse treeTree tree = sentence.get(TreeAnnotation.class);    System.out.println(tree.toString());// 获取dependency graphSemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);System.out.println(dependencies);}Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);}
}

解释：该代码将 text字符串交给Stanford CoreNLP处理，StanfordCoreNLP的各个组件（annotator）对其依次进行处理。

处理完后的sentences中包含了所有分析结果，对其遍历即可获取。

4.运行结果：

【java】使用Stanford CoreNLP处理英文（词性标注/词形还原/解析等）相关推荐

NLP工具——Stanford CoreNLP的python封装包处理中文
文章目录 1.StanfordCoreNLP是什么? 2.StanfordNLP是什么? 3.StanfordNLP的使用 3.1 安装 3.2 运行 3.3 如何处理中文? 3.4 demo 4.第 ...
自然语言处理——词性标注、词干提取、词形还原
目录词性标注方法工具实例词干提取和词形还原算法步骤词性标注一般而言,文本里的动词可能比较重要,而助词可能不太重要: 我今天真好看我今天真好看啊甚至有时候同一个词有着不同的意思: ...
Java基于stanford-corenlp实现英文词形还原
本文作者:合肥工业大学管理学院钱洋 email:1563178220@qq.com 内容可能有不到之处,欢迎交流. 未经本人允许禁止转载. 文章目录简介 stanford-corenlp jav ...
Stanford CoreNLP Stanza使用手册
1. 手动下载工具包:(参考https://github.com/stanfordnlp/stanza/issues/275) 默认英语工具包:https://nlp.stanford.edu/sof ...
StanfordCoreNLP: 英文句子词性还原、词干标注工具包简单使用（Java）
一.说明 StanfordCoreNLP是Stanford开发的关于自然语言处理的工具包,其包括分词.词性还原以及词性标注等很多功能.具体可参考官网:https://stanfordnlp.githu ...
Ubuntu下安装Stanford CoreNLP
Stanford CoreNLP提供了一系列自然语言分析工具.它能够给出基本的词形,词性,不管是公司名还是人名等,格式化的日期,时间,量词,并且能够标记句子的结构,语法形式和字词依赖,指明那些名字指向 ...
Stanford CoreNLP使用需要注意的一点
1.Stanford CoreNLP maven依赖,jdk依赖1.8 <dependency> <groupId>edu.stanford.nlp</groupId&g ...
Stanford CoreNLP遇到的问题
Exception in thread "main" java.lang.RuntimeException: edu.stanford.nlp.io.RuntimeIOExcept ...
Stanford Corenlp中文分词自定义词典（扩展词典）
Stanford Corenlp是斯坦福大学的自然语言处理工具,其中中文分词是基于条件随机场CRF (Conditional Random Field) ,不是基于字典的直接匹配.最近调用Stanfo ...
Stanford CoreNLP超简单安装及简单使用，句法分析及依存句法分析
Stanford CoreNLP超简单安装及简单使用,句法分析及依存句法分析,使用jupyter notebook 今天我们来使用Stanford CoreNLP进行简单的句法分析,我使用的是jupy ...

【java】使用Stanford CoreNLP处理英文（词性标注/词形还原/解析等）

【java】使用Stanford CoreNLP处理英文（词性标注/词形还原/解析等）相关推荐

最新文章

热门文章