斯坦福stanford coreNLP 宾州树库汉语短语类别表23个
短语标记17个
标注 |
英文说明 |
中文说明 |
ADJP |
Adjective phrase |
形容词短语,由JJ投射 |
ADVP |
Adverbial phrase headed by AD |
由副词开头的副词短语、状语 |
CLP |
Classifier phrase |
量词短语 |
CP |
Clause headed by C(complementizer) |
由补语引导的补语从句,关系从句 |
DNP |
Phrase formed by “XP+DEG” |
XP+DEG结构构成的短语 |
DP |
Determiner phrease |
限定词短语 |
DVP |
Phrase formed BY ‘’XP+DEB“ |
XP+DEV结构构成的短语 |
FRAG |
fragment |
片段 |
IP |
InflectionPhrase |
Simple clause headed by I(INFL或其他曲折成份) |
LCP |
Phrase formed by ”XP+LC“ |
处所词为中心语的短语 |
LST |
List marker |
用于解释说明性的列表标记短语 |
NP |
Noun phrase |
名词短语 |
PP |
Preposition phrase |
介词短语 |
PRN |
Parenthetical |
插入语 |
QP |
Quantifier phrase |
数词短语,由数量词构成的短语结构 |
UCP |
Unidentical coordination phrase |
非一致性并列短语 |
VP |
Verb phrase |
动词短语 |
动词复合6个标记
VCD 并列动词复合 (VCD (VV 投资 ) (VV 办厂 ))
VCP VV+VC 动词+是
VNV A不A,A一A,(VNV(VV 能) (AD 不) (VV 能))
VPT V的R,或V不R (VPT (VV 得) (AD 不) (VV 到))
VRD 动词结果复合,第二个成份是第一个成份的结果(VRD (VV 呈现) (VV 出));(VP(VRD(VV 联合) (VV 起来)))
VSB 定语+核心复合,第一个成份为不及物动词,两个成份之间没有附加语或者体标记,VSB (VV 加速) (VV 建设)) (VP(VSB(VV 仰头)(VV 望去)))
NP
中心词为名词构成的短语。从语法角度看,有两种含义:(1)按句法成份构成的短语,如组块在句子中充当主语、宾语等,可以增加辅助标签,NP-Sbg,NP-Obj;(2)知识库中的实体和属性,这种组块称为baseNP。
VP
以动词为中心,与其修饰、限定、并列成份共同构成的一种语义组块。
CoreNLP中源码
nonTerminalInfo.put("ROOT",new String[][]{{left, "IP"}});
nonTerminalInfo.put("PAIR",new String[][]{{left, "IP"}});// Major syntactic categories
nonTerminalInfo.put("ADJP",new String[][]{{left, "JJ","ADJP"}}); // there is one ADJP unary rewrite to AD but otherwiseall have JJ or ADJP
nonTerminalInfo.put("ADVP",new String[][]{{left, "AD","CS", "ADVP","JJ"}}); // CS is a subordinating conjunctor, and there are acouple of ADVP->JJ unary rewrites
nonTerminalInfo.put("CLP",new String[][]{{right, "M","CLP"}});
//nonTerminalInfo.put("CP", newString[][] {{left,"WHNP","IP","CP","VP"}}); // this iscomplicated; see bracketing guide p. 34. Actually, all WHNP are empty. IP/CP seems to be the best semantic head; syntax would dictate DEC/ADVP.Using IP/CP/VP/M is INCREDIBLY bad for Dep parser - lose 3% absolute.
nonTerminalInfo.put("CP",new String[][]{{right, "DEC","WHNP", "WHPP"},rightExceptPunct}); // the (syntax-oriented) right-first head rule
// nonTerminalInfo.put("CP", new String[][]{{right, "DEC","ADVP", "CP", "IP", "VP","M"}}); // the (syntax-oriented) right-first head rule
nonTerminalInfo.put("DNP",new String[][]{{right, "DEG","DEC"}, rightExceptPunct});//according to tgrep2, first preparation, all DNPs have a DEG daughter
nonTerminalInfo.put("DP",new String[][]{{left, "DT","DP"}}); // there's one instance of DP adjunction
nonTerminalInfo.put("DVP",new String[][]{{right, "DEV","DEC"}}); // DVP always has DEV under it
nonTerminalInfo.put("FRAG",new String[][]{{right, "VV","NN"}, rightExceptPunct});//FRAGseems only to be used for bits at the beginnings of articles:"Xinwenshe<DATE>" and "(wan)"
nonTerminalInfo.put("INTJ",new String[][]{{right, "INTJ","IJ", "SP"}});
nonTerminalInfo.put("IP",new String[][]{{left, "VP","IP"}, rightExceptPunct}); // CDM July 2010 following email from Pi-Chuanchanged preference to VP over IP: IP can be -SBJ, -OBJ, or -ADV, and shouldn'tbe head
nonTerminalInfo.put("LCP",new String[][]{{right, "LC","LCP"}}); // there's a bit of LCP adjunction
nonTerminalInfo.put("LST",new String[][]{{right, "CD","PU"}}); // covers all examples
nonTerminalInfo.put("NP",new String[][]{{right, "NN","NR", "NT","NP", "PN","CP"}}); // Basic heads are NN/NR/NT/NP; PN is pronoun. Some NPs are nominalized relative clauseswithout overt nominal material; these are NP->CP unary rewrites. Finally, note that this doesn't give any specialtreatment of coordination.
nonTerminalInfo.put("PP",new String[][]{{left, "P","PP"}}); // in the manual there's an example of VV heading PP butI couldn't find such an example with tgrep2
// cdm 2006: PRN changed to not choose punctuation. Helped parsing (if not significantly)
// nonTerminalInfo.put("PRN", new String[][]{{left,"PU"}}); //presumably left/right doesn't matter
nonTerminalInfo.put("PRN",new String[][]{{left, "NP","VP", "IP","QP", "PP","ADJP", "CLP","LCP"}, {rightdis, "NN","NR", "NT","FW"}});
// cdm 2006: QP: add OD -- occurs some;occasionally NP, NT, M; parsing performance no-op
nonTerminalInfo.put("QP",new String[][]{{right, "QP","CLP", "CD","OD", "NP","NT", "M"}});//there's some QP adjunction
// add OD?
nonTerminalInfo.put("UCP",new String[][]{{left, }}); //an alternative would be"PU","CC"
nonTerminalInfo.put("VP",new String[][]{{left, "VP","VCD", "VPT","VV", "VCP","VA", "VC","VE", "IP","VSB", "VCP","VRD", "VNV"},leftExceptPunct}); //note that ba and long bei introduce IP-OBJ smallclauses; short bei introduces VP
// add BA, LB, as needed// verb compounds
nonTerminalInfo.put("VCD",new String[][]{{left, "VCD","VV", "VA","VC", "VE"}});//could easily be right instead
nonTerminalInfo.put("VCP",new String[][]{{left, "VCD","VV", "VA","VC", "VE"}});// notmuch info from documentation
nonTerminalInfo.put("VRD",new String[][]{{left, "VCD","VRD", "VV","VA", "VC","VE"}}); // definitely left
nonTerminalInfo.put("VSB",new String[][]{{right, "VCD","VSB", "VV","VA", "VC","VE"}}); // definitely right, though some examples lookquestionably classified (na2lai2 zhi1fu4)
nonTerminalInfo.put("VNV",new String[][]{{left, "VV","VA", "VC","VE"}}); // left/right doesn't matter
nonTerminalInfo.put("VPT",new String[][]{{left, "VV","VA", "VC","VE"}}); // activity verb is to the left// some POS tags apparently sit where phrases are supposed to be
nonTerminalInfo.put("CD",new String[][]{{right, "CD"}});
nonTerminalInfo.put("NN",new String[][]{{right, "NN"}});
nonTerminalInfo.put("NR",new String[][]{{right, "NR"}});// I'm adding these POS tags to doprimitive morphology for character-level
// parsing. It shouldn't affect anythingelse because heads of preterminals are not
// generally queried - GMA
nonTerminalInfo.put("VV",new String[][]{{left}});
nonTerminalInfo.put("VA",new String[][]{{left}});
nonTerminalInfo.put("VC",new String[][]{{left}});
nonTerminalInfo.put("VE",new String[][]{{left}});// new for ctb6.
nonTerminalInfo.put("FLR",new String[][]{rightExceptPunct});// new for CTB9
nonTerminalInfo.put("DFL",new String[][]{rightExceptPunct});
nonTerminalInfo.put("EMO",new String[][]{leftExceptPunct});//left/right doesn't matter
nonTerminalInfo.put("INC",new String[][]{leftExceptPunct});
nonTerminalInfo.put("INTJ",new String[][]{leftExceptPunct});
nonTerminalInfo.put("OTH",new String[][]{leftExceptPunct});
nonTerminalInfo.put("SKIP",new String[][]{leftExceptPunct});
斯坦福stanford coreNLP 宾州树库汉语短语类别表23个相关推荐
- 句法分析语料:宾州树库、UD树库
句法分析语料:宾州树库.UD树库 目录 句法分析语料:宾州树库.UD树库 宾州树库 UD树库
- 中文宾州树库标记含义
来源:http://blog.csdn.net/neutblue/article/details/7375085 1 Part-Of-Speech tags: 33 tags 标记 英语 ...
- 词性标记说明(Penn Treebank Tagset 宾州树库)
转自:http://blog.csdn.net/wskings/article/details/17607021 最近在做命名实体识别,用到Stanford-CoreNlp词性标记,由于不是语言学专业 ...
- 【NLP】Penn Treebank Tagset 宾州树库 词性标记说明
转自:http://blog.csdn.net/wskings/article/details/17607021 最近在做命名实体识别,用到Stanford-CoreNlp词性标记,由于不是语言学专业 ...
- 中文树库-CTB短语结构标记
中文树库-CTB短语结构标记 词类标记-33类 Tag Eecription AD 副词 AS 体态词,体标记 BA "把""将"的词性标记 CC 并列连词,& ...
- 汉语树库/CoNLL格式,依存句法分析语料
转载自码农场,原文链接:http://www.hankcs.com/nlp/corpus/chinese-treebank.html 本文旨在介绍CoNLL格式的中文依存语料库(汉语依存树库).CoN ...
- NLP工具——Stanford CoreNLP的python封装包 处理中文
文章目录 1.StanfordCoreNLP是什么? 2.StanfordNLP是什么? 3.StanfordNLP的使用 3.1 安装 3.2 运行 3.3 如何处理中文? 3.4 demo 4.第 ...
- 独家 | 综述:情感树库上语义组合的递归深层模型
作者:Talha Chafekar翻译:顾伟嵩校对:阿笛本文约1400字,建议阅读5分钟本文探讨了单词和n-grams的不同组合方法,以及如何借助基于树的表示法,以自底向上的方式预测短语或单词的二元或 ...
- stanford corenlp的TokensRegex
最近做一些音乐类.读物类的自然语言理解,就调研使用了下Stanford corenlp,记录下来. 功能 Stanford Corenlp是一套自然语言分析工具集包括: POS(part of spe ...
- 【中文树库标记---CTB】
北大标注集 词性编码 词性名称 注解 词性编码 词性名称 注解 Ag 形语素 形容词语素.形容词代码为a,语素代码为g前面置以A a 形容词 取英语形容词adjective的第1个字母 ad 副形词 ...
最新文章
- html 分页_MySQL——优化嵌套查询和分页查询
- 漫谈边缘计算(三):5G的好拍档
- python和c 的区别-python和C语言的差别
- JS 里的数据类型及几个操作
- 喀秋莎Camtasia Studio微视频录制工具使用指南
- 自制合成孔径雷达(2) SDR实现的对比(SDR实现测速雷达)
- Mysql—— order 和 limit 的用法
- 刷IP工具、刷IP软件的原理和工作过程
- 洛谷P3354 [IOI2005]Riv 河流 题解
- 《回炉重造》——集合(容器)
- 前端启动本地服务的四种方法,看完不会你锤我
- 第一集 DLNA 白話文介紹
- 《微信公众平台开发:从零基础到ThinkPHP5高性能框架实践》——1.2 微信公众账号注册...
- 关于录制短视频点播不能播放问题的总结
- KubeSphere安装redis集群,全程超带劲
- python 字符串方法 replace_python字符串方法replace()简介
- 治疗失眠小妙招:按摩百会穴酸枣仁贴肚脐
- matlab 局部极值点,matlab 图像局部求极值
- 中国联通MEC边缘云架构与部署实践
- [原创] Photoshopt午简单的调出暗青色效果
热门文章
- 转账设计测试用例-----必背
- spring boot 集成paypal支付 rest api v2的实现
- 计算机键盘按键功能说明,电脑键盘各个按键功能分别是什么 电脑键盘各个按键功能介绍...
- 国际化地区语言码对照表(i18n)
- 360软件小助手-壁纸存储路径
- 测试ips显示器的软件,IPS屏幕显示测试
- python报错:expected an indented block
- ubuntu防火墙安装和设置-ufw
- matlab imcrop 对应python函数_MATLAB车牌识别之车牌精准定位浅谈
- jmeter参数化测试-姓名生成