NLPIR/ICTCLAS 汉语分词系统(http://ictclas.nlpir.org)
PyNLPIR 是该汉语分词系统的 python 封装版(http://pynlpir.readthedocs.io...)

安装步骤:
① pip install pynlpir
② pynlpir update

官方文档的汉语分词示例:

import pynlpir
pynlpir.open()str = '欢迎科研人员、技术工程师、企事业单位与个人参与 NLPIR 平台的建设工作。'
result = pynlpir.segment(str)print(result)# output: [('欢迎', 'verb'), ('科研', 'noun'), ('人员', 'noun'), ('、', 'punctuation mark'), ('技术', 'noun'), ('工程师', 'noun'), ('、', 'punctuation mark'), ('企事业', 'noun'), ('单位', 'noun'), ('与', 'conjunction'), ('个人', 'noun'), ('参与', 'verb'), ('NLPIR', 'noun'), ('平台', 'noun'), ('的', 'particle'), ('建设', 'verb'), ('工作', 'verb'), ('。', 'punctuation mark')]

可能遇到的问题:
① raise RuntimeError("NLPIR function 'NLPIR_Init' failed.")

解决方案:
访问 https://github.com/NLPIR-team... 仓库,
下载 license 例如 NLPIR-ICTCLAS 分词系统授权中的 NLPIR.user 文件,
替换路径 path_to_local_python/Lib/site-packages/pynlpir/Data 下的同名文件以更新授权。

中文停用词表:

["啊","阿","哎","哎呀","哎哟","唉","俺","俺们","按","按照","吧","吧哒","把","罢了","被","本","本着","比","比方","比如","鄙人","彼","彼此","边","别","别的","别说","并","并且","不比","不成","不单","不但","不独","不管","不光","不过","不仅","不拘","不论","不怕","不然","不如","不特","不惟","不问","不只","朝","朝着","趁","趁着","乘","冲","除","除此之外","除非","除了","此","此间","此外","从","从而","打","待","但","但是","当","当着","到","得","的","的话","等","等等","地","第","叮咚","对","对于","多","多少","而","而况","而且","而是","而外","而言","而已","尔后","反过来","反过来说","反之","非但","非徒","否则","嘎","嘎登","该","赶","个","各","各个","各位","各种","各自","给","根据","跟","故","故此","固然","关于","管","归","果然","果真","过","哈","哈哈","呵","和","何","何处","何况","何时","嘿","哼","哼唷","呼哧","乎","哗","还是","还有","换句话说","换言之","或","或是","或者","极了","及","及其","及至","即","即便","即或","即令","即若","即使","几","几时","己","既","既然","既是","继而","加之","假如","假若","假使","鉴于","将","较","较之","叫","接着","结果","借","紧接着","进而","尽","尽管","经","经过","就","就是","就是说","据","具体地说","具体说来","开始","开外","靠","咳","可","可见","可是","可以","况且","啦","来","来着","离","例如","哩","连","连同","两者","了","临","另","另外","另一方面","论","嘛","吗","慢说","漫说","冒","么","每","每当","们","莫若","某","某个","某些","拿","哪","哪边","哪儿","哪个","哪里","哪年","哪怕","哪天","哪些","哪样","那","那边","那儿","那个","那会儿","那里","那么","那么些","那么样","那时","那些","那样","乃","乃至","呢","能","你","你们","您","宁","宁可","宁肯","宁愿","哦","呕","啪达","旁人","呸","凭","凭借","其","其次","其二","其他","其它","其一","其余","其中","起","起见","岂但","恰恰相反","前后","前者","且","然而","然后","然则","让","人家","任","任何","任凭","如","如此","如果","如何","如其","如若","如上所述","若","若非","若是","啥","上下","尚且","设若","设使","甚而","甚么","甚至","省得","时候","什么","什么样","使得","是","是的","首先","谁","谁知","顺","顺着","似的","虽","虽然","虽说","虽则","随","随着","所","所以","他","他们","他人","它","它们","她","她们","倘","倘或","倘然","倘若","倘使","腾","替","通过","同","同时","哇","万一","往","望","为","为何","为了","为什么","为着","喂","嗡嗡","我","我们","呜","呜呼","乌乎","无论","无宁","毋宁","嘻","吓","相对而言","像","向","向着","嘘","呀","焉","沿","沿着","要","要不","要不然","要不是","要么","要是","也","也罢","也好","一","一般","一旦","一方面","一来","一切","一样","一则","依","依照","矣","以","以便","以及","以免","以至","以至于","以致","抑或","因","因此","因而","因为","哟","用","由","由此可见","由于","有","有的","有关","有些","又","于","于是","于是乎","与","与此同时","与否","与其","越是","云云","哉","再说","再者","在","在下","咱","咱们","则","怎","怎么","怎么办","怎么样","怎样","咋","照","照着","者","这","这边","这儿","这个","这会儿","这就是说","这里","这么","这么点儿","这么些","这么样","这时","这些","这样","正如","吱","之","之类","之所以","之一","只是","只限","只要","只有","至","至于","诸位","着","着呢","自","自从","自个儿","自各儿","自己","自家","自身","综上所述","总的来看","总的来说","总的说来","总而言之","总之","纵","纵令","纵然","纵使","遵照","作为","兮","呃","呗","咚","咦","喏","啐","喔唷","嗬","嗯","嗳","啊哈","啊呀","啊哟","挨次","挨个","挨家挨户","挨门挨户","挨门逐户","挨着","按理","按期","按时","按说","暗地里","暗中","暗自","昂然","八成","白白","半","梆","保管","保险","饱","背地里","背靠背","倍感","倍加","本人","本身","甭","比起","比如说","比照","毕竟","必","必定","必将","必须","便","别人","并非","并肩","并没","并没有","并排","并无","勃然","不","不必","不常","不大","不得","不得不","不得了","不得已","不迭","不定","不对","不妨","不管怎样","不会","不仅仅","不仅仅是","不经意","不可开交","不可抗拒","不力","不了","不料","不满","不免","不能不","不起","不巧","不然的话","不日","不少","不胜","不时","不是","不同","不能","不要","不外","不外乎","不下","不限","不消","不已","不亦乐乎","不由得","不再","不择手段","不怎么","不曾","不知不觉","不止","不止一次","不至于","才","才能","策略地","差不多","差一点","常","常常","常言道","常言说","常言说得好","长此下去","长话短说","长期以来","长线","敞开儿","彻夜","陈年","趁便","趁机","趁热","趁势","趁早","成年","成年累月","成心","乘机","乘胜","乘势","乘隙","乘虚","诚然","迟早","充分","充其极","充其量","抽冷子","臭","初","出","出来","出去","除此","除此而外","除此以外","除开","除去","除却","除外","处处","川流不息","传","传说","传闻","串行","纯","纯粹","此后","此中","次第","匆匆","从不","从此","从此以后","从古到今","从古至今","从今以后","从宽","从来","从轻","从速","从头","从未","从无到有","从小","从新","从严","从优","从早到晚","从中","从重","凑巧","粗","存心","达旦","打从","打开天窗说亮话","大","大不了","大大","大抵","大都","大多","大凡","大概","大家","大举","大略","大面儿上","大事","大体","大体上","大约","大张旗鼓","大致","呆呆地","带","殆","待到","单","单纯","单单","但愿","弹指之间","当场","当儿","当即","当口儿","当然","当庭","当头","当下","当真","当中","倒不如","倒不如说","倒是","到处","到底","到了儿","到目前为止","到头","到头来","得起","得天独厚","的确","等到","叮当","顶多","定","动不动","动辄","陡然","都","独","独自","断然","顿时","多次","多多","多多少少","多多益善","多亏","多年来","多年前","而后","而论","而又","尔等","二话不说","二话没说","反倒","反倒是","反而","反手","反之亦然","反之则","方","方才","方能","放量","非常","非得","分期","分期分批","分头","奋勇","愤然","风雨无阻","逢","弗","甫","嘎嘎","该当","概","赶快","赶早不赶晚","敢","敢情","敢于","刚","刚才","刚好","刚巧","高低","格外","隔日","隔夜","个人","各式","更","更加","更进一步","更为","公然","共","共总","够瞧的","姑且","古来","故而","故意","固","怪","怪不得","惯常","光","光是","归根到底","归根结底","过于","毫不","毫无","毫无保留地","毫无例外","好在","何必","何尝","何妨","何苦","何乐而不为","何须","何止","很","很多","很少","轰然","后来","呼啦","忽地","忽然","互","互相","哗啦","话说","还","恍然","会","豁然","活","伙同","或多或少","或许","基本","基本上","基于","极","极大","极度","极端","极力","极其","极为","急匆匆","即将","即刻","即是说","几度","几番","几乎","几经","既...又","继之","加上","加以","间或","简而言之","简言之","简直","见","将才","将近","将要","交口","较比","较为","接连不断","接下来","皆可","截然","截至","藉以","借此","借以","届时","仅","仅仅","谨","进来","进去","近","近几年来","近来","近年来","尽管如此","尽可能","尽快","尽量","尽然","尽如人意","尽心竭力","尽心尽力","尽早","精光","经常","竟","竟然","究竟","就此","就地","就算","居然","局外","举凡","据称","据此","据实","据说","据我所知","据悉","具体来说","决不","决非","绝","绝不","绝顶","绝对","绝非","均","喀","看","看来","看起来","看上去","看样子","可好","可能","恐怕","快","快要","来不及","来得及","来讲","来看","拦腰","牢牢","老","老大","老老实实","老是","累次","累年","理当","理该","理应","历","立","立地","立刻","立马","立时","联袂","连连","连日","连日来","连声","连袂","临到","另方面","另行","另一个","路经","屡","屡次","屡次三番","屡屡","缕缕","率尔","率然","略","略加","略微","略为","论说","马上","蛮","满","没","没有","每逢","每每","每时每刻","猛然","猛然间","莫","莫不","莫非","莫如","默默地","默然","呐","那末","奈","难道","难得","难怪","难说","内","年复一年","凝神","偶而","偶尔","怕","砰","碰巧","譬如","偏偏","乒","平素","颇","迫于","扑通","其后","其实","奇","齐","起初","起来","起首","起头","起先","岂","岂非","岂止","迄","恰逢","恰好","恰恰","恰巧","恰如","恰似","千","万","千万","千万千万","切","切不可","切莫","切切","切勿","窃","亲口","亲身","亲手","亲眼","亲自","顷","顷刻","顷刻间","顷刻之间","请勿","穷年累月","取道","去","权时","全都","全力","全年","全然","全身心","然","人人","仍","仍旧","仍然","日复一日","日见","日渐","日益","日臻","如常","如此等等","如次","如今","如期","如前所述","如上","如下","汝","三番两次","三番五次","三天两头","瑟瑟","沙沙","上","上来","上去","一.","一一","一下","一个","一些","一何","一则通过","一天","一定","一时","一次","一片","一番","一直","一致","一起","一转眼","一边","一面","上升","上述","上面","下","下列","下去","下来","下面","不一","不久","不变","不可","不够","不尽","不尽然","不敢","不断","不若","不足","与其说","专门","且不说","且说","严格","严重","个别","中小","中间","丰富","为主","为什麽","为止","为此","主张","主要","举行","乃至于","之前","之后","之後","也就是说","也是","了解","争取","二来","云尔","些","亦","产生","人","人们","什麽","今","今后","今天","今年","今後","介于","从事","他是","他的","代替","以上","以下","以为","以前","以后","以外","以後","以故","以期","以来","任务","企图","伟大","似乎","但凡","何以","余外","你是","你的","使","使用","依据","依靠","便于","促进","保持","做到","傥然","儿","允许","元/吨","先不先","先后","先後","先生","全体","全部","全面","共同","具体","具有","兼之","再","再其次","再则","再有","再次","再者说","决定","准备","凡","凡是","出于","出现","分别","则甚","别处","别是","别管","前此","前进","前面","加入","加强","十分","即如","却","却不","原来","又及","及时","双方","反应","反映","取得","受到","变成","另悉","只","只当","只怕","只消","叫做","召开","各人","各地","各级","合理","同一","同样","后","后者","后面","向使","周围","呵呵","咧","唯有","啷当","喽","嗡","嘿嘿","因了","因着","在于","坚决","坚持","处在","处理","复杂","多么","多数","大力","大多数","大批","大量","失去","她是","她的","好","好的","好象","如同","如是","始而","存在","孰料","孰知","它们的","它是","它的","安全","完全","完成","实现","实际","宣布","容易","密切","对应","对待","对方","对比","小","少数","尔","尔尔","尤其","就是了","就要","属于","左右","巨大","巩固","已","已矣","已经","巴","巴巴","帮助","并不","并不是","广大","广泛","应当","应用","应该","庶乎","庶几","开展","引起","强烈","强调","归齐","当前","当地","当时","形成","彻底","彼时","往往","後来","後面","得了","得出","得到","心里","必然","必要","怎奈","怎麽","总是","总结","您们","您是","惟其","意思","愿意","成为","我是","我的","或则","或曰","战斗","所在","所幸","所有","所谓","扩大","掌握","接著","数/","整个","方便","方面","无","无法","既往","明显","明确","是不是","是以","是否","显然","显著","普通","普遍","曾","曾经","替代","最","最后","最大","最好","最後","最近","最高","有利","有力","有及","有所","有效","有时","有点","有的是","有着","有著","末##末","本地","来自","来说","构成","某某","根本","欢迎","欤","正值","正在","正巧","正常","正是","此地","此处","此时","此次","每个","每天","每年","比及","比较","没奈何","注意","深入","清楚","满足","然後","特别是","特殊","特点","犹且","犹自","现代","现在","甚且","甚或","甚至于","用来","由是","由此","目前","直到","直接","相似","相信","相反","相同","相对","相应","相当","相等","看出","看到","看看","看见","真是","真正","眨眼","矣乎","矣哉","知道","确定","种","积极","移动","突出","突然","立即","竟而","第二","类如","练习","组成","结合","继后","继续","维持","考虑","联系","能否","能够","自后","自打","至今","至若","致","般的","良好","若夫","若果","范围","莫不然","获得","行为","行动","表明","表示","要求","规定","觉得","譬喻","认为","认真","认识","许多","设或","诚如","说明","说来","说说","诸","诸如","谁人","谁料","贼死","赖以","距","转动","转变","转贴","达到","迅速","过去","过来","运用","还要","这一来","这次","这点","这种","这般","这麽","进入","进步","进行","适应","适当","适用","逐步","逐渐","通常","造成","遇到","遭到","遵循","避免","那般","那麽","部分","采取","里面","重大","重新","重要","针对","问题","防止","附近","限制","随后","随时","随著","难道说","集中","需要","非特","非独","高兴","若果 "]

自然语言处理:汉语分词相关推荐

  1. 自然语言处理简介(1)---- 服务梳理与传统汉语分词

    文章大纲 1.Nlp技术体系简介 1.1 基础技术 1.2 Nlp 核心技术 1.3 NlP+(高端技术) 2.知名NLP 服务系统简介 2.1汉语分词系统ICTCLAS 2.2 哈工大语言云(Lan ...

  2. 《自然语言处理实战入门》 ---- 第4课 :中文分词原理及相关组件简介 之 汉语分词领域主要分词算法、组件、服务(上)...

    目录 0.内容梗概 1. 基于传统统计算法的分词组件 1.1 hanlp : Han Language Processing 1.2 语言技术平台(Language Technology Platfo ...

  3. 《自然语言处理实战入门》第三章 :中文分词原理及相关组件简介 ---- 汉语分词领域主要分词算法、组件、服务(上)

    文章大纲 0.内容梗概 1. 基于传统统计算法的分词组件 1.1 hanlp : Han Language Processing 1.1.1 pyhanlp 安装 1.1.2 功能及分词样例 1.1. ...

  4. 灵玖Nlpir Parser语义智能系统精准汉语分词

    词是最小的能够独立活动的有意义的语言成分.在汉语中,由于词与词之间不存在分隔符,词本身也缺乏明显的形态标记,因此汉语浅层分析的特有问题就是如何将汉语的字串分割为合理的词语序列. 实际上,汉语分词的主要 ...

  5. matlab分词NLP,自然语言处理NLP分词篇

    自然语言处理NLP分词篇 自然语言处理NLP[分词篇] NLP简介和三种分词模型 NLP逐渐成为人工智能一大热点研究方向,目前国外对英文分词的研究比较深入,而中文分词发展较缓.它需要联系上下文.作者背 ...

  6. 中文开源汉语分词工具

    本文转载自:http://www.scholat.com/vpost.html?pid=4477 由于中文文本词与词之间没有像英文那样有空格分隔,因此很多时候中文文本操作都涉及切词,这里整理了一些中文 ...

  7. 计算语言学之汉语分词

    1. 汉语分词定义 世界上语言种类我们之前提过,像英语一样的屈折语在词与词之间是使用空格隔开的,像日语这种黏着语和汉语这种孤立语,则并没有使用空格隔开,这也意味着,汉语和日语这种更需要在语义上理解其句 ...

  8. 自然语言处理实验—分词算法(含python代码及详细例子讲解)

    自然语言处理实验-分词算法 最近在学自然语言处理,这是第一个上机实验自然语言处理的分词算法,也是自然语言处理比较入门的算法.和大家分享一下. 首先,自然语言处理,英文是(Nature Language ...

  9. python汉语分词,python汉语分词的简单示例

    对python这个高级语言感兴趣的小伙伴,下面一起跟随编程之家 jb51.cc的小编两巴掌来看看吧! 目前我常常使用的分词有结巴分词.NLPIR分词等等 最近是在使用结巴分词,稍微做一下推荐,还是蛮好 ...

最新文章

  1. python3安卓版下载-QPython3H安卓运行Python神器
  2. codeforces 112A-C语言解题报告
  3. python中代理模式分为几种_通俗 Python 设计模式——代理模式
  4. P1959 遗址_NOI导刊2009普及(6)
  5. effective c++:virtual函数在构造函数和析构函数中的注意事项
  6. 提升代码格调——JavaScript 数组的 reduce() 方法入门
  7. php画图抗锯齿,​CSS3如何实现字体抗锯齿渲染效果?-webkit-font-smoothing属性(实例)...
  8. 去中介化的租房EOS DAPP,实现租客与房东互赢
  9. 微信小程序 黑色背景 页面跳转闪屏
  10. 原来姹紫嫣红开遍 -- 牡丹亭·游园惊梦
  11. 【Chrome扩展程序】content_script 的跨域问题
  12. 电商直播方案主要有哪些内容?
  13. APS系统的现状以及与MES系统的关联
  14. android获取键盘状态,Android获取屏幕方向及键盘状态的小例子
  15. 客户机加入域环境的前提条件
  16. 面向接口编程思想(转)
  17. XDOJ_37 排序2
  18. redis-benchmark对redis进行性能测试
  19. sd卡数据恢复:sd卡损坏这样修复数据
  20. 【软考——系统架构师】架构、系分、软设的区别和联系

热门文章

  1. java中最常用jar包的用途说明,适合初学者
  2. javaWeb -- HTTP协议
  3. 【C++】浅析析构函数(基类中)为什么要写成虚基类?
  4. C++实现十大排序算法(冒泡,选择,插入,归并,快速,堆,希尔,桶,计数,基数)排序算法时间复杂度、空间复杂度、稳定性比较(面试经验总结)
  5. ASP.NET 2.0中GRIDVIEW排序
  6. 由MessageBox和AfxMessageBox的使用异同所感
  7. 网络数据包分析软件Wireshark简介
  8. 常用排序算法的C++实现
  9. OpenCV像素点处理
  10. 【FFmpeg】警告:[mpegts] H.264 bitstream error, startcode missing, size 0