nltk中文分句_如何改进NLTK的分句技术？

Kiss和Strunk(2006)Punkt算法的可怕之处在于它是无监督的。所以给一个新的文本，你应该重新训练这个模型并将它应用到你的文本中，例如>>> from nltk.tokenize.punkt import PunktSentenceTokenizer, PunktParameters

>>> text = "An ambitious campus expansion plan was proposed by Fr. Vernon F. Gallagher in 1952. Assumption Hall, the first student dormitory, was opened in 1954, and Rockwell Hall was dedicated in November 1958, housing the schools of business and law. It was during the tenure of F. Henry J. McAnulty that Fr. Gallagher's ambitious plans were put to action."

# Training a new model with the text.

>>> tokenizer = PunktSentenceTokenizer()

>>> tokenizer.train(text)

# It automatically learns the abbreviations.

>>> tokenizer._params.abbrev_types

{'f', 'fr', 'j'}

# Use the customized tokenizer.

>>> tokenizer.tokenize(text)

['An ambitious campus expansion plan was proposed by Fr. Vernon F. Gallagher in 1952.', 'Assumption Hall, the first student dormitory, was opened in 1954, and Rockwell Hall was dedicated in November 1958, housing the schools of business and law.', "It was during the tenure of F. Henry J. McAnulty that Fr. Gallagher's ambitious plans were put to action."]

nltk中文分句_如何改进NLTK的分句技术？相关推荐

python自动生成中文句子_如何使用NLTK从归纳语法生成句子？
在NLTK 2.0中,可以使用nltk.parse.generate生成all可能的sentences for a given grammar. 这段代码定义了一个函数,它应该基于(p)CFG中的产生 ...
nltk中文分句_如何使用nltk进行中文分词？
匿名用户 1级 2016-04-18 回答中文里没有分词的说法,只有英文.法语.德语等有分词的说法. 分词是动词的三种非限定形式之一,分为两种:现在分词和过去分词.现在分词一般有四种形式,基本形式为 ...
mybatis generator 中文注释_[SpringBoot2.X] 23- 整合持久层技术 -MyBatis - 配置
整合MyBatis 1搭建项目环境 1.1.1创建项目--略 11.2修改POM文件,添加相关依赖 <dependency><groupId>org.springframewo ...
nltk中文分句_利用NLTK进行分句分词
1.输入一个段落,分成句子(Punkt句子分割器) import nltk import nltk.data def splitSentence(paragraph): tokenizer = nlt ...
python 英语分词_自然语言处理 | NLTK英文分词尝试
NLTK是一个高效的Python构建的平台,用来处理自然语言数据,它提供了易于使用的接口,通过这些接口可以访问超过50个语料库和词汇资源(如WordNet),还有一套用于分类.标记化.词干标记.解析和 ...
python 英语分词_基于Python NLTK库进行英文文本预处理
文本预处理是要文本处理成计算机能识别的格式,是文本分类.文本可视化.文本分析等研究的重要步骤.具体流程包括文本分词.去除停用词.词干抽取(词形还原).文本向量表征.特征选择等步骤,以消除脏数据对挖掘分 ...
jieba入门记录——nltk中文语料处理
jieba入门记录--nltk中文语料处理环境:pycharm(Anaconda) 1.官网下载jieba:https://pypi.org/project/jieba/ 2.将下载好的压缩包解压到 ...
python训练自己中文语料库_自然语言处理——NLTK中文语料库语料库
Python NLTK库中包含着大量的语料库,但是大部分都是英文,不过有一个Sinica(中央研究院)提供的繁体中文语料库,值得我们注意. 在使用这个语料库之前,我们首先要检查一下是否已经安装了这个语 ...
python随机生成中文句子_python – 如何使用NLTK从诱导语法中生成句子？
在NLTK 2.0中,您可以使用nltk.parse.generate生成所有可能的 sentences for a given grammar. 该代码定义了一个基于(P)CFG中的生产规则生成单个 ...

nltk中文分句_如何改进NLTK的分句技术？

nltk中文分句_如何改进NLTK的分句技术？相关推荐

最新文章

热门文章