sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频教程)

https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

https://www.pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/?completed=/stemming-nltk-tutorial/

# -*- coding: utf-8 -*-
"""
Created on Sun Nov 13 09:14:13 2016@author: daxiong
"""
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer#训练数据
train_text=state_union.raw("2005-GWBush.txt")
#测试数据
sample_text=state_union.raw("2006-GWBush.txt")
'''Punkt is designed to learn parameters (a list of abbreviations, etc.) unsupervised from a corpus similar to the target domain. The pre-packaged models may therefore be unsuitable: use PunktSentenceTokenizer(text) to learn parameters from the given text
'''
#我们现在训练punkttokenizer(分句器)
custom_sent_tokenizer=PunktSentenceTokenizer(train_text)
#训练后,我们可以使用punkttokenizer(分句器)
tokenized=custom_sent_tokenizer.tokenize(sample_text)'''
nltk.pos_tag(["fire"]) #pos_tag(列表)
Out[19]: [('fire', 'NN')]
'''#文本词性标记函数
def process_content():try:for i in tokenized[0:5]:words=nltk.word_tokenize(i)tagged=nltk.pos_tag(words)print(tagged)except Exception as e:print(str(e))process_content()

One of the more powerful aspects of the NLTK module is the Part of Speech tagging that it can do for you. This means labeling words in a sentence as nouns, adjectives, verbs...etc. Even more impressive, it also labels by tense, and more. Here's a list of the tags, what they mean, and some examples:

POS tag list:CC coordinating conjunction
CD  cardinal digit
DT  determiner
EX  existential there (like: "there is" ... think of it like "there exists") FW foreign word IN preposition/subordinating conjunction JJ adjective 'big' JJR adjective, comparative 'bigger' JJS adjective, superlative 'biggest' LS list marker 1) MD modal could, will NN noun, singular 'desk' NNS noun plural 'desks' NNP proper noun, singular 'Harrison' NNPS proper noun, plural 'Americans' PDT predeterminer 'all the kids' POS possessive ending parent's PRP personal pronoun I, he, she PRP$ possessive pronoun my, his, hers RB adverb very, silently, RBR adverb, comparative better RBS adverb, superlative best RP particle give up TO to go 'to' the store. UH interjection errrrrrrrm VB verb, base form take VBD verb, past tense took VBG verb, gerund/present participle taking VBN verb, past participle taken VBP verb, sing. present, non-3d take VBZ verb, 3rd person sing. present takes WDT wh-determiner which WP wh-pronoun who, what WP$ possessive wh-pronoun whose WRB wh-abverb where, when

How might we use this? While we're at it, we're going to cover a new sentence tokenizer, called the PunktSentenceTokenizer. This tokenizer is capable of unsupervised machine learning, so you can actually train it on any body of text that you use. First, let's get some imports out of the way that we're going to use:

import nltk
from nltk.corpus import state_union from nltk.tokenize import PunktSentenceTokenizer

Now, let's create our training and testing data:

train_text = state_union.raw("2005-GWBush.txt") sample_text = state_union.raw("2006-GWBush.txt")

One is a State of the Union address from 2005, and the other is from 2006 from past President George W. Bush.

Next, we can train the Punkt tokenizer like:

custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

Then we can actually tokenize, using:

tokenized = custom_sent_tokenizer.tokenize(sample_text)

Now we can finish up this part of speech tagging script by creating a function that will run through and tag all of the parts of speech per sentence like so:

def process_content():try: for i in tokenized[:5]: words = nltk.word_tokenize(i) tagged = nltk.pos_tag(words) print(tagged) except Exception as e: print(str(e)) process_content()

The output should be a list of tuples, where the first element in the tuple is the word, and the second is the part of speech tag. It should look like:

[('PRESIDENT', 'NNP'), ('GEORGE', 'NNP'), ('W.', 'NNP'), ('BUSH', 'NNP'), ("'S", 'POS'), ('ADDRESS', 'NNP'), ('BEFORE', 'NNP'), ('A', 'NNP'), ('JOINT', 'NNP'), ('SESSION', 'NNP'), ('OF', 'NNP'), ('THE', 'NNP'), ('CONGRESS', 'NNP'), ('ON', 'NNP'), ('THE', 'NNP'), ('STATE', 'NNP'), ('OF', 'NNP'), ('THE', 'NNP'), ('UNION', 'NNP'), ('January', 'NNP'), ('31', 'CD'), (',', ','), ('2006', 'CD'), ('THE', 'DT'), ('PRESIDENT', 'NNP'), (':', ':'), ('Thank', 'NNP'), ('you', 'PRP'), ('all', 'DT'), ('.', '.')] [('Mr.', 'NNP'), ('Speaker', 'NNP'), (',', ','), ('Vice', 'NNP'), ('President', 'NNP'), ('Cheney', 'NNP'), (',', ','), ('members', 'NNS'), ('of', 'IN'), ('Congress', 'NNP'), (',', ','), ('members', 'NNS'), ('of', 'IN'), ('the', 'DT'), ('Supreme', 'NNP'), ('Court', 'NNP'), ('and', 'CC'), ('diplomatic', 'JJ'), ('corps', 'NNS'), (',', ','), ('distinguished', 'VBD'), ('guests', 'NNS'), (',', ','), ('and', 'CC'), ('fellow', 'JJ'), ('citizens', 'NNS'), (':', ':'), ('Today', 'NN'), ('our', 'PRP$'), ('nation', 'NN'), ('lost', 'VBD'), ('a', 'DT'), ('beloved', 'VBN'), (',', ','), ('graceful', 'JJ'), (',', ','), ('courageous', 'JJ'), ('woman', 'NN'), ('who', 'WP'), ('called', 'VBN'), ('America', 'NNP'), ('to', 'TO'), ('its', 'PRP$'), ('founding', 'NN'), ('ideals', 'NNS'), ('and', 'CC'), ('carried', 'VBD'), ('on', 'IN'), ('a', 'DT'), ('noble', 'JJ'), ('dream', 'NN'), ('.', '.')] [('Tonight', 'NNP'), ('we', 'PRP'), ('are', 'VBP'), ('comforted', 'VBN'), ('by', 'IN'), ('the', 'DT'), ('hope', 'NN'), ('of', 'IN'), ('a', 'DT'), ('glad', 'NN'), ('reunion', 'NN'), ('with', 'IN'), ('the', 'DT'), ('husband', 'NN'), ('who', 'WP'), ('was', 'VBD'), ('taken', 'VBN'), ('so', 'RB'), ('long', 'RB'), ('ago', 'RB'), (',', ','), ('and', 'CC'), ('we', 'PRP'), ('are', 'VBP'), ('grateful', 'JJ'), ('for', 'IN'), ('the', 'DT'), ('good', 'NN'), ('life', 'NN'), ('of', 'IN'), ('Coretta', 'NNP'), ('Scott', 'NNP'), ('King', 'NNP'), ('.', '.')] [('(', 'NN'), ('Applause', 'NNP'), ('.', '.'), (')', ':')] [('President', 'NNP'), ('George', 'NNP'), ('W.', 'NNP'), ('Bush', 'NNP'), ('reacts', 'VBZ'), ('to', 'TO'), ('applause', 'VB'), ('during', 'IN'), ('his', 'PRP$'), ('State', 'NNP'), ('of', 'IN'), ('the', 'DT'), ('Union', 'NNP'), ('Address', 'NNP'), ('at', 'IN'), ('the', 'DT'), ('Capitol', 'NNP'), (',', ','), ('Tuesday', 'NNP'), (',', ','), ('Jan', 'NNP'), ('.', '.')]

At this point, we can begin to derive meaning, but there is still some work to do. The next topic that we're going to cover is chunking, which is where we group words, based on their parts of speech, into hopefully meaningful groups.

python风控评分卡建模和风控常识

https://study.163.com/course/introduction.htm?courseId=1005214003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

转载于:https://www.cnblogs.com/webRobot/p/6080069.html

自然语言15_Part of Speech Tagging with NLTK相关推荐

  1. 自然语言12_Tokenizing Words and Sentences with NLTK

    sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId=1005269003&am ...

  2. 自然语言处理(四): Part of Speech Tagging

    目录 1. What is Part of Speech (POS)? 词性是什么 2. Information Extraction 信息提取 2.1 POS Open Classes 2.2 PO ...

  3. python和nltk自然语言处理 pdf_NLTK基础教程:用NLTK和Python库构建机器学习应用 完整版pdf...

    本书主要介绍如何通过NLTK库与一些Python库的结合从而实现复杂的NLP任务和机器学习应用.全书共分为10章.第1章对NLP进行了简单介绍.第2章.第3章和第4章主要介绍一些通用的预处理技术.专属 ...

  4. Lecture 5 Part of Speech Tagging

    目录 POS application: Information Extraction 词性应用:信息提取 POS Open Class 开放类词性 Problem of word classes: A ...

  5. 第四篇:Part of Speech Tagging 词性标注

    词性也就是单词类别,形态类别,句法类别 名词,动词,形容词等. POS告诉了我们单词和他的邻居的一些信息,简单举例: 名词前常有限定词 名词前有动词 content 作为名词,发音为 CONtent ...

  6. 自然语言14_Stemming words with NLTK

    sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId=1005269003&am ...

  7. python自然语言处理工具NLTK各个包的意思和作用总结

    [转]http://www.myexception.cn/perl-python/464414.html [原]Python NLP实战之一:环境准备 最近正在学习Python,看了几本关于Pytho ...

  8. python自然语言处理 分词_Python 自然语言处理(基于jieba分词和NLTK)

    Python 自然语言处理(基于jieba分词和NLTK) 发布时间:2018-05-11 11:39, 浏览次数:1038 , 标签: Python jieba NLTK ----------欢迎加 ...

  9. 「自然语言处理」如何快速理解?有这篇文章就够了!

    原文来源:codeburst.io 作者:Pramod Chandrayan 「雷克世界」编译:嗯~阿童木呀.我是卡布达 现如今,在更多情况下,我们是以比特和字节为生,而不是依靠交换情感.我们使用一种 ...

最新文章

  1. 树莓派4装Ubuntu
  2. 【译】Go语言声明语法
  3. 将vim打造成IDE编程环境
  4. boost::integer::extended_euclidean用法的测试程序
  5. HDU 4870 Rating 高斯消元法
  6. oracle改成归档模式_oracle 11g开启归档模式及修改归档目录
  7. nginx php空白页 fastcgi_param
  8. 小心Lombok用法中的坑
  9. linux管理防火墙开放端口
  10. Linux 命令(75)—— uptime 命令
  11. oracle-pl/sql之二
  12. DSP应用技术(第一章)
  13. c++程序查重系统设计思路
  14. 一个微信投票小程序防止刷票的想法
  15. matlab求解常微分方程组——dsolve与ode45
  16. 把Excel转换成word文档有什么简单的方法
  17. 云计算被指变相占土地 专家称去伪存真
  18. 一个文字类RPG游戏框架(走过路过别错过)C++
  19. 品酒论三国之一(帅才的典型特征)
  20. mysql CONCAT和DATE_ADD函数的使用

热门文章

  1. 调度算法为何被阿里如此重视?
  2. java 参数传递_java中方法的参数传递机制
  3. 认识 lib 目录里的 .so 文件
  4. 一种基于游戏引擎的AR模式探讨(下)
  5. 零基础学Python(第十三章 元组)
  6. MySQL sysdate()函数 不走索引的问题
  7. 有不同列数的两个表的UNION
  8. Python学习教程:0基础学Python?手把手教你从变量和赋值语句学
  9. 爬取及分析天猫商城冈本评论(二)数据处理
  10. BZOJ 1305 dance跳舞(最大流+二分答案)