sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频教程)

https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

https://www.pythonprogramming.net/stemming-nltk-tutorial/?completed=/stop-words-nltk-tutorial/

# -*- coding: utf-8 -*-
"""
Created on Sun Nov 13 09:14:13 2016@author: daxiong
"""
from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize,word_tokenize
#生成波特词干算法实例
ps=PorterStemmer()
'''
ps.stem('emancipation')
Out[17]: 'emancip'ps.stem('love')
Out[18]: 'love'ps.stem('loved')
Out[19]: 'love'ps.stem('loving')
Out[20]: 'love''''example_words=['python','pythoner','pythoning','pythoned','pythonly']
example_text="Five score years ago, a great American, in whose symbolic shadow we stand today, signed the Emancipation Proclamation. This momentous decree came as a great beacon light of hope to millions of Negro slaves who had been seared in the flames of withering injustice. It came as a joyous daybreak to end the long night of bad captivity."
#分句
list_sentences=sent_tokenize(example_text)
#分词
list_words=word_tokenize(example_text)
#词干提取
list_stemWords=[ps.stem(w) for w in example_words]
''' ['python', 'python', 'python', 'python', 'pythonli']'''list_stemWords1=[ps.stem(w) for w in list_words]

Stemming words with NLTK

The idea of stemming is a sort of normalizing method. Many variations of words carry the same meaning, other than when tense is involved.

The reason why we stem is to shorten the lookup, and normalize sentences.

Consider:

I was taking a ride in the car.
I was riding in the car.

This sentence means the same thing. in the car is the same. I was is the same. the ing denotes a clear past-tense in both cases, so is it truly necessary to differentiate between ride and riding, in the case of just trying to figure out the meaning of what this past-tense activity was?

No, not really.

This is just one minor example, but imagine every word in the English language, every possible tense and affix you can put on a word. Having individual dictionary entries per version would be highly redundant and inefficient, especially since, once we convert to numbers, the "value" is going to be identical.

One of the most popular stemming algorithms is the Porter stemmer, which has been around since 1979.

First, we're going to grab and define our stemmer:

from nltk.stem import PorterStemmer from nltk.tokenize import sent_tokenize, word_tokenize ps = PorterStemmer()

Now, let's choose some words with a similar stem, like:

example_words = ["python","pythoner","pythoning","pythoned","pythonly"]

Next, we can easily stem by doing something like:

for w in example_words: print(ps.stem(w))

Our output:

python
python
python
python
pythonli

Now let's try stemming a typical sentence, rather than some words:

new_text = "It is important to by very pythonly while you are pythoning with python. All pythoners have pythoned poorly at least once."
words = word_tokenize(new_text) for w in words: print(ps.stem(w))

Now our result is:

It
is
import to by veri pythonli while you are python with python . All python have python poorli at least onc .

Next up, we're going to discuss something a bit more advanced from the NLTK module, Part of Speech tagging, where we can use the NLTK module to identify the parts of speech for each word in a sentence.

python风控评分卡建模和风控常识

https://study.163.com/course/introduction.htm?courseId=1005214003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

转载于:https://www.cnblogs.com/webRobot/p/6079947.html

自然语言14_Stemming words with NLTK相关推荐

  1. python nltk book_自然语言处理(1)之NLTK与PYTHON

    自然语言处理(1)之NLTK与PYTHON 题记: 由于现在的项目是搜索引擎,所以不由的对自然语言处理产生了好奇,再加上一直以来都想学Python,只是没有机会与时间.碰巧这几天在亚马逊上找书时发现了 ...

  2. r与python自然语言处理_Python自然语言处理实践: 在NLTK中使用斯坦福中文分词器 | 我爱自然语言处理...

    斯坦福大学自然语言处理组是世界知名的NLP研究小组,他们提供了一系列开源的Java文本分析工具,包括分词器(Word Segmenter),词性标注工具(Part-Of-Speech Tagger), ...

  3. 机器学习自然语言处理之英文NLTK(代码+原理)

    目录 什么是自然语言处理? 常用的自然语言处理技术 NLTK简介 NLTK的功能 分词 过滤掉停用词 词汇规范化(Lexicon Normalization) 1)词形还原(lemmatization ...

  4. python自然语言处理书籍推荐-自然语言处理有哪些好的入门书籍推荐?入门首先应该有哪些实践?...

    自然语言处理入门书籍推荐: /><数学之美(第二版)> 由原谷歌自然语言处理专家吴军博士将原谷歌黑板报内容重新编辑整理而成,让非专业人士也能了解到算法与常见应用的背后数学原理. 介绍 ...

  5. 自然语言0_nltk中文使用和学习资料汇总

    sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId=1005269003&am ...

  6. python语言语块句的标记_《自然语言处理理论与实战》

    编辑推荐 1.讲解自然语言处理的理论 2.案例丰富,实战性强 3.适合自然语言处理学习的入门者 内容提要 自然语言处理是什么?谁需要学习自然语言处理?自然语言处理在哪些地方应用?相关问题一直困扰着不少 ...

  7. 《Python自然语言处理(第二版)-Steven Bird等》学习笔记:第10章 分析句子的意思

    第10章 分析句子的意思 10.1 自然语言理解 查询数据库 自然语言.语义和逻辑 10.2 命题逻辑 10.3 一阶逻辑 一阶定理证明 一阶逻辑语言总结 真值模型 独立变量和赋值 量化 量词范围歧义 ...

  8. NLTK11《Python自然语言处理》code10 分析语句的含义

    分析语句的含义 # -*- coding: utf-8 -*- # win10 python3.5.3/python3.6.1 nltk3.2.4 # <Python自然语言处理> 10 ...

  9. NLTK01 《NLTK基础教程--用NLTK和Python库构建机器学习应用》

    01 关于NLTK的认知 很多介绍NLP的,都会提到NLTK库.还以为NLTK是多牛逼的必需品.看了之后,感觉NLTK对实际项目,作用不大.很多内容都是从语义.语法方面解决NLP问题的.感觉不太靠谱. ...

最新文章

  1. Logminer实战
  2. Input中onbeforepaste的作用
  3. Hark的数据结构与算法练习之简单选择排序
  4. Could not execute SQL statement.
  5. 【开源项目----Android OPenGLES渲染YUV视频文件】
  6. javascript中的cookie问题
  7. 华为云发力分布式云,折射出云计算哪些定势?
  8. cocos2d-x-3.0 build不成功
  9. Switching命令大全
  10. 前端性能优化如何做到极致?
  11. matlab2016a安装
  12. 计算机图形学--全局光照RSM
  13. OCT图像层次分割相关论文泛读
  14. 工作中遇到的一个问题:
  15. IPO (Python)
  16. vue通用后台管理(登录页面)
  17. 在ubuntu下使用Vim学习C++
  18. 人脸论文解读系列——三维人脸重建(一)
  19. ccache 3.1.9 发布,高速C/C++编译工具
  20. WPF中利用DynamicResource动态资源和资源字典实现软件中英文切换

热门文章

  1. Pandas+Pyecharts | 某APP大学生用户数据分析可视化
  2. 20145326蔡馨熤《信息安全系统设计基础》期末总结
  3. CDGA:应聘数仓岗,选择企业级别 or 算法团队?
  4. 红米手机4android os是木马怎么清除,红米Note4 的LineageOS14.1刷机包 安卓7.1.1原生风格 本地化 20180203更新...
  5. 21考研:你是为了什么考研?
  6. SQLite 对时间的支持
  7. 西门子plc与oracle报文,西门子PLC以太网 通讯协议 解析
  8. python 拦截windows弹窗广告_win10怎么阻止弹窗广告拦截功能的方法
  9. 华为电脑计算机怎么显示在桌面,电脑桌面小便签,华为电脑怎么设置桌面便签...
  10. 往事如烟 - 辉哥的财务自由