对于大段大段的英文txt文本,可以用open指令打开,手动建立stopwords字典,进行停用词处理。(附上通用词)

with open('E:\\DATA\\520only abstract.txt','r',encoding='UTF-8')as f:
#打开需要处理的txt文件t=f.read()#文本命名为tfrom nltk import word_tokenize#分词word_tokens=word_tokenize(t)#对t分词
stopwords=['d','ll','m','re','s' ,'t','ve','ZT','ZZ','a','able','about','above','abst','accordance','according','accordingly','across','act','actually','added','adj','adopted','affected',\'affecting','affects','after','afterwards','again','against','ah','all','allow','allows','almost','alone','along','already','also','although','always','am','among','amongst','an','and','announce','another','any',\'anybody','anyhow','anymore','anyone','anything','anyway','anyways','anywhere','apart','apparently','appear','appreciate','appropriate','approximately','are','area','areas','aren','arent','arise','around',\'as','aside','ask','asked','asking','asks','associated','at','auth','available','away','awfully','back','backed','backing','backs','be','became','because','become','becomes','becoming','been','before',\'beforehand','began','begin','beginning','beginnings','begins','behind','being','beings','believe','below','beside','besides','best','better','between','beyond','big','biol','both','brief','briefly','but','by',\'ca','came','can','cannot','cant','case','cases','cause','causes','certain','certainly','changes','clear','clearly','co','com','come','comes','concerning','consequently','consider','considering','containing','contains',\'corresponding','could','couldnt','course','currently','date','definitely','describe','described','despite','did','differ','different','differently','discuss','do','does','doing','done','down','downed','downing',\'downs','downwards','due','during','each','early','ed','edu','effect','eg','eight','eighty','either','else','elsewhere','end','ended','ending','ends','enough','entirely','especially','et','et-al','etc','even','evenly',\'ever','every','everybody','everyone','everything','everywhere','ex','exactly','example','except','face','faces','fact','facts','far','felt','few','ff','fifth','find','finds','first','five','fix','followed','following','follows',\'for','former','formerly','forth','found','four','from','full','fully','further','furthered','furthering','furthermore','furthers','gave','general','generally','get','gets','getting','give','given','gives','giving','go','goes',\'going','gone','good','goods','got','gotten','great','greater','greatest','greetings','group','grouped','grouping','groups','had','happens','hardly','has','have','having','he','hed','hello','help','hence','her',\'here','hereafter','hereby','herein','heres','hereupon','hers','herself','hes','hi','hid','high','higher','highest','him','himself','his','hither','home','hopefully','how','howbeit','however','hundred','id','ie','if',\'ignored','im','immediate','immediately','importance','important','in','inasmuch','inc','include','indeed','index','indicate','indicated','indicates','information','inner','insofar','instead','interest','interested',\'interesting','interests','into','invention','inward','is','isn','t','it','itd','its','itself','j','just','k','keep','keeps','kept','keys','kg','kind','km','knew','know','known','knows','large','largely','last','lately','later',\'latest','latter','latterly','least','less','lest','let','lets','like','liked','likely','line','little','long','longer','longest','look','looking','looks','ltd','m','made','mainly','make','makes','making','man','many','may', \'line', 'little', 'long', 'longer', 'longest', 'look','looking','looks','ltd','m','made', 'mainly', 'make', 'makes', 'making', 'man', 'many', 'may','maybe', 'me', 'mean', 'means', 'meantime', 'meanwhile', 'member', 'members', 'men', 'merely', 'mg',\'might', 'million', 'miss', 'ml', 'more', 'moreover', 'most', 'mostly', 'mr', 'mrs', 'much', 'mug','must', 'my', 'myself','n', 'na', 'name', 'namely', 'nay', 'nd', 'near', 'nearly', 'necessarily', 'necessary', 'need', 'needed',\'needing', 'needs', 'neither', 'never', 'nevertheless', 'new', 'newer', 'newest', 'next', 'nine','ninety', 'no', 'nobody', 'non','none', 'nonetheless', 'noone', 'nor', 'normally', 'nos', 'not', 'noted', 'nothing', 'novel', 'now',\'nowhere', 'number', 'numbers', 'o', 'obtain', 'obtained', 'obviously', 'of', 'off', 'often', 'oh', 'ok','okay', 'old', 'older', 'oldest','omitted', 'on', 'once', 'one', 'ones', 'only', 'onto', 'open', 'opened', 'opening', 'opens', 'or',\'ord', 'order', 'ordered', 'ordering', 'orders', 'other', 'others', 'otherwise', 'ought', 'our', 'ours','ourselves', 'out', 'outside', 'over','overall', 'owing', 'own', 'p', 'page', 'pages', 'part', 'parted', 'particular', 'particularly',\'parting', 'parts', 'past', 'per', 'perhaps', 'place', 'placed', 'places', 'please', 'plus', 'point','pointed', 'pointing', 'take', 'taken', 'taking','tell', 'tends', 'th', 'than', 'thank', 'thanks', 'thanx', 'that', 'the', 'their', 'theirs', 'them',\'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'thered', 'therefore', 'therein','thereof', 'therere', 'theres', 'thereto','thereupon', 'these', 'they', 'theyd', 'theyre', 'thing', 'things','think', 'thinks', 'third', 'this', 'thorough', 'thoroughly', 'those', 'thou', 'though', 'thoughh',\'thought', 'thoughts', 'thousand', 'three', 'throug', 'through', 'throughout', 'thru', 'thus', 'til','tip', 'to', 'today', 'together', 'too', 'took', 'toward', 'towards', 'tried', 'tries', 'truly', 'try','trying', 'ts', 'turn', 'turned', 'turning', 'turns', 'twice', 'two', 'un', 'under', 'unfortunately',\'unless','unlike', 'unlikely', 'until', 'unto', 'up', 'upon', 'ups', 'us', 'use', 'used', 'useful', 'usefully','usefulness', 'uses', 'using', 'usually', 'uucp', 'value', 'various', 'very', 'via', 'viz', 'vol','vols', 'vs', 'want', 'wanted', 'wanting' 'wants', 'was', 'way', 'ways', 'we', 'wed', 'welcome', 'well',\'wells', 'went', 'were', 'what', 'whatever', 'whats', 'when', 'whence', 'whenever', 'where','whereafter', 'whereas', 'whereby', 'wherein', 'wheres', 'whereupon','wherever', 'whether', 'which', 'while', 'whim','whither', 'who', 'whod', 'whoever', 'whole', 'whom', 'whomever', 'whos', 'whose', 'why', 'widely',\'will', 'willing', 'wish', 'with', 'within', 'without', 'wonder','words', 'work', 'worked', 'working', 'work','world', 'would', 'www', 'year', 'years', 'yes', 'yet', 'you', 'youd', 'young', 'younger', 'youngest','your', 'youre', 'yours', 'yourself', 'yourselves', 'z', 'zero', 'zt', 'zz']#导入stopwords
filtered_sentence = []#定义输出for w in word_tokens:if w not in stopwords:filtered_sentence.append(w)#for循环
print("\n\nFiltered Sentence \n\n")
print(" ".join(filtered_sentence))
#输出结果

英文文本导入去停用词相关推荐

  1. 文本相似度计算 python去停用词_python专业方向 | 文本相似度计算

    欢迎关注我们的微信公众号"人工智能LeadAI"(ID:atleadai)步骤 1.分词.去停用词 2.词袋模型向量化文本 3.TF-IDF模型向量化文本 4.LSI模型向量化文本 ...

  2. 『NLP自然语言处理』中文文本的分词、去标点符号、去停用词、词性标注

    利用Python代码实现中文文本的自然语言处理,包括分词.去标点符号.去停用词.词性标注&过滤. 在刚开始的每个模块,介绍它的实现.最后会将整个文本处理过程封装成 TextProcess 类. ...

  3. (3.2)将分词和去停用词后的评论文本基于“环境、卫生、价格、服务”分类...

    酒店评论情感分析系统(三)-- 将分词和去停用词后的评论文本基于"环境.卫生.价格.服务"分类 思想: 将进行了中文分词和去停用词之后得到的词或短语按序存在一个数组(iniArra ...

  4. 文本情感分析:去停用词

    原文地址 分类目录--情感识别 随便构造了一份测试数据如下,内容是gensim下的词向量生成模型word2vec的属性说明 一种方式,通过正则表达式,这里以去标点符号为例,在分词之前进行操作 impo ...

  5. 文本分析——分词并去停用词返回嵌套列表并保存到本地

    文章目录 文本分析分词并去停用词返回嵌套列表 读取文件并进行分词去停用词操作 保存结果到本地 从本地读取结果 文本分析分词并去停用词返回嵌套列表 此代码块用于分词并去停用词(从csv文件转成了txt分 ...

  6. [超详细] Python3爬取豆瓣影评、去停用词、词云图、评论关键词绘图处理

    爬取豆瓣电影<大侦探皮卡丘>的影评,并做词云图和关键词绘图 第一步:找到评论的网页url. https://movie.douban.com/subject/26835471/commen ...

  7. IKAnalyzer进行中文分词和去停用词

    最近学习主题模型pLSA.LDA,就想拿来试试中文.首先就是找文本进行切词.去停用词等预处理,这里我找了开源工具IKAnalyzer2012,下载地址:(:(注意:这里尽量下载最新版本,我这里用的IK ...

  8. python去停用词用nltk_【NLTK】安装和使用NLTK分词和去停词

    黄聪:Python+NLTK自然语言处理学习(一):环境搭建 http://www.cnblogs.com/huangcong/archive/2011/08/29/2157437.html 安装NL ...

  9. 分词并去停用词自定义函数:seg_word(sentence)

    分词并去停用词自定义函数:seg_word(sentence). import jieba def seg_word(sentence):"""使用jieba对文档分词& ...

最新文章

  1. 尹伊:我眼中的Datawhale
  2. oracle proc 定义宿主,oracle proc 编程基础及最小化案例
  3. java string 常用方法_String类的12个常用方法
  4. python升级命令出现错误_python - _tkinter.TclError:无法调用“ update”命令:应用程序已被破坏错误 - 堆栈内存溢出...
  5. volatile 关键字
  6. 超图iserver登录密码忘记,重置密码
  7. 字节跳动---万万没想到之聪明的编辑
  8. SMS短信的C语言代码摘抄
  9. C语言实现面向接口编程
  10. python朋友圈动态_如何利用Python网络爬虫爬取微信朋友圈动态--附代码(下)
  11. 算法提高 求最大值java_算法笔记_096:蓝桥杯练习 算法提高 求最大值(Java)
  12. 论文笔记_S2D.39_2015-ICCV-条件随机场作为递归神经网络(CRF-RNN)
  13. 报数(约瑟夫环问题)
  14. 统计学中sp_统计学中pssp是什么意思
  15. 史上最牛的Linux视频教程—兄弟连 学习笔记1
  16. 在win7的iis下部署asp网站
  17. linux无线网卡速度慢,Linux如何解决英特尔无线网卡WiFi网速慢、WiFi蓝牙无法共存等问题...
  18. 解决win10只有IE可以上网,其他浏览器都无法连接网络
  19. 个人windows-manjaro双系统配置记录
  20. 关于emacs字体放大问题

热门文章

  1. gem意思_邓紫棋为什么叫GEM 每次改名都有意义
  2. 《队长说得队》第三次作业:团队项目的原型设计
  3. 每个国家的邮政编码查询
  4. 简历制作要点与面试技巧
  5. 亚马逊跨境电商平台四大特点,你了解吗?
  6. 机械专业夹具类毕业设计题目汇总/组合机床、车床拨叉、飞锤支架、连接座、倒挡拨叉、盖、法兰盘、铜衬轴套、心轴零件、曲轴箱零件、托板、发动机曲轴、方刀架、车床变速箱、柴油机机体、车床滤油器、方刀架……
  7. mysql blob 存取乱码问题
  8. 技术前沿与经典文章16:历史上54位伟大物理学家、科学家的专属LOGO(二)
  9. 高清播放知识之 480P、720P、1080P
  10. 闲聊MySQL(九):浅析SQL执行计划