NLTK

这是一个处理文本的python库,我们知道文字性的知识可是拥有非常庞大的数据量,故而这属于大数据系列。
本文只是浅尝辄止,目前本人并未涉及这块知识,只是偶尔好奇,才写本文。

从NLTK中的book模块中,载入所有条目

  • book 模块包含所有数据
from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
text1
<Text: Moby Dick by Herman Melville 1851>
text2
<Text: Sense and Sensibility by Jane Austen 1811>

搜索文本或主题

  1. concordance允许在课文中查找单词,并打印出来
  2. similar 用来识别文章中和搜索词相似的词语,可以用在搜索引擎中的相关度识别功能中。
  3. common_contexts 用来识别2个关键词相似的词语。
  4. dispersion_plot 绘制单词的离散图
text1.concordance('monstrous') # 在text1中查阅词汇'monstrous'
# concordance
# 英 [kən'kɔːd(ə)ns]  美 [kən'kɔrdns]
# n. 调和,一致;用语索引;著作或作家全集的重要用字索引
Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us ,
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u
text2.concordance('affection')
Displaying 25 of 79 matches:
, however , and , as a mark of his affection for the three girls , he left them
t . It was very well known that no affection was ever supposed to exist between
deration of politeness or maternal affection on the side of the former , the tw
d the suspicion -- the hope of his affection for me may warrant , without impru
hich forbade the indulgence of his affection . She knew that his mother neither
rd she gave one with still greater affection . Though her late conversation witcan never hope to feel or inspire affection again , and if her home be uncomfo
m of the sense , elegance , mutual affection , and domestic comfort of the fami
, and which recommended him to her affection beyond every thing else . His soci
ween the parties might forward the affection of Mr . Willoughby , an equally stthe most pointed assurance of her affection . Elinor could not be surprised at
he natural consequence of a strong affection in a young and ardent mind . This opinion . But by an appeal to her affection for her mother , by representing tevery alteration of a place which affection had established as perfect with hi
e will always have one claim of my affection , which no other can possibly shar
f the evening declared at once his affection and happiness . " Shall we see you
ause he took leave of us with less affection than his usual behaviour has shewn
ness ." " I want no proof of their affection ," said Elinor ; " but of their en
onths , without telling her of his affection ;-- that they should part without
ould be the natural result of your affection for her . She used to be all unres
distinguished Elinor by no mark of affection . Marianne saw and listened with i
th no inclination for expense , no affection for strangers , no profession , an
till distinguished her by the same affection which once she had felt no doubt o
al of her confidence in Edward ' s affection , to the remembrance of every markwas made ? Had he never owned his affection to yourself ?" " Oh , no ; but if 
text1.similar('monstrous')
true contemptible christian abundant few part mean careful puzzled
mystifying passing curious loving wise doleful gamesome singular
delightfully perilous fearless
text2.similar('monstrous')
very so exceedingly heartily a as good great extremely remarkably
sweet vast amazingly
text2.common_contexts(['monstrous','very'])
a_pretty am_glad a_lucky is_pretty be_glad
# 从文本中检查一个单词的位置,从该单词出现开始出现了多少次。
# Each stripe represents an instance of a word,
# and each row represents the entire text.
text4.dispersion_plot(['citizens','democracy','freedon','duties','America','liberty'])
# dispersion
# 英 [dɪ'spɜːʃ(ə)n]  美 [dɪ'spɝʒn]
# n. 散布;[统计][数] 离差;驱散

print(text3.generate('monstrous'))
None

统计词汇

len(text3)
44764
sorted(set(text3))
['!',"'",'(',')',',',',)','.','.)',':',';',';)','?','?)','A','Abel','Abelmizraim','Abidah','Abide','Abimael','Abimelech','Abr','Abrah','Abraham','Abram','Accad','Achbor','Adah','Adam','Adbeel','Admah','Adullamite','After','Aholibamah','Ahuzzath','Ajah','Akan','All','Allonbachuth','Almighty','Almodad','Also','Alvah','Alvan','Am','Amal','Amalek','Amalekites','Ammon','Amorite','Amorites','Amraphel','An','Anah','Anamim','And','Aner','Angel','Appoint','Aram','Aran','Ararat','Arbah','Ard','Are','Areli','Arioch','Arise','Arkite','Arodi','Arphaxad','Art','Arvadite','As','Asenath','Ashbel','Asher','Ashkenaz','Ashteroth','Ask','Asshur','Asshurim','Assyr','Assyria','At','Atad','Avith','Baalhanan','Babel','Bashemath','Be','Because','Becher','Bedad','Beeri','Beerlahairoi','Beersheba','Behold','Bela','Belah','Benam','Benjamin','Beno','Beor','Bera','Bered','Beriah','Bethel','Bethlehem','Bethuel','Beware','Bilhah','Bilhan','Binding','Birsha','Bless','Blessed','Both','Bow','Bozrah','Bring','But','Buz','By','Cain','Cainan','Calah','Calneh','Can','Cana','Canaan','Canaanite','Canaanites','Canaanitish','Caphtorim','Carmi','Casluhim','Cast','Cause','Chaldees','Chedorlaomer','Cheran','Cherubims','Chesed','Chezib','Come','Cursed','Cush','Damascus','Dan','Day','Deborah','Dedan','Deliver','Diklah','Din','Dinah','Dinhabah','Discern','Dishan','Dishon','Do','Dodanim','Dothan','Drink','Duke','Dumah','Earth','Ebal','Eber','Edar','Eden','Edom','Edomites','Egy','Egypt','Egyptia','Egyptian','Egyptians','Ehi','Elah','Elam','Elbethel','Eldaah','EleloheIsrael','Eliezer','Eliphaz','Elishah','Ellasar','Elon','Elparan','Emins','En','Enmishpat','Eno','Enoch','Enos','Ephah','Epher','Ephra','Ephraim','Ephrath','Ephron','Er','Erech','Eri','Es','Esau','Escape','Esek','Eshban','Eshcol','Ethiopia','Euphrat','Euphrates','Eve','Even','Every','Except','Ezbon','Ezer','Fear','Feed','Fifteen','Fill','For','Forasmuch','Forgive','From','Fulfil','G','Gad','Gaham','Galeed','Gatam','Gather','Gaza','Gentiles','Gera','Gerar','Gershon','Get','Gether','Gihon','Gilead','Girgashites','Girgasite','Give','Go','God','Gomer','Gomorrah','Goshen','Guni','Hadad','Hadar','Hadoram','Hagar','Haggi','Hai','Ham','Hamathite','Hamor','Hamul','Hanoch','Happy','Haran','Hast','Haste','Have','Havilah','Hazarmaveth','Hazezontamar','Hazo','He','Hear','Heaven','Heber','Hebrew','Hebrews','Hebron','Hemam','Hemdan','Here','Hereby','Heth','Hezron','Hiddekel','Hinder','Hirah','His','Hitti','Hittite','Hittites','Hivite','Hobah','Hori','Horite','Horites','How','Hul','Huppim','Husham','Hushim','Huz','I','If','In','Irad','Iram','Is','Isa','Isaac','Iscah','Ishbak','Ishmael','Ishmeelites','Ishuah','Isra','Israel','Issachar','Isui','It','Ithran','Jaalam','Jabal','Jabbok','Jac','Jachin','Jacob','Jahleel','Jahzeel','Jamin','Japhe','Japheth','Jared','Javan','Jebusite','Jebusites','Jegarsahadutha','Jehovahjireh','Jemuel','Jerah','Jetheth','Jetur','Jeush','Jezer','Jidlaph','Jimnah','Job','Jobab','Jokshan','Joktan','Jordan','Joseph','Jubal','Judah','Judge','Judith','Kadesh','Kadmonites','Karnaim','Kedar','Kedemah','Kemuel','Kenaz','Kenites','Kenizzites','Keturah','Kiriathaim','Kirjatharba','Kittim','Know','Kohath','Kor','Korah','LO','LORD','Laban','Lahairoi','Lamech','Lasha','Lay','Leah','Lehabim','Lest','Let','Letushim','Leummim','Levi','Lie','Lift','Lo','Look','Lot','Lotan','Lud','Ludim','Luz','Maachah','Machir','Machpelah','Madai','Magdiel','Magog','Mahalaleel','Mahalath','Mahanaim','Make','Malchiel','Male','Mam','Mamre','Man','Manahath','Manass','Manasseh','Mash','Masrekah','Massa','Matred','Me','Medan','Mehetabel','Mehujael','Melchizedek','Merari','Mesha','Meshech','Mesopotamia','Methusa','Methusael','Methuselah','Mezahab','Mibsam','Mibzar','Midian','Midianites','Milcah','Mishma','Mizpah','Mizraim','Mizz','Moab','Moabites','Moreh','Moreover','Moriah','Muppim','My','Naamah','Naaman','Nahath','Nahor','Naphish','Naphtali','Naphtuhim','Nay','Nebajoth','Neither','Night','Nimrod','Nineveh','Noah','Nod','Not','Now','O','Obal','Of','Oh','Ohad','Omar','On','Onam','Onan','Only','Ophir','Our','Out','Padan','Padanaram','Paran','Pass','Pathrusim','Pau','Peace','Peleg','Peniel','Penuel','Peradventure','Perizzit','Perizzite','Perizzites','Phallu','Phara','Pharaoh','Pharez','Phichol','Philistim','Philistines','Phut','Phuvah','Pildash','Pinon','Pison','Potiphar','Potipherah','Put','Raamah','Rachel','Rameses','Rebek','Rebekah','Rehoboth','Remain','Rephaims','Resen','Return','Reu','Reub','Reuben','Reuel','Reumah','Riphath','Rosh','Sabtah','Sabtech','Said','Salah','Salem','Samlah','Sarah','Sarai','Saul','Save','Say','Se','Seba','See','Seeing','Seir','Sell','Send','Sephar','Serah','Sered','Serug','Set','Seth','Shalem','Shall','Shalt','Shammah','Shaul','Shaveh','She','Sheba','Shebah','Shechem','Shed','Shel','Shelah','Sheleph','Shem','Shemeber','Shepho','Shillem','Shiloh','Shimron','Shinab','Shinar','Shobal','Should','Shuah','Shuni','Shur','Sichem','Siddim','Sidon','Simeon','Sinite','Sitnah','Slay','So','Sod','Sodom','Sojourn','Some','Spake','Speak','Spirit','Stand','Succoth','Surely','Swear','Syrian','Take','Tamar','Tarshish','Tebah','Tell','Tema','Teman','Temani','Terah','Thahash','That','The','Then','There','Therefore','These','They','Thirty','This','Thorns','Thou','Thus','Thy','Tidal','Timna','Timnah','Timnath','Tiras','To','Togarmah','Tola','Tubal','Tubalcain','Twelve','Two','Unstable','Until','Unto','Up','Upon','Ur','Uz','Uzal','We','What','When','Whence','Where','Whereas','Wherefore','Which','While','Who','Whose','Whoso','Why','Wilt','With','Woman','Ye','Yea','Yet','Zaavan','Zaphnathpaaneah','Zar','Zarah','Zeboiim','Zeboim','Zebul','Zebulun','Zemarite','Zepho','Zerah','Zibeon','Zidon','Zillah','Zilpah','Zimran','Ziphion','Zo','Zoar','Zohar','Zuzims','a','abated','abide','able','abode','abomination','about','above','abroad','absent','abundantly','accept','accepted','according','acknowledged','activity','add','adder','afar','afflict','affliction','afraid','after','afterward','afterwards','aga','again','against','age','aileth','air','al','alive','all','almon','alo','alone','aloud','also','altar','altogether','always','am','among','amongst','an','and','angel','angels','anger','angry','anguish','anointedst','anoth','another','answer','answered','any','anything','appe','appear','appeared','appease','appoint','appointed','aprons','archer','archers','are','arise','ark','armed','arms','army','arose','arrayed','art','artificer','as','ascending','ash','ashamed','ask','asked','asketh','ass','assembly','asses','assigned','asswaged','at','attained','audience','avenged','aw','awaked','away','awoke','back','backward','bad','bade','badest','badne','bak','bake','bakemeats','baker','bakers','balm','bands','bank','bare','barr','barren','basket','baskets','battle','bdellium','be','bear','beari','bearing','beast','beasts','beautiful','became','because','become','bed','been','befall','befell','before','began','begat','beget','begettest','begin','beginning','begotten','beguiled','beheld','behind','behold','being','believed','belly','belong','beneath','bereaved','beside','besides','besought','best','betimes','better','between','betwixt','beyond','binding','bird','birds','birthday','birthright','biteth','bitter','blame','blameless','blasted','bless','blessed','blesseth','blessi','blessing','blessings','blindness','blood','blossoms','bodies','boldly','bondman','bondmen','bondwoman','bone','bones','book','booths','border','borders','born','bosom','both','bottle','bou','boug','bough','bought','bound','bow','bowed','bowels','bowing','boys','bracelets','branches','brass','bre','breach','bread','breadth','break','breaketh','breaking','breasts','breath','breathed','breed','brethren','brick','brimstone','bring','brink','broken','brook','broth','brother','brought','brown','bruise','budded','build','builded','built','bulls','bundle','bundles','burdens','buried','burn','burning','burnt','bury','buryingplace','business','but','butler','butlers','butlership','butter','buy','by','cakes','calf','call','called','came','camel','camels','camest','can','cannot','canst','captain','captive','captives','carcases','carried','carry','cast','castles','catt','cattle','caught','cause','caused','cave','cease','ceased','certain','certainly','chain','chamber','change','changed','changes','charge','charged','chariot','chariots','chesnut','chi','chief','child','childless','childr','children','chode','choice','chose','circumcis','circumcise','circumcised','citi','cities','city','clave','clean','clear','cleave','clo','closed','clothed','clothes','cloud','clusters','co','coat','coats','coffin','cold',...]
len(set(text3))
2789
len(text3)/len(set(text3))
16.050197203298673
text3.count('smote')
5
100*text4.count('a')/len(text4)
1.4643016433938312
def lexical_diversity(text):# lexical英['leksɪk(ə)l] 美 ['lɛksɪkl]# adj.词汇的;[语] 词典的;词典编纂的# diversity英[daɪ'vɜːsɪtɪ; dɪ-]美 [dɪˈvəsɪti]# n.多样性;差异return len(text)/len(set(text))
def percentage(count, total):return 100*count/totalprint('text3中词汇多样性指标:{}'.format(lexical_diversity(text3)))
print('text4中单词a占全文的百分比:{}'.format(percentage(text4.count('a'),len(text4))))
text3中词汇多样性指标:16.050197203298673
text4中单词a占全文的百分比:1.4643016433938312

列表 = Lists

sent1 = ['Call', 'me','Ishmael','.']
print('打印sent1中的内容:{}'.format(sent1))
print('打印sent1中内容的长度:{}'.format(len(sent1)))
print('sent1中词汇多样性指标:{}'.format(lexical_diversity(sent1)))
打印sent1中的内容:['Call', 'me', 'Ishmael', '.']
打印sent1中内容的长度:4
sent1中词汇多样性指标:1.0
sent1,sent2,sent3,sent4 # 这是内部定义好的列表
(['Call', 'me', 'Ishmael', '.'],['The','family','of','Dashwood','had','long','been','settled','in','Sussex','.'],['In','the','beginning','God','created','the','heaven','and','the','earth','.'],['Fellow','-','Citizens','of','the','Senate','and','of','the','House','of','Representatives',':'])
sent4+sent1
['Fellow','-','Citizens','of','the','Senate','and','of','the','House','of','Representatives',':','Call','me','Ishmael','.']
sent1.append('Some')
['Call', 'me', 'Ishmael', '.', 'Some', 'Some', 'Some', 'Some']

列表索引

type(text4)
nltk.text.Text
text4[173]
'awaken'
text4.index('awaken')
173
text5[16715:16735]
['U86','thats','why','something','like','gamefly','is','so','good','because','you','can','actually','play','a','full','game','without','buying','it']
text6[1600:1625]
['We',"'",'re','an','anarcho','-','syndicalist','commune','.','We','take','it','in','turns','to','act','as','a','sort','of','executive','officer','for','the','week']

变量

sent1 = ['Call','me','Ishmael','.']
my_sent = ['Bravely','bold','Sir','Robin',',','rode','forth','from','Camelot','.']
noun_phrase = my_sent[1:4]
print('打印切片后的列表:noun_phrase-》{}'.format(noun_phrase))
wOrDs = sorted(noun_phrase)
print('打印排序后的列表:wOrDs-》{}'.format(wOrDs))
打印切片后的列表:noun_phrase-》['bold', 'Sir', 'Robin']
打印排序后的列表:wOrDs-》['Robin', 'Sir', 'bold']

字符串

name = 'bright'
print('打印name中的第一个字母:{}'.format(name[0]))
print(name[:4])
print(name*2)
print(name + '!')
打印name中的第一个字母:b
brig
brightbright
bright!
' '.join(['Monty', 'Python'])
'Monty Python'
'Monty Python'.split()
['Monty', 'Python']
saying = ['After','all','is','said','and','done','more','is','said','than','done']
tokens = set(saying)
tokens = sorted(tokens)
tokens[-2:]
['said', 'than']
fdist1 = FreqDist(text1)
vocabulary1 = fdist1.keys()
type(vocabulary1)
dict_keys
fdist1.plot(50, cumulative=True)
#Cumulative frequency plot for the 50 most frequently used words in Moby Dick, which
#account for nearly half of the tokens.

fdist1.hapaxes() #the words that occur once only
['Herman','Melville',']','ETYMOLOGY','Late','Consumptive','School','threadbare','lexicons','mockingly','flags','mortality','signification','HACKLUYT','Sw','HVAL','roundness','Dut','Ger','WALLEN','WALW','IAN','RICHARDSON','KETOS','GREEK','CETUS','LATIN','WHOEL','ANGLO','SAXON','WAL','HWAL','SWEDISH','ICELANDIC','BALEINE','BALLENA','FEGEE','ERROMANGOAN','Librarian','painstaking','burrower','grub','Vaticans','stalls','higgledy','piggledy','gospel','promiscuously','commentator','belongest','sallow','Pale','Sherry','loves','bluntly','Subs','thankless','Hampton','Court','hie','refugees','pampered','Michael','Raphael','unsplinterable','GENESIS','JOB','JONAH','punish','ISAIAH','soever','cometh','incontinently','perisheth','PLUTARCH','MORALS','breedeth','Whirlpooles','Balaene','arpens','PLINY','Scarcely','TOOKE','LUCIAN','TRUE','catched','OCTHER','VERBAL','TAKEN','MOUTH','ALFRED','890','gudgeon','retires','MONTAIGNE','APOLOGY','RAIMOND','SEBOND','Nick','RABELAIS','cartloads','STOWE','ANNALS','LORD','BACON','Touching','ork','DEATH','sovereignest','bruise','HAMLET','leach','Mote','availle','returne','againe','worker','Dinting','paine','thro','maine','FAERIE','Immense','til','DAVENANT','PREFACE','GONDIBERT','spermacetti','Hosmannus','Nescio','VIDE','Spencer','Talus','flail','threatens','jav','lins','WALLER','SUMMER','ISLANDS','Commonwealth','Civitas','OPENING','SENTENCE','HOBBES','LEVIATHAN','Silly','Mansoul','chewing','sprat','PILGRIM','PROGRESS','Created','PARADISE','LOST','---"','Hugest','Stretched','Draws','FULLLER','PROFANE','HOLY','STATE','DRYDEN','ANNUS','MIRABILIS','aground','EDGE','TEN','SPITZBERGEN','PURCHAS','wantonness','fuzzing','vents','HERBERT','INTO','ASIA','AFRICA','SCHOUTEN','SIXTH','CIRCUMNAVIGATION','Elbe','ducat','herrings','GREENLAND','Several','Fife','Anno','1652','Pitferren','SIBBALD','FIFE','KINROSS','Myself','Sperma','ceti','fierceness','RICHARD','STRAFFORD','LETTER','BERMUDAS','PHIL','TRANS','1668','PRIMER','COWLEY','1729','"...','frequendy','insupportable','disorder','ULLOA','SOUTH','AMERICA','sylphs','petticoat','Oft','Tho','RAPE','LOCK','NAT','wales','JOHNSON','COOK','dung','lime','juniper','UNO','VON','TROIL','LETTERS','BANKS','SOLANDER','1772','Nantuckois','JEFFERSON','MEMORIAL','MINISTER','REFERENCE','PARLIAMENT','SOMEWHERE','guarding','protecting','robbers','BLACKSTONE','Rodmond','suspends','attends','FALCONER','Bright','roofs','domes','rockets','Around','unwieldy','COWPER','VISIT','LONDON','HUNTER','DISSECTION','SMALL','SIZED','aorta','gushing','PALEY','THEOLOGY','mammiferous','hind','BARON','CUVIER','COLNETT','PURPOSE','EXTENDING','SPERMACETI','Floundered','chace','peopling','Gather','Led','instincts','trackless','Assaulted','voracious','spiral','MONTGOMERY','WORLD','FLOOD','Paean','fatter','Flounders','CHARLES','LAMB','TRIUMPH','1690','OBED','Susan','HAWTHORNE','TWICE','bespeak','raal','COOPER','PILOT','Berlin','Gazette','ECKERMANN','CONVERSATIONS','GOETHE','ESSEX','WAS','ATTACKED','FINALLY','DESTROYED','OWEN','CHACE','FIRST','SAID','VESSEL','YORK','1821','piping','dimmed','phospher','ELIZABETH','OAKES','SMITH','amounted','440','SCORESBY','Mad','agonies','endures','infuriated','rears','snaps','propelled','observers','opportunities','habitudes','BEALE','offensively','artful','mischievous','FREDERICK','DEBELL','1840','October','Raise','ay','THAR','bowes','os','ROSS','ETCHINGS','CRUIZE','1846','Globe','transactions','relate','HUSSEY','SURVIVORS','parried','MISSIONARY','JOURNAL','TYERMAN','boldest','persevering','REPORT','DANIEL','SPEECH','SENATE','APPLICATION','ERECTION','BREAKWATER','CAPTORS','WHALEMAN','ADVENTURES','BIOGRAPHY','GATHERED','HOMEWARD','COMMODORE','PREBLE','REV','CHEEVER','MUTINEER','BROTHER','ANOTHER','MCCULLOCH','COMMERCIAL','reciprocal','clews','SOMETHING','UNPUBLISHED','CURRENTS','Pedestrians','recollect','gateways','VOYAGER','ARCTIC','NEWSPAPER','TAKING','RETAKING','HOBOMACK','MIRIAM','FISHERMAN','appliance','RIBS','TRUCKS','Terra','Del','Fuego','DARWIN','NATURALIST',";--'",'!\'"','WHARTON','Loomings','spleen','regulating','circulation','Whenever','drizzly','hypos','philosophical','Cato','Manhattoes','reefs','downtown','gazers','Circumambulate','Corlears','Coenties','Slip','Whitehall','Posted','sentinels','spiles','pier','lath','counters','desks','loitering','shady','Inlanders','lanes','alleys','attract','dale','dreamiest','shadiest','quietest','enchanting','Saco','crucifix','Deep','mazy','Tiger','Tennessee','Rockaway','Persians','deity','Narcissus','ungraspable','hazy','quarrelsome','offices','abominate','toils','trials','barques','schooners','broiling','buttered','judgmatically','peppered','reverentially','idolatrous','dotings','ibis','roasted','bake','plumb','Van','Rensselaers','Randolphs','Hardicanutes','lording','tallest','decoction','Seneca','Stoics','Testament','promptly','rub','infliction','BEING','PAID','urbane','ills','monied','consign','prevalent','violate','Pythagorean','commonalty','police','surveillance','programme','solo','CONTESTED','ELECTION','PRESIDENCY','UNITED','STATES','ISHMAEL','BLOODY','AFFGHANISTAN','managers','genteel','comedies','farces','cunningly','disguises','cajoling','unbiased','freewill','discriminating','overwhelming','undeliverable','itch','forbidden','ignoring','lodges','Carpet','Bag','Manhatto','candidates','penalties','Tyre','Carthage','imported','cobblestones','bitingly','shouldering','price','fervent','asphaltic','pavement','flinty','projections','soles','Too','cheapest','cheeriest','invitingly','particles','peer','Angel','Doom','wailing','gnashing','Wretched','entertainment','Moving','emigrant','poverty','creak','lodgings','zephyr','hob','toasting','observest','sashless','glazier','reasonest','chinks','crannies','lint','chattering','shiverings','cob','redder','Orion','glitters','conservatories','president','temperance','blubbering','straggling','wainscots','reminding','oilpainting','besmoked','defaced','unequal','crosslights','hags','delineate','bewitched','ponderings','boggy','soggy','squitchy','froze','heath','icebound','represents','Horner','foundered','clubs','harvesting','hacking','horrifying','Mixed','Nathan','Swain','corkscrew','Blanco','sojourning','fireplaces','duskier','cockpits','rarities','Projecting','Within','shelves','flasks','bustles','deliriums','Abominable','tumblers','cylinders','goggling','deceitfully','tapered','Parallel','pecked','footpads','Fill','shilling','examining','SKRIMSHANDER','accommodated','unoccupied','haint','pose','whalin','decidedly','objectionable','wander','Battery','ruminating','adorning','potatoes','sartainty','diabolically','steaks','undress','looker','rioting','Grampus','seed','Feegees','tramping','Enveloped','bedarned','eruption','officiating','brimmers','complained','potion','colds','catarrhs','liquor','arrantest','topers','obstreperously','aloof','desirous','hilarity','coffer','Southerner','mountaineers','Alleghanian','missed','supernaturally','congratulate','multiply','bachelor','abominated','tidiest','bedwards','shan','tablecloth','Skrimshander','bump','spraining','eider','yoking','rickety','whirlwinds','knockings','dismissed','popped','cherishing','chuckled','chuckle','mightily','catches','bamboozingly','overstocked','toothpick','rayther','BROWN','slanderin','farrago','BROKE','Sartain','Mt','Hecla','persist','mystifying','unsay','criminal','Wall','purty','sarmon','rips','tellin','bought','balmed','curios','sellin','inions','fooling','idolators','Depend','reg','lar','spliced','Johnny','sprawling','Arter','glim','jiffy','irresolute','vum','WON','Folding','scrutiny','porcupine','moccasin','ponchos','parade','rainy','remembering','commended','cobs','Nod','footfall','unlacing','blackish','plasters','inkling','Placing','crammed','scalp','mildewed','Ignorance','parent','nonplussed','undressing','checkered','Thirty','frogs','quaked','wrapall','dreadnaught','fumbled','Remembering','manikin','tenpin','andirons','jambs','bricks','appropriate','applying','hastier','withdrawals','antics','devotee','extinguishing','unceremoniously','bagged','sportsman','woodcock','uncomfortableness','deliberating','puffed','sang','Stammering','conjured','responses','debel','flourishing','Angels','flourishings','peddlin','sleepe','grunted','gettee','motioning','comely','insured','Counterpane','parti','triangles','interminable','caper','supperless','21st','hemisphere','sigh','Sixteen','ached','coaches','stockinged','slippering','misbehaviour','unendurable','stepmothers','misfortunes','steeped','shudderingly','confounding','soberly','recurred','predicament','unlock','bridegroom','clasp','hugged','rouse','snore','scratch','Throwing','expostulations','unbecomingness','matrimonial','dawning','overture','innate','compliment','civility','rudeness','toilette','dressing','donning','gaspings','booting','caterpillar','outlandishness','manners','education','undergraduate','dreamt','cowhide','pinched','curtains','indecorous','contented','restricting','donned','lathering','unsheathes','whets','Rogers','cutlery','Afterwards','baton','Breakfast','pleasantly','bountifully','laughable','bosky','unshorn','gowns','toasted','lingers','tarried','barred','Grub','Park','assurance','polish','occasioned','embarrassed','bashfulness','duelled','winking','tastes','sheepishly','bashful','icicle','admirer','cordially','grappling','genteelly','eschewed','undivided','6','circulating','nondescripts','Chestnut','jostle','Regent','Lascars','Bombay','Apollo','Feegeeans','Tongatobooarrs','Erromanggoans','Pannangians','Brighggians','weekly','Vermonters','stalwart','frames','felled','strutting','wester','bombazine','cloak','mow','gloves','joins','outfit','waistcoats','Hay','Seed','tract','dearest','pave','eggs','patrician','parks','scraggy','scoria','Herr','dowers','nieces','reservoirs','maples','bountiful','proffer','passer','cones','blossoms','superinduced','carnation','Salem','sweethearts','Puritanic','Whaleman','Wrapping','Each','quote','TALBOT','Near','Desolation','1st','SISTER','ROBERT','WILLIS','ELLERY','NATHAN','COLEMAN','WALTER','CANNY','SETH','GLEIG','Forming','ELIZA','31st','MARBLE','SHIPMATES','EZEKIEL','HARDY','AUGUST','3d','1833','WIDOW','Shaking','glazed','Affected','relatives','unhealing','sympathetically','wounds','bleed','blanks',...]

单词的精细选择

  1. the set of all w such that w is an element of V (the vocabulary) and w has property P
    {w|w \(\in\) V and P(w)}
  2. The corresponding Python expression is given:
    [w for w in V if p(w)]
V = set(text1)
long_words = [w for w in V if len(w)>15]
sorted(long_words)
['CIRCUMNAVIGATION','Physiognomically','apprehensiveness','cannibalistically','characteristically','circumnavigating','circumnavigation','circumnavigations','comprehensiveness','hermaphroditical','indiscriminately','indispensableness','irresistibleness','physiognomically','preternaturalness','responsibilities','simultaneousness','subterraneousness','supernaturalness','superstitiousness','uncomfortableness','uncompromisedness','undiscriminating','uninterpenetratingly']

本文选自《Natural Language Processing with Python》

转载于:https://www.cnblogs.com/brightyuxl/p/8973951.html

Python3自然语言(NLTK)——语言大数据相关推荐

  1. 语言大数据起航,大数据量级加码

    点击查看全文 ZD至顶网CIO与应用频道 04月11日 北京消息:权威专家表示,中国每年捕获和产生的数据量将从2012年的364EB增长到2020年的8.6ZB,即年增50%,占全球数据总量比例从13 ...

  2. 鸟枪换炮,利用python3对球员做大数据降维(因子分析得分),为C罗找到合格僚机

    鸟枪换炮,利用python3对球员做大数据降维(因子分析得分),为C罗找到合格僚机 原文转载自「刘悦的技术博客」https://v3u.cn/a_id_176 众所周知,尤文图斯需要一座欧冠奖杯,C罗 ...

  3. 译见赋能跨语言大数据渠道生态

    文章讲的是译见赋能跨语言大数据渠道生态,2016年11月18日,2016"赋能·译见"渠道招商大会在京圆满落幕.来自全国近二十个领域的上百家企业盛情赴会,共谋跨语言大数据未来发展. ...

  4. “译见”跨语言大数据渠道生态 上亿市场即将开启

    2016年11月18日,2016"赋能译见"渠道招商大会在京圆满落幕.来自全国近二十个领域的上百家企业盛情赴会,共谋跨语言大数据未来发展.会上,中译语通科技(北京)有限公司(以下简 ...

  5. c罗python可视化分析_鸟枪换炮,利用python3对球员做大数据降维(因子分析得分),为C罗找到合格僚机...

    众所周知,尤文图斯需要一座欧冠奖杯,C罗也还想再拿一座欧冠奖杯,为自己的荣誉簙上锦上添花.意甲霸主在意甲虽然风生水起,予取予求,但是在今年欧冠1/8决赛赛场上,被法甲球队里昂所淘汰,痛定思痛,球队解雇 ...

  6. python3语法错误python_[大数据]Python 3.x中使用print函数出现语法错误(SyntaxError: invalid syntax)的原因 - 码姐姐找文...

    在安装了最新版本的Python 3.x版本之后, 去参考别人的代码(基于Python 2.x写的教程),去利用print函数,打印输出内容时,结果却遇到print函数的语法错误: SyntaxErro ...

  7. python win10 连接hive_使用win10+python3.5+impyla 连接大数据平台hive表的步骤与问题解决...

    环境硬件配置及Hadoop,Hive版本 一.安装步骤 pip install pure-sasl Downloading https://pypi.tuna.tsinghua.edu.cn/pack ...

  8. 【数据挖掘结果】大数据企业的汇总信息

    名称 涉及领域 核心业务 投资机构 投资机构 投资机构 投资机构 金额(人民币) 轮次 美林数据 算法/分析/模型/可视化 数据分析,数据集成与管控,数据应用开发 未披露       新三板   惠辰 ...

  9. 大数据技术与实践学习笔记(1 of 3,from hitwh)

    大数据技术与实践 注意!由于文章图片是通过typora一键上传图片实现,该功能还存在bug,容易导致图片顺序混乱,文章开头提供了原版文章的 pdf 资源下载,推荐下载 pdf 后观看 文章目录 大数据 ...

  10. 当AI黑科技撞上大数据日:清华大学第四届大数据日成功举办

    春风送暖,万物芳华,清华大学迎来了她107岁生日:值此之际,清华-青岛数据科学研究院(以下简称"数据院")在2018年4月26日C楼前广场举办了主题为--"AI黑科技进清 ...

最新文章

  1. 【炼丹】深度学习多目标优化的多个loss应该如何权衡
  2. 使用mock解决测试中依赖第三方接口的问题
  3. 什么是Windows Service应用程序?(转)
  4. 「 每日一练,快乐水题 」2006. 差的绝对值为 K 的数对数目
  5. linux将变量保存生成txt,linux-将输出命令保存在变量中并写入for循环
  6. mysql 唯一约束 多字段_mysql多字段唯一约束
  7. 7招改善你的谷歌chrome浏览器
  8. pytorch —— Batch Normalization
  9. php 生成缩略图保存,PHP批量生成图片缩略图的方法
  10. 获取Http请求参数
  11. 团队项目第一阶段站立会议01
  12. c语言编译器安装到c盘吗,PE安装到C盘的详细教程
  13. Csv 之 简单解决使用 Excel 打开 csv 出现中文乱码现象
  14. Qt中关于emit和moc_*.cpp的自动生成
  15. html文本框后面紧挨着按钮,Word题目与答案
  16. 梦想起航商务工作PPT模板
  17. SpringBoot:运行项目是报错org.apache.ibatis.builder.IncompleteElementException:
  18. 2020蓝天杯论文评比系统_我所师生参加安徽省药理学会2020年学术年会
  19. 后台管理系统-------登录功能@zj-zhangjie
  20. 基于Python的算术编码的设计与实现

热门文章

  1. linux查看主机配置命令,如何查看Linux 硬件配置信息
  2. windows计算机查看里设置,windows10电脑配置怎么查看
  3. PNG图片背景透明-简单方法
  4. FPGA自动白平衡实现步骤详解
  5. 计算机毕业设计-基于springboot的社区志愿者管理系统
  6. 大O记法-BigO notation
  7. 桌面计算机地址栏在哪,win10系统工具栏怎样添加桌面和地址栏选项【图文教程】...
  8. 矩阵最简行阶梯型计算器_商人懂技术,谁也挡不住——首位将机械计算器推向全世界的人...
  9. MediaType介绍
  10. ERP系统实施一般方法与步骤