在之前词云可视化的代码中,我们已经大概介绍了词云生成的方法和步骤,这里我们就用了官网的一些例子来展示一下词云的美丽

如果想了解更多有趣的项目和小玩意,都可以来我这里哦通道

Single Word

Make a word cloud with a single word that’s repeated.

import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloudtext = "square"x, y = np.ogrid[:300, :300]# 定义一个圆心为(150,150),半径为130的圆,使得我们的mask形状是一个圆
mask = (x - 150) ** 2 + (y - 150) ** 2 > 130 ** 2
mask = 255 * mask.astype(int)# 设置repeat参数,Make a word cloud with a single word that's repeated.
wc = WordCloud(background_color="white", repeat=True, mask=mask)
# 生成词云
wc.generate(text)plt.axis("off")
# 采用双线性插值的方法显示颜色
plt.imshow(wc, interpolation="bilinear")
plt.show()

Create wordcloud with Arabic

我们现在生成阿拉伯文本
这里我们需要两个第三方库

  • bidi.algorithm
  • arabic_reshaper

所以我们首先要install这两个包
pip install python-bidi arabic_reshaper

import os
import codecs
from wordcloud import WordCloud
import arabic_reshaper
from bidi.algorithm import get_display# get data directory (using getcwd() is needed to support running example in generated IPython notebook) 得到当前地址
d = os.path.dirname(__file__) if "__file__" in locals() else os.getcwd()# Read the whole text.
f = codecs.open(os.path.join(d, 'arabicwords.txt'), 'r', 'utf-8')# Make text readable for a non-Arabic library like wordcloud
text = arabic_reshaper.reshape(f.read())
text = get_display(text)# Generate a word cloud image
wordcloud = WordCloud(font_path='fonts/NotoNaskhArabic/NotoNaskhArabic-Regular.ttf').generate(text)# Export to an image
wordcloud.to_file("arabic_example.png")

Minimal Example

Generating a square wordcloud from the US constitution using default arguments.

import osfrom os import path
from wordcloud import WordCloud# get data directory (using getcwd() is needed to support running example in generated IPython notebook)
d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()# Read the whole text.
text = open(path.join(d, 'constitution.txt')).read()# Generate a word cloud image
wordcloud = WordCloud().generate(text)# Display the generated image:
# the matplotlib way:
import matplotlib.pyplot as plt
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")# lower max_font_size
wordcloud = WordCloud(max_font_size=40).generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()# The pil way (if you don't have matplotlib)
# image = wordcloud.to_image()
# image.show()


Masked wordcloud

Using a mask you can generate wordclouds in arbitrary shapes.

from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import osfrom wordcloud import WordCloud, STOPWORDS# get data directory (using getcwd() is needed to support running example in generated IPython notebook)
d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()# Read the whole text.
text = open(path.join(d, 'alice.txt')).read()# read the mask image
# taken from
# http://www.stencilry.org/stencils/movies/alice%20in%20wonderland/255fk.jpg
alice_mask = np.array(Image.open(path.join(d, "alice_mask.png")))stopwords = set(STOPWORDS)
stopwords.add("said")wc = WordCloud(background_color="white", max_words=2000, mask=alice_mask,stopwords=stopwords, contour_width=3, contour_color='steelblue')# generate word cloud
wc.generate(text)# store to file
wc.to_file(path.join(d, "alice.png"))# show
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.figure()
plt.imshow(alice_mask, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis("off")
plt.show()


Using frequency

Using a dictionary of word frequency.

import multidict as multidictimport numpy as npimport os
import re
from PIL import Image
from os import path
from wordcloud import WordCloud
import matplotlib.pyplot as pltdef getFrequencyDictForText(sentence):fullTermsDict = multidict.MultiDict()tmpDict = {}# making dict for counting frequenciesfor text in sentence.split(" "):if re.match("a|the|an|the|to|in|for|of|or|by|with|is|on|that|be", text):continueval = tmpDict.get(text, 0)tmpDict[text.lower()] = val + 1for key in tmpDict:fullTermsDict.add(key, tmpDict[key])return fullTermsDictdef makeImage(text):alice_mask = np.array(Image.open("alice_mask.png"))wc = WordCloud(background_color="white", max_words=1000, mask=alice_mask)# generate word cloudwc.generate_from_frequencies(text)# showplt.imshow(wc, interpolation="bilinear")plt.axis("off")plt.show()# get data directory (using getcwd() is needed to support running example in generated IPython notebook)
d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()text = open(path.join(d, 'alice.txt'), encoding='utf-8')
text = text.read()
makeImage(getFrequencyDictForText(text))

Image-colored wordcloud

You can color a word-cloud by using an image-based coloring strategy implemented in ImageColorGenerator. It uses the average color of the region occupied by the word in a source image. You can combine this with masking - pure-white will be interpreted as ‘don’t occupy’ by the WordCloud object when passed as mask. If you want white as a legal color, you can just pass a different image to “mask”, but make sure the image shapes line up.

from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import osfrom wordcloud import WordCloud, STOPWORDS, ImageColorGenerator# get data directory (using getcwd() is needed to support running example in generated IPython notebook)
d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()# Read the whole text.
text = open(path.join(d, 'alice.txt')).read()# read the mask / color image taken from
# http://jirkavinse.deviantart.com/art/quot-Real-Life-quot-Alice-282261010
alice_coloring = np.array(Image.open(path.join(d, "alice_color.png")))
stopwords = set(STOPWORDS)
stopwords.add("said")wc = WordCloud(background_color="white", max_words=2000, mask=alice_coloring,stopwords=stopwords, max_font_size=40, random_state=42)
# generate word cloud
wc.generate(text)# create coloring from image
image_colors = ImageColorGenerator(alice_coloring)# show
fig, axes = plt.subplots(1, 3)
axes[0].imshow(wc, interpolation="bilinear")
# recolor wordcloud and show
# we could also give color_func=image_colors directly in the constructor
axes[1].imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")
axes[2].imshow(alice_coloring, cmap=plt.cm.gray, interpolation="bilinear")
for ax in axes:ax.set_axis_off()
plt.show()

Emoji Example

A simple example that shows how to include emoji. Note that this example does not seem to work on OS X, but does work correctly in Ubuntu.

There are 3 important steps to follow to include emoji: 1) Read the text input with io.open instead of the built in open. This ensures that it is loaded as UTF-8 2) Override the regular expression used by word cloud to parse the text into words. The default expression will only match ascii words 3) Override the default font to something that supports emoji. The included Symbola font includes black and white outlines for most emoji. There are currently issues with the PIL/Pillow library that seem to prevent it from functioning correctly on OS X (https://github.com/python-pillow/Pillow/issues/1774), so try this on ubuntu if you are having problems.

import io
import os
import string
from os import path
from wordcloud import WordCloud# get data directory (using getcwd() is needed to support running example in generated IPython notebook)
d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()# It is important to use io.open to correctly load the file as UTF-8
text = io.open(path.join(d, 'happy-emoji.txt')).read()# the regex used to detect words is a combination of normal words, ascii art, and emojis
# 2+ consecutive letters (also include apostrophes), e.x It's
normal_word = r"(?:\w[\w']+)"
# 2+ consecutive punctuations, e.x. :)
ascii_art = r"(?:[{punctuation}][{punctuation}]+)".format(punctuation=string.punctuation)
# a single character that is not alpha_numeric or other ascii printable
emoji = r"(?:[^\s])(?<![\w{ascii_printable}])".format(ascii_printable=string.printable)
regexp = r"{normal_word}|{ascii_art}|{emoji}".format(normal_word=normal_word, ascii_art=ascii_art,emoji=emoji)# Generate a word cloud image
# The Symbola font includes most emoji
font_path = path.join(d, 'fonts', 'Symbola', 'Symbola.ttf')
wc = WordCloud(font_path=font_path, regexp=regexp).generate(text)# Display the generated image:
# the matplotlib way:
import matplotlib.pyplot as plt
plt.imshow(wc)
plt.axis("off")
plt.show()

import numpy as np
from PIL import Image
from os import path
import matplotlib.pyplot as plt
import os
import randomfrom wordcloud import WordCloud, STOPWORDSdef grey_color_func(word, font_size, position, orientation, random_state=None,**kwargs):return "hsl(0, 0%%, %d%%)" % random.randint(60, 100)# get data directory (using getcwd() is needed to support running example in generated IPython notebook)
d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()# read the mask image taken from
# http://www.stencilry.org/stencils/movies/star%20wars/storm-trooper.gif
mask = np.array(Image.open(path.join(d, "stormtrooper_mask.png")))# movie script of "a new hope"
# http://www.imsdb.com/scripts/Star-Wars-A-New-Hope.html
# May the lawyers deem this fair use.
text = open(path.join(d, 'a_new_hope.txt')).read()# pre-processing the text a little bit
text = text.replace("HAN", "Han")
text = text.replace("LUKE'S", "Luke")# adding movie script specific stopwords
stopwords = set(STOPWORDS)
stopwords.add("int")
stopwords.add("ext")wc = WordCloud(max_words=1000, mask=mask, stopwords=stopwords, margin=10,random_state=1).generate(text)
# store default colored image
default_colors = wc.to_array()
plt.title("Custom colors")
plt.imshow(wc.recolor(color_func=grey_color_func, random_state=3),interpolation="bilinear")
wc.to_file("a_new_hope.png")
plt.axis("off")
plt.figure()
plt.title("Default colors")
plt.imshow(default_colors, interpolation="bilinear")
plt.axis("off")
plt.show()

Using custom colors

Using the recolor method and custom coloring functions.

Image-colored wordcloud with boundary map

A slightly more elaborate version of an image-colored wordcloud that also takes edges in the image into account. Recreating an image similar to the parrot example.

import os
from PIL import Imageimport numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import gaussian_gradient_magnitudefrom wordcloud import WordCloud, ImageColorGenerator# get data directory (using getcwd() is needed to support running example in generated IPython notebook)
d = os.path.dirname(__file__) if "__file__" in locals() else os.getcwd()# load wikipedia text on rainbow
text = open(os.path.join(d, 'wiki_rainbow.txt'), encoding="utf-8").read()# load image. This has been modified in gimp to be brighter and have more saturation.
parrot_color = np.array(Image.open(os.path.join(d, "parrot-by-jose-mari-gimenez2.jpg")))
# subsample by factor of 3. Very lossy but for a wordcloud we don't really care.
parrot_color = parrot_color[::3, ::3]# create mask  white is "masked out"
parrot_mask = parrot_color.copy()
parrot_mask[parrot_mask.sum(axis=2) == 0] = 255# some finesse: we enforce boundaries between colors so they get less washed out.
# For that we do some edge detection in the image
edges = np.mean([gaussian_gradient_magnitude(parrot_color[:, :, i] / 255., 2) for i in range(3)], axis=0)
parrot_mask[edges > .08] = 255# create wordcloud. A bit sluggish, you can subsample more strongly for quicker rendering
# relative_scaling=0 means the frequencies in the data are reflected less
# acurately but it makes a better picture
wc = WordCloud(max_words=2000, mask=parrot_mask, max_font_size=40, random_state=42, relative_scaling=0)# generate word cloud
wc.generate(text)
plt.imshow(wc)# create coloring from image
image_colors = ImageColorGenerator(parrot_color)
wc.recolor(color_func=image_colors)
plt.figure(figsize=(10, 10))
plt.imshow(wc, interpolation="bilinear")
wc.to_file("parrot_new.png")plt.figure(figsize=(10, 10))
plt.title("Original Image")
plt.imshow(parrot_color)plt.figure(figsize=(10, 10))
plt.title("Edge map")
plt.imshow(edges)
plt.show()




create wordcloud with chinese

Wordcloud is a very good tool, but if you want to create Chinese wordcloud only wordcloud is not enough. The file shows how to use wordcloud with Chinese. First, you need a Chinese word segmentation library jieba, jieba is now the most elegant the most popular Chinese word segmentation tool in python. You can use ‘PIP install jieba’. To install it. As you can see, at the same time using wordcloud with jieba very convenient

import jieba
# jieba.enable_parallel(4)
# Setting up parallel processes :4 ,but unable to run on Windows
from os import path
from imageio import imread
import matplotlib.pyplot as plt
import os
# jieba.load_userdict("txt\userdict.txt")
# add userdict by load_userdict()
from wordcloud import WordCloud, ImageColorGenerator# get data directory (using getcwd() is needed to support running example in generated IPython notebook)
d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()stopwords_path = d + '/wc_cn/stopwords_cn_en.txt'
# Chinese fonts must be set
font_path = d + '/fonts/SourceHanSerif/SourceHanSerifK-Light.otf'# the path to save worldcloud
imgname1 = d + '/wc_cn/LuXun.jpg'
imgname2 = d + '/wc_cn/LuXun_colored.jpg'
# read the mask / color image taken from
back_coloring = imread(path.join(d, d + '/wc_cn/LuXun_color.jpg'))# Read the whole text.
text = open(path.join(d, d + '/wc_cn/CalltoArms.txt'),encoding='utf-8').read()# if you want use wordCloud,you need it
# add userdict by add_word()
userdict_list = ['阿Q', '孔乙己', '单四嫂子']# The function for processing text with Jieba
def jieba_processing_txt(text):for word in userdict_list:jieba.add_word(word)mywordlist = []seg_list = jieba.cut(text, cut_all=False)liststr = "/ ".join(seg_list)with open(stopwords_path, encoding='utf-8') as f_stop:f_stop_text = f_stop.read()f_stop_seg_list = f_stop_text.splitlines()for myword in liststr.split('/'):if not (myword.strip() in f_stop_seg_list) and len(myword.strip()) > 1:mywordlist.append(myword)return ' '.join(mywordlist)wc = WordCloud(font_path=font_path, background_color="white", max_words=2000, mask=back_coloring,max_font_size=100, random_state=42, width=1000, height=860, margin=2,)wc.generate(jieba_processing_txt(text))# create coloring from image
image_colors_default = ImageColorGenerator(back_coloring)plt.figure()
# recolor wordcloud and show
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.show()# save wordcloud
wc.to_file(path.join(d, imgname1))# create coloring from image
image_colors_byImg = ImageColorGenerator(back_coloring)# show
# we could also give color_func=image_colors directly in the constructor
plt.imshow(wc.recolor(color_func=image_colors_byImg), interpolation="bilinear")
plt.axis("off")
plt.figure()
plt.imshow(back_coloring, interpolation="bilinear")
plt.axis("off")
plt.show()# save wordcloud
wc.to_file(path.join(d, imgname2))


Colored by Group Example

Generating a word cloud that assigns colors to words based on a predefined mapping from colors to words

from wordcloud import (WordCloud, get_single_color_func)
import matplotlib.pyplot as pltclass SimpleGroupedColorFunc(object):"""Create a color function object which assigns EXACT colorsto certain words based on the color to words mappingParameters----------color_to_words : dict(str -> list(str))A dictionary that maps a color to the list of words.default_color : strColor that will be assigned to a word that's not a memberof any value from color_to_words."""def __init__(self, color_to_words, default_color):self.word_to_color = {word: colorfor (color, words) in color_to_words.items()for word in words}self.default_color = default_colordef __call__(self, word, **kwargs):return self.word_to_color.get(word, self.default_color)class GroupedColorFunc(object):"""Create a color function object which assigns DIFFERENT SHADES ofspecified colors to certain words based on the color to words mapping.Uses wordcloud.get_single_color_funcParameters----------color_to_words : dict(str -> list(str))A dictionary that maps a color to the list of words.default_color : strColor that will be assigned to a word that's not a memberof any value from color_to_words."""def __init__(self, color_to_words, default_color):self.color_func_to_words = [(get_single_color_func(color), set(words))for (color, words) in color_to_words.items()]self.default_color_func = get_single_color_func(default_color)def get_color_func(self, word):"""Returns a single_color_func associated with the word"""try:color_func = next(color_func for (color_func, words) in self.color_func_to_wordsif word in words)except StopIteration:color_func = self.default_color_funcreturn color_funcdef __call__(self, word, **kwargs):return self.get_color_func(word)(word, **kwargs)text = """The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!"""# Since the text is small collocations are turned off and text is lower-cased
wc = WordCloud(collocations=False).generate(text.lower())color_to_words = {# words below will be colored with a green single color function'#00ff00': ['beautiful', 'explicit', 'simple', 'sparse','readability', 'rules', 'practicality','explicitly', 'one', 'now', 'easy', 'obvious', 'better'],# will be colored with a red single color function'red': ['ugly', 'implicit', 'complex', 'complicated', 'nested','dense', 'special', 'errors', 'silently', 'ambiguity','guess', 'hard']
}# Words that are not in any of the color_to_words values
# will be colored with a grey single color function
default_color = 'grey'# Create a color function with single tone
# grouped_color_func = SimpleGroupedColorFunc(color_to_words, default_color)# Create a color function with multiple tones
grouped_color_func = GroupedColorFunc(color_to_words, default_color)# Apply our color function
wc.recolor(color_func=grouped_color_func)# Plot
plt.figure()
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.show()


参考:https://amueller.github.io/word_cloud/index.html

多种好看好玩的词云例子Example相关推荐

  1. 使用爬虫抓取网易云音乐热门评论生成好玩的词云

    互联网爬虫是一个很有意思的技术,借由爬虫,我们可以做到很多好玩的事情--这其中就包括爬取评论. 词云就是个更好玩的技术,通过技术方法分析词语出现频率,生成可视化的图形,将文字内容用图形呈现,想想就很意 ...

  2. jieba库与好玩的词云的学习与应用实现

    经过了一些学习与一些十分有意义的锻(zhe)炼(mo),我决定尝试一手新接触的python第三方库 --jieba库! 这是一个极其优秀且强大的第三方库,可以对一个文本文件的所有内容进行识别,分词,甚 ...

  3. Jieba库使用和好玩的词云

    1.认识jieba库和安装. 主要功能包括分词,添加自定义词典,关键词提取,词性标注,并行分词,Tokenize:返回词语在原文的起始位置,命令行分词等功能. 代码对 Python 2/3 均兼容 全 ...

  4. 关于《后浪》的B站弹幕分析总结(三)——怎么制作好看的交互式词云

    目录 一.对分词做词频统计 二.使用wordcloud展示词云 二.使用pyecharts绘制词云 三.使用词云制作工具 与本文相关内容链接: B站视频<[数说弹幕]我不小心看了后浪弹幕> ...

  5. Python实现的《芳华》WordCloud词云+LDA主题模型

    WordCloud 词云 + LDA 主题模型 何小嫚&刘峰原图.jpg 人物词云效果.jpg 电影<芳华>在春节重映了一波,加上之前的热映,最终取得了 14 亿票房的好成绩.严歌 ...

  6. 用WordCloud词云+LDA主题模型,带你读一读《芳华》(python实现)

    电影<芳华>在春节重映了一波,加上之前的热映,最终取得了14亿票房的好成绩.严歌苓的原著也因此被更多的人细细品读.用文本分析的一些技术肢解小说向来是自然语言处理领域的一大噱头,这次当然也不 ...

  7. 在线词云制作tagxedo

    最近在用python制作词云的时候发现了一个更加方便快捷很好玩的词云制作网站 http://www.tagxedo.com/app.html 所以今天就来大致介绍下是怎么使用的 1.先来介绍下tagx ...

  8. 用空间说说做词云,有趣好玩,颜值爆表

    用空间说说做词云,有趣好玩,颜值爆表 哈喽大家好,我跟大家分享Python的一个有趣玩法:用QQ空间的说说做词云. 材料准备 首先我们准备好看的血小板一只. 然后准备好Python3. 准备selen ...

  9. 如何用Python制作简单又好看的词云?来瞅瞅吧~

    今天教大家如何用Python制作简单的词云,文中有非常详细的介绍及代码示例,对于正在学习Python的小伙伴们也能看的懂,需要的小伙伴可以参考下. 一.准备 词云制作所必需的三个包: 安装:pip i ...

最新文章

  1. javascript遍历DOM结构和对象结构
  2. python linux命令-Python之路【第三篇】:Linux常用命令
  3. 蓝桥杯单片机练习_第九届彩灯控制器
  4. ICLR 2021投稿中值得一读的NLP相关论文
  5. 使用while 循环实现输出 1, 2, 3, 4, 5, 7, 8, 9, 11, 12(提示:输出结果为一行,没有6和10,12后面没有逗号)
  6. Flexbox弹性盒模型
  7. BZOJ #3166. [Heoi2013]Alo(可持久化trie树+set)
  8. 程序员写代码为什么需要 review?
  9. Maven里头的pom.xml配置详解
  10. 小程序 长按复制文本
  11. PHP之JWT接口鉴权(二) 自定义错误异常
  12. cher怎么翻译中文_中文翻译法语收费标准是怎么定的
  13. MySql主从复制 Master-slaver
  14. 什么是PO设计(封装)?
  15. Python编程——数字
  16. 最全大数据学习资源整理
  17. 2022年汽车配件市场分析
  18. 未来教师是否会被计算机取代,未来老师会被计算机所取代吗?Will The Teacher be Replaced by Computers in the Future?...
  19. 史上物理学最强的科普!
  20. 关于微信第三方redirect_uri 参数错误录(10003)

热门文章

  1. TencentOS-Tiny之GCC
  2. 万众瞩目--腾讯云数据库TDSQL第一届征文正式大赛开启
  3. 前端css之 浮动 自学日记
  4. Design contains shelved or modified (but not repoured) polygons. The result of DRC is not correct.
  5. 东方博宜OJ 1863 - 【入门】特殊的数字四十
  6. 网站访问流程及原理分析
  7. hive-Fetch抓取
  8. Java8 通过foreach 遍历List,同时输出下标
  9. Python自动化完成tb喵币任务V2.0
  10. 正圆锥体空间方程_数值模拟偏微分方程的三种方法:FDM、FEM及FVM