文章目录

数据可视化第二版-03部分-07章-局部与整体
- 总结
- 可视化视角-局部与整体
- 代码实现
- - 韦恩图
  - - venn3
    - venn2
  - 饼图
  - 环形图
  - 旭日图
  - 园堆积图
  - 矩形树图
  - 漏斗图
  - - 基于matplot'lib的漏斗图
    - 基于pyecharts的漏斗图
    - 基于pyecharts的治愈率漏斗图
  - 虚拟环境相关命令汇集
- 教材截图

数据可视化第二版-03部分-07章-局部与整体

光荣的三八节到了，来个充满理想年代的图片。

总结

本系列博客为基于《数据可视化第二版》一书的教学资源博客。本文主要是第07章-局部与整体可视化的案例相关。

可视化视角-局部与整体

代码实现

韦恩图

可参考：https://www.jb51.net/article/238729.htm
https://pypi.org/project/matplotlib-venn/
python中Matplotlib并没有现成的函数可直接绘制venn图，不过已经有前辈基于matplotlib.patches及matplotlib.path开发了两个轮子。
安装matplotlib_venn：

pip install matplotlib_venn -i https://pypi.tuna.tsinghua.edu.cn/simple

该包提供了四个主要函数:venn2、venn2_circles、venn3和venn3_circles。

venn3

韦恩图1


from matplotlib import pyplot as plt
from matplotlib_venn import venn3# 1
plt.figure(figsize=(4, 4))  # 设置画布大小
plt.title("韦恩图示例")
plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号
# (Abc, aBc, ABc, abC, AbC, aBC, ABC)
v = venn3(subsets=(20, 10, 5, 15, 5, 10, 5),set_labels=('逻辑学', '艺术修养', '大学语文'),set_colors=('magenta', 'cyan', 'b'))
plt.show()

输出为：

from matplotlib import pyplot as plt
from matplotlib_venn import venn3# 2
plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号plt.figure(figsize=(4, 4))  # 设置画布大小
v = venn3(subsets=(20, 10, 5, 15, 5, 10, 5), set_labels=('逻辑学', '艺术修养', '大学语文'), set_colors=('magenta', 'cyan', 'b'))
plt.show()

venn2

#导入依赖packages
import matplotlib.pyplot as plt
from matplotlib_venn import venn2,venn2_circles# subsets参数
#绘图数据的格式，以下5种方式均可以，注意异同
# (Ab, aB, AB)
subset = [[{1,2,3},{1,2,4}],#列表list(集合1，集合2)({1,2,3},{1,2,4}),#元组tuple(集合1，集合2){'10': 1, '01': 1, '11': 2},#字典dict(A独有，B独有，AB共有)(3, 3, 2),####元组tuple（A有，B有，AB共有），注意和其它几种方式的异同点[3,3,2]#列表list（A有，B有，AB共有）           ]
for i in subset:my_dpi=100plt.figure(figsize=(500/my_dpi, 500/my_dpi), dpi=my_dpi) # #控制图尺寸的同时，使图高分辨率（高清）显示g=venn2(subsets=i, #默认数据绘制venn图，只需传入绘图数据set_colors=("#098154","#c72e29"),#设置圈的颜色，中间颜色不能修改alpha=0.6,#透明度normalize_to=1.0,#venn图占据figure的比例，1.0为占满)g=venn2_circles(subsets = i, linestyle='--', linewidth=0.8, color="black"#外框线型、线宽、颜色)plt.title('subsets=%s'%str(i))plt.show()

饼图

plt.pie(x, explode=None, labels=None, colors=None, autopct=None, pctdistance=0.6, shadow=False, labeldistance=1.1,startangle=0, radius=1, counterclock=True, wedgeprops=None, textprops=None, center=0, 0, frame=False, rotatelabels=False, *, normalize=None, data=None)

x即每个扇形的占比的序列或数组
explode如果不是None，则是一个len(x)长度的数组，指定每一块的突出程度；突出显示，设置每一块分割出来的间隙大小
labels为每个扇形提供标签的字符串序列
colors为每个扇形提供颜色的字符串序列
autopct如果它是一个格式字符串，标签将是fmt % pct。如果它是一个函数，它将被调用。
shadow阴影
startangle从x轴逆时针旋转,饼的旋转角度
pctdistance, default: 0.6每个饼片的中心与由autopct生成的文本的开头之间距离与半径的比率，大于1的话会显示在圆外
labeldistance, default: 1.1饼状图标签绘制时的径向距离（我认为这个也与8类似是个比率）。如果设置为None，则不绘制标签，而是存储在图例()中使用。

# -*- coding:UTF-8 -*-from matplotlib import pyplot as pltplt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams["axes.unicode_minus"] = False  # 用来正常显示负号# 1
labels = '法语', '意大利语', '德语'  # 建立不同类别
sizes = [60, 40, 30]  # 不同类别对应的数量fig = plt.figure(figsize=(4, 4))
ax1 = fig.add_subplot(111)
color = ['tomato', 'Gold', 'DeepSkyBlue']
ax1.pie(sizes,labels=labels,labeldistance=0.5,colors=color,textprops=dict(color='black'),  # 字体颜色autopct='%1.1f%%',  # 显示数值标签pctdistance=0.7)  # 数值标签到中心点的距离
ax1.axis('equal')
plt.title('小语种学习人数饼图')
plt.show()

from matplotlib import pyplot as pltplt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams["axes.unicode_minus"] = False  # 用来正常显示负号# 2
labels = 'Python组', 'Java组', 'C组', 'Go组'
sizes = [25, 45, 30, 10]
fig = plt.figure(figsize=(4, 4))
explode = (0.1, 0, 0, 0) # 分割扇形
ax2 = fig.add_subplot(111)
ax2.pie(sizes, explode=explode,  # 分隔扇形labels=labels, autopct='%1.1f%%')
plt.title('不同种类程序语言使用分割饼图')
plt.show()

from matplotlib import pyplot as pltplt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams["axes.unicode_minus"] = False  # 用来正常显示负号# 3
labels = '男性', '女性'  # 建立不同类别
sizes = [30, 70]  # 不同类别对应的数量fig = plt.figure(figsize=(4, 4))
ax1 = fig.add_subplot(111)
color = ['RoyalBlue', 'DeepPink']
ax1.pie(sizes,labels=labels,labeldistance=0.5,colors=color,textprops=dict(color='white'),  # 字体颜色autopct='%1.1f%%',  # 显示数值标签pctdistance=0.7)  # 数值标签到中心点的距离
ax1.axis('equal')
plt.title('某款应用程序使用者性别饼图')
plt.show()

环形图

参考：
[python] 基于matplotlib实现圆环图的绘制
可以重点看下这个链接。

wedgeprops中通过width参数设定内部圆的半径，edgecolor设置内部圆的颜色。

import matplotlib.pyplot as pltplt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams["axes.unicode_minus"] = False  # 用来正常显示负号# 1
labels = ['四川', '河北', '北京', '重庆', '天津']  # 设定类别
A1 = [36980.22, 35964.00, 28000.40, 19500.00, 18595.38]
color = ['yellow', 'cyan', 'lightblue', 'lightgreen', 'pink']
wedges1, texts1, autotexts1 = plt.pie(A1, autopct='%3.1f%%', radius=1, pctdistance=0.8,colors=color, startangle=180, textprops=dict(color='black'),wedgeprops=dict(width=0.4, edgecolor='w'))
plt.legend(wedges1, labels, fontsize=12, title='地区', loc='center right',bbox_to_anchor=(1, 0, 0.3, 1))
plt.title('2017年四个地区生产总值')
plt.show()

import matplotlib.pyplot as pltplt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams["axes.unicode_minus"] = False  # 用来正常显示负号# 2
labels = ['日用品', '餐饮', '交通', '储蓄', '其他']  # 设定类别
A1 = [1000, 1500, 500, 2000, 1000]
color = ['lightpink', 'hotpink', 'MediumPurple', 'Lavender', 'seashell']
wedges1, texts1, autotexts1 = plt.pie(A1, autopct='%3.1f%%', radius=1, pctdistance=0.8,colors=color, startangle=180, textprops=dict(color='black'),wedgeprops=dict(width=0.4, edgecolor='w'))
plt.legend(wedges1, labels, fontsize=12, title='消费种类', loc='center right',bbox_to_anchor=(1.1, 0, 0.3, 1))
plt.title('个人消费环形图')
plt.show()

旭日图

参考：
https://pyecharts.org/#/zh-cn/intro
https://pyecharts.org/#/zh-cn/basic_charts?id=sunburst%ef%bc%9a%e6%97%ad%e6%97%a5%e5%9b%be

from pyecharts.charts import Sunburst
from pyecharts import options as optsdata = [opts.SunburstItem(name="A公司",children=[opts.SunburstItem(name="男装",value=15,children=[opts.SunburstItem(name="上装", value=8),opts.SunburstItem(name="下装", value=7)]),opts.SunburstItem(name="女装",value=10,children=[opts.SunburstItem(name="衬衫", value=5),opts.SunburstItem(name="裙子", value=1),opts.SunburstItem(name="裤子", value=4),],),],),opts.SunburstItem(name="B公司",children=[opts.SunburstItem(name="鞋子",children=[opts.SunburstItem(name="凉鞋", value=1),opts.SunburstItem(name="运动鞋", value=2),],)],),]sunburst = (Sunburst(init_opts=opts.InitOpts(width="600px", height="600px")).add(series_name="", data_pair=data, radius=[0, "90%"]).set_global_opts(title_opts=opts.TitleOpts(title="旭日图示例")).set_series_opts(label_opts=opts.LabelOpts(formatter="{b}")).render("旭日图.html")
)import os
os.system("旭日图.html")

园堆积图

参考：https://blog.csdn.net/LuohenYJ/article/details/119006870

pip install circlify==0.15.0

代码：

# 圆堆积图
import circlify
import matplotlib.pyplot as pltplt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams["axes.unicode_minus"] = False  # 用来正常显示负号data = [{'id': '中国', 'datum': 1015986, 'children': [{'id': "广东", 'datum': 110760.94,'children': [{'id': "深圳", 'datum': 27670.24},{'id': "广州", 'datum': 25019.11}]},{'id': "上海", 'datum': 38700.58},{'id': "北京", 'datum': 36102.6},{'id': "重庆", 'datum': 25002.79},{'id': "浙江", 'datum': 64613,'children': [{'id': "杭州", 'datum': 16106},{'id': "宁波", 'datum': 12408.7},{'id': "温州", 'datum': 6870.9}]}
]}]# 计算圆圈位置
circles = circlify.circlify(data,show_enclosure=False,target_enclosure=circlify.Circle(x=0, y=0, r=1)
)# 设置画布
fig, ax = plt.subplots(figsize=(4, 4))# 设置标题
ax.set_title('2020年中国部分地市GDP')
ax.axis('off')lim = max(max(abs(circle.x) + circle.r,abs(circle.y) + circle.r,)for circle in circles
)
plt.xlim(-lim, lim)
plt.ylim(-lim, lim)
# 画最高级的圆圈：
for circle in circles:if circle.level != 1:continuex, y, r = circleax.add_patch(plt.Circle((x, y), r, alpha=0.5, linewidth=2, color="yellow"))# 画第二级的圆圈:
for circle in circles:if circle.level != 2:continuex, y, r = circleax.add_patch(plt.Circle((x, y), r, alpha=0.5, linewidth=2, color="lightblue"))# 画第三级的圆圈:
for circle in circles:if circle.level != 3:continuex, y, r = circlelabel = circle.ex["id"]ax.add_patch(plt.Circle((x, y), r, alpha=0.5, linewidth=2, color="green"))plt.annotate(label, (x, y), ha='center', color="black")# 设置标签
for circle in circles:if circle.level != 2:continuex, y, r = circlelabel = circle.ex["id"]plt.annotate(label, (x, y), va='top', ha='center', bbox=dict(edgecolor='blue', pad=.5),fontsize=8)plt.show()

输出为：

voronoi一般指泰森多边形。泰森多边形又叫冯洛诺伊图（Voronoi diagram），得名于Georgy Voronoi，是一组由连接两邻点线段的垂直平分线组成的连续多边形。

矩形树图

squarify一种坐标系，包括原点（x和y）和宽度/高度（dx和dy）的值。
从最大值到最小值排序并规范化为总面积（即dx*dy）的正值列表。
将数据生成基于matplotlib的树状图可视化

pip install squarify

# 导入第三方包
import matplotlib.pyplot as plt
import squarify# 中文及负号处理办法
plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams["axes.unicode_minus"] = False  # 用来正常显示负号# 创建数据
name = ['俄罗斯', '加拿大', '中国', '美国', '巴西', '澳大利亚','印度', '阿根廷', '哈萨克斯坦', '苏丹', '阿尔及利亚']
income = [1707.50, 997.1, 960.1, 936.4, 854.7, 774.1, 328.8, 278, 271.1, 250.6, 238.2]# 绘图
colors = ['steelblue', '#9999ff', 'red', 'indianred','green', 'yellow', 'orange', 'lightblue', 'gold', 'lightgreen', 'pink']
plot = squarify.plot(sizes=income,  # 指定绘图数据label=name,  # 指定标签color=colors,  # 指定自定义颜色alpha=0.6,  # 指定透明度value=income,  # 添加数值标签edgecolor='white',  # 设置边界框为白色linewidth=3  # 设置边框宽度为3)
# 设置标签大小
plt.rc('font', size=9)
# 设置标题大小
plot.set_title('世界国土面积情况(单位：万平方公里）', fontdict={'fontsize': 16})# 去除坐标轴
plt.axis('off')
# 去除上边框和右边框刻度
plt.tick_params(top='off', right='off')
# 显示图形
plt.show()

# 导入第三方包
import matplotlib.pyplot as plt
import squarify# 中文及负号处理办法
plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams["axes.unicode_minus"] = False  # 用来正常显示负号# 2# # 创建数据
name = ['美国', '加拿大', '中国', '德国', '意大利', '英国', '土耳其', '西班牙']
income = [529951, 23318, 83482, 126266, 152271, 79885, 52167, 163027]# 绘图
colors = ['royalblue', 'cyan', 'red', 'violet', 'green', 'yellow', 'orange', 'lightblue']
plot = squarify.plot(sizes=income,  # 指定绘图数据label=name,  # 指定标签color=colors,  # 指定自定义颜色alpha=0.5,  # 指定透明度value=income,  # 添加数值标签linewidth=3  # 设置边框宽度为3)
# 设置标签大小
plt.rc('font', size=9)
# 设置标题大小
plot.set_title('截止2020年4月12日的新冠肺炎感染人数', fontdict={'fontsize': 16})# 去除坐标轴
plt.axis('off')
# 去除上边框和右边框刻度
plt.tick_params(top='off', right='off')
# 显示图形
plt.show()

漏斗图

基于matplot’lib的漏斗图

# 漏斗图1import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
# Polygon()可以用来传入按顺序组织的多边形顶点，从而生成出多边形
from matplotlib.collections import PatchCollectionplt.style.use('seaborn-dark')  # 设置主题
plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams["axes.unicode_minus"] = False  # 用来正常显示负号data = [135043, 113413, 74909, 10366, 9018, 4151]
phase = ['总访客数量  ', '活跃访客数量', '注册用户数量', '预定用户数量', '支付用户数量', '复购用户数量']
visitor_num = 135043
data1 = [visitor_num / 2 - i / 2 for i in data]
data2 = [i + j for i, j in zip(data, data1)]
color_list = ['#5c1d1d', '#892c2c', '#994a4a', '#c56161', '#d48989', '#e2b0b0']  # 柱子颜色fig, ax = plt.subplots(figsize=(16, 9), facecolor='#f4f4f4')
ax.barh(phase[::-1], data2[::-1], color=color_list, height=0.7)  # 柱宽设置为0.7
ax.barh(phase[::-1], data1[::-1], color='#f4f4f4', height=0.7)  # 设置成背景同色
ax.axis('off')polygons = []
for i in range(len(data)):# 阶段ax.text(0,  # 坐标i,  # 高度phase[::-1][i],  # 文本color='black', alpha=0.8, size=16, ha="right")# 数量ax.text(data2[0] / 2,i,str(data[::-1][i]) + '(' + str(round(data[::-1][i] / data[0] * 100, 1)) + '%)',color='black', alpha=0.8, size=18, ha="center")if i < 5:# 比例ax.text(data2[0] / 2,4.4 - i,str(round(data[i + 1] / data[i], 3) * 100) + '%',color='black', alpha=0.8, size=16, ha="center")# 绘制多边形polygons.append(Polygon(xy=np.array([(data1[i + 1], 4 + 0.35 - i),# 因为柱状图的宽度设置成了0.7，所以一半便是0.35(data2[i + 1], 4 + 0.35 - i),(data2[i], 5 - 0.35 - i),(data1[i], 5 - 0.35 - i)])))# 使用add_collection与PatchCollection来向Axes上添加多边形
ax.add_collection(PatchCollection(polygons,facecolor='#e2b0b0',alpha=0.8));
plt.title("商品购买情况分析", fontsize=18)
plt.show()

基于pyecharts的漏斗图

pip install openpyxl

# 漏斗图2
import pandas as pd
import os
os.chdir(os.path.dirname(os.path.realpath(__file__)))
data = pd.read_excel('漏斗图.xlsx', 'Sheet1')
attrs = data['环节'].tolist()
attr_value = data['人数'].tolist()
from pyecharts import options as opts
from pyecharts.charts import Funnelc = (Funnel().add("商品", [list(z) for z in zip(attrs, attr_value)],label_opts=opts.LabelOpts(position="inside"),).set_global_opts(title_opts=opts.TitleOpts(title="漏斗图示例")).render("漏斗图.html")
)import osos.system("漏斗图.html")

输出为：

基于pyecharts的治愈率漏斗图

from pyecharts import options as opts
from pyecharts.charts import Funneldata = [13500, 10000, 9000, 7000]
phase = ['检查', '阳性', '手术治疗', '治愈']c = (Funnel().add("阶段", [list(z) for z in zip(phase, data)]).set_global_opts(title_opts=opts.TitleOpts(title="漏斗图")).render("治愈率漏斗图.html")
)import osos.system("治愈率漏斗图.html")

虚拟环境相关命令汇集

激活conda

d:\ProgramData\Anaconda3\Scripts\activate

创建python版本的命令

conda create -n py10 python=3.10

查看当前的python版本

conda env list

切换到指定的python版本

conda activate py10

激活虚拟环境后，安装python虚拟环境

python -m venv venv202302

切换路径到文件当前路径

import os
print(os.getcwd(),"-----------------")
# os.chdir("./")
os.chdir(os.path.dirname(os.path.realpath(__file__)))
print(os.getcwd(),"-----------------")

临时使用阿里镜像安装python包

pip install 包名 -i https://mirrors.aliyun.com/pypi/simple/

持久使用阿里镜像安装python包

pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/

教材截图