利用python进行数据分析

第八章：绘图和可视化

pandas绘图工具

22.5 Plot Formatting（绘图格式）

22.5.1 Controlling the Legend（图例管理）

You may set the legend argument to False to hide the legend, which is shown by default.
可通过legend=False这个参数选择不显示图例，默认显示
>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> df = pd.DataFrame(np.random.randn(1000, 4), index=range(1,1001), columns=list('ABCD'))
>>> df = df.cumsum()
>>> df.plot(legend=False)

22.5.2 Scales

You may pass logy to get a log-scale Y axis.
可通过logy=True参数使用对数标尺作为图表的Y轴，Y轴显示10的多少次幂
>>> ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000',periods=1000))
>>> ts = np.exp(ts.cumsum())
>>> ts.plot(logy=True)

22.5.3 Plotting on a Secondary Y-axis（绘制双Y轴图形）

To plot data on a secondary y-axis, use the secondary_y keyword:
secondary_y=True参数绘制双Y轴图形
>>> df.A.plot()
>>> df.B.plot(secondary_y=True, style='g')
双Y轴图形中，应存在两个绘图，其中A图用左Y轴标注，B图用右Y轴标注，二者共用一个X轴
To plot some columns in a DataFrame, give the column names to the secondary_y keyword:
对于DataFrame对象，可定义其中的那一列（columns）用右Y轴标注
>>> plt.figure()
>>> ax = df.plot(secondary_y=['A', 'B']) #定义column A B使用右Y轴
>>> ax.set_ylabel('CD scale') #设定左Y轴标签为CD scale
>>> ax.right_ax.set_ylabel('AB scale') #设定右Y轴标签为AB scale
DataFrame对象这个df存在4列，ABCD，并设定AB列使用右Y轴，并将df.plot()定义为另一个对象。
之后这个ax对象进行Y轴标签定义，同时ax.right_ax表示设定为右Y轴
绘图完成后会在图例当中显示哪个column是标注为右Y轴的
—— A(right)
—— B(right)
—— C
—— D
Note that the columns plotted on the secondary y-axis is automatically marked with “(right)” in the legend. To turn off the automatic marking, use the mark_right=False keyword:
如果想关闭这个标签使用mark_right=False参数
>>> df.plot(secondary_y=['A', 'B'], mark_right=False)

22.5.4 Suppressing Tick Resolution Adjustment

对于X轴的时间标签，pandas多数情况下不能判断X轴的出现频率，所以可以使用x-axis tick labelling（X轴加标签的方法）来全部显示X轴内容
Using the x_compat parameter, you can suppress this behavior:
设定参数就是x_compat=True
>>> ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000',periods=1000))
>>> df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list('ABCD'))
>>> df = df.cumsum()
>>> plt.figure()
>>> df.A.plot(x_compat=True)
If you have more than one plot that needs to be suppressed, the use method in pandas.plotting. plot_params can be used in a with statement:
>>> plt.figure()
>>> with pd.plotting.plot_params.use('x_compat', True): #该用法不是很懂
... ····df.A.plot(color='r')
... ····df.B.plot(color='g')
... ····df.C.plot(color='b')

22.5.6 Subplots（多图绘制）

Each Series in a DataFrame can be plotted on a different axis with the subplots keyword:
DataFrame对象当中的每个column都能绘制出单独的图，需要加入subplots=True参数
>>> df.plot(subplots=True, figsize=(6, 6))

22.5.7 Using Layout and Targeting Multiple Axes

The layout of subplots can be specified by layout keyword. It can accept (rows, columns). The layout keyword can be used in hist and boxplot also. If input is invalid, ValueError will be raised.
layout参数可以将subplots排列成想要的行数和列数，可应用到柱状图和箱线图，如果输入无效则会报出ValueError错误
The number of axes which can be contained by rows x columns specified by layout must be larger than the number of required subplots. If layout can contain more axes than required, blank axes are not drawn. Similar to a numpy array’s reshape method, you can use -1 for one dimension to automatically calculate the number of rows or columns needed, given the other.
>>> df.plot(subplots=True, layout=(2, 3), figsize=(6, 6), sharex=False) #4个图排列2行3列
>>> df.plot(subplots=True, layout=(2, 2), figsize=(6, 6), sharex=False) #4个图排列2行2列
>>> df.plot(subplots=True, layout=(2, -1), figsize=(6, 6), sharex=False) #规定2行但不规定列数，根据DataFrame当中column的数量自行判定行列规则
更加复杂的图，绘制16张图，对角线是4个column及其反向图
Also, you can pass multiple axes created beforehand as list-like via ax keyword. This allows to use more complicated layout. The passed axes must be the same number as the subplots being drawn.
When multiple axes are passed via ax keyword, layout, sharex and sharey keywords don’t affect to the output. You should explicitly pass sharex=False and sharey=False, otherwise you will see a warning.
>>> fig, axes = plt.subplots(4, 4, figsize=(6, 6))
>>> plt.subplots_adjust(wspace=0.5, hspace=0.5)
>>> target1 = [axes[0][0], axes[1][1], axes[2][2], axes[3][3]]
>>> target2 = [axes[3][0], axes[2][1], axes[1][2], axes[0][3]]
>>> df.plot(subplots=True, ax=target1, legend=False, sharex=False, sharey=False)
>>> (-df).plot(subplots=True, ax=target2, legend=False, sharex=False, sharey=False)
Another option is passing an ax argument to Series.plot() to plot on a particular axis:
将4个图横向纵向两两排列，并将图例去掉改成图片标题，4个图用相同的颜色的线条表示
>>> fig, axes = plt.subplots(nrows=2, ncols=2)
>>> df['A'].plot(ax=axes[0,0]); axes[0,0].set_title('A')
>>> df['B'].plot(ax=axes[0,1]); axes[0,1].set_title('B')
>>> df['C'].plot(ax=axes[1,0]); axes[1,0].set_title('C')
>>> df['D'].plot(ax=axes[1,1]); axes[1,1].set_title('D')

22.5.8 Plotting With Error Bars（添加误差棒）

Horizontal and vertical errorbars can be supplied to the xerr and yerr keyword arguments to plot(). The error values can be specified using a variety of formats.
水平或垂直误差棒可以在plot()函数中通过xerr和yerr两个参数进行添加，误差值可以有一下存在形式
• As a DataFrame or dict of errors with column names matching the columns attribute of the plotting DataFrame or matching the name attribute of the Series; DataFrame对象可以使用字典字典的键与DataFrame的column的名称相对应
• As a str indicating which of the columns of plotting DataFrame contain the error values; 字符串可以表明DataFrame的哪一列包含误差值
• As raw values (list, tuple, or np.ndarray). Must be the same length as the plotting DataFrame/Series; 作为初始值对象(list, tuple, or np.ndarray)，其序列长度要和DataFrame列的长度相同
Asymmetrical error bars are also supported, however raw error values must be provided in this case. For a M length Series, a Mx2 array should be provided indicating lower and upper (or left and right) errors. For a MxN DataFrame, asymmetrical errors should be in a Mx2xN array.
# Generate the data
>>> ix3 = pd.MultiIndex.from_arrays([['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], ['foo', 'foo', 'bar', 'bar', 'foo', 'foo', 'bar', 'bar']], names=['letter', 'word'])
>>> df3 = pd.DataFrame({'data1': [3, 2, 4, 3, 2, 4, 3, 2], 'data2': [6, 5, 7, 5, 4, 5, 6, 5]}, index=ix3)
# Group by index labels and take the means and standard deviations for each group
>>> gp3 = df3.groupby(level=('letter', 'word'))
>>> means = gp3.mean()
>>> errors = gp3.std()
>>> means.plot.bar(yerr=errors, ax=ax)

22.5.9 Plotting Tables

Plotting with matplotlib table is now supported in DataFrame.plot() and Series.plot() with a table keyword. The table keyword can accept bool, DataFrame or Series. The simple way to draw a table is to specify table=True. Data will be transposed to meet matplotlib’s default layout.
如果想绘制matplotlib数据表格，需要加入table参数，table=True
>>> plt.show()
>>> fig, ax = plt.subplots(1, 1)
>>> df = pd.DataFrame(np.random.rand(5, 3), columns=['a', 'b', 'c'])
>>> ax.get_xaxis().set_visible(False) # Hide Ticks隐藏X轴数值
>>> df.plot(table=True, ax=ax)
在表格下方显示X轴各个位置column的数值
>>> fig, ax = plt.subplots(1, 1)
>>> ax.get_xaxis().set_visible(False)
>>> df.plot(table=np.round(df.T, 2), ax=ax)
在表格下方显示X轴各个位置column的数值，但这个值是编辑一个二维数组，并保留两位小数
Finally, there is a helper function pandas.plotting.table to create a table from DataFrame and Series, and add it to an matplotlib.Axes. This function can accept keywords which matplotlib table has.
>>> from pandas.plotting import table
>>> fig, ax = plt.subplots(1, 1)
>>> table(ax, np.round(df.describe(), 2), loc='upper right', colWidths=[0.2, 0.2, 0.2])
>>> df.plot(ax=ax, ylim=(0, 2), legend=None)

22.5.10 Colormaps

DataFrame plotting supports the use of the colormap= argument, which accepts either a Matplotlib colormap or a string that is a name of a colormap registered with Matplotlib. A visualization of the default matplotlib colormaps is available here.
绘图当中常常要使用多种绘图颜色，所以使用colormap这个参数，colormap这个类当中的函数用法见http://matplotlib.org/api/cm_api.html，colormap所能够用到的颜色见http://scipy.github.io/old-wiki/pages/Cookbook/Matplotlib/Show_colormaps。
>>> df = pd.DataFrame(np.random.randn(1000, 10), index=range(1,1001))
>>> df = df.cumsum()
>>> plt.figure()
>>> df.plot(colormap='cubehelix')
或者df.plot(colormap=’gist_rainbow’) 或者 df.plot(colormap=’prism’) 如果df的column很多，建议选择这个gist_rainbow来填充线条颜色

Colormaps can also be used other plot types, like bar charts #Colormap也可以使用其他的绘图类型
>>> dd = pd.DataFrame(np.random.randn(10, 10)).applymap(abs)
>>> dd = dd.cumsum()
>>> plt.figure()
>>> dd.plot.bar(colormap='Greens')

Parallel coordinates charts #平行坐标轴绘图法
>>> from pandas.plotting import parallel_coordinates
>>> plt.figure()
>>> parallel_coordinates(data, 'Name', colormap='gist_rainbow')

Andrews curves charts #安德鲁斯曲线
>>> from pandas.plotting import andrews_curves
>>> plt.figure()
>>> andrews_curves(data, 'Name', colormap='winter')

22.6 Plotting directly with matplotlib（绘制填充线）

>>> price = pd.Series(np.random.randn(150).cumsum(), index=pd.date_range('2000-1-1', periods=150, freq='B'))
>>> ma = price.rolling(20).mean()
>>> mstd = price.rolling(20).std()
>>> plt.figure()
>>> plt.plot(price.index, price, 'k')
>>> plt.plot(ma.index, ma, 'b')
>>> plt.fill_between(mstd.index, ma-2*mstd, ma+2*mstd, color='b', alpha=0.2)
对plt进行填充，蓝线是ma值，蓝色的条状是ma-2*mstd和ma+2*mstd分布两侧
22.7 Trellis plotting interface
具体绘图参见https://github.com/mwaskom/seaborn（seaborn）和http://pandas.pydata.org/pandas-docs/version/0.18.1/visualization.html（pandas的visualization网站章节）

python：利用pandas进行绘图（总结）绘图格式相关推荐

python在excel中的应用-Python利用pandas处理Excel数据的应用详解
最近迷上了高效处理数据的pandas,其实这个是用来做数据分析的,如果你是做大数据分析和测试的,那么这个是非常的有用的!!但是其实我们平时在做自动化测试的时候,如果涉及到数据的读取和存储,那么而利用p ...
python与excel的应用-Python利用pandas处理Excel数据的应用
最近迷上了高效处理数据的pandas,其实这个是用来做数据分析的,如果你是做大数据分析和测试的,那么这个是非常的有用的!!但是其实我们平时在做自动化测试的时候,如果涉及到数据的读取和存储,那么而利用p ...
python处理excel表格-Python利用pandas处理Excel数据的应用
最近迷上了高效处理数据的pandas,其实这个是用来做数据分析的,如果你是做大数据分析和测试的,那么这个是非常的有用的!!但是其实我们平时在做自动化测试的时候,如果涉及到数据的读取和存储,那么而利用p ...
[转载] Python利用pandas处理Excel数据的应用
参考链接: Python | Pandas数据比较与选择最近迷上了高效处理数据的pandas,其实这个是用来做数据分析的,如果你是做大数据分析和测试的,那么这个是非常的有用的!!但是其实我们平时在做 ...
python利用pandas和xlrd读取excel，特征筛选列
利用xlrd读取excel筛选0值超过99%的列,并删除import xlrdworkbook=xlrd.open_workbook(r"123.xlsx")table = wor ...
python 利用matplotlib中imshow()函数绘图
matplotlib 是python最著名的2D绘图库,它提供了一整套和matlab相似的命令API,十分适合交互式地进行制图.而且也可以方便地将它作为绘图控件,嵌入GUI应用程序中.通过简单的绘图语 ...
python利用pandas合并excel表格代码_利用Python pandas对Excel进行合并的方法示例
前言在网上找了很多Python处理Excel的方法和代码,都不是很尽人意,所以自己综合网上各位大佬的方法,自己进行了优化,具体的代码如下. 博主也是新手一枚,代码肯定有很多需要优化的地方,欢迎各位大 ...
python导入excel模块_Excel到python第一章python利用pandas和numpy模块导入数据
原博文 2019-08-29 21:18 − import numpy as np import pandas as pd # 导入数据 # 读取csv数据 df = pd.read_csv(open ...
python 利用pandas库实现读写 .csv文件
最近在处理ILSVRC12数据集,有一个需求就是将图片路径和标签写入.csv文件中,这里我们用到了pandas库,当然也有不用pandas库的方法,这里不再介绍,因为pandas处理起来是真的香啊 1 ...
【ValueError: could not convert string to float: ‘young‘】python利用pandas对string类型的数据序列化
项目场景: 利用sklearn库构建决策树,使用sklearn.tree.DecisionTreeClassifier()提供的方法----fit()决策树可视化的时候,我们看到程序报错了问题描述 ...

python：利用pandas进行绘图（总结）绘图格式