2天学会Pandas

0.导语1.Series2.DataFrame2.1 DataFrame的简单运用3.pandas选择数据3.1 实战筛选3.2 筛选总结4.Pandas设置值4.1 创建数据4.2 根据位置设置loc和iloc4.3 根据条件设置4.4 按行或列设置4.5 添加Series序列(长度必须对齐)4.6 设定某行某列为特定值4.7 修改一整行数据5.Pandas处理丢失数据5.1 创建含NaN的矩阵5.2 删除掉有NaN的行或列5.3 替换NaN值为0或者其他5.4 是否有缺失数据NaN6.Pandas导入导出6.1 导入数据6.2 导出数据7.Pandas合并操作7.1 Pandas合并concat7.2.Pandas 合并 merge7.2.1 定义资料集并打印出7.2.2 依据key column合并,并打印7.2.3 两列合并7.2.4 Indicator设置合并列名称7.2.5 依据index合并7.2.6 解决overlapping的问题8.Pandas plot出图9.学习来源

0.导语

Pandas是基于Numpy构建的，让Numpy为中心的应用变得更加简单。

本文为一篇长文，建议收藏，转发~

1.Series

import pandas as pdimport numpy as np

# Seriess = pd.Series([1,3,6,np.nan,44,1])print(s)'''0     1.01     3.02     6.03     NaN4    44.05     1.0dtype: float64'''# 默认index从0开始,如果想要按照自己的索引设置，则修改index参数,如:index=[3,4,3,7,8,9]

2.DataFrame

2.1 DataFrame的简单运用

# DataFramedates = pd.date_range('2018-08-19',periods=6)# dates = pd.date_range('2018-08-19','2018-08-24') # 起始、结束  与上述等价'''numpy.random.randn(d0, d1, …, dn)是从标准正态分布中返回一个或多个样本值。numpy.random.rand(d0, d1, …, dn)的随机样本位于[0, 1)中。(6,4)表示6行4列数据'''df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=['a','b','c','d'])print(df)# DataFrame既有行索引也有列索引， 它可以被看做由Series组成的大字典。print(df['b'])'''2018-08-19   -0.2174242018-08-20   -1.4210582018-08-21   -0.4245892018-08-22    0.5346752018-08-23   -0.0185252018-08-24    0.635075Freq: D, Name: b, dtype: float64'''# 未指定行标签和列标签的数据df1 = pd.DataFrame(np.arange(12).reshape(3,4))print(df1)# 另一种方式df2 = pd.DataFrame({    'A': [1,2,3,4],    'B': pd.Timestamp('20180819'),    'C': pd.Series([1,6,9,10],dtype='float32'),    'D': np.array([3] * 4,dtype='int32'),    'E': pd.Categorical(['test','train','test','train']),    'F': 'foo'})print(df2)'''   A          B     C  D      E    F0  1 2018-08-19   1.0  3   test  foo1  2 2018-08-19   6.0  3  train  foo2  3 2018-08-19   9.0  3   test  foo3  4 2018-08-19  10.0  3  train  foo'''print(df2.dtypes)'''A             int64B    datetime64[ns]C           float32D             int32E          categoryF            objectdtype: object'''print(df2.index)# RangeIndex(start=0, stop=4, step=1)print(df2.columns)# Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')print(df2.values)'''[[1 Timestamp('2018-08-19 00:00:00') 1.0 3 'test' 'foo'] [2 Timestamp('2018-08-19 00:00:00') 6.0 3 'train' 'foo'] [3 Timestamp('2018-08-19 00:00:00') 9.0 3 'test' 'foo'] [4 Timestamp('2018-08-19 00:00:00') 10.0 3 'train' 'foo']]'''# 数据总结print(df2.describe())'''              A          C    Dcount  4.000000   4.000000  4.0mean   2.500000   6.500000  3.0std    1.290994   4.041452  0.0min    1.000000   1.000000  3.025%    1.750000   4.750000  3.050%    2.500000   7.500000  3.075%    3.250000   9.250000  3.0max    4.000000  10.000000  3.0'''# 翻转数据print(df2.T)# print(np.transpose(df2))等价于上述操作'''axis=1表示行axis=0表示列默认ascending(升序)为Trueascending=True表示升序,ascending=False表示降序下面两行分别表示按行升序与按行降序'''print(df2.sort_index(axis=1,ascending=True))print(df2.sort_index(axis=1,ascending=False))'''   A          B     C  D      E    F0  1 2018-08-19   1.0  3   test  foo1  2 2018-08-19   6.0  3  train  foo2  3 2018-08-19   9.0  3   test  foo3  4 2018-08-19  10.0  3  train  foo     F      E  D     C          B  A0  foo   test  3   1.0 2018-08-19  11  foo  train  3   6.0 2018-08-19  22  foo   test  3   9.0 2018-08-19  33  foo  train  3  10.0 2018-08-19  4'''# 表示按列降序与按列升序print(df2.sort_index(axis=0,ascending=False))print(df2.sort_index(axis=0,ascending=True))'''   A          B     C  D      E    F3  4 2018-08-19  10.0  3  train  foo2  3 2018-08-19   9.0  3   test  foo1  2 2018-08-19   6.0  3  train  foo0  1 2018-08-19   1.0  3   test  foo   A          B     C  D      E    F0  1 2018-08-19   1.0  3   test  foo1  2 2018-08-19   6.0  3  train  foo2  3 2018-08-19   9.0  3   test  foo3  4 2018-08-19  10.0  3  train  foo'''# 对特定列数值排列# 表示对C列降序排列print(df2.sort_values(by='C',ascending=False))

3.pandas选择数据

3.1 实战筛选

import pandas as pdimport numpy as npdates = pd.date_range('20180819', periods=6)df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates, columns=['A','B','C','D'])print(df)# 检索A列print(df['A'])print(df.A)# 选择跨越多行或多列# 选取前3行print(df[0:3])print(df['2018-08-19':'2018-08-21'])'''            A  B   C   D2018-08-19  0  1   2   32018-08-20  4  5   6   72018-08-21  8  9  10  11'''# 根据标签选择数据# 获取特定行或列# 指定行数据print(df.loc['20180819'])'''A    0B    1C    2D    3Name: 2018-08-19 00:00:00, dtype: int32'''# 指定列# 两种方式print(df.loc[:,'A':'B'])print(df.loc[:,['A','B']])'''             A   B2018-08-19   0   12018-08-20   4   52018-08-21   8   92018-08-22  12  132018-08-23  16  172018-08-24  20  21'''# 行与列同时检索print(df.loc['20180819',['A','B']])'''A    0B    1Name: 2018-08-19 00:00:00, dtype: int32'''# 根据序列iloc# 获取特定位置的值print(df.iloc[3,1])print(df.iloc[3:5,1:3]) # 不包含末尾5或3，同列表切片'''             B   C2018-08-22  13  142018-08-23  17  18'''# 跨行操作print(df.iloc[[1,3,5],1:3])'''             B   C2018-08-20   5   62018-08-22  13  142018-08-24  21  22'''# 混合选择print(df.ix[:3,['A','C']])'''            A   C2018-08-19  0   22018-08-20  4   62018-08-21  8  10'''print(df.iloc[:3,[0,2]]) # 结果同上

# 通过判断的筛选print(df[df.A>8])'''             A   B   C   D2018-08-22  12  13  14  152018-08-23  16  17  18  192018-08-24  20  21  22  23'''

3.2 筛选总结

1.iloc与ix区别  总结:相同点：iloc可以取相应的值，操作方便,与ix操作类似。  不同点：ix可以混合选择，可以填入column对应的字符选择，而iloc只能采用index索引，对于列数较多情况下，ix要方便操作许多。2.loc与iloc区别  总结：相同点：都可以索引处块数据  不同点：iloc可以检索对应值,两者操作不同。3.ix与loc、iloc三者的区别  总结：ix是混合loc与iloc操作如下:对比三者操作print(df.loc['20180819','A':'B'])print(df.iloc[0,0:2])print(df.ix[0,'A':'B'])输出结果相同，均为:A    0B    1Name: 2018-08-19 00:00:00, dtype: int32

4.Pandas设置值

4.1 创建数据

import pandas as pdimport numpy as np# 创建数据dates = pd.date_range('20180820',periods=6)df = pd.DataFrame(np.arange(24).reshape(6,4), index=dates, columns=['A','B','C','D'])print(df)'''             A   B   C   D2018-08-20   0   1   2   32018-08-21   4   5   6   72018-08-22   8   9  10  112018-08-23  12  13  14  152018-08-24  16  17  18  192018-08-25  20  21  22  23'''

4.2 根据位置设置loc和iloc

# 根据位置设置loc和ilocdf.iloc[2,2] = 111df.loc['20180820','B'] = 2222print(df)'''             A     B    C   D2018-08-20   0  2222    2   32018-08-21   4     5    6   72018-08-22   8     9  111  112018-08-23  12    13   14  152018-08-24  16    17   18  192018-08-25  20    21   22  23'''

4.3 根据条件设置

# 根据条件设置# 更改B中的数，而更改的位置取决于4的位置，并设相应位置的数为0df.B[df.A>4] = 0print(df)'''             A     B    C   D2018-08-20   0  2222    2   32018-08-21   4     5    6   72018-08-22   8     0  111  112018-08-23  12     0   14  152018-08-24  16     0   18  192018-08-25  20     0   22  23'''

4.4 按行或列设置

# 按行或列设置# 列批处理，F列全改为NaNdf['F'] = np.nanprint(df)

4.5 添加Series序列(长度必须对齐)

df['E'] = pd.Series([1,2,3,4,5,6], index=pd.date_range('20180820',periods=6))print(df)

4.6 设定某行某列为特定值

# 设定某行某列为特定值df.ix['20180820','A'] = 56df.loc['20180820','A'] = 67df.iloc[0,0] = 76

4.7 修改一整行数据

# 修改一整行数据df.iloc[1] = np.nan # df.iloc[1,:]=np.nandf.loc['20180820'] = np.nan # df.loc['20180820,:']=np.nandf.ix[2] = np.nan # df.ix[2,:]=np.nandf.ix['20180823'] = np.nanprint(df)

5.Pandas处理丢失数据

5.1 创建含NaN的矩阵

# Pandas处理丢失数据import pandas as pdimport numpy as np# 创建含NaN的矩阵# 如何填充和删除NaN数据?dates = pd.date_range('20180820',periods=6)df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['A','B','C','D']) # a.reshape(6,4)等价于a.reshape((6,4))df.iloc[0,1] = np.nandf.iloc[1,2] = np.nanprint(df)'''             A     B     C   D2018-08-20   0   NaN   2.0   32018-08-21   4   5.0   NaN   72018-08-22   8   9.0  10.0  112018-08-23  12  13.0  14.0  152018-08-24  16  17.0  18.0  192018-08-25  20  21.0  22.0  23'''

5.2 删除掉有NaN的行或列

# 删除掉有NaN的行或列print(df.dropna()) # 默认是删除掉含有NaN的行print(df.dropna(    axis=0, # 0对行进行操作;1对列进行操作    how='any' # 'any':只要存在NaN就drop掉；'all':必须全部是NaN才drop))'''             A     B     C   D2018-08-22   8   9.0  10.0  112018-08-23  12  13.0  14.0  152018-08-24  16  17.0  18.0  192018-08-25  20  21.0  22.0  23'''# 删除掉所有含有NaN的列print(df.dropna(    axis=1,    how='any'))'''             A   D2018-08-20   0   32018-08-21   4   72018-08-22   8  112018-08-23  12  152018-08-24  16  192018-08-25  20  23'''

5.3 替换NaN值为0或者其他

# 替换NaN值为0或者其他print(df.fillna(value=0))'''             A     B     C   D2018-08-20   0   0.0   2.0   32018-08-21   4   5.0   0.0   72018-08-22   8   9.0  10.0  112018-08-23  12  13.0  14.0  152018-08-24  16  17.0  18.0  192018-08-25  20  21.0  22.0  23'''

5.4 是否有缺失数据NaN

# 是否有缺失数据NaN# 是否为空print(df.isnull())'''                A      B      C      D2018-08-20  False   True  False  False2018-08-21  False  False   True  False2018-08-22  False  False  False  False2018-08-23  False  False  False  False2018-08-24  False  False  False  False2018-08-25  False  False  False  False'''# 是否为NaNprint(df.isna())'''                A      B      C      D2018-08-20  False   True  False  False2018-08-21  False  False   True  False2018-08-22  False  False  False  False2018-08-23  False  False  False  False2018-08-24  False  False  False  False2018-08-25  False  False  False  False'''# 检测某列是否有缺失数据NaNprint(df.isnull().any())'''A    FalseB     TrueC     TrueD    Falsedtype: bool'''# 检测数据中是否存在NaN,如果存在就返回Trueprint(np.any(df.isnull())==True)

6.Pandas导入导出

6.1 导入数据

import pandas as pd # 加载模块# 读取csvdata = pd.read_csv('student.csv')# 打印出dataprint(data)# 前三行print(data.head(3))# 后三行print(data.tail(3))

6.2 导出数据

# 将资料存取成pickledata.to_pickle('student.pickle')# 读取pickle文件并打印print(pd.read_pickle('student.pickle'))

7.Pandas合并操作

7.1 Pandas合并concat

import pandas as pdimport numpy as np

# 定义资料集df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d'])df2 = pd.DataFrame(np.ones((3,4))*1, columns=['a','b','c','d'])df3 = pd.DataFrame(np.ones((3,4))*2, columns=['a','b','c','d'])print(df1)'''     a    b    c    d0  0.0  0.0  0.0  0.01  0.0  0.0  0.0  0.02  0.0  0.0  0.0  0.0'''print(df2)'''     a    b    c    d0  1.0  1.0  1.0  1.01  1.0  1.0  1.0  1.02  1.0  1.0  1.0  1.0'''print(df3)'''     a    b    c    d0  2.0  2.0  2.0  2.01  2.0  2.0  2.0  2.02  2.0  2.0  2.0  2.0'''# concat纵向合并res = pd.concat([df1,df2,df3],axis=0)

# 打印结果print(res)'''     a    b    c    d0  0.0  0.0  0.0  0.01  0.0  0.0  0.0  0.02  0.0  0.0  0.0  0.00  1.0  1.0  1.0  1.01  1.0  1.0  1.0  1.02  1.0  1.0  1.0  1.00  2.0  2.0  2.0  2.01  2.0  2.0  2.0  2.02  2.0  2.0  2.0  2.0'''# 上述合并过程中，index重复，下面给出重置index方法# 只需要将index_ignore设定为True即可res = pd.concat([df1,df2,df3],axis=0,ignore_index=True)

# 打印结果print(res)'''     a    b    c    d0  0.0  0.0  0.0  0.01  0.0  0.0  0.0  0.02  0.0  0.0  0.0  0.03  1.0  1.0  1.0  1.04  1.0  1.0  1.0  1.05  1.0  1.0  1.0  1.06  2.0  2.0  2.0  2.07  2.0  2.0  2.0  2.08  2.0  2.0  2.0  2.0'''# join 合并方式#定义资料集df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d'], index=[1,2,3])df2 = pd.DataFrame(np.ones((3,4))*1, columns=['b','c','d','e'], index=[2,3,4])print(df1)print(df2)'''join='outer',函数默认为join='outer'。此方法是依照column来做纵向合并，有相同的column上下合并在一起，其他独自的column各自成列，原来没有值的位置皆为NaN填充。'''# 纵向"外"合并df1与df2res = pd.concat([df1,df2],axis=0,join='outer')

print(res)

'''     a    b    c    d    e1  0.0  0.0  0.0  0.0  NaN2  0.0  0.0  0.0  0.0  NaN3  0.0  0.0  0.0  0.0  NaN2  NaN  1.0  1.0  1.0  1.03  NaN  1.0  1.0  1.0  1.04  NaN  1.0  1.0  1.0  1.0'''# 修改indexres = pd.concat([df1,df2],axis=0,join='outer',ignore_index=True)

print(res)'''     a    b    c    d    e0  0.0  0.0  0.0  0.0  NaN1  0.0  0.0  0.0  0.0  NaN2  0.0  0.0  0.0  0.0  NaN3  NaN  1.0  1.0  1.0  1.04  NaN  1.0  1.0  1.0  1.05  NaN  1.0  1.0  1.0  1.0'''# join='inner'合并相同的字段# 纵向"内"合并df1与df2res = pd.concat([df1,df2],axis=0,join='inner')# 打印结果print(res)'''     b    c    d1  0.0  0.0  0.02  0.0  0.0  0.03  0.0  0.0  0.02  1.0  1.0  1.03  1.0  1.0  1.04  1.0  1.0  1.0'''# join_axes(依照axes合并)#定义资料集df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d'], index=[1,2,3])df2 = pd.DataFrame(np.ones((3,4))*1, columns=['b','c','d','e'], index=[2,3,4])print(df1)'''     a    b    c    d1  0.0  0.0  0.0  0.02  0.0  0.0  0.0  0.03  0.0  0.0  0.0  0.0'''print(df2)'''     b    c    d    e2  1.0  1.0  1.0  1.03  1.0  1.0  1.0  1.04  1.0  1.0  1.0  1.0'''# 依照df1.index进行横向合并res = pd.concat([df1,df2],axis=1,join_axes=[df1.index])print(res)'''     a    b    c    d    b    c    d    e1  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN2  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.03  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0'''# 移除join_axes参数,打印结果res = pd.concat([df1,df2],axis=1)print(res)'''     a    b    c    d    b    c    d    e1  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN2  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.03  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0'''# append(添加数据)# append只有纵向合并，没有横向合并#定义资料集df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d'])df2 = pd.DataFrame(np.ones((3,4))*1, columns=['a','b','c','d'])df3 = pd.DataFrame(np.ones((3,4))*2, columns=['a','b','c','d'])s1 = pd.Series([1,2,3,4], index=['a','b','c','d'])# 将df2合并到df1下面,以及重置index,并打印出结果res = df1.append(df2,ignore_index=True)print(res)'''     a    b    c    d0  0.0  0.0  0.0  0.01  0.0  0.0  0.0  0.02  0.0  0.0  0.0  0.03  1.0  1.0  1.0  1.04  1.0  1.0  1.0  1.05  1.0  1.0  1.0  1.0'''# 合并多个df,将df2与df3合并至df1的下面,以及重置index,并打印出结果res = df1.append([df2,df3], ignore_index=True)print(res)'''     a    b    c    d0  0.0  0.0  0.0  0.01  0.0  0.0  0.0  0.02  0.0  0.0  0.0  0.03  1.0  1.0  1.0  1.04  1.0  1.0  1.0  1.05  1.0  1.0  1.0  1.06  2.0  2.0  2.0  2.07  2.0  2.0  2.0  2.08  2.0  2.0  2.0  2.0'''# 合并series,将s1合并至df1，以及重置index，并打印结果res = df1.append(s1,ignore_index=True)print(res)'''     a    b    c    d0  0.0  0.0  0.0  0.01  0.0  0.0  0.0  0.02  0.0  0.0  0.0  0.03  1.0  2.0  3.0  4.0'''# 总结:两种常用合并方式res = pd.concat([df1, df2, df3], axis=0, ignore_index=True)res1 = df1.append([df2, df3], ignore_index=True)print(res)print(res1)

7.2.Pandas 合并 merge

7.2.1 定义资料集并打印出

import pandas as pd# 依据一组key合并# 定义资料集并打印出left = pd.DataFrame({'key' : ['K0','K1','K2','K3'],                     'A' : ['A0','A1','A2','A3'],                     'B' : ['B0','B1','B2','B3']})

right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],                      'C' : ['C0', 'C1', 'C2', 'C3'],                      'D' : ['D0', 'D1', 'D2', 'D3']})print(left)'''    A   B key0  A0  B0  K01  A1  B1  K12  A2  B2  K23  A3  B3  K3'''print(right)'''    C   D key0  C0  D0  K01  C1  D1  K12  C2  D2  K23  C3  D3  K3'''

7.2.2 依据key column合并,并打印

# 依据key column合并,并打印res = pd.merge(left,right,on='key')print(res)'''    A   B key   C   D0  A0  B0  K0  C0  D01  A1  B1  K1  C1  D12  A2  B2  K2  C2  D23  A3  B3  K3  C3  D3'''# 依据两组key合并#定义资料集并打印出left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],                      'key2': ['K0', 'K1', 'K0', 'K1'],                      'A': ['A0', 'A1', 'A2', 'A3'],                      'B': ['B0', 'B1', 'B2', 'B3']})right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],                       'key2': ['K0', 'K0', 'K0', 'K0'],                       'C': ['C0', 'C1', 'C2', 'C3'],                       'D': ['D0', 'D1', 'D2', 'D3']})print(left)'''    A   B key1 key20  A0  B0   K0   K01  A1  B1   K0   K12  A2  B2   K1   K03  A3  B3   K2   K1'''print(right)'''    C   D key1 key20  C0  D0   K0   K01  C1  D1   K1   K02  C2  D2   K1   K03  C3  D3   K2   K0'''

7.2.3 两列合并

依据key1与key2 columns进行合并

# 依据key1与key2 columns进行合并，并打印出四种结果['left', 'right', 'outer', 'inner']res = pd.merge(left, right, on=['key1', 'key2'], how='inner')print(res)res = pd.merge(left, right, on=['key1', 'key2'], how='outer')print(res)res = pd.merge(left, right, on=['key1', 'key2'], how='left')print(res)res = pd.merge(left, right, on=['key1', 'key2'], how='right')print(res)'''---------------inner方式---------------    A   B key1 key2   C   D0  A0  B0   K0   K0  C0  D01  A2  B2   K1   K0  C1  D12  A2  B2   K1   K0  C2  D2---------------outer方式---------------     A    B key1 key2    C    D0   A0   B0   K0   K0   C0   D01   A1   B1   K0   K1  NaN  NaN2   A2   B2   K1   K0   C1   D13   A2   B2   K1   K0   C2   D24   A3   B3   K2   K1  NaN  NaN5  NaN  NaN   K2   K0   C3   D3---------------left方式---------------    A   B key1 key2    C    D0  A0  B0   K0   K0   C0   D01  A1  B1   K0   K1  NaN  NaN2  A2  B2   K1   K0   C1   D13  A2  B2   K1   K0   C2   D24  A3  B3   K2   K1  NaN  NaN--------------right方式---------------     A    B key1 key2   C   D0   A0   B0   K0   K0  C0  D01   A2   B2   K1   K0  C1  D12   A2   B2   K1   K0  C2  D23  NaN  NaN   K2   K0  C3  D3'''

7.2.4 Indicator设置合并列名称

# Indicatordf1 = pd.DataFrame({'col1':[0,1],'col_left':['a','b']})df2 = pd.DataFrame({'col1':[1,2,2],'col_right':[2,2,2]})print(df1)'''   col1 col_left0     0        a1     1        b'''print(df2)'''   col1  col_right0     1          21     2          22     2          2'''

# 依据col1进行合并,并启用indicator=True,最后打印res = pd.merge(df1,df2,on='col1',how='outer',indicator=True)print(res)'''   col1 col_left  col_right      _merge0     0        a        NaN   left_only1     1        b        2.0        both2     2      NaN        2.0  right_only3     2      NaN        2.0  right_only'''# 自定义indicator column的名称,并打印出res = pd.merge(df1,df2,on='col1',how='outer',indicator='indicator_column')print(res)'''   col1 col_left  col_right indicator_column0     0        a        NaN        left_only1     1        b        2.0             both2     2      NaN        2.0       right_only3     2      NaN        2.0       right_only'''

7.2.5 依据index合并

# 依据index合并#定义资料集并打印出left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],                     'B': ['B0', 'B1', 'B2']},                     index=['K0', 'K1', 'K2'])right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],                      'D': ['D0', 'D2', 'D3']},                     index=['K0', 'K2', 'K3'])print(left)'''     A   BK0  A0  B0K1  A1  B1K2  A2  B2'''print(right)'''     C   DK0  C0  D0K2  C2  D2K3  C3  D3'''# 依据左右资料集的index进行合并,how='outer',并打印res = pd.merge(left,right,left_index=True,right_index=True,how='outer')print(res)'''      A    B    C    DK0   A0   B0   C0   D0K1   A1   B1  NaN  NaNK2   A2   B2   C2   D2K3  NaN  NaN   C3   D3'''# 依据左右资料集的index进行合并,how='inner',并打印res = pd.merge(left,right,left_index=True,right_index=True,how='inner')print(res)'''     A   B   C   DK0  A0  B0  C0  D0K2  A2  B2  C2  D2'''

7.2.6 解决overlapping的问题

# 解决overlapping的问题#定义资料集boys = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'age': [1, 2, 3]})girls = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'age': [4, 5, 6]})print(boys)'''   age   k0    1  K01    2  K12    3  K2'''print(girls)'''   age   k0    4  K01    5  K02    6  K3'''# 使用suffixes解决overlapping的问题# 比如将上面两个合并时,age重复了,则可通过suffixes设置,以此保证不重复,不同名res = pd.merge(boys,girls,on='k',suffixes=['_boy','_girl'],how='inner')print(res)'''   age_boy   k  age_girl0        1  K0         41        1  K0         5'''

8.Pandas plot出图

import pandas as pdimport numpy as npimport matplotlib.pyplot as plt

data = pd.Series(np.random.randn(1000), index=np.arange(1000))print(data)print(data.cumsum())# data本来就是一个数据，所以我们可以直接plotdata.plot()plt.show()

# np.random.randn(1000,4) 随机生成1000行4列数据# list("ABCD")会变为['A','B','C','D']data = pd.DataFrame(    np.random.randn(1000,4),    index=np.arange(1000),    columns=list("ABCD"))data.cumsum()data.plot()plt.show()

ax = data.plot.scatter(x='A',y='B',color='DarkBlue',label='Class1')# 将之下这个 data 画在上一个 ax 上面data.plot.scatter(x='A',y='C',color='LightGreen',label='Class2',ax=ax)plt.show()

9.学习来源

1.https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/
2.https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/

pandas 中文打印无法对齐_2天学会Pandas相关推荐

Pandas基本操作指南-2天学会pandas
作者:光城Pandas 是基于NumPy 的一种工具,该工具是为了解决数据分析任务而创建的.Pandas 纳入了大量库和一些标准的数据模型,提供了高效地操作大型数据集所需的工具.pandas提供了大量 ...
10分钟上手pythonpandas_【译】10分钟学会Pandas
十分钟学会Pandas 这是关于Pandas的简短介绍主要面向新用户.你可以参考Cookbook了解更复杂的使用方法习惯上,我们这样导入: In [1]: importpandas as pd In ...
十分钟学python-【译】10分钟学会Pandas
十分钟学会Pandas 这是关于Pandas的简短介绍主要面向新用户.你可以参考Cookbook了解更复杂的使用方法习惯上,我们这样导入: In [1]: importpandas as pd In ...
Pandas中文教程
导航索引模块 | 下一个 | pandas 0.19.2 documentation» 目录新功能安装为pandas贡献常见问题(FAQ) 套装概述 10分钟入门pandas 教程食谱 ...
Python 2x 中list 里面的中文打印效果乱码
事情是这样的本来是处理python2x 中list 里面的中文打印为unicode 想处理下打印为中文,处理之后打印的效果中文乱码了代码如下 #!/usr/bin/python # -*- cod ...
斑马打印机-中文打印
中文打印通常有两种方式 1.使用字体库,代价高,使用简单速度快 2.通过图片方式打印,免费使用,速度适中官方有下载Fnthex32.dll https://download.csdn.n ...
十分钟学会pandas《10 Minutes to pandas》
pandas官方网站上的<10 Minutes to pandas>点这里查看,讲解浅显易懂,本文在官网的基础上作了补充.详细的介绍请参考:Cookbook .pandas是非常强大的数据 ...
Zebra中文打印助手
目前使用最多的条码码打印机应该要属Zebra打印机了,很多人在编写标签打印程序的时候有遇到打印中文或非打印机内容语言的问题.官方给出解决方案是使用汉卡来解决.这种方法的优点是打印速度很快,但缺点就是字 ...
解决命令行程序中文英文不对齐的情况
解决命令行程序中文英文不对齐的情况问题描述我用java写了个命令行的工具程序,在程序中我会输出一些提示语句,这些输出语句中有中文,但是奇怪的是中文老是对不齐,强迫症的我看起来很不舒服. 原因 ...

pandas 中文打印无法对齐_2天学会Pandas