1.DataFrame常用属性、函数以及索引方式

1.1DataFrame简介

　　DataFrame是一个表格型的数据结构，它含有一组有序的列，每列可以是不同的值类型（数值、字符串、布尔值等）。DataFrame既有行索引也有列索引，它可以被看做由Series组成的字典（共用同一个索引）。DataFrame可以通过类似字典的方式或者.columnname的方式将列获取为一个Series。行也可以通过位置或名称的方式进行获取。

　　　　为不存在的列赋值会创建新列。

　　　　>>> del frame['xxx']　　# 删除列

1.2DataFrame常用属性

属性	说明
values	DataFrame的值
index	行索引
index.name	行索引的名字
columns	列索引
columns.name	列索引的名字
ix	返回行的DataFrame
ix[[x,y,...], [x,y,...]]	对行重新索引，然后对列重新索引
T	frame行列转置

1.3DataFrame常用函数

1.3.1函数	说明
DataFrame(dict, columns=dict.index, index=[dict.columnnum]) DataFrame(二维ndarray) DataFrame(由数组、列表或元组组成的字典) DataFrame(NumPy的结构化/记录数组) DataFrame(由Series组成的字典) DataFrame(由字典组成的字典) DataFrame(字典或Series的列表) DataFrame(由列表或元组组成的列表) DataFrame(DataFrame) DataFrame(NumPy的MaskedArray)	构建DataFrame 数据矩阵，还可以传入行标和列标每个序列会变成DataFrame的一列。所有序列的长度必须相同类似于“由数组组成的字典” 每个Series会成为一列。如果没有显式制定索引，则各Series的索引会被合并成结果的行索引各内层字典会成为一列。键会被合并成结果的行索引。各项将会成为DataFrame的一行。索引的并集会成为DataFrame的列标。类似于二维ndarray 沿用DataFrame 类似于二维ndarray，但掩码结果会变成NA/缺失值
df.reindex([x,y,...], fill_value=NaN, limit) df.reindex([x,y,...], method=NaN) df.reindex([x,y,...], columns=[x,y,...],copy=True)	返回一个适应新索引的新对象，将缺失值填充为fill_value，最大填充量为limit 返回适应新索引的新对象，填充方式为method 同时对行和列进行重新索引，默认复制新对象。
df.drop(index, axis=0)	丢弃指定轴上的指定项。

1.3.2排序函数	说明
df.sort_index(axis=0, ascending=True) df.sort_index(by=[a,b,...])	根据索引排序

1.3.3汇总统计函数	说明
df.count()	非NaN的数量
df.describe()	一次性产生多个汇总统计
df.min() df.min()	最小值最大值
df.idxmax(axis=0, skipna=True) df.idxmin(axis=0, skipna=True)	返回含有最大值的index的Series 返回含有最小值的index的Series
df.quantile(axis=0)	计算样本的分位数
df.sum(axis=0, skipna=True, level=NaN) df.mean(axis=0, skipna=True, level=NaN) df.median(axis=0, skipna=True, level=NaN) df.mad(axis=0, skipna=True, level=NaN) df.var(axis=0, skipna=True, level=NaN) df.std(axis=0, skipna=True, level=NaN) df.skew(axis=0, skipna=True, level=NaN) df.kurt(axis=0, skipna=True, level=NaN) df.cumsum(axis=0, skipna=True, level=NaN) df.cummin(axis=0, skipna=True, level=NaN) df.cummax(axis=0, skipna=True, level=NaN) df.cumprod(axis=0, skipna=True, level=NaN) df.diff(axis=0) df.pct_change(axis=0)	返回一个含有求和小计的Series 返回一个含有平均值的Series 返回一个含有算术中位数的Series 返回一个根据平均值计算平均绝对离差的Series 返回一个方差的Series 返回一个标准差的Series 返回样本值的偏度（三阶距）返回样本值的峰度（四阶距）返回样本的累计和返回样本的累计最大值返回样本的累计最小值返回样本的累计积返回样本的一阶差分返回样本的百分比数变化


1.3.4计算函数	说明
df.add(df2, fill_value=NaN, axist=1) df.sub(df2, fill_value=NaN, axist=1) df.div(df2, fill_value=NaN, axist=1) df.mul(df2, fill_value=NaN, axist=1)	元素级相加，对齐时找不到元素默认用fill_value 元素级相减，对齐时找不到元素默认用fill_value 元素级相除，对齐时找不到元素默认用fill_value 元素级相乘，对齐时找不到元素默认用fill_value
df.apply(f, axis=0)	将f函数应用到由各行各列所形成的一维数组上
df.applymap(f)	将f函数应用到各个元素上
df.cumsum(axis=0, skipna=True)	累加，返回累加后的dataframe

1.4DataFrame索引方式

索引方式	说明
df[val]	选取DataFrame的单个列或一组列
df.ix[val]	选取Dataframe的单个行或一组行
df.ix[:,val]	选取单个列或列子集
df.ix[val1,val2]	将一个或多个轴匹配到新索引
reindex方法	将一个或多个轴匹配到新索引
xs方法	根据标签选取单行或者单列，返回一个Series
icol、irow方法	根据整数位置选取单列或单行，并返回一个Series
get_value、set_value	根据行标签和列标签选取单个值

运算：默认情况下，Dataframe和Series之间的算术运算会将Series的索引匹配到的Dataframe的列，沿着列一直向下传播。若索引找不到，则会重新索引产生并集。

2.DataFrame常用属性例程

# -*- coding: utf-8 -*-
"""
@author: 蔚蓝的天空Tom
DataFrame是一个表格型的数据结构，它含有一组有序的列，每列可以是不同的值类型（数值、字符串、布尔值等）。
DataFrame既有行索引也有列索引，它可以被看做由Series组成的字典（共用同一个索引）。
DataFrame可以通过类似字典的方式或者.columnname的方式将列获取为一个Series。
行也可以通过位置或名称的方式进行获取。
DataFrame常用属性
属性说明
values DataFrame的值
index 行索引
index.name 行索引的名字
columns 列索引
columns.name 列索引的名字
ix 返回行的DataFrame
ix[[x,y,...], [x,y,...]] 对行重新索引，然后对列重新索引
T frame行列转置
"""
import pandas as pd
from pandas import DataFrame
if __name__=='__main__':
data = {'Name':['Tom','Kim','Andy'],
'Age':[18,16,19],
'Height':[1.6,1.5,1.7]}
ind = ['No.1', 'No.2', 'No.3']
df = pd.DataFrame(data, index=ind)
# Age Height Name
#No.1 18 1.6 Tom
#No.2 16 1.5 Kim
#No.3 19 1.7 Andy
#DataFram的值
v = df.values #<class 'numpy.ndarray'>
#[[18 1.6 'Tom']
# [16 1.5 'Kim']
# [19 1.7 'Andy']]
#行索引，用户没有自定义行索引index时，返回行索引魔人数值
ind = df.index #<class 'pandas.indexes.base.Index'>
#Index(['No.1', 'No.2', 'No.3'], dtype='object')
#行索引的名字，未设置时获取到None
iname = df.index.name
#None
#行索引的名字，先设置再获取
df.index.name = 'StudentID'
iname = df.index.name
#StudentID
#列索引
col = df.columns #<class 'pandas.indexes.base.Index'>
#Index(['Age', 'Height', 'Name'], dtype='object')
#列索引的名字, 未设置时为None
cname = df.columns.name
#None
#列索引的名字，先设置再获取
df.columns.name = 'StudentInfo'
cname = df.columns.name
#StudentInfo
#ix, 返回行的DataFrame
ret = df.ix[0] #返回第一行数据
#Age 18
#Height 1.6
#Name Tom
#Name: No.1, dtype: object
#ix, 返回行的DataFrame
ret = df.ix[1] #返回第二行数据, <class 'pandas.core.series.Series'>
#Age 16
#Height 1.5
#Name Kim
#Name: No.2, dtype: object
ret = df.ix[-1] #返回最后一行数据
#Age 19
#Height 1.7
#Name Andy
#Name: No.3, dtype: object
#ix[[rowx, rowy,...]] 对行重新索引，相等于DataFrame切片
ret = df.ix[[0,2]]
#StudentInfo Age Height Name
#StudentID
#No.1 18 1.6 Tom
#No.2 16 1.5 Kim
#ix[[rowx, rowy,...], [colx, coly, ...]]
ret = df.ix[[0,2], [0,1]]
#StudentInfo Age Height
#StudentID
#No.1 18 1.6
#No.3 19 1.7
#T frame行列转置
print('转置前:\n', df)
#转置前:
#StudentInfo Age Height Name
#StudentID
#No.1 18 1.6 Tom
#No.2 16 1.5 Kim
#No.3 19 1.7 Andy
print('转置前values:\n', df.values)
#转置前values:
# [[18 1.6 'Tom']
# [16 1.5 'Kim']
# [19 1.7 'Andy']]
dfT = df.T
print('转置后:\n', dfT)
#转置后:
#StudentID No.1 No.2 No.3
#StudentInfo
#Age 18 16 19
#Height 1.6 1.5 1.7
#Name Tom Kim Andy
print('转置后values:\n', dfT.values)
#转置后values:
# [[18 16 19]
# [1.6 1.5 1.7]
# ['Tom' 'Kim' 'Andy']]
print('转置前index.name:\n', df.index.name)
#StudentID
print('转置后index.name:\n', dfT.index.name)
#StudentInfo
print('转置前columns.name:\n', df.columns.name)
#StudentInfo
print('转置后columns.name:\n', dfT.columns.name)
#StudentID

3.DataFrame常用函数DataFrame()/reindex()/drop()

def DataFrame_manual():
'''
DataFrame类型类似于数据库表结构的数据结构，含有行索引和列索引
可以将DataFrame看成由相同索引的Series组成的Dict类型。
在其底层是通过二维以及一维的数据块实现
'''
import pandas as pd
from pandas import DataFrame
#1. DataFrame对象的创建
#1.1用包含等长的列表或者是NumPy数组的字典创建DataFrame对象
#建立等长列表的字典类型
data = {'Name':['Tom', 'Kim', 'Andy'],
'Age':[18, 16, 19],
'Height':[1.6, 1.5, 1.7]}
#建立DataFrame对象
#使用默认索引[0,1,2,....]
df = pd.DataFrame(data) #默认索引，默认列的顺序
# Age Height Name
# 0 18 1.6 Tom
# 1 16 1.5 Kim
# 2 19 1.7 Andy
#指定列的顺序
df = pd.DataFrame(data, columns=['Name', 'Age', 'Height'])
# Name Age Height
# 0 Tom 18 1.6
# 1 Kim 16 1.5
# 2 Andy 19 1.7
#指定DataFrame的索引
df = pd.DataFrame(data, index=['1st', '2nd', '3th'])
# Age Height Name
# 1st 18 1.6 Tom
# 2nd 16 1.5 Kim
# 3th 19 1.7 Andy
#1.2 用嵌套dict生成DataFrame对象
#用嵌套dict生成DataFrame，外部的dict索引会成为列名，内部的dict索引会成为行名
#生成的DataFrame会根据行索引排序
data = {'Name': {'1st':'Tom', '2nd':'Kim', '3th':'Andy'},
'Age': {'1st':18, '2nd':16, '3th':19},
'Height':{'1st':1.6, '2nd':1.5, '3th':1.7}}
df = pd.DataFrame(data) #使用嵌套dict指定的行序列，使用默认的列序列(列名字典排序)
# Age Height Name
# 1st 18 1.6 Tom
# 2nd 16 1.5 Kim
# 3th 19 1.7 Andy
df = pd.DataFrame(data, ['3th', '2nd', '1st']) #指定行的序列
# Age Height Name
# 3th 19 1.7 Andy
# 2nd 16 1.5 Kim
# 1st 18 1.6 Tom
#2访问DataFrame
#从DataFrame中获取一列的结果为一个Series，有两种方法
#2.1字典索引方式获取
data = {'Name':['Tom', 'Kim', 'Andy'],
'Age':[18, 16, 19],
'Height':[1.6, 1.5, 1.7]}
df = pd.DataFrame(data, columns=['Name', 'Age', 'Height'], index=['1st', '2nd', '3th'])
# Name Age Height
# 1st Tom 18 1.6
# 2nd Kim 16 1.5
# 3th Andy 19 1.7
s = df['Name']
# 1st Tom
# 2nd Kim
# 3th Andy
# Name: Name, dtype: object
#2.2通过ix获取一行数据
data = {'Name':['Tom', 'Kim', 'Andy'],
'Age':[18, 16, 19],
'Height':[1.6, 1.5, 1.7]}
df = pd.DataFrame(data,
columns=['Name', 'Age', 'Height'],
index=['1st', '2nd', '3th'])
s = df.ix['1st'] #获取单行，参数为行索引值
# Name Tom
# Age 18
# Height 1.6
# Name: 1st, dtype: object
s = df.ix[0] #获取单行，参数默认数字行索引
# Name Tom
# Age 18
# Height 1.6
# Name: 1st, dtype: object
s = df.ix[['3th', '2nd']]#获取多行
# Name Age Height
# 3th Andy 19 1.7
# 2nd Kim 16 1.5
s = df.ix[range(3)] #通过默认数字行索引获取数据
# Name Age Height
# 1st Tom 18 1.6
# 2nd Kim 16 1.5
# 3th Andy 19 1.7
#2.3获取指定行，指定列的交汇值
ret = df['Name']['1st'] #Tom
ret = df['Name'][0] #Tom
ret = df['Age']['1st'] #18
ret = df['Age'][0] #18
ret = df['Height']['1st']#1.6
ret = df['Height'][0] #1.6
#2.4获取指定列，指定行的交汇值
ret = df.ix['1st']['Name'] #Tom
ret = df.ix[0]['Name'] #Tom
ret = df.ix['1st']['Age'] #18
ret = df.ix[0]['Age'] #18
ret = df.ix['1st']['Height']#1.6
ret = df.ix[0]['Height'] #1.6
#3.修改DataFame对象
#3.1增加列
data = {'Name':['Tom', 'Kim', 'Andy'],
'Age':[18, 16, 19],
'Height':[1.6, 1.5, 1.7]}
df = pd.DataFrame(data,
columns=['Name', 'Age', 'Height'],
index=['1st', '2nd', '3th'])
df['Grade'] = 9 #增加一列，年级'Grade'，为同一值9年级
# Name Age Height Grade
# 1st Tom 18 1.6 9
# 2nd Kim 16 1.5 9
# 3th Andy 19 1.7 9
#3.2修改一列的值
df['Grade'] = [6,7,7]
# Name Age Height Grade
# 1st Tom 18 1.6 6
# 2nd Kim 16 1.5 7
# 3th Andy 19 1.7 7
#3.3判断Grade是否为7年级
s = pd.Series([False, True, True], index=['1st', '2nd', '3th'])
df['HighGrade'] = s #新增一列'HighGrade'，用Series赋值
# Name Age Height Grade HighGrade
# 1st Tom 18 1.6 6 False
# 2nd Kim 16 1.5 7 True
# 3th Andy 19 1.7 7 True
#4.命令DataFrame的行、列
data = {'Name':['Tom', 'Kim', 'Andy'],
'Age':[18, 16, 19],
'Height':[1.6, 1.5, 1.7]}
df = pd.DataFrame(data,
columns=['Name', 'Age', 'Height'],
index=['1st', '2nd', '3th'])
df.columns.name = 'Students'
df.index.name = 'ID'
# Students Name Age Height
# ID
# 1st Tom 18 1.6
# 2nd Kim 16 1.5
# 3th Andy 19 1.7

4.DataFrame排序函数

def DataFrame_Sort():
data = {'Name': {'No.1':'Tom', 'No.2':'Kim', 'No.3':'Andy'},
'Age': {'No.1':18, 'No.2':16, 'No.3':19},
'Height':{'No.1':1.6, 'No.2':1.5, 'No.3':1.7}}
df = pd.DataFrame(data)
df.index.name = 'ID'
df.columns.name = 'StudentInfo'
#StudentInfo Age Height Name
#ID
#No.1 18 1.6 Tom
#No.2 16 1.5 Kim
#No.3 19 1.7 Andy
#行索引排序，升序
ret = df.sort_index(ascending=True)
#StudentInfo Age Height Name
#ID
#No.1 18 1.6 Tom
#No.2 16 1.5 Kim
#No.3 19 1.7 Andy
#行索引排序，降序
ret = df.sort_index(ascending=False)
#StudentInfo Age Height Name
#ID
#No.3 19 1.7 Andy
#No.2 16 1.5 Kim
#No.1 18 1.6 Tom
#数据排序，按照指定列排序，降序
ret = df.sort_values(by='Age', ascending=True) #按照Age列降序排序
#StudentInfo Age Height Name
#ID
#No.2 16 1.5 Kim
#No.1 18 1.6 Tom
#No.3 19 1.7 Andy
#数据排序，按照指定列排序，升序
ret = df.sort_values(by='Age', ascending=False)
#StudentInfo Age Height Name
#ID
#No.3 19 1.7 Andy
#No.1 18 1.6 Tom
#No.2 16 1.5 Kim

5.DataFrame汇总统计函数

# -*- coding: utf-8 -*-
"""
@author: 蔚蓝的天空Tom
Aim:DataFrame的汇总统计功能函数
df.count() 非NaN的数量
df.describe() 一次性产生多个汇总统计
df.min() 最小值
df.min() 最大值
df.idxmax(axis=0, skipna=True) 返回含有最大值的index的Series
df.idxmin(axis=0, skipna=True) 返回含有最小值的index的Series
df.quantile(axis=0) 计算样本的分位数
df.sum(axis=0, skipna=True, level=NaN) 返回一个含有求和小计的Series
df.mean(axis=0, skipna=True, level=NaN) 返回一个含有平均值的Series
df.median(axis=0, skipna=True, level=NaN) 返回一个含有算术中位数的Series
df.mad(axis=0, skipna=True, level=NaN) 返回一个根据平均值计算平均绝对离差的Series
df.var(axis=0, skipna=True, level=NaN) 返回一个方差的Series
df.std(axis=0, skipna=True, level=NaN) 返回一个标准差的Series
df.skew(axis=0, skipna=True, level=NaN) 返回样本值的偏度（三阶距）
df.kurt(axis=0, skipna=True, level=NaN) 返回样本值的峰度（四阶距）
df.cumsum(axis=0, skipna=True, level=NaN) 返回样本的累计和
df.cummin(axis=0, skipna=True, level=NaN) 返回样本的累计最大值
df.cummax(axis=0, skipna=True, level=NaN) 返回样本的累计最小值
df.cumprod(axis=0, skipna=True, level=NaN) 返回样本的累计积
df.diff(axis=0) 返回样本的一阶差分
df.pct_change(axis=0) 返回样本的百分比数变化
"""
import pandas as pd
from pandas import DataFrame
if __name__=='__main__':
data = {'Name':['Tom', 'Kim', 'Andy'],
'Age':[18, 16, 19],
'Height':[1.6, 1.5, 1.7]}
ind = ['No.1', 'No.2', 'No.3']
df = pd.DataFrame(data, index=ind)
df.index.name = 'ID'
df.columns.name = 'StudentInfo'
#StudentInfo Age Height Name
#ID
#No.1 18 1.6 Tom
#No.2 16 1.5 Kim
#No.3 19 1.7 Andy
#df.count() 非NaN的数量
cnt = df.count()
#StudentInfo
#Age 3
#Height 3
#Name 3
#dtype: int64
#df.describe()一次性产生多个汇总统计(包括count, mean, std, min, max等)
ret = df.describe() #<class 'pandas.core.frame.DataFrame'>
#StudentInfo Age Height
#count 3.000000 3.00
#mean 17.666667 1.60
#std 1.527525 0.10
#min 16.000000 1.50
#25% 17.000000 1.55
#50% 18.000000 1.60
#75% 18.500000 1.65
#max 19.000000 1.70
#df.min() 最小值，每列的最小数值
ret = df.min()
#StudentInfo
#Age 16
#Height 1.5
#Name Andy
#dtype: object
#df.min() 最大值，每列的最大数值
ret = df.max()
#StudentInfo
#Age 19
#Height 1.7
#Name Tom
#dtype: object
#df.idxmax(axis=0, skipna=True) 返回含有最大值的index的Series
data = {'Age':[18,16,19],
'Height':[1.6, 1.5, 1.7],
'Math':[60, 70, 100],
'English':[98, 68, 69],
'Chinese':[50, 99, 70]}
ind = ['No.1', 'No.2', 'No.3']
df = pd.DataFrame(data, index=ind)
df.index.name = 'ID'
df.columns.name = 'Student'
#Student Age Chinese English Height Math
#ID
#No.1 18 50 98 1.6 60
#No.2 16 99 68 1.5 70
#No.3 19 70 69 1.7 100
#df.idxmin(axis=0, skipna=True) 返回含有最小值的index的Series
ret = df.idxmax(axis = 0) #<class 'pandas.core.series.Series'>
#Student
#Age No.3
#Chinese No.2
#English No.1
#Height No.3
#Math No.3
#dtype: object
#每行最大数据所在列名
ret = df.idxmax(axis = 1) #<class 'pandas.core.series.Series'>
#ID
#No.1 English
#No.2 Chinese
#No.3 Math
#dtype: object
#df.quantile(axis=0) 计算样本的分位数（有二分位数，四分位数等）
ret = df.quantile(axis = 0) #每列样本的中位数
#Student
#Age 18.0
#Chinese 70.0
#English 69.0
#Height 1.6
#Math 70.0
#dtype: float64
#df.sum(axis=0, skipna=True, level=NaN) 返回一个含有求和小计的Series
ret = df.sum(axis=0) #每列样本的总和
#Student
#Age 53.0
#Chinese 219.0
#English 235.0
#Height 4.8
#Math 230.0
#dtype: float64
ret = df.sum(axis=1) #每行数据的总和，从此样本看没有任何意义
#ID
#No.1 227.6
#No.2 254.5
#No.3 259.7
#dtype: float64
#df.mean(axis=0, skipna=True, level=NaN) 返回一个含有平均值的Series
ret = df.mean(axis=0) #每列样本的平均值
#Student
#Age 17.666667
#Chinese 73.000000
#English 78.333333
#Height 1.600000
#Math 76.666667
#dtype: float64
ret = df.mean(axis=1) #每行数据的平均值，以此样本看没有任何意义
#ID
#No.1 45.52
#No.2 50.90
#No.3 51.94
#dtype: float64
#df.median(axis=0, skipna=True, level=NaN) 返回一个含有算术中位数的Series
ret = df.median(axis=0) #每列样本的中位数
#Student
#Age 18.0
#Chinese 70.0
#English 69.0
#Height 1.6
#Math 70.0
#dtype: float64
ret = df.median(axis=1) #每行数据的中位数
#ID
#No.1 50.0
#No.2 68.0
#No.3 69.0
#dtype: float64
#df.mad(axis=0, skipna=True, level=NaN) 返回一个根据平均值计算平均绝对离差的Series
#绝对离差=单项数值与平均值之差的绝对值
#Student Age Chinese English Height Math
#ID
#No.1 18 50 98 1.6 60
#No.2 16 99 68 1.5 70
#No.3 19 70 69 1.7 100
ret = df.mad(axis=0) #逐列求值
#Student
#Age 1.111111
#Chinese 17.333333
#English 13.111111
#Height 0.066667
#Math 15.555556
#dtype: float64
ret = df.mad(axis=1) #逐行求值
#ID
#No.1 28.576
#No.2 33.720
#No.3 33.272
#dtype: float64
#df.var(axis=0, skipna=True, level=NaN) 返回一个方差的Series
ret = df.var(axis=0) #逐列操作求方差
#Student
#Age 2.333333
#Chinese 607.000000
#English 290.333333
#Height 0.010000
#Math 433.333333
#dtype: float64
ret = df.var(axis=1) #逐行操作求方差
#ID
#No.1 1417.552
#No.2 1657.300
#No.3 1634.018
#dtype: float64
#df.std(axis=0, skipna=True, level=NaN) 返回一个标准差的Series
ret = df.std(axis=0) #逐列求标准差
#Student
#Age 1.527525
#Chinese 24.637370
#English 17.039171
#Height 0.100000
#Math 20.816660
#dtype: float64
ret = df.std(axis=1) #逐行求标准差
#ID
#No.1 37.650392
#No.2 40.709950
#No.3 40.422989
#dtype: float64
#df.skew(axis=0, skipna=True, level=NaN) 返回样本值的偏度（三阶距）
ret = df.skew(axis=0) #逐列求样本值的偏度（三阶矩）
#Student
#Age -0.935220
#Chinese 0.539824
#English 1.725342
#Height 0.000000
#Math 1.293343
#dtype: float64
ret = df.skew(axis=1) #逐行求样本值的偏度（三阶矩）
#ID
#No.1 0.328682
#No.2 -0.245853
#No.3 -0.256661
#dtype: float64
#df.kurt(axis=0, skipna=True, level=NaN) 返回样本值的峰度（四阶距）
ret = df.kurt(axis=0) #逐列求样本值的峰度（四阶距）
#Student
#Age NaN
#Chinese NaN
#English NaN
#Height NaN
#Math NaN
#dtype: float64
ret = df.kurt(axis=1) #逐行求样本值的峰度（四阶距）
#ID
#No.1 -0.582437
#No.2 -2.079006
#No.3 -1.879115
#dtype: float64
#df.cumsum(axis=0, skipna=True, level=NaN) 返回样本的累计和
ret = df.cumsum(axis=0) #逐列求累积和
#Student Age Chinese English Height Math
#ID
#No.1 18.0 50.0 98.0 1.6 60.0
#No.2 34.0 149.0 166.0 3.1 130.0
#No.3 53.0 219.0 235.0 4.8 230.0
ret = df.cumsum(axis=1)#逐行求累积和
#Student Age Chinese English Height Math
#ID
#No.1 18.0 68.0 166.0 167.6 227.6
#No.2 16.0 115.0 183.0 184.5 254.5
#No.3 19.0 89.0 158.0 159.7 259.7
#df.cummin(axis=0, skipna=True, level=NaN) 返回样本的累计最小值
ret = df.cummin(axis=0) #逐列求累计最小值
#Student Age Chinese English Height Math
#ID
#No.1 18.0 50.0 98.0 1.6 60.0
#No.2 16.0 50.0 68.0 1.5 60.0
#No.3 16.0 50.0 68.0 1.5 60.0
ret = df.cummin(axis=1) #逐行求累计最小值
#Student Age Chinese English Height Math
#ID
#No.1 18.0 18.0 18.0 1.6 1.6
#No.2 16.0 16.0 16.0 1.5 1.5
#No.3 19.0 19.0 19.0 1.7 1.7
#df.cummax(axis=0, skipna=True, level=NaN) 返回样本的累计最大值
ret = df.cummax(axis=0) #逐列求累计最大值
#Student Age Chinese English Height Math
#ID
#No.1 18.0 50.0 98.0 1.6 60.0
#No.2 18.0 99.0 98.0 1.6 70.0
#No.3 19.0 99.0 98.0 1.7 100.0
ret = df.cummax(axis=1) #逐行求累计最大值
#Student Age Chinese English Height Math
#ID
#No.1 18.0 50.0 98.0 98.0 98.0
#No.2 16.0 99.0 99.0 99.0 99.0
#No.3 19.0 70.0 70.0 70.0 100.0
#df.cumprod(axis=0, skipna=True, level=NaN) 返回样本的累计积
ret = df.cumprod(axis=0) #逐列求累计积
#Student Age Chinese English Height Math
#ID
#No.1 18.0 50.0 98.0 1.60 60.0
#No.2 288.0 4950.0 6664.0 2.40 4200.0
#No.3 5472.0 346500.0 459816.0 4.08 420000.0
ret = df.cumprod(axis=1) #逐行求累计积
#Student Age Chinese English Height Math
#ID
#No.1 18.0 900.0 88200.0 141120.0 8467200.0
#No.2 16.0 1584.0 107712.0 161568.0 11309760.0
#No.3 19.0 1330.0 91770.0 156009.0 15600900.0
#df.diff(axis=0) 返回样本的一阶差分
ret = df.diff(axis=0) #逐列求一阶差分
#Student Age Chinese English Height Math
#ID
#No.1 NaN NaN NaN NaN NaN
#No.2 -2.0 49.0 -30.0 -0.1 10.0
#No.3 3.0 -29.0 1.0 0.2 30.0
ret = df.diff(axis=1) #逐行求一阶差分
#<class 'pandas.core.frame.DataFrame'>
#Student Age Chinese English Height Math
#ID
#No.1 NaN 32.0 48.0 NaN -38.0
#No.2 NaN 83.0 -31.0 NaN 2.0
#No.3 NaN 51.0 -1.0 NaN 31.0
#df.pct_change(axis=0) 返回样本的百分比数变化
ret =df.pct_change(axis=0) #逐列求百分比数变化
#Student Age Chinese English Height Math
#ID
#No.1 NaN NaN NaN NaN NaN
#No.2 -0.111111 0.980000 -0.306122 -0.062500 0.166667
#No.3 0.187500 -0.292929 0.014706 0.133333 0.428571
ret = df.pct_change(axis=1) #逐行求百分比数变化
#Student Age Chinese English Height Math
#ID
#No.1 NaN 1.777778 0.960000 -0.983673 36.500000
#No.2 NaN 5.187500 -0.313131 -0.977941 45.666667
#No.3 NaN 2.684211 -0.014286 -0.975362 57.823529

6.DataFrame计算函数

# -*- coding: utf-8 -*-
"""
@author: 蔚蓝的天空Tom
Aim:实现DataFrame的计算函数的示例
df.add(df2, fill_value=NaN, axist=1) 元素级相加，对齐时找不到元素默认用fill_value
df.sub(df2, fill_value=NaN, axist=1) 元素级相减，对齐时找不到元素默认用fill_value
df.div(df2, fill_value=NaN, axist=1) 元素级相除，对齐时找不到元素默认用fill_value
df.mul(df2, fill_value=NaN, axist=1) 元素级相乘，对齐时找不到元素默认用fill_value
df.apply(f, axis=0) 将f函数应用到由各行各列所形成的一维数组上
df.applymap(f) 将f函数应用到各个元素上
df.cumsum(axis=0, skipna=True) 累加，返回累加后的dataframe
"""
import pandas as pd
from pandas import DataFrame
if __name__=='__main__':
data = {'Math':[2, 4, 6],
'English':[4, 8, 12]}
ind = ['No.1', 'No.2', 'No.3']
df1 = pd.DataFrame(data, index=ind)
df1.index.name = 'ID'
df1.columns.name = 'Student'
#Student English Math
#ID
#No.1 4 2
#No.2 8 4
#No.3 12 6
data = {'Math':[1,2,3],
'English':[2,4,6]}
ind = ['No.1', 'No.2', 'No.3']
df2 = pd.DataFrame(data, index=ind)
df2.index.name = 'ID'
df2.columns.name = 'Student'
#Student English Math
#ID
#No.1 2 1
#No.2 4 2
#No.3 6 3
#df.add(df2, fill_value=NaN, axist=1) 元素级相加，对齐时找不到元素默认用fill_value
ret = df1.add(df2) #对应元素相加
#Student English Math
#ID
#No.1 6 3
#No.2 12 6
#No.3 18 9
#df.sub(df2, fill_value=NaN, axist=1) 元素级相减，对齐时找不到元素默认用fill_value
ret = df1.sub(df2) #对应元素相减
#Student English Math
#ID
#No.1 2 1
#No.2 4 2
#No.3 6 3
#df.div(df2, fill_value=NaN, axist=1) 元素级相除，对齐时找不到元素默认用fill_value
ret = df1.div(df2) #对应元素相除
#Student English Math
#ID
#No.1 2.0 2.0
#No.2 2.0 2.0
#No.3 2.0 2.0
#df.mul(df2, fill_value=NaN, axist=1) 元素级相乘，对齐时找不到元素默认用fill_value
ret = df1.mul(df2) #对应元素相乘
#Student English Math
#ID
#No.1 8 2
#No.2 32 8
#No.3 72 18
#df.apply(f, axis=0) 将f函数应用到由各行各列所形成的一维数组上
#Student English Math
#ID
#No.1 4 2
#No.2 8 4
#No.3 12 6
import numpy as np
ret = df1.apply(np.square) #对每个元素进行开平方np.squre
#Student English Math
#ID
#No.1 16 4
#No.2 64 16
#No.3 144 36
#df.applymap(f) 将f函数应用到各个元素上
ret = df1.applymap(np.square)
#Student English Math
#ID
#No.1 16 4
#No.2 64 16
#No.3 144 36
#df.cumsum(axis=0, skipna=True) 累加，返回累加后的dataframe
#Student English Math
#ID
#No.1 4 2
#No.2 8 4
#No.3 12 6
ret = df1.cumsum(axis=0) #对每列内的元素，进行累加
#Student English Math
#ID
#No.1 4 2
#No.2 12 6
#No.3 24 12
ret = df1.cumsum(axis=1) #对每行内的元素，进行累加
#Student English Math
#ID
#No.1 4 6
#No.2 8 12
#No.3 12 18

7.DataFrame常用索引方式例程

# -*- coding: utf-8 -*-
"""
@author: 蔚蓝的天空Tom
Aim:完成DataFrame的索引方式的示例----df[], df.ix[], df.reindex(), df.xs(), df.icol()等
索引方式说明
df[val] 选取DataFrame的单个列或一组列
df.ix[val] 选取Dataframe的单个行或一组行
df.ix[:,val] 选取单个列或列子集
df.ix[val1,val2] 将一个或多个轴匹配到新索引
reindex方法将一个或多个轴匹配到新索引
xs方法根据标签选取单行或者单列，返回一个Series
icol、irow方法根据整数位置选取单列或单行，并返回一个Series
get_value、set_value 根据行标签和列标签选取单个值
"""
import pandas as pd
from pandas import DataFrame
if __name__=='__main__':
data = {'Name':['Tom', 'Kim', 'Andy'],
'Age':[18, 16, 19],
'Math':[95, 98, 96]}
ind = ['No.1', 'No.2', 'No.3']
df = pd.DataFrame(data, index=ind, columns=['Name', 'Age', 'Math'])
df.index.name = 'ID'
df.columns.name = 'Student'
#Student Name Age Math
#ID
#No.1 Tom 18 95
#No.2 Kim 16 98
#No.3 Andy 19 96
#选取DataFrame的单个列
ret = df[[0]] #df的第1列
#Student Name
#ID
#No.1 Tom
#No.2 Kim
#No.3 Andy
ret = df[[-1]] #df的最后一列
#Student Math
#ID
#No.1 95
#No.2 98
#No.3 96
ret = df[[-1, 0]] #df的最后一列和第一列
#Student Math Name
#ID
#No.1 95 Tom
#No.2 98 Kim
#No.3 96 Andy
#df.ix[val] 选取Dataframe的单个行或一组行
ret = df.ix[[0]] #df的第一行
#Student Name Age Math
#ID
#No.1 Tom 18 95
ret = df.ix[[-1]] #df的最后一行
#Student Name Age Math
#ID
#No.3 Andy 19 96
ret = df.ix[[-1,0]] #df的最后一行和第一行
#Student Name Age Math
#ID
#No.3 Andy 19 96
#No.1 Tom 18 95
#df.ix[:,val] 选取单个列或列子集
ret = df.ix[0:2, [0]] #第一列中从0到1序号的列子集
#Student Name
#ID
#No.1 Tom
#No.2 Kim
ret = df.ix[:-1, [0]] #第一列中不包含最后一个元素的列子集
#Student Name
#ID
#No.1 Tom
#No.2 Kim
#df.ix[val1,val2] 将一个或多个轴匹配到新索引
ret = df.ix[[0], [0]] #求第一行第一列元素
#Student Name
#ID
#No.1 Tom
ret = df.ix[[0], [1]] #求第一行第二列元素
#Student Age
#ID
#No.1 18
ret = df.ix[[1], [0]] #求第2行第一列元素
#Student Name
#ID
#No.2 Kim

df.reindex()+df.xs()+df.iloc[] + df.get_value() + df.get_values() + df.set_value()

import pandas as pd
from pandas import DataFrame
if __name__=='__main__':
data = {'Name':['Tom', 'Kim', 'Andy'],
'Age':[18, 16, 19],
'Height':[1.7, 1.5, 1.6]}
ind = ['No.1', 'No.2', 'No.3']
df = pd.DataFrame(data, index=ind, columns=['Name', 'Age', 'Height'])
df.index.name = 'ID'
df.columns.name = 'Student'
#Student Name Age Height
#ID
#No.1 Tom 18 1.7
#No.2 Kim 16 1.5
#No.3 Andy 19 1.6
#reindex方法将一个或多个轴匹配到新索引
ret = df.reindex(index=['No.3', 'No.2', 'No.1']) #按照指定的行索引显示
#Student Name Age Height
#ID
#No.3 Andy 19 1.6
#No.2 Kim 16 1.5
#No.1 Tom 18 1.7
ret = df.reindex(index=['No.3', 'No.2', 'No.1'], columns=['Name', 'Age'])
#Student Name Age
#ID
#No.3 Andy 19.0
#No.2 Kim 16.0
#No. NaN NaN
ret = df.reindex(index=['No.1'], columns=['Name', 'Age'])
#Student Name Age
#ID
#No.1 Tom 18
ret = df.reindex(index=['No.1'], columns=['Name'])
#Student Name
#ID
#No.1 Tom
#xs方法根据标签选取单行或者单列，返回一个Series
ret = df.xs(key='No.1', axis=0)#获取由key指定的行No.1，必须设置axis=0
#Student
#Name Tom
#Age 18
#Height 1.7
#Name: No.1, dtype: object
ret = df.xs(key='Name', axis=1) #获取由key指定的列Name，必须设置axis=1
#ID
#No.1 Tom
#No.2 Kim
#4No.3 Andy
#Name: Name, dtype: object
ret = df.xs(key='Age', axis=1) #获取由key指定的列Age，必须设置axis=1
#ID
#No.1 18

【pandas-汇总3】DataFrame常用属性、函数以及索引方式相关推荐

dataframe两个表合并_Part25:Pandas基础(Series,DataFrame类的创建、索引、切片、算术方法)...
一.为什么学习pandas numpy已经可以帮助我们进行数据的处理了,那么学习pandas的目的是什么呢? numpy能够帮助我们处理的是数值型的数据,当然在数据分析中除了数值型的数据还有好多其他类 ...
CSS入门（CSS常用属性----字体、对齐方式、display属性、浮动）
CSS常用属性设置 3.字体设置字体 font-family ①当font-family的属性值包含空格或特殊字符时,需要将font-family的属性值用引号括起来. ②font-family有& ...
数据分析（Numpy，Pandas，Matplotlib）常用API
目录 Numpy Pandas Series DataFrame Matplotlib Series和Dataframe的画图 seaborn Scipy Numpy: np.array ...
python pandas包,Python的常用包pandas,numpy
Pandas 1.DataFrame 和 Series 的介绍import pandas as pd #导入pandas 包 array = [[1,2,3],[3,4,5]] #创建列表 ...
dataframe常用操作_Pandas模块基础及常用方法
Pandas是基于Numpy的数据处理与分析模块.包含两个最重要的基本类型:Series和DataFrame.其中Series类似numpy的一维数组,DataFrame类似二维数组,但可存储不同类型 ...
pandas基础(part2)--DataFrame
学习笔记,这个笔记以例子为主. 开发工具:Spyder 文章目录数据框DateFrame 数据结构操作(举例) 列访问列添加列删除行访问行添加行删除修改DataFrame中的数据 Dat ...
python dataframe索引转成列_Pandas之DataFrame对象的列和索引之间的转化
约定: import pandas as pd DataFrame对象的列和索引之间的转化我们常常需要将DataFrame对象中的某列或某几列作为索引,或者将索引转化为对象的列.pandas提供了s ...
Pandas中DataFrame的属性、方法、常用操作以及使用示例
前言系列文章目录 [Python]目录视频及资料和课件链接:https://pan.baidu.com/s/1LCv_qyWslwB-MYw56fjbDg?pwd=1234 提取码:1234 文 ...
[Pandas] 查看DataFrame的常用属性
导入数据 import pandas as pddf = pd.DataFrame([['L123','A',0,123],['L456','A',1,456],['L437','C',0,789], ...

【pandas-汇总3】DataFrame常用属性、函数以及索引方式

1.DataFrame常用属性、函数以及索引方式

1.1DataFrame简介

1.2DataFrame常用属性

1.3DataFrame常用函数

1.3.1函数

1.3.2排序函数

1.3.3汇总统计函数

1.3.4计算函数

1.4DataFrame索引方式

运算：默认情况下，Dataframe和Series之间的算术运算会将Series的索引匹配到的Dataframe的列，沿着列一直向下传播。若索引找不到，则会重新索引产生并集。

2.DataFrame常用属性例程

3.DataFrame常用函数DataFrame()/reindex()/drop()

4.DataFrame排序函数

5.DataFrame汇总统计函数

6.DataFrame计算函数

7.DataFrame常用索引方式例程

【pandas-汇总3】DataFrame常用属性、函数以及索引方式相关推荐

最新文章

热门文章