import pandas as pd

pandas值series创建

t = pd.Series([1, 2, 31, 12, 3, 4])
t
0     1
1     2
2    31
3    12
4     3
5     4
dtype: int64
type(t)
pandas.core.series.Series

series指定索引

t2 = pd.Series([1,23,3,2,3],index=list('abcde'))
t2
a     1
b    23
c     3
d     2
e     3
dtype: int64
import numpy as np
import string
t2 = pd.Series(np.arange(10),index=list(string.ascii_uppercase[:10]))
t2
A    0
B    1
C    2
D    3
E    4
F    5
G    6
H    7
I    8
J    9
dtype: int32

通过字典创建一个series

temp_dict = {'name':'xiaohong','age':18,'tel':10086}
temp_dict
{'name': 'xiaohong', 'age': 18, 'tel': 10086}
t3 = pd.Series(temp_dict)
t3
name    xiaohong
age           18
tel        10086
dtype: object
t3.dtype
dtype('O')

Pandas切片

t3['age']
18
t3[0]
'xiaohong'
t3[[1,2]]  #取出第二、三行
age       18
tel    10086
dtype: object
t3[:3]  # 取出前三行
name    xiaohong
age           18
tel        10086
dtype: object
t3[['age','tel']]
age       18
tel    10086
dtype: object
t
0     1
1     2
2    31
3    12
4     3
5     4
dtype: int64
t[t>4]  # 把大于四的取出来
2    31
3    12
dtype: int64

pandas取出索引

t3.index
Index(['name', 'age', 'tel'], dtype='object')
for i in t3.index:print(i)
name
age
tel
type(t3.index)
pandas.core.indexes.base.Index
list(t3.index)[:2]
['name', 'age']
t3.values
array(['xiaohong', 18, 10086], dtype=object)
type(t3.values)
numpy.ndarray

读取文件

df = pd.read_csv('./can.csv')
df
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
1 20 1.004 0.090 -0.125
0 1 20 1.004 -0.043 -0.125
1 1 20 0.969 0.090 -0.121
2 1 20 0.973 -0.012 -0.137
3 1 20 1.000 -0.016 -0.121
4 1 20 0.961 0.082 -0.121
... ... ... ... ... ...
152994 3 100 1.051 0.090 -0.262
152995 3 100 0.918 0.039 -0.129
152996 3 100 1.156 -0.094 -0.227
152997 3 100 0.934 0.203 -0.172
152998 3 100 1.199 -0.176 0.109

152999 rows × 5 columns

df.head(10)
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
1 20 1.004 0.090 -0.125
0 1 20 1.004 -0.043 -0.125
1 1 20 0.969 0.090 -0.121
2 1 20 0.973 -0.012 -0.137
3 1 20 1.000 -0.016 -0.121
4 1 20 0.961 0.082 -0.121
5 1 20 0.973 -0.055 -0.109
6 1 20 1.000 0.012 -0.133
7 1 20 0.969 -0.102 -0.141
8 1 20 0.973 -0.059 -0.125
9 1 20 1.012 0.043 -0.133
import pandas as pd
import numpy as np
pd.DataFrame(np.arange(12).reshape(3,4))
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
pd.DataFrame(np.arange(12).reshape(3,4),index=list('abc'), columns=list('WXYZ'))
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
W X Y Z
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
d1 = {'name':['xiaoming','xiaogang'],'age':[12,20]}
d1
{'name': ['xiaoming', 'xiaogang'], 'age': [12, 20]}
t1 = pd.DataFrame(d1)
t1
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
name age
0 xiaoming 12
1 xiaogang 20
d2 = [{'name':'xioahong','age':20,'tel':10020},{'name':'xioaming','tel':123231},{'name':'xiaowang','age':18}]
d2
[{'name': 'xioahong', 'age': 20, 'tel': 10020},{'name': 'xioaming', 'tel': 123231},{'name': 'xiaowang', 'age': 18}]
t2 = pd.DataFrame(d2)
t2
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
name age tel
0 xioahong 20.0 10020.0
1 xioaming NaN 123231.0
2 xiaowang 18.0 NaN
df = pd.read_csv('./jd.csv')
print(df.head())  # 默认五行数据
  乐高(LEGO)积木 艺术系列ART 31202 米奇米妮 18岁+ 儿童玩具 马赛克像素画 男孩女孩成人情人节礼物     乐高京东自营旗舰店  \
0                              林家铺子水果罐头 什锦罐头 200g*2罐             林家铺子官方旗舰店
1  羽生结弦:王者之路( 超人气花样滑冰冠军羽生结弦全新传记,全面展示羽生10年成长经历和心路历程!)                 中信出版社
2            【话费慢充】全国电信话费充值手机特惠慢充话费200元 72小时内到账 200元          易士捷通讯充值拼购专营店
3  豪皇 潮汕牛肉丸500g*2包 火锅食材牛丸 烧烤丸串生鲜潮汕年夜饭火锅丸子 汕头手打牛肉丸 年货             邻家小厨生鲜专营店
4  伊利奶粉【全新升级】 金领冠系列 幼儿配方奶粉 3段1200克特惠三联装(1-3岁幼儿适用)...           伊利母婴京东自营旗舰店   764  1099.00    https://item.jd.com/100017067554.html
0     8274     6.90  https://item.jd.com/10029836000540.html
1    42023    49.00        https://item.jd.com/13598042.html
2      664   191.99    https://item.jd.com/200151598576.html
3       32    39.00     https://item.jd.com/65414277974.html
4  1480578   146.00         https://item.jd.com/1100526.html
print(df.info())  # 默认拿出列索引数据
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6054 entries, 0 to 6053
Data columns (total 5 columns):#   Column                                                      Non-Null Count  Dtype
---  ------                                                      --------------  -----  0   乐高(LEGO)积木 艺术系列ART 31202 米奇米妮 18岁+ 儿童玩具 马赛克像素画 男孩女孩成人情人节礼物  6054 non-null   object 1   乐高京东自营旗舰店                                                   5894 non-null   object 2   764                                                         6054 non-null   int64  3   1099.00                                                     6054 non-null   float644   https://item.jd.com/100017067554.html                       6054 non-null   object
dtypes: float64(1), int64(1), object(3)
memory usage: 236.6+ KB
None
# df = df.sort_values(by='Count_AnimalName',ascending=False)  # 打印出现次数最多的
# df
print(df[:20])  # 取前二十行数据
   乐高(LEGO)积木 艺术系列ART 31202 米奇米妮 18岁+ 儿童玩具 马赛克像素画 男孩女孩成人情人节礼物  \
0                               林家铺子水果罐头 什锦罐头 200g*2罐
1   羽生结弦:王者之路( 超人气花样滑冰冠军羽生结弦全新传记,全面展示羽生10年成长经历和心路历程!)
2             【话费慢充】全国电信话费充值手机特惠慢充话费200元 72小时内到账 200元
3   豪皇 潮汕牛肉丸500g*2包 火锅食材牛丸 烧烤丸串生鲜潮汕年夜饭火锅丸子 汕头手打牛肉丸 年货
4   伊利奶粉【全新升级】 金领冠系列 幼儿配方奶粉 3段1200克特惠三联装(1-3岁幼儿适用)...
5      【欧洲进口】法国原瓶进口 Roux家族黑舰经典混酿干红葡萄酒红酒送礼佳品750ml*6瓶整箱
6             良品铺子 香酥脆灰枣 酥脆小枣即食无核脆枣红枣干蜜饯果干休闲零食量贩装400g
7      小黄鸭(B.Duck)小学生书包男童女童一三年级男孩儿童减负护脊双肩包 sbd80008黄色
8   稳健医用外科口罩一次性医用口罩成人儿童可选 稳健口罩 三层防护 透气薄款防细菌口罩医用 1盒...
9                                 嗨吃家 酸辣粉清真宽粉112g*12袋
10  日本进口 黛珂Cosme Decorte牛油果乳液150ml 补水保湿 软化肤质 改善粗糙 ...
11                               嗨吃家正宗铁棍山药粉皮200g*5袋速食
12                     小鹿蓝蓝_酸奶溶豆 宝宝零食益生菌享6个月食谱 4口味各1盒
13                                      嗨吃家热干面176g*6袋
14       【药房直售】康速达 痔立克痔疮膏冷敷凝胶内外混合痔疮肉球肛门瘙痒男女 (周期型)实发两盒
15                             蒙牛  酸酸乳 原味250ml×24 礼盒装
16  蒂佳婷Dr.Jart+ 绿丸面膜贴片 舒缓镇静 补水保湿 水动力舒缓补水绿丸面膜25g*5片...
17                法国原瓶进口  杰朗克西里尔 赤霞珠 干红 葡萄酒 750ml 双支装
18  土土优选丹麦风味曲奇饼干 皇冠品质早餐网红休闲办公室零食年货72g/盒 十盒*(丹麦风味曲奇...
19            善存维生素C咀嚼片香橙口味补充维C120片 1盒 1盒*(15+15+90)片           乐高京东自营旗舰店      764  1099.00  \
0              林家铺子官方旗舰店     8274     6.90
1                  中信出版社    42023    49.00
2           易士捷通讯充值拼购专营店      664   191.99
3              邻家小厨生鲜专营店       32    39.00
4            伊利母婴京东自营旗舰店  1480578   146.00
5                玫嘉官方旗舰店      540   298.00
6            良品铺子京东自营旗舰店     5747    17.90
7               尚喜屋母婴旗舰店        9    88.00
8                稳健官方旗舰店    53390    16.90
9                 燕之北旗舰店      107    39.90
10         京东国际美妆自营跨境免税店    89640   289.00
11                燕之北旗舰店       10    39.90
12               小鹿蓝蓝旗舰店     1275    54.00
13                燕之北旗舰店       32    26.90
14             颐鹤堂大药房旗舰店     4553    69.00
15             蒙牛京东自营旗舰店   799422    44.90
16  蒂佳婷(Dr.Jart)海外京东自营专区   747604    98.00
17             禧家拾粮酒类旗舰店       27    39.90
18             土土优选官方旗舰店    14849    19.90
19                益尔益旗舰店     2040    49.00   https://item.jd.com/100017067554.html
0   https://item.jd.com/10029836000540.html
1         https://item.jd.com/13598042.html
2     https://item.jd.com/200151598576.html
3      https://item.jd.com/65414277974.html
4          https://item.jd.com/1100526.html
5      https://item.jd.com/22453030555.html
6     https://item.jd.com/100027854140.html
7   https://item.jd.com/10030112475646.html
8   https://item.jd.com/10021189665333.html
9   https://item.jd.com/10035393346580.html
10         https://item.jd.com/4972612.html
11  https://item.jd.com/10035809618060.html
12  https://item.jd.com/10038718351041.html
13  https://item.jd.com/10035527980479.html
14  https://item.jd.com/10033781790177.html
15         https://item.jd.com/1411416.html
16         https://item.jd.com/4858894.html
17  https://item.jd.com/10028867267738.html
18  https://item.jd.com/10026591565614.html
19     https://item.jd.com/47384323647.html

pandas爬取注意点

方括号写数组,表示取行,对行进行操作

取列表示取列索引,对列进行操作

print(df['764'])  # 具体取某一列的值
0          8274
1         42023
2           664
3            32
4       1480578...
6049    1952366
6050        769
6051        137
6052      21686
6053        276
Name: 764, Length: 6054, dtype: int64
print(type(df['764']))
<class 'pandas.core.series.Series'>
t3 = pd.DataFrame(np.arange(12).reshape(3,4),index=list('abc'),columns=list('WXYZ'))
t3
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
W X Y Z
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
t3.loc['a','Z']  # 指定取第几行第几列的数据
3
type(t3.loc['a','Z'])
numpy.int32
t3.loc['a']
W    0
X    1
Y    2
Z    3
Name: a, dtype: int32
t3.loc[:,'Y']
a     2
b     6
c    10
Name: Y, dtype: int32
t3.loc[['a','c'],:]
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
W X Y Z
a 0 1 2 3
c 8 9 10 11
t3.loc[['a','c'],['W','Z']]
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
W Z
a 0 3
c 8 11
t3
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
W X Y Z
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
t3.iloc[1] # 拿到第二行数据
W    4
X    5
Y    6
Z    7
Name: b, dtype: int32
t3.iloc[:,2]  # 取第三列
a     2
b     6
c    10
Name: Y, dtype: int32
t3.iloc[:,[2,1]]  # 取不连续的两列
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
Y X
a 2 1
b 6 5
c 10 9
t3.iloc[1:,:2] = np.nan
t3
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
W X Y Z
a 0.0 1.0 2 3
b NaN NaN 6 7
c NaN NaN 10 11
df
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
乐高(LEGO)积木 艺术系列ART 31202 米奇米妮 18岁+ 儿童玩具 马赛克像素画 男孩女孩成人情人节礼物 乐高京东自营旗舰店 764 1099.00 https://item.jd.com/100017067554.html
0 林家铺子水果罐头 什锦罐头 200g*2罐 林家铺子官方旗舰店 8274 6.90 https://item.jd.com/10029836000540.html
1 羽生结弦:王者之路( 超人气花样滑冰冠军羽生结弦全新传记,全面展示羽生10年成长经历和心路历程!) 中信出版社 42023 49.00 https://item.jd.com/13598042.html
2 【话费慢充】全国电信话费充值手机特惠慢充话费200元 72小时内到账 200元 易士捷通讯充值拼购专营店 664 191.99 https://item.jd.com/200151598576.html
3 豪皇 潮汕牛肉丸500g*2包 火锅食材牛丸 烧烤丸串生鲜潮汕年夜饭火锅丸子 汕头手打牛肉丸 年货 邻家小厨生鲜专营店 32 39.00 https://item.jd.com/65414277974.html
4 伊利奶粉【全新升级】 金领冠系列 幼儿配方奶粉 3段1200克特惠三联装(1-3岁幼儿适用)... 伊利母婴京东自营旗舰店 1480578 146.00 https://item.jd.com/1100526.html
... ... ... ... ... ...
6049 贝亲(Pigeon)宽口径玻璃奶瓶奶嘴套装 婴儿奶瓶240ml+自然实感婴儿奶嘴(L码+LL... 贝亲(Pigeon)京东自营旗舰店 1952366 172.00 https://item.jd.com/7639987.html
6050 尤果(YOUGUO)衣架子带晾衣夹子折叠晾衣架晒袜子架内衣架神器32夹子 可折叠加厚【1个3... 尤果生活日用拼购旗舰店 769 15.90 https://item.jd.com/10031481561764.html
6051 匹克态极闪现3代篮球鞋男2022春季新款耐磨缓震篮球运动鞋男鞋 大白-气泡配色 42 匹克官方旗舰店 137 669.00 https://item.jd.com/10039423932347.html
6052 超能 洗衣凝珠 洗衣凝珠 100颗 防串色 浓缩 酵素 香水味 花香型 洗衣球 洗衣珠 超能京东自营官方旗舰店 21686 119.00 https://item.jd.com/100011740813.html
6053 8册专注力训练书找不同迷宫书3-6岁儿童注意力观察记忆力智力开发全脑开发思维训练书籍 凤凰新华书店旗舰店 276 15.80 https://item.jd.com/71219454726.html

6054 rows × 5 columns

df.index  # 获取行索引数据
RangeIndex(start=0, stop=6054, step=1)
df.columns  # 获取列索引数据
Index(['乐高(LEGO)积木 艺术系列ART 31202 米奇米妮 18岁+ 儿童玩具 马赛克像素画 男孩女孩成人情人节礼物','乐高京东自营旗舰店', '764', '1099.00', 'https://item.jd.com/100017067554.html'],dtype='object')
df.dtypes  # 获取每一列的数据类型
乐高(LEGO)积木 艺术系列ART 31202 米奇米妮 18岁+ 儿童玩具 马赛克像素画 男孩女孩成人情人节礼物     object
乐高京东自营旗舰店                                                      object
764                                                             int64
1099.00                                                       float64
https://item.jd.com/100017067554.html                          object
dtype: object
df.values  # 获取值
array([['林家铺子水果罐头 什锦罐头 200g*2罐', '林家铺子官方旗舰店', 8274, 6.9,'https://item.jd.com/10029836000540.html'],['羽生结弦:王者之路( 超人气花样滑冰冠军羽生结弦全新传记,全面展示羽生10年成长经历和心路历程!)', '中信出版社',42023, 49.0, 'https://item.jd.com/13598042.html'],['【话费慢充】全国电信话费充值手机特惠慢充话费200元 72小时内到账 200元', '易士捷通讯充值拼购专营店', 664,191.99, 'https://item.jd.com/200151598576.html'],...,['匹克态极闪现3代篮球鞋男2022春季新款耐磨缓震篮球运动鞋男鞋 大白-气泡配色 42', '匹克官方旗舰店', 137,669.0, 'https://item.jd.com/10039423932347.html'],['超能 洗衣凝珠 洗衣凝珠 100颗 防串色 浓缩 酵素 香水味 花香型 洗衣球 洗衣珠', '超能京东自营官方旗舰店',21686, 119.0, 'https://item.jd.com/100011740813.html'],['8册专注力训练书找不同迷宫书3-6岁儿童注意力观察记忆力智力开发全脑开发思维训练书籍', '凤凰新华书店旗舰店', 276,15.8, 'https://item.jd.com/71219454726.html']], dtype=object)
df
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
乐高(LEGO)积木 艺术系列ART 31202 米奇米妮 18岁+ 儿童玩具 马赛克像素画 男孩女孩成人情人节礼物 乐高京东自营旗舰店 764 1099.00 https://item.jd.com/100017067554.html
0 林家铺子水果罐头 什锦罐头 200g*2罐 林家铺子官方旗舰店 8274 6.90 https://item.jd.com/10029836000540.html
1 羽生结弦:王者之路( 超人气花样滑冰冠军羽生结弦全新传记,全面展示羽生10年成长经历和心路历程!) 中信出版社 42023 49.00 https://item.jd.com/13598042.html
2 【话费慢充】全国电信话费充值手机特惠慢充话费200元 72小时内到账 200元 易士捷通讯充值拼购专营店 664 191.99 https://item.jd.com/200151598576.html
3 豪皇 潮汕牛肉丸500g*2包 火锅食材牛丸 烧烤丸串生鲜潮汕年夜饭火锅丸子 汕头手打牛肉丸 年货 邻家小厨生鲜专营店 32 39.00 https://item.jd.com/65414277974.html
4 伊利奶粉【全新升级】 金领冠系列 幼儿配方奶粉 3段1200克特惠三联装(1-3岁幼儿适用)... 伊利母婴京东自营旗舰店 1480578 146.00 https://item.jd.com/1100526.html
... ... ... ... ... ...
6049 贝亲(Pigeon)宽口径玻璃奶瓶奶嘴套装 婴儿奶瓶240ml+自然实感婴儿奶嘴(L码+LL... 贝亲(Pigeon)京东自营旗舰店 1952366 172.00 https://item.jd.com/7639987.html
6050 尤果(YOUGUO)衣架子带晾衣夹子折叠晾衣架晒袜子架内衣架神器32夹子 可折叠加厚【1个3... 尤果生活日用拼购旗舰店 769 15.90 https://item.jd.com/10031481561764.html
6051 匹克态极闪现3代篮球鞋男2022春季新款耐磨缓震篮球运动鞋男鞋 大白-气泡配色 42 匹克官方旗舰店 137 669.00 https://item.jd.com/10039423932347.html
6052 超能 洗衣凝珠 洗衣凝珠 100颗 防串色 浓缩 酵素 香水味 花香型 洗衣球 洗衣珠 超能京东自营官方旗舰店 21686 119.00 https://item.jd.com/100011740813.html
6053 8册专注力训练书找不同迷宫书3-6岁儿童注意力观察记忆力智力开发全脑开发思维训练书籍 凤凰新华书店旗舰店 276 15.80 https://item.jd.com/71219454726.html

6054 rows × 5 columns

mean_data = df['1099.00']
mean_data
0         6.90
1        49.00
2       191.99
3        39.00
4       146.00...
6049    172.00
6050     15.90
6051    669.00
6052    119.00
6053     15.80
Name: 1099.00, Length: 6054, dtype: float64
print('商品均价',mean_data.mean())
商品均价 332.4171737693964
df[mean_data==mean_data.min()]  #取出最便宜的商品
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
乐高(LEGO)积木 艺术系列ART 31202 米奇米妮 18岁+ 儿童玩具 马赛克像素画 男孩女孩成人情人节礼物 乐高京东自营旗舰店 764 1099.00 https://item.jd.com/100017067554.html
286 补运费专拍链接 熊出没官方旗舰店 0 1.0 https://item.jd.com/10042346578090.html
1906 【京选99新】苹果iPhone 12 ProMax 256GB 石墨色5G全网通 S12 勇科手机 2 1.0 https://item.jd.com/10040790836846.html
2047 Yottoy 瑜伽入门学习教程 yottoy京东自营旗舰店 11 1.0 https://item.jd.com/100018075841.html
2791 运费补运费专用链接(请勿单独拍) 补运费专用链接 荷尔健康大药房旗舰店 7 1.0 https://item.jd.com/10023059152178.html
3854 【准新机】【在保280天以上】iPhone13ProMax 5G全网通256G远峰蓝S18 勇科手机 1 1.0 https://item.jd.com/10041957238447.html
4398 定金 别克昂科拉 试驾享原厂精美试驾礼 【新车汽车买车SUV】 具体车型请与线下经销商协定 上汽通用别克官方旗舰店 0 1.0 https://item.jd.com/68629491955.html
4491 贵州茅台镇酱香型白酒整箱53度粮食窖藏老酒年货送礼酒水饮品江左盟大曲酱香酒 单瓶装 遵巡酒类专营店 2427 1.0 https://item.jd.com/10028009269896.html
5888 【准新机】【在保280天以上】iPhone13ProMax 5G全网通256G金色 S11 勇科手机 0 1.0 https://item.jd.com/10041957046439.html
print('总共有'+str(df['764'].count())+'个商品')
总共有6054个商品
t3
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
W X Y Z
a 0.0 1.0 2 3
b NaN NaN 6 7
c NaN NaN 10 11
t3[pd.notnull(t3['W'])]  # 删除W这一列有nan的行数据
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
W X Y Z
a 0.0 1.0 2 3
t3.dropna(axis=0)  # 删除所有含有nan的数值 不适用
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
W X Y Z
a 0.0 1.0 2 3
t3.dropna(axis=0,how='any',inplace=True)
t3
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
W X Y Z
a 0.0 1.0 2 3
t2
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
name age tel
0 xioahong 20.0 10020.0
1 xioaming NaN 123231.0
2 xiaowang 18.0 NaN
t2.fillna(0)  # 将nan替换为0或其他数值
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
name age tel
0 xioahong 20.0 10020.0
1 xioaming 0.0 123231.0
2 xiaowang 18.0 0.0
t2.fillna(t2.mean())  # 将nan替换为均值
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
name age tel
0 xioahong 20.0 10020.0
1 xioaming 19.0 123231.0
2 xiaowang 18.0 66625.5
t2['age'].fillna(t2['age'].mean(0))  # 替换age这一列
0    20.0
1    19.0
2    18.0
Name: age, dtype: float64
t3 = pd.DataFrame(np.arange(12).reshape(3,4),index=list('abc'),columns=list('WXYZ'))
t3
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
W X Y Z
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
t3[t3==0] = np.nan  # 0会参与计算,nan不会
t3
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
W X Y Z
a NaN 1 2 3
b 4.0 5 6 7
c 8.0 9 10 11
from matplotlib import pyplot as plt
import pandas as pd
file_path = './can.csv'
df = pd.read_csv(file_path)
df
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
1 20 1.004 0.090 -0.125
0 1 20 1.004 -0.043 -0.125
1 1 20 0.969 0.090 -0.121
2 1 20 0.973 -0.012 -0.137
3 1 20 1.000 -0.016 -0.121
4 1 20 0.961 0.082 -0.121
... ... ... ... ... ...
152994 3 100 1.051 0.090 -0.262
152995 3 100 0.918 0.039 -0.129
152996 3 100 1.156 -0.094 -0.227
152997 3 100 0.934 0.203 -0.172
152998 3 100 1.199 -0.176 0.109

152999 rows × 5 columns

print(df.info)
# rating,runtime分布情况
# 选择图形,直方图
# 准备数据
<bound method DataFrame.info of         1   20  1.004  0.090  -0.125
0       1   20  1.004 -0.043  -0.125
1       1   20  0.969  0.090  -0.121
2       1   20  0.973 -0.012  -0.137
3       1   20  1.000 -0.016  -0.121
4       1   20  0.961  0.082  -0.121
...    ..  ...    ...    ...     ...
152994  3  100  1.051  0.090  -0.262
152995  3  100  0.918  0.039  -0.129
152996  3  100  1.156 -0.094  -0.227
152997  3  100  0.934  0.203  -0.172
152998  3  100  1.199 -0.176   0.109[152999 rows x 5 columns]>
# 查看数据聚集在哪块
runtime_data = df["20"].values
print(runtime_data)
max_runtime = runtime_data.max()
min_runtime = runtime_data.min()
# 计算数组
num_bin = (max_runtime-min_runtime)//5
print(num_bin)
# 设置图形大小
plt.figure(figsize=(20,8),dpi=80)
plt.hist(runtime_data, num_bin)
plt.show()
[ 20  20  20 ... 100 100 100]
16

runtime_data = np.array([8.1,7.0,7.3,7.2,6.2,6.1,8.3,6.4,7.1,7.5,8.4,9.9,7.5,7.9,9.8,6.5,7.8,8.9,6.8,7.8,9.8,7.8,6.7,8.9,7.8,7.8,9.7,6.5,6.7,6.4,6.8,9.8,8.1,7.0,7.3,7.2,6.2,6.1,8.3,6.4,7.1,7.5,8.4,9.9,7.5,7.9,9.8,6.5,7.8,8.9,6.8,7.8,9.8,7.8,6.7,8.9,7.8,7.8,9.7,6.5,6.7,6.4,6.8,9.8,8.1,7.0,7.3,7.2,6.2,6.1,8.3,6.4,7.1,7.5,8.4,9.9,7.5,7.9,9.8,6.5,7.8,8.9,6.8,7.8,9.8,7.8,6.7,8.9,7.8,7.8,9.7,6.5,6.7,6.4,6.8,9.8])
print(runtime_data)
max_runtime = runtime_data.max()
min_runtime = runtime_data.min()
# 计算数组
# num_bin = (max_runtime-min_runtime)//5
# print(num_bin)
num_bin_list = [1.9,3.5]
i = 3.5
while i <= max_runtime:i += 0.5num_bin_list.append(i)
print(num_bin_list)
# 设置图形大小
plt.figure(figsize=(20,8),dpi=80)
plt.hist(runtime_data, num_bin_list)plt.xticks(num_bin_list)
plt.show()
[8.1 7.  7.3 7.2 6.2 6.1 8.3 6.4 7.1 7.5 8.4 9.9 7.5 7.9 9.8 6.5 7.8 8.96.8 7.8 9.8 7.8 6.7 8.9 7.8 7.8 9.7 6.5 6.7 6.4 6.8 9.8 8.1 7.  7.3 7.26.2 6.1 8.3 6.4 7.1 7.5 8.4 9.9 7.5 7.9 9.8 6.5 7.8 8.9 6.8 7.8 9.8 7.86.7 8.9 7.8 7.8 9.7 6.5 6.7 6.4 6.8 9.8 8.1 7.  7.3 7.2 6.2 6.1 8.3 6.47.1 7.5 8.4 9.9 7.5 7.9 9.8 6.5 7.8 8.9 6.8 7.8 9.8 7.8 6.7 8.9 7.8 7.89.7 6.5 6.7 6.4 6.8 9.8]
[1.9, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0]

import pandas as pddf = pd.DataFrame({'key':['A','B','C','A','B','C','A','B','C'],'data':[0,5,10,5,10,15,10,15,20]})
df
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
key data
0 A 0
1 B 5
2 C 10
3 A 5
4 B 10
5 C 15
6 A 10
7 B 15
8 C 20
for key in ['A','B','C']:print(key,df[df['key'] == key].sum())
A key     AAA
data     15
dtype: object
B key     BBB
data     30
dtype: object
C key     CCC
data     45
dtype: object
df
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
key data
0 A 0
1 B 5
2 C 10
3 A 5
4 B 10
5 C 15
6 A 10
7 B 15
8 C 20

groupby方法

df.groupby('key').sum()  # 同类取和
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
data
key
A 15
B 30
C 45
import numpy as np
df.groupby('key').aggregate(np.mean)  #映射  得到每一类的平均值
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
data
key
A 5
B 10
C 15
df = pd.read_csv('./can.csv')
df
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
1 20 1.004 0.090 -0.125
0 1 20 1.004 -0.043 -0.125
1 1 20 0.969 0.090 -0.121
2 1 20 0.973 -0.012 -0.137
3 1 20 1.000 -0.016 -0.121
4 1 20 0.961 0.082 -0.121
... ... ... ... ... ...
152994 3 100 1.051 0.090 -0.262
152995 3 100 0.918 0.039 -0.129
152996 3 100 1.156 -0.094 -0.227
152997 3 100 0.934 0.203 -0.172
152998 3 100 1.199 -0.176 0.109

152999 rows × 5 columns

df.groupby('1')['20'].mean()
1
1    60.000784
2    60.000000
3    60.000000
Name: 20, dtype: float64
df.groupby(by = '20').groups  # 对20这一列分组统计
{20: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], 25: [2999, 3000, 3001, 3002, 3003, 3004, 3005, 3006, 3007, 3008, 3009, 3010, 3011, 3012, 3013, 3014, 3015, 3016, 3017, 3018, 3019, 3020, 3021, 3022, 3023, 3024, 3025, 3026, 3027, 3028, 3029, 3030, 3031, 3032, 3033, 3034, 3035, 3036, 3037, 3038, 3039, 3040, 3041, 3042, 3043, 3044, 3045, 3046, 3047, 3048, 3049, 3050, 3051, 3052, 3053, 3054, 3055, 3056, 3057, 3058, 3059, 3060, 3061, 3062, 3063, 3064, 3065, 3066, 3067, 3068, 3069, 3070, 3071, 3072, 3073, 3074, 3075, 3076, 3077, 3078, 3079, 3080, 3081, 3082, 3083, 3084, 3085, 3086, 3087, 3088, 3089, 3090, 3091, 3092, 3093, 3094, 3095, 3096, 3097, 3098, ...], 30: [5999, 6000, 6001, 6002, 6003, 6004, 6005, 6006, 6007, 6008, 6009, 6010, 6011, 6012, 6013, 6014, 6015, 6016, 6017, 6018, 6019, 6020, 6021, 6022, 6023, 6024, 6025, 6026, 6027, 6028, 6029, 6030, 6031, 6032, 6033, 6034, 6035, 6036, 6037, 6038, 6039, 6040, 6041, 6042, 6043, 6044, 6045, 6046, 6047, 6048, 6049, 6050, 6051, 6052, 6053, 6054, 6055, 6056, 6057, 6058, 6059, 6060, 6061, 6062, 6063, 6064, 6065, 6066, 6067, 6068, 6069, 6070, 6071, 6072, 6073, 6074, 6075, 6076, 6077, 6078, 6079, 6080, 6081, 6082, 6083, 6084, 6085, 6086, 6087, 6088, 6089, 6090, 6091, 6092, 6093, 6094, 6095, 6096, 6097, 6098, ...], 35: [8999, 9000, 9001, 9002, 9003, 9004, 9005, 9006, 9007, 9008, 9009, 9010, 9011, 9012, 9013, 9014, 9015, 9016, 9017, 9018, 9019, 9020, 9021, 9022, 9023, 9024, 9025, 9026, 9027, 9028, 9029, 9030, 9031, 9032, 9033, 9034, 9035, 9036, 9037, 9038, 9039, 9040, 9041, 9042, 9043, 9044, 9045, 9046, 9047, 9048, 9049, 9050, 9051, 9052, 9053, 9054, 9055, 9056, 9057, 9058, 9059, 9060, 9061, 9062, 9063, 9064, 9065, 9066, 9067, 9068, 9069, 9070, 9071, 9072, 9073, 9074, 9075, 9076, 9077, 9078, 9079, 9080, 9081, 9082, 9083, 9084, 9085, 9086, 9087, 9088, 9089, 9090, 9091, 9092, 9093, 9094, 9095, 9096, 9097, 9098, ...], 40: [11999, 12000, 12001, 12002, 12003, 12004, 12005, 12006, 12007, 12008, 12009, 12010, 12011, 12012, 12013, 12014, 12015, 12016, 12017, 12018, 12019, 12020, 12021, 12022, 12023, 12024, 12025, 12026, 12027, 12028, 12029, 12030, 12031, 12032, 12033, 12034, 12035, 12036, 12037, 12038, 12039, 12040, 12041, 12042, 12043, 12044, 12045, 12046, 12047, 12048, 12049, 12050, 12051, 12052, 12053, 12054, 12055, 12056, 12057, 12058, 12059, 12060, 12061, 12062, 12063, 12064, 12065, 12066, 12067, 12068, 12069, 12070, 12071, 12072, 12073, 12074, 12075, 12076, 12077, 12078, 12079, 12080, 12081, 12082, 12083, 12084, 12085, 12086, 12087, 12088, 12089, 12090, 12091, 12092, 12093, 12094, 12095, 12096, 12097, 12098, ...], 45: [14999, 15000, 15001, 15002, 15003, 15004, 15005, 15006, 15007, 15008, 15009, 15010, 15011, 15012, 15013, 15014, 15015, 15016, 15017, 15018, 15019, 15020, 15021, 15022, 15023, 15024, 15025, 15026, 15027, 15028, 15029, 15030, 15031, 15032, 15033, 15034, 15035, 15036, 15037, 15038, 15039, 15040, 15041, 15042, 15043, 15044, 15045, 15046, 15047, 15048, 15049, 15050, 15051, 15052, 15053, 15054, 15055, 15056, 15057, 15058, 15059, 15060, 15061, 15062, 15063, 15064, 15065, 15066, 15067, 15068, 15069, 15070, 15071, 15072, 15073, 15074, 15075, 15076, 15077, 15078, 15079, 15080, 15081, 15082, 15083, 15084, 15085, 15086, 15087, 15088, 15089, 15090, 15091, 15092, 15093, 15094, 15095, 15096, 15097, 15098, ...], 50: [17999, 18000, 18001, 18002, 18003, 18004, 18005, 18006, 18007, 18008, 18009, 18010, 18011, 18012, 18013, 18014, 18015, 18016, 18017, 18018, 18019, 18020, 18021, 18022, 18023, 18024, 18025, 18026, 18027, 18028, 18029, 18030, 18031, 18032, 18033, 18034, 18035, 18036, 18037, 18038, 18039, 18040, 18041, 18042, 18043, 18044, 18045, 18046, 18047, 18048, 18049, 18050, 18051, 18052, 18053, 18054, 18055, 18056, 18057, 18058, 18059, 18060, 18061, 18062, 18063, 18064, 18065, 18066, 18067, 18068, 18069, 18070, 18071, 18072, 18073, 18074, 18075, 18076, 18077, 18078, 18079, 18080, 18081, 18082, 18083, 18084, 18085, 18086, 18087, 18088, 18089, 18090, 18091, 18092, 18093, 18094, 18095, 18096, 18097, 18098, ...], 55: [20999, 21000, 21001, 21002, 21003, 21004, 21005, 21006, 21007, 21008, 21009, 21010, 21011, 21012, 21013, 21014, 21015, 21016, 21017, 21018, 21019, 21020, 21021, 21022, 21023, 21024, 21025, 21026, 21027, 21028, 21029, 21030, 21031, 21032, 21033, 21034, 21035, 21036, 21037, 21038, 21039, 21040, 21041, 21042, 21043, 21044, 21045, 21046, 21047, 21048, 21049, 21050, 21051, 21052, 21053, 21054, 21055, 21056, 21057, 21058, 21059, 21060, 21061, 21062, 21063, 21064, 21065, 21066, 21067, 21068, 21069, 21070, 21071, 21072, 21073, 21074, 21075, 21076, 21077, 21078, 21079, 21080, 21081, 21082, 21083, 21084, 21085, 21086, 21087, 21088, 21089, 21090, 21091, 21092, 21093, 21094, 21095, 21096, 21097, 21098, ...], 60: [23999, 24000, 24001, 24002, 24003, 24004, 24005, 24006, 24007, 24008, 24009, 24010, 24011, 24012, 24013, 24014, 24015, 24016, 24017, 24018, 24019, 24020, 24021, 24022, 24023, 24024, 24025, 24026, 24027, 24028, 24029, 24030, 24031, 24032, 24033, 24034, 24035, 24036, 24037, 24038, 24039, 24040, 24041, 24042, 24043, 24044, 24045, 24046, 24047, 24048, 24049, 24050, 24051, 24052, 24053, 24054, 24055, 24056, 24057, 24058, 24059, 24060, 24061, 24062, 24063, 24064, 24065, 24066, 24067, 24068, 24069, 24070, 24071, 24072, 24073, 24074, 24075, 24076, 24077, 24078, 24079, 24080, 24081, 24082, 24083, 24084, 24085, 24086, 24087, 24088, 24089, 24090, 24091, 24092, 24093, 24094, 24095, 24096, 24097, 24098, ...], 65: [26999, 27000, 27001, 27002, 27003, 27004, 27005, 27006, 27007, 27008, 27009, 27010, 27011, 27012, 27013, 27014, 27015, 27016, 27017, 27018, 27019, 27020, 27021, 27022, 27023, 27024, 27025, 27026, 27027, 27028, 27029, 27030, 27031, 27032, 27033, 27034, 27035, 27036, 27037, 27038, 27039, 27040, 27041, 27042, 27043, 27044, 27045, 27046, 27047, 27048, 27049, 27050, 27051, 27052, 27053, 27054, 27055, 27056, 27057, 27058, 27059, 27060, 27061, 27062, 27063, 27064, 27065, 27066, 27067, 27068, 27069, 27070, 27071, 27072, 27073, 27074, 27075, 27076, 27077, 27078, 27079, 27080, 27081, 27082, 27083, 27084, 27085, 27086, 27087, 27088, 27089, 27090, 27091, 27092, 27093, 27094, 27095, 27096, 27097, 27098, ...], 70: [29999, 30000, 30001, 30002, 30003, 30004, 30005, 30006, 30007, 30008, 30009, 30010, 30011, 30012, 30013, 30014, 30015, 30016, 30017, 30018, 30019, 30020, 30021, 30022, 30023, 30024, 30025, 30026, 30027, 30028, 30029, 30030, 30031, 30032, 30033, 30034, 30035, 30036, 30037, 30038, 30039, 30040, 30041, 30042, 30043, 30044, 30045, 30046, 30047, 30048, 30049, 30050, 30051, 30052, 30053, 30054, 30055, 30056, 30057, 30058, 30059, 30060, 30061, 30062, 30063, 30064, 30065, 30066, 30067, 30068, 30069, 30070, 30071, 30072, 30073, 30074, 30075, 30076, 30077, 30078, 30079, 30080, 30081, 30082, 30083, 30084, 30085, 30086, 30087, 30088, 30089, 30090, 30091, 30092, 30093, 30094, 30095, 30096, 30097, 30098, ...], 75: [32999, 33000, 33001, 33002, 33003, 33004, 33005, 33006, 33007, 33008, 33009, 33010, 33011, 33012, 33013, 33014, 33015, 33016, 33017, 33018, 33019, 33020, 33021, 33022, 33023, 33024, 33025, 33026, 33027, 33028, 33029, 33030, 33031, 33032, 33033, 33034, 33035, 33036, 33037, 33038, 33039, 33040, 33041, 33042, 33043, 33044, 33045, 33046, 33047, 33048, 33049, 33050, 33051, 33052, 33053, 33054, 33055, 33056, 33057, 33058, 33059, 33060, 33061, 33062, 33063, 33064, 33065, 33066, 33067, 33068, 33069, 33070, 33071, 33072, 33073, 33074, 33075, 33076, 33077, 33078, 33079, 33080, 33081, 33082, 33083, 33084, 33085, 33086, 33087, 33088, 33089, 33090, 33091, 33092, 33093, 33094, 33095, 33096, 33097, 33098, ...], 80: [35999, 36000, 36001, 36002, 36003, 36004, 36005, 36006, 36007, 36008, 36009, 36010, 36011, 36012, 36013, 36014, 36015, 36016, 36017, 36018, 36019, 36020, 36021, 36022, 36023, 36024, 36025, 36026, 36027, 36028, 36029, 36030, 36031, 36032, 36033, 36034, 36035, 36036, 36037, 36038, 36039, 36040, 36041, 36042, 36043, 36044, 36045, 36046, 36047, 36048, 36049, 36050, 36051, 36052, 36053, 36054, 36055, 36056, 36057, 36058, 36059, 36060, 36061, 36062, 36063, 36064, 36065, 36066, 36067, 36068, 36069, 36070, 36071, 36072, 36073, 36074, 36075, 36076, 36077, 36078, 36079, 36080, 36081, 36082, 36083, 36084, 36085, 36086, 36087, 36088, 36089, 36090, 36091, 36092, 36093, 36094, 36095, 36096, 36097, 36098, ...], 85: [38999, 39000, 39001, 39002, 39003, 39004, 39005, 39006, 39007, 39008, 39009, 39010, 39011, 39012, 39013, 39014, 39015, 39016, 39017, 39018, 39019, 39020, 39021, 39022, 39023, 39024, 39025, 39026, 39027, 39028, 39029, 39030, 39031, 39032, 39033, 39034, 39035, 39036, 39037, 39038, 39039, 39040, 39041, 39042, 39043, 39044, 39045, 39046, 39047, 39048, 39049, 39050, 39051, 39052, 39053, 39054, 39055, 39056, 39057, 39058, 39059, 39060, 39061, 39062, 39063, 39064, 39065, 39066, 39067, 39068, 39069, 39070, 39071, 39072, 39073, 39074, 39075, 39076, 39077, 39078, 39079, 39080, 39081, 39082, 39083, 39084, 39085, 39086, 39087, 39088, 39089, 39090, 39091, 39092, 39093, 39094, 39095, 39096, 39097, 39098, ...], 90: [41999, 42000, 42001, 42002, 42003, 42004, 42005, 42006, 42007, 42008, 42009, 42010, 42011, 42012, 42013, 42014, 42015, 42016, 42017, 42018, 42019, 42020, 42021, 42022, 42023, 42024, 42025, 42026, 42027, 42028, 42029, 42030, 42031, 42032, 42033, 42034, 42035, 42036, 42037, 42038, 42039, 42040, 42041, 42042, 42043, 42044, 42045, 42046, 42047, 42048, 42049, 42050, 42051, 42052, 42053, 42054, 42055, 42056, 42057, 42058, 42059, 42060, 42061, 42062, 42063, 42064, 42065, 42066, 42067, 42068, 42069, 42070, 42071, 42072, 42073, 42074, 42075, 42076, 42077, 42078, 42079, 42080, 42081, 42082, 42083, 42084, 42085, 42086, 42087, 42088, 42089, 42090, 42091, 42092, 42093, 42094, 42095, 42096, 42097, 42098, ...], 95: [44999, 45000, 45001, 45002, 45003, 45004, 45005, 45006, 45007, 45008, 45009, 45010, 45011, 45012, 45013, 45014, 45015, 45016, 45017, 45018, 45019, 45020, 45021, 45022, 45023, 45024, 45025, 45026, 45027, 45028, 45029, 45030, 45031, 45032, 45033, 45034, 45035, 45036, 45037, 45038, 45039, 45040, 45041, 45042, 45043, 45044, 45045, 45046, 45047, 45048, 45049, 45050, 45051, 45052, 45053, 45054, 45055, 45056, 45057, 45058, 45059, 45060, 45061, 45062, 45063, 45064, 45065, 45066, 45067, 45068, 45069, 45070, 45071, 45072, 45073, 45074, 45075, 45076, 45077, 45078, 45079, 45080, 45081, 45082, 45083, 45084, 45085, 45086, 45087, 45088, 45089, 45090, 45091, 45092, 45093, 45094, 45095, 45096, 45097, 45098, ...], 100: [47999, 48000, 48001, 48002, 48003, 48004, 48005, 48006, 48007, 48008, 48009, 48010, 48011, 48012, 48013, 48014, 48015, 48016, 48017, 48018, 48019, 48020, 48021, 48022, 48023, 48024, 48025, 48026, 48027, 48028, 48029, 48030, 48031, 48032, 48033, 48034, 48035, 48036, 48037, 48038, 48039, 48040, 48041, 48042, 48043, 48044, 48045, 48046, 48047, 48048, 48049, 48050, 48051, 48052, 48053, 48054, 48055, 48056, 48057, 48058, 48059, 48060, 48061, 48062, 48063, 48064, 48065, 48066, 48067, 48068, 48069, 48070, 48071, 48072, 48073, 48074, 48075, 48076, 48077, 48078, 48079, 48080, 48081, 48082, 48083, 48084, 48085, 48086, 48087, 48088, 48089, 48090, 48091, 48092, 48093, 48094, 48095, 48096, 48097, 48098, ...]}
df = pd.DataFrame([[1,2,3],[4,5,6]],index=['a','b'],columns=['A','B','C'])
df
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
A B C
a 1 2 3
b 4 5 6
df.sum(axis=0)
A    5
B    7
C    9
dtype: int64
df.sum(axis=1)
a     6
b    15
dtype: int64
df.sum(axis='columns')
a     6
b    15
dtype: int64
df.max(axis=0)
A    4
B    5
C    6
dtype: int64
df.median(axis=0)  # 中位数
A    2.5
B    3.5
C    4.5
dtype: float64

二元统计

df = pd.read_csv('./can.csv')
df
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
1 20 1.004 0.090 -0.125
0 1 20 1.004 -0.043 -0.125
1 1 20 0.969 0.090 -0.121
2 1 20 0.973 -0.012 -0.137
3 1 20 1.000 -0.016 -0.121
4 1 20 0.961 0.082 -0.121
... ... ... ... ... ...
152994 3 100 1.051 0.090 -0.262
152995 3 100 0.918 0.039 -0.129
152996 3 100 1.156 -0.094 -0.227
152997 3 100 0.934 0.203 -0.172
152998 3 100 1.199 -0.176 0.109

152999 rows × 5 columns

df.head()
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
1 20 1.004 0.090 -0.125
0 1 20 1.004 -0.043 -0.125
1 1 20 0.969 0.090 -0.121
2 1 20 0.973 -0.012 -0.137
3 1 20 1.000 -0.016 -0.121
4 1 20 0.961 0.082 -0.121
df.cov()  # 计算协方差
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
1 20 1.004 0.090 -0.125
1 0.666669 -0.000261 -0.003833 0.003257 0.000941
20 -0.000261 599.997386 0.040354 0.052441 0.113526
1.004 -0.003833 0.040354 0.599015 0.012148 -0.036479
0.090 0.003257 0.052441 0.012148 0.551461 -0.010641
-0.125 0.000941 0.113526 -0.036479 -0.010641 0.267299
df.corr()  # 计算相关系数 取值范围【-1,1】
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
1 20 1.004 0.090 -0.125
1 1.000000 -0.000013 -0.006065 0.005372 0.002228
20 -0.000013 1.000000 0.002129 0.002883 0.008964
1.004 -0.006065 0.002129 1.000000 0.021137 -0.091164
0.090 0.005372 0.002883 0.021137 1.000000 -0.027716
-0.125 0.002228 0.008964 -0.091164 -0.027716 1.000000
df
.dataframe tbody tr th {vertical-align: top;
}.dataframe thead th {text-align: right;
}
1 20 1.004 0.090 -0.125
0 1 20 1.004 -0.043 -0.125
1 1 20 0.969 0.090 -0.121
2 1 20 0.973 -0.012 -0.137
3 1 20 1.000 -0.016 -0.121
4 1 20 0.961 0.082 -0.121
... ... ... ... ... ...
152994 3 100 1.051 0.090 -0.262
152995 3 100 0.918 0.039 -0.129
152996 3 100 1.156 -0.094 -0.227
152997 3 100 0.934 0.203 -0.172
152998 3 100 1.199 -0.176 0.109

152999 rows × 5 columns

df['1.004'].value_counts()  # 统计相同数的数量
 0.980    44520.977    43580.996    42321.000    41940.984    4166...
-3.547       15.844       14.988       16.816       1
-3.668       1
Name: 1.004, Length: 2733, dtype: int64
df['1.004'].value_counts(ascending=True)  # 指定排序方法
-3.668       16.816       14.988       15.844       1
-3.547       1... 0.984    41661.000    41940.996    42320.977    43580.980    4452
Name: 1.004, Length: 2733, dtype: int64
df['1.004'].value_counts(ascending=True, bins=5)  # 指定划分区间,进行统计
(-8.017, -4.801]       118
(4.797, 7.996]         635
(-4.801, -1.602]      1613
(1.598, 4.797]       10207
(-1.602, 1.598]     140426
Name: 1.004, dtype: int64
df['1'].value_counts(ascending=True)  # 统计1,2,3的各自数量
1    50999
2    51000
3    51000
Name: 1, dtype: int64

数据处理Pandas学习笔记(一)相关推荐

  1. 数据处理Pandas学习笔记(二)

    对象操作 对象的增删改查 data = [10,11,12] index = ['a','b','c'] s = pd.Series(data=data, index=index) s a 10 b ...

  2. pandas学习笔记:pandas.Dataframe.rename()函数用法

    pandas学习笔记:pandas.Dataframe.rename()函数用法 pandas.Dataframe.rename()函数主要是用来修改Dataframe数据的行名和列名. 主要用到的参 ...

  3. Pandas学习笔记(一)

    Pandas学习笔记一 Pandas数组读取 读取csv.tsv.txt文件 读取excel文件 读取mysql数据表 Pandas数据结构 创建Series的几种方法 根据标签查询Series数据 ...

  4. pandas学习笔记之DateFrame

    pandas学习笔记之DateFrame 文章目录 pandas学习笔记之DateFrame 1.DateFrame的创建 1)认识DataFrame对象 2)由二维列表创建(默认index和colu ...

  5. 数据分析之pandas学习笔记(六)(层次化索引、重塑、轴向旋转、行列变换、合并表数据)

    数据分析之Pandas学习笔记(六)(层次化索引.重塑.轴向旋转.行列变换.合并表数据) level层次化索引 unstack()与stack()进行重塑,即:行列索引变换 swaplevel()交换 ...

  6. [Pandas 学习笔记] - No.1 pandas学习笔记

    pandas学习笔记 pandas是基于numpy开发出的数据分析包,用于高效地操作大型数据集.pandas的数据结构有三种 分别为 series,dataframe和panel,对应一维,二维,三维 ...

  7. pandas学习笔记之Series

    pandas学习笔记之Series 文章目录 pandas学习笔记之Series pandas中Series的创建 1)用python中的列表list创建: 2)用numpy数组创建 3)用pytho ...

  8. pandas学习笔记(三):数据的变换与数据的管理

    注:学习笔记基于文彤老师的pandas的系列课程 课程链接:https://study.163.com/course/courseMain.htm?courseId=1005124008&sh ...

  9. Python入门:对Excel数据处理的学习笔记【第四章】字符串类型处理技术

    注:该学习笔记是根据曾志贤老师编写的<从Excel到Python,用Python轻松处理Excel数据>所学习整理的笔记. 第四章 字符串类型处理技术 目录 第四章 字符串类型处理技术 一 ...

最新文章

  1. 完整SQL分页存储过程(支持多表联接)
  2. keepalive的 nopreempt 非抢占
  3. SpringMVC Spring Mybatis Druid SpringSession集成例子
  4. [python] 解决pip install download速度过慢问题 更换豆瓣源
  5. c语言加法结合性,C语言 运算符 的结合性 怎么理解?求举例子详细解答!!
  6. Java 8.if语句
  7. 2021年北京学校高考成绩查询,2021年北京高考成绩查询时间及入口【官方】
  8. 如何在几天时间内快速理解一个陌生行业?
  9. Quartus II下进行SignalTap仿真
  10. 15个简单的JS编码标准让你的代码更整洁
  11. 50套3dmax家具建模详细教程 3dmax床建模教程丨3Dmax基础教程3dmax教学3dmax室内设计教程
  12. qq linux五笔输入法下载软件,qq五笔输入法
  13. 2022 Apache IoTDB 物联网生态大会成功举办,见证工业数据已然创造的未来
  14. py0_二十一天计划书(前言以及本计划书)
  15. 阿里云轻量级GPU计算型vgn6i云服务器配置性能详解
  16. 微信支付-APP支付
  17. 盖茨基金会与前Ripple CTO的区块链项目达成合作
  18. android 2048 动画,大杀器Bodymovin和Lottie:把AE动画转换成HTML5/Android/iOS原生动画
  19. SQL server如何导入数据库.MDF文件
  20. 做不好这一点,企业微信运营可能要全“白搭”!【企业微信增长神器Vol.1】

热门文章

  1. 最新云核泛目录自带MIP模板开源站群系统源码
  2. 经典C源程序100例
  3. 资产超2000亿美元,48小时闪崩:硅谷银行破产啦!一大波科技公司发不出工资?...
  4. oracle 时间格式化 to——datetime,精通 Oracle+Python,第 2 部分:处理时间和日期
  5. Windows 10 下载 (4)
  6. Oracle暂停俄罗斯业务,国产数据库发展正当时
  7. python comprehension_什么是Python List Comprehension?
  8. 会Python的大学生,找工作有多赚?
  9. jvm crash 的原因以及解决办法
  10. 【JZOJ4253】QYQ在艾泽拉斯