Pandas入门教程(三)
时间格式化和时间查询
import pandas as pd
date=pd.Timestamp('2020/09/27 13:30:00')
print(date)
2020-09-27 13:30:00
#年
print(date.year)
#月
print(date.month)
#日
print(date.day)
#时
print(date.hour)
#分
print(date.minute)
#秒
print(date.second)
2020
9
27
13
30
0
#加5天
date+pd.Timedelta('5 days')
Timestamp('2020-10-02 13:30:00')
#时间转换
res=pd.to_datetime('2020-09-10 13:20:00')
print(res.year)
2020
#生成一列数据
se=pd.Series(['2020-11-24 00:00:00','2020-11-25 00:00:00','2020-11-26 00:00:00'])
print(se)
0 2020-11-24 00:00:00
1 2020-11-25 00:00:00
2 2020-11-26 00:00:00
dtype: object
# 转成时间格式
print(pd.to_datetime(se))
0 2020-11-24
1 2020-11-25
2 2020-11-26
dtype: datetime64[ns]
# 生成一个等差的时间序列,从2020-09-17开始,长度为periods,间隔为12H
pd.Series(pd.date_range(start='2020-09-17',periods=10,freq='12H'))
0 2020-09-17 00:00:00
1 2020-09-17 12:00:00
2 2020-09-18 00:00:00
3 2020-09-18 12:00:00
4 2020-09-19 00:00:00
5 2020-09-19 12:00:00
6 2020-09-20 00:00:00
7 2020-09-20 12:00:00
8 2020-09-21 00:00:00
9 2020-09-21 12:00:00
dtype: datetime64[ns]
df=pd.read_csv('./pandas/data/flowdata.csv')
# 将Time列转换成时间格式
df['Time']=pd.to_datetime(df['Time'])
df=df.set_index(df['Time'])
print(df)
Time L06_347 LS06_347 LS06_348
Time
2009-01-01 00:00:00 2009-01-01 00:00:00 0.137417 0.097500 0.016833
2009-01-01 03:00:00 2009-01-01 03:00:00 0.131250 0.088833 0.016417
2009-01-01 06:00:00 2009-01-01 06:00:00 0.113500 0.091250 0.016750
2009-01-01 09:00:00 2009-01-01 09:00:00 0.135750 0.091500 0.016250
2009-01-01 12:00:00 2009-01-01 12:00:00 0.140917 0.096167 0.017000
... ... ... ... ...
2013-01-01 12:00:00 2013-01-01 12:00:00 1.710000 1.710000 0.129583
2013-01-01 15:00:00 2013-01-01 15:00:00 1.420000 1.420000 0.096333
2013-01-01 18:00:00 2013-01-01 18:00:00 1.178583 1.178583 0.083083
2013-01-01 21:00:00 2013-01-01 21:00:00 0.898250 0.898250 0.077167
2013-01-02 00:00:00 2013-01-02 00:00:00 0.860000 0.860000 0.075000[11697 rows x 4 columns]
# 打印索引
print(df.index)
DatetimeIndex(['2009-01-01 00:00:00', '2009-01-01 03:00:00','2009-01-01 06:00:00', '2009-01-01 09:00:00','2009-01-01 12:00:00', '2009-01-01 15:00:00','2009-01-01 18:00:00', '2009-01-01 21:00:00','2009-01-02 00:00:00', '2009-01-02 03:00:00',...'2012-12-31 21:00:00', '2013-01-01 00:00:00','2013-01-01 03:00:00', '2013-01-01 06:00:00','2013-01-01 09:00:00', '2013-01-01 12:00:00','2013-01-01 15:00:00', '2013-01-01 18:00:00','2013-01-01 21:00:00', '2013-01-02 00:00:00'],dtype='datetime64[ns]', name='Time', length=11697, freq=None)
# 读取时间的时候自动将时间转换成datetime
df=pd.read_csv('./pandas/data/flowdata.csv',index_col=0,parse_dates=True)
print(df)
L06_347 LS06_347 LS06_348
Time
2009-01-01 00:00:00 0.137417 0.097500 0.016833
2009-01-01 03:00:00 0.131250 0.088833 0.016417
2009-01-01 06:00:00 0.113500 0.091250 0.016750
2009-01-01 09:00:00 0.135750 0.091500 0.016250
2009-01-01 12:00:00 0.140917 0.096167 0.017000
... ... ... ...
2013-01-01 12:00:00 1.710000 1.710000 0.129583
2013-01-01 15:00:00 1.420000 1.420000 0.096333
2013-01-01 18:00:00 1.178583 1.178583 0.083083
2013-01-01 21:00:00 0.898250 0.898250 0.077167
2013-01-02 00:00:00 0.860000 0.860000 0.075000[11697 rows x 3 columns]
# 通过时间索引筛选数据
print(df[('2009-01-01 00:00:00'):('2009-01-01 12:00:00')])
L06_347 LS06_347 LS06_348
Time
2009-01-01 00:00:00 0.137417 0.097500 0.016833
2009-01-01 03:00:00 0.131250 0.088833 0.016417
2009-01-01 06:00:00 0.113500 0.091250 0.016750
2009-01-01 09:00:00 0.135750 0.091500 0.016250
2009-01-01 12:00:00 0.140917 0.096167 0.017000
# 查看数据的最后10条数据
print(df.tail(10))
L06_347 LS06_347 LS06_348
Time
2012-12-31 21:00:00 0.846500 0.846500 0.170167
2013-01-01 00:00:00 1.688333 1.688333 0.207333
2013-01-01 03:00:00 2.693333 2.693333 0.201500
2013-01-01 06:00:00 2.220833 2.220833 0.166917
2013-01-01 09:00:00 2.055000 2.055000 0.175667
2013-01-01 12:00:00 1.710000 1.710000 0.129583
2013-01-01 15:00:00 1.420000 1.420000 0.096333
2013-01-01 18:00:00 1.178583 1.178583 0.083083
2013-01-01 21:00:00 0.898250 0.898250 0.077167
2013-01-02 00:00:00 0.860000 0.860000 0.075000
# 通过年,月,日筛选数据
print(df['2012'])
print(df['2012-01'])
print(df['2012-01-31'])
L06_347 LS06_347 LS06_348
Time
2012-01-01 00:00:00 0.307167 0.273917 0.028000
2012-01-01 03:00:00 0.302917 0.270833 0.030583
2012-01-01 06:00:00 0.331500 0.284750 0.030917
2012-01-01 09:00:00 0.330750 0.293583 0.029750
2012-01-01 12:00:00 0.295000 0.285167 0.031750
... ... ... ...
2012-12-31 09:00:00 0.682750 0.682750 0.066583
2012-12-31 12:00:00 0.651250 0.651250 0.063833
2012-12-31 15:00:00 0.629000 0.629000 0.061833
2012-12-31 18:00:00 0.617333 0.617333 0.060583
2012-12-31 21:00:00 0.846500 0.846500 0.170167[2928 rows x 3 columns]L06_347 LS06_347 LS06_348
Time
2012-01-01 00:00:00 0.307167 0.273917 0.028000
2012-01-01 03:00:00 0.302917 0.270833 0.030583
2012-01-01 06:00:00 0.331500 0.284750 0.030917
2012-01-01 09:00:00 0.330750 0.293583 0.029750
2012-01-01 12:00:00 0.295000 0.285167 0.031750
... ... ... ...
2012-01-31 09:00:00 0.191000 0.231250 0.025583
2012-01-31 12:00:00 0.183333 0.227167 0.025917
2012-01-31 15:00:00 0.163417 0.221000 0.023750
2012-01-31 18:00:00 0.157083 0.220667 0.023167
2012-01-31 21:00:00 0.160083 0.214750 0.023333[248 rows x 3 columns]L06_347 LS06_347 LS06_348
Time
2012-01-31 00:00:00 0.191250 0.247417 0.025917
2012-01-31 03:00:00 0.181083 0.241583 0.025833
2012-01-31 06:00:00 0.188750 0.236750 0.026000
2012-01-31 09:00:00 0.191000 0.231250 0.025583
2012-01-31 12:00:00 0.183333 0.227167 0.025917
2012-01-31 15:00:00 0.163417 0.221000 0.023750
2012-01-31 18:00:00 0.157083 0.220667 0.023167
2012-01-31 21:00:00 0.160083 0.214750 0.023333
# 选择一段时间的数据
print(df['2009-01-01':'2012-01-01'])
L06_347 LS06_347 LS06_348
Time
2009-01-01 00:00:00 0.137417 0.097500 0.016833
2009-01-01 03:00:00 0.131250 0.088833 0.016417
2009-01-01 06:00:00 0.113500 0.091250 0.016750
2009-01-01 09:00:00 0.135750 0.091500 0.016250
2009-01-01 12:00:00 0.140917 0.096167 0.017000
... ... ... ...
2012-01-01 09:00:00 0.330750 0.293583 0.029750
2012-01-01 12:00:00 0.295000 0.285167 0.031750
2012-01-01 15:00:00 0.301417 0.287750 0.031417
2012-01-01 18:00:00 0.322083 0.304167 0.038083
2012-01-01 21:00:00 0.355417 0.346500 0.080917[8768 rows x 3 columns]
# 筛选所有1月份的数据
print(df[df.index.month==1])
L06_347 LS06_347 LS06_348
Time
2009-01-01 00:00:00 0.137417 0.097500 0.016833
2009-01-01 03:00:00 0.131250 0.088833 0.016417
2009-01-01 06:00:00 0.113500 0.091250 0.016750
2009-01-01 09:00:00 0.135750 0.091500 0.016250
2009-01-01 12:00:00 0.140917 0.096167 0.017000
... ... ... ...
2013-01-01 12:00:00 1.710000 1.710000 0.129583
2013-01-01 15:00:00 1.420000 1.420000 0.096333
2013-01-01 18:00:00 1.178583 1.178583 0.083083
2013-01-01 21:00:00 0.898250 0.898250 0.077167
2013-01-02 00:00:00 0.860000 0.860000 0.075000[1001 rows x 3 columns]
# 筛选所有8点到12点的数据
print(df[(df.index.hour>8) & (df.index.hour<12)])# 或者通过between_time筛选8点到12点的数据
print(df.between_time('8:00','12:00'))
L06_347 LS06_347 LS06_348
Time
2009-01-01 09:00:00 0.135750 0.091500 0.016250
2009-01-02 09:00:00 0.141917 0.097083 0.016417
2009-01-03 09:00:00 0.124583 0.084417 0.015833
2009-01-04 09:00:00 0.109000 0.105167 0.018000
2009-01-05 09:00:00 0.161500 0.114583 0.021583
... ... ... ...
2012-12-28 09:00:00 0.961500 0.961500 0.092417
2012-12-29 09:00:00 0.786833 0.786833 0.077000
2012-12-30 09:00:00 0.916000 0.916000 0.101583
2012-12-31 09:00:00 0.682750 0.682750 0.066583
2013-01-01 09:00:00 2.055000 2.055000 0.175667[1462 rows x 3 columns]L06_347 LS06_347 LS06_348
Time
2009-01-01 09:00:00 0.135750 0.091500 0.016250
2009-01-01 12:00:00 0.140917 0.096167 0.017000
2009-01-02 09:00:00 0.141917 0.097083 0.016417
2009-01-02 12:00:00 0.147833 0.101917 0.016417
2009-01-03 09:00:00 0.124583 0.084417 0.015833
... ... ... ...
2012-12-30 12:00:00 1.465000 1.465000 0.086833
2012-12-31 09:00:00 0.682750 0.682750 0.066583
2012-12-31 12:00:00 0.651250 0.651250 0.063833
2013-01-01 09:00:00 2.055000 2.055000 0.175667
2013-01-01 12:00:00 1.710000 1.710000 0.129583[2924 rows x 3 columns]
resample重采样
# 按天求平均值
df=df.resample('D').mean()
print(df)
L06_347 LS06_347 LS06_348
Time
2009-01-01 0.125010 0.092281 0.016635
2009-01-02 0.124146 0.095781 0.016406
2009-01-03 0.113562 0.085542 0.016094
2009-01-04 0.140198 0.102708 0.017323
2009-01-05 0.128812 0.104490 0.018167
... ... ... ...
2012-12-29 0.807604 0.807604 0.078031
2012-12-30 1.027240 1.027240 0.088000
2012-12-31 0.748365 0.748365 0.081417
2013-01-01 1.733042 1.733042 0.142198
2013-01-02 0.860000 0.860000 0.075000[1463 rows x 3 columns]
# 求3天的平均值
print(df.resample('3D').mean())
L06_347 LS06_347 LS06_348
Time
2009-01-01 0.120906 0.091201 0.016378
2009-01-04 0.121594 0.091708 0.016670
2009-01-07 0.097042 0.070740 0.014479
2009-01-10 0.115941 0.086340 0.014545
2009-01-13 0.346962 0.364549 0.034198
... ... ... ...
2012-12-20 0.996337 0.996337 0.114472
2012-12-23 2.769059 2.769059 0.225542
2012-12-26 1.451583 1.451583 0.140101
2012-12-29 0.861069 0.861069 0.082483
2013-01-01 1.296521 1.296521 0.108599[488 rows x 3 columns]
# 求1个月的平均值
print(df.resample('M').mean().head())
L06_347 LS06_347 LS06_348
Time
2009-01-31 0.517864 0.536660 0.045597
2009-02-28 0.516847 0.529987 0.047238
2009-03-31 0.372536 0.382359 0.037508
2009-04-30 0.163182 0.129354 0.021356
2009-05-31 0.178588 0.160616 0.020744
# 求某个时间单位的平均值,最大值,最小值
print(df.resample('M').min().head())
print(df.resample('M').max().head())
L06_347 LS06_347 LS06_348
Time
2009-01-31 0.078156 0.058438 0.013573
2009-02-28 0.182646 0.135667 0.019073
2009-03-31 0.131385 0.098875 0.016979
2009-04-30 0.078510 0.066375 0.013917
2009-05-31 0.060771 0.047969 0.013656L06_347 LS06_347 LS06_348
Time
2009-01-31 5.933531 6.199927 0.404708
2009-02-28 4.407604 4.724583 0.231750
2009-03-31 1.337896 1.586833 0.116969
2009-04-30 0.275698 0.247312 0.037375
2009-05-31 2.184250 2.433073 0.168792
列排序
df = pd.DataFrame({'group':['a','a','a','b','b','b','c','c','c'],'data':[4,3,2,1,12,3,4,5,7]})
print(df)
group data
0 a 4
1 a 3
2 a 2
3 b 1
4 b 12
5 b 3
6 c 4
7 c 5
8 c 7
# 默认(ascending=True)从小到大排列. 从大到小(ascending=False)
df.sort_values(by=['group','data'],ascending=[False,True],inplace=True)
print(df)
group data
6 c 4
7 c 5
8 c 7
3 b 1
5 b 3
4 b 12
2 a 2
1 a 3
0 a 4
df=pd.DataFrame({'k1':['one']*3+['tow']*3,'k2':[1,2,3,4,5,6]})
print(df)
k1 k2
0 one 1
1 one 2
2 one 3
3 tow 4
4 tow 5
5 tow 6
# 排序
print(df.sort_values(by='k2',ascending=False))
k1 k2
5 tow 6
4 tow 5
3 tow 4
2 one 3
1 one 2
0 one 1
# 删除重复数据
print(df.drop_duplicates())
k1 k2
0 one 1
1 one 2
2 one 3
3 tow 4
4 tow 5
5 tow 6
# 按某列去重删除数据
print(df.drop_duplicates(subset='k1'))
k1 k2
0 one 1
3 tow 4
df = pd.DataFrame({'food':['A1','A2','B1','B2','B3','C1','C2'],'data':[1,2,3,4,5,6,7]})
print(df)
food data
0 A1 1
1 A2 2
2 B1 3
3 B2 4
4 B3 5
5 C1 6
6 C2 7
# 将A1,A2归类到A, B1,B2,B3归类B,C1,C2归类到C
dict1 = {'A1':'A','A2':'A','B1':'B','B2':'B','B3':'B','C1':'C','C2':'C'
}
df['Upper']=df['food'].map(dict1)
print(df)
food data Upper
0 A1 1 A
1 A2 2 A
2 B1 3 B
3 B2 4 B
4 B3 5 B
5 C1 6 C
6 C2 7 C
import numpy as np
df=pd.DataFrame({'k1':np.random.randn(5),'k2':np.random.randn(5)})
print(df)
df2=df.assign(ration=df['k1']/df['k2'])
print(df2)
k1 k2
0 1.977668 -1.136251
1 0.550649 0.010131
2 0.723699 0.304536
3 -0.247529 0.030359
4 -0.351775 0.732785k1 k2 ration
0 1.977668 -1.136251 -1.740520
1 0.550649 0.010131 54.355131
2 0.723699 0.304536 2.376394
3 -0.247529 0.030359 -8.153389
4 -0.351775 0.732785 -0.480052
# 删除ration列
df2.drop('ration',axis='columns',inplace=True)
print(df2)
k1 k2
0 1.977668 -1.136251
1 0.550649 0.010131
2 0.723699 0.304536
3 -0.247529 0.030359
4 -0.351775 0.732785
数据替换
se=pd.Series([1,2,3,4,5,6,7,8])
print(se)
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
dtype: int64
se.replace(6,np.nan,inplace=True)
print(se)
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 NaN
6 7.0
7 8.0
dtype: float64
Pandas.cut计算每个值在给定的哪个范围
ages = [15,18,20,21,22,34,41,52,63,79]
bins = [10,40,80]
bins_res = pd.cut(ages,bins)
print(bins_res)
[(10, 40], (10, 40], (10, 40], (10, 40], (10, 40], (10, 40], (40, 80], (40, 80], (40, 80], (40, 80]]
Categories (2, interval[int64]): [(10, 40] < (40, 80]]
# 统计各个范围的数量
print(pd.value_counts(bins_res))
(10, 40] 6
(40, 80] 4
dtype: int64
np.nan
df=pd.DataFrame([range(3),[2,np.nan,5],[np.nan,3,np.nan],range(3)])
print(df)
0 1 2
0 0.0 1.0 2.0
1 2.0 NaN 5.0
2 NaN 3.0 NaN
3 0.0 1.0 2.0
print(df.isnull())
0 1 2
0 False False False
1 False True False
2 True False True
3 False False False
# 将NaN填充成10
print(df.fillna(10))
0 1 2
0 0.0 1.0 2.0
1 2.0 10.0 5.0
2 10.0 3.0 10.0
3 0.0 1.0 2.0
# 查看某一行是否存在NaN
print(df[df.isnull().any(axis=1)])
0 1 2
1 2.0 NaN 5.0
2 NaN 3.0 NaN
Pandas入门教程(三)相关推荐
- python使用教程pandas-十分钟搞定pandas(入门教程)
本文是对pandas官方网站上<10Minutes to pandas>的一个简单的翻译,原文在这里.这篇文章是对pandas的一个简单的介绍,详细的介绍请参考:Cookbook .习惯上 ...
- python使用教程pandas-Python 数据处理库 pandas 入门教程基本操作
pandas是一个Python语言的软件包,在我们使用Python语言进行机器学习编程的时候,这是一个非常常用的基础编程库.本文是对它的一个入门教程. pandas提供了快速,灵活和富有表现力的数据结 ...
- qpython3可视图形界面_PySide——Python图形化界面入门教程(三)
PySide--Python图形化界面入门教程(三) --使用内建新号和槽 --Using Built-In Signals and Slots 上一个教程中,我们学习了如何创建和建立交互widget ...
- SpringCloud 入门教程(三): 配置自动刷新
Spring Cloud 入门教程(三): 配置自动刷新 之前讲的配置管理, 只有在应用启动时会读取到GIT的内容, 之后只要应用不重启,GIT中文件的修改,应用无法感知, 即使重启Config Se ...
- 【MATLAB Image Processing Toolbox 入门教程三】快速入门之“在多光谱图像中寻找植被”
[MATLAB Image Processing Toolbox 入门教程三] 本篇摘要 一.从多光谱图像文件导入彩色红外通道 二.构建近红外光谱散射图 三.计算植被系数并显示其定位 四.综合实例部分 ...
- python爬虫入门教程(三):淘女郎爬虫 ( 接口解析 | 图片下载 )
2019/10/28更新 网站已改版,代码已失效(其实早就失效了,但我懒得改...)此博文仅供做思路上的参考 代码使用python2编写,因已失效,就未改写成python3 爬虫入门系列教程: pyt ...
- R语言七天入门教程三:学习基本结构
R语言七天入门教程三:学习基本结构 一.编程的语言的基本结构 1.三种基本结构 绝大多数编程语言,都有三种最基本的程序结构:顺序结构.分支结构.循环结构.这三种结构的流程图如下所示(从左至右依次为:顺 ...
- Python自动化办公:pandas入门教程
在后台回复[阅读书籍] 即可获取python相关电子书~ Hi,我是山月. 今天给大家带来一个强大的朋友:pandas.如果你有许多数据分析任务的话,那你一定不能错过它. 由于它的内容比较多,因此会分 ...
- python做数据处理软件_程序员用于机器学习编程的Python 数据处理库 pandas 入门教程...
入门介绍 pandas适合于许多不同类型的数据,包括: · 具有异构类型列的表格数据,例如SQL表格或Excel数据 · 有序和无序(不一定是固定频率)时间序列数据. · 具有行列标签的任意矩阵数据( ...
最新文章
- PostgreSQL Oracle 兼容性之 - PL/SQL DETERMINISTIC 与PG函数稳定性(immutable, stable, volatile)...
- 新疆电信IBSS系统集中联机热备份--案例
- The request sent by the client was syntactically incorrect. 错误以及spring事物
- 数据库迁移之从oracle 到 MySQL
- FreeBSD 9.1安装KMS 这是一个伪命题###### ,9....
- Exception in thread main java.lang.NullPointerException一例解决
- STL(二)——向量vector
- 8类网线利弊_知识积累 | 千兆网线和百兆网线有何区别?
- ffmpeg 视频合并
- 【SBUS,串口DMA】用STM32F407的串口DMA读取SBUS接收机信号
- 什么是竞品分析?竞品分析全流程解析
- RGB和CMYK的区别
- 短视频搬运软件:抖音批量解析下载一个作者所有视频
- 智能传感器芯片行业下游市场应用前景分析预测及市场需求结构分析
- HTML - 调用腾讯 QQ 进行客服在线聊天(PC)
- Angular CLI简介
- 宇视摄像头默认密码是多少
- 黑马程序员——OC语言------类的声明实现、面向对象
- cv2.VideoCapture(0)
- 百度名词~杂篇--(对日常遇到事物的深入了解)
热门文章
- 让服务器自动从HG版本库中下载代码
- 想成为嵌入式程序员应知道的0x10个基本问题——转
- linux下用grep命令根据文件内容进行关键字搜索[linux ubuntu grep] -转
- IOC 容器中那些鲜为人知的细节(关于 FactoryBean 和 BeanFactory)
- 首次安装Linux,配置网络、换源一步到位
- 蓝桥杯 ADV-166 算法提高 聪明的美食家 java版
- java设置只有一行表格,为什么我的表格插入一行后 样式都变了?是因为没有设置css吗?如果在java函数中插入的td.innerHTML = input type='text'/,可以设置样...
- vue 中provide的用法_说一说VUE中的/deep/用法
- 在VMware上安装CentOS-6.5 minimal - 安装VMware Tools
- [转]SpringMVCfrom:form表单标签和input表单标签简介