pandas中resample函数进行时间采样

源码的举例可以帮助我们很好的理解

resample源码充分讲解了升采样、降采样

help(df.resample)
Help on method resample in module pandas.core.generic:resample(rule, axis=0, closed: Union[str, NoneType] = None, label: Union[str, NoneType] = None, convention: str = 'start', kind: Union[str, NoneType] = None, loffset=None, base: int = 0, on=None, level=None) method of pandas.core.frame.DataFrame instanceResample time-series data.Convenience method for frequency conversion and resampling of timeseries. Object must have a datetime-like index (`DatetimeIndex`,`PeriodIndex`, or `TimedeltaIndex`), or pass datetime-like valuesto the `on` or `level` keyword.Parameters----------rule : DateOffset, Timedelta or strThe offset string or object representing target conversion.axis : {0 or 'index', 1 or 'columns'}, default 0Which axis to use for up- or down-sampling. For `Series` thiswill default to 0, i.e. along the rows. Must be`DatetimeIndex`, `TimedeltaIndex` or `PeriodIndex`.closed : {'right', 'left'}, default NoneWhich side of bin interval is closed. The default is 'left'for all frequency offsets except for 'M', 'A', 'Q', 'BM','BA', 'BQ', and 'W' which all have a default of 'right'.label : {'right', 'left'}, default NoneWhich bin edge label to label bucket with. The default is 'left'for all frequency offsets except for 'M', 'A', 'Q', 'BM','BA', 'BQ', and 'W' which all have a default of 'right'.convention : {'start', 'end', 's', 'e'}, default 'start'For `PeriodIndex` only, controls whether to use the start orend of `rule`.kind : {'timestamp', 'period'}, optional, default NonePass 'timestamp' to convert the resulting index to a`DateTimeIndex` or 'period' to convert it to a `PeriodIndex`.By default the input representation is retained.loffset : timedelta, default NoneAdjust the resampled time labels.base : int, default 0For frequencies that evenly subdivide 1 day, the "origin" of theaggregated intervals. For example, for '5min' frequency, base couldrange from 0 through 4. Defaults to 0.on : str, optionalFor a DataFrame, column to use instead of index for resampling.Column must be datetime-like.level : str or int, optionalFor a MultiIndex, level (name or number) to use forresampling. `level` must be datetime-like.Returns-------Resampler objectSee Also--------groupby : Group by mapping, function, label, or list of labels.Series.resample : Resample a Series.DataFrame.resample: Resample a DataFrame.Notes-----See the `user guide<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling>`_for more.To learn more about the offset strings, please see `this link<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects>`__.Examples--------Start by creating a series with 9 one minute timestamps.>>> index = pd.date_range('1/1/2000', periods=9, freq='T')>>> series = pd.Series(range(9), index=index)>>> series2000-01-01 00:00:00    02000-01-01 00:01:00    12000-01-01 00:02:00    22000-01-01 00:03:00    32000-01-01 00:04:00    42000-01-01 00:05:00    52000-01-01 00:06:00    62000-01-01 00:07:00    72000-01-01 00:08:00    8Freq: T, dtype: int64Downsample the series into 3 minute bins and sum the valuesof the timestamps falling into a bin.>>> series.resample('3T').sum()2000-01-01 00:00:00     32000-01-01 00:03:00    122000-01-01 00:06:00    21Freq: 3T, dtype: int64Downsample the series into 3 minute bins as above, but label eachbin using the right edge instead of the left. Please note that thevalue in the bucket used as the label is not included in the bucket,which it labels. For example, in the original series thebucket ``2000-01-01 00:03:00`` contains the value 3, but the summedvalue in the resampled bucket with the label ``2000-01-01 00:03:00``does not include 3 (if it did, the summed value would be 6, not 3).To include this value close the right side of the bin interval asillustrated in the example below this one.>>> series.resample('3T', label='right').sum()2000-01-01 00:03:00     32000-01-01 00:06:00    122000-01-01 00:09:00    21Freq: 3T, dtype: int64Downsample the series into 3 minute bins as above, but close the rightside of the bin interval.>>> series.resample('3T', label='right', closed='right').sum()2000-01-01 00:00:00     02000-01-01 00:03:00     62000-01-01 00:06:00    152000-01-01 00:09:00    15Freq: 3T, dtype: int64Upsample the series into 30 second bins.>>> series.resample('30S').asfreq()[0:5]   # Select first 5 rows2000-01-01 00:00:00   0.02000-01-01 00:00:30   NaN2000-01-01 00:01:00   1.02000-01-01 00:01:30   NaN2000-01-01 00:02:00   2.0Freq: 30S, dtype: float64Upsample the series into 30 second bins and fill the ``NaN``values using the ``pad`` method.>>> series.resample('30S').pad()[0:5]2000-01-01 00:00:00    02000-01-01 00:00:30    02000-01-01 00:01:00    12000-01-01 00:01:30    12000-01-01 00:02:00    2Freq: 30S, dtype: int64Upsample the series into 30 second bins and fill the``NaN`` values using the ``bfill`` method.>>> series.resample('30S').bfill()[0:5]2000-01-01 00:00:00    02000-01-01 00:00:30    12000-01-01 00:01:00    12000-01-01 00:01:30    22000-01-01 00:02:00    2Freq: 30S, dtype: int64Pass a custom function via ``apply``>>> def custom_resampler(array_like):...     return np.sum(array_like) + 5...>>> series.resample('3T').apply(custom_resampler)2000-01-01 00:00:00     82000-01-01 00:03:00    172000-01-01 00:06:00    26Freq: 3T, dtype: int64For a Series with a PeriodIndex, the keyword `convention` can beused to control whether to use the start or end of `rule`.Resample a year by quarter using 'start' `convention`. Values areassigned to the first quarter of the period.>>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01',...                                             freq='A',...                                             periods=2))>>> s2012    12013    2Freq: A-DEC, dtype: int64>>> s.resample('Q', convention='start').asfreq()2012Q1    1.02012Q2    NaN2012Q3    NaN2012Q4    NaN2013Q1    2.02013Q2    NaN2013Q3    NaN2013Q4    NaNFreq: Q-DEC, dtype: float64Resample quarters by month using 'end' `convention`. Values areassigned to the last month of the period.>>> q = pd.Series([1, 2, 3, 4], index=pd.period_range('2018-01-01',...                                                   freq='Q',...                                                   periods=4))>>> q2018Q1    12018Q2    22018Q3    32018Q4    4Freq: Q-DEC, dtype: int64>>> q.resample('M', convention='end').asfreq()2018-03    1.02018-04    NaN2018-05    NaN2018-06    2.02018-07    NaN2018-08    NaN2018-09    3.02018-10    NaN2018-11    NaN2018-12    4.0Freq: M, dtype: float64For DataFrame objects, the keyword `on` can be used to specify thecolumn instead of the index for resampling.>>> d = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19],...           'volume': [50, 60, 40, 100, 50, 100, 40, 50]})>>> df = pd.DataFrame(d)>>> df['week_starting'] = pd.date_range('01/01/2018',...                                     periods=8,...                                     freq='W')>>> dfprice  volume week_starting0     10      50    2018-01-071     11      60    2018-01-142      9      40    2018-01-213     13     100    2018-01-284     14      50    2018-02-045     18     100    2018-02-116     17      40    2018-02-187     19      50    2018-02-25>>> df.resample('M', on='week_starting').mean()price  volumeweek_starting2018-01-31     10.75    62.52018-02-28     17.00    60.0For a DataFrame with MultiIndex, the keyword `level` can be used tospecify on which level the resampling needs to take place.>>> days = pd.date_range('1/1/2000', periods=4, freq='D')>>> d2 = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19],...            'volume': [50, 60, 40, 100, 50, 100, 40, 50]})>>> df2 = pd.DataFrame(d2,...                    index=pd.MultiIndex.from_product([days,...                                                     ['morning',...                                                      'afternoon']]...                                                     ))>>> df2price  volume2000-01-01 morning       10      50afternoon     11      602000-01-02 morning        9      40afternoon     13     1002000-01-03 morning       14      50afternoon     18     1002000-01-04 morning       17      40afternoon     19      50>>> df2.resample('D', level=0).sum()price  volume2000-01-01     21     1102000-01-02     22     1402000-01-03     32     1502000-01-04     36      90

案例

import pandas as pd
from datetime import datetime
pd.set_option('display.max_rows', None)data = {'weight': [1, 2, 3, 4, 5, 6, 7, 8, 9],'height': [1.1, 1.2, 1.3, 1.4, 1.5, 1., 1.6, 1.5, 1.9]
}df = pd.DataFrame(data, index=pd.date_range(start='2020-7-26', periods=9, freq='6T'))
df = df.to_period(freq='T')
df
weight height
2020-07-26 00:00 1 1.1
2020-07-26 00:06 2 1.2
2020-07-26 00:12 3 1.3
2020-07-26 00:18 4 1.4
2020-07-26 00:24 5 1.5
2020-07-26 00:30 6 1.0
2020-07-26 00:36 7 1.6
2020-07-26 00:42 8 1.5
2020-07-26 00:48 9 1.9
聚合函数有:sum,median,std,max,min,ohlc,mean
填充函数有:pad,bfill,ffill  它们都有一个参数limit, 查看函数源码使用help(df.pad)
df.resample('3T').sum()  # 每两分钟采一次数据
weight height
2020-07-26 00:00 1 1.1
2020-07-26 00:03 0 0.0
2020-07-26 00:06 2 1.2
2020-07-26 00:09 0 0.0
2020-07-26 00:12 3 1.3
2020-07-26 00:15 0 0.0
2020-07-26 00:18 4 1.4
2020-07-26 00:21 0 0.0
2020-07-26 00:24 5 1.5
2020-07-26 00:27 0 0.0
2020-07-26 00:30 6 1.0
2020-07-26 00:33 0 0.0
2020-07-26 00:36 7 1.6
2020-07-26 00:39 0 0.0
2020-07-26 00:42 8 1.5
2020-07-26 00:45 0 0.0
2020-07-26 00:48 9 1.9
df.resample('3T', label='right', closed='right').max()
weight height
2020-07-26 00:00 1.0 1.1
2020-07-26 00:03 NaN NaN
2020-07-26 00:06 2.0 1.2
2020-07-26 00:09 NaN NaN
2020-07-26 00:12 3.0 1.3
2020-07-26 00:15 NaN NaN
2020-07-26 00:18 4.0 1.4
2020-07-26 00:21 NaN NaN
2020-07-26 00:24 5.0 1.5
2020-07-26 00:27 NaN NaN
2020-07-26 00:30 6.0 1.0
2020-07-26 00:33 NaN NaN
2020-07-26 00:36 7.0 1.6
2020-07-26 00:39 NaN NaN
2020-07-26 00:42 8.0 1.5
2020-07-26 00:45 NaN NaN
2020-07-26 00:48 9.0 1.9
df[['weight']].resample('T').pad()  # pad向下填充
weight
2020-07-26 00:00 1
2020-07-26 00:01 1
2020-07-26 00:02 1
2020-07-26 00:03 1
2020-07-26 00:04 1
2020-07-26 00:05 1
2020-07-26 00:06 2
2020-07-26 00:07 2
2020-07-26 00:08 2
2020-07-26 00:09 2
2020-07-26 00:10 2
2020-07-26 00:11 2
2020-07-26 00:12 3
2020-07-26 00:13 3
2020-07-26 00:14 3
2020-07-26 00:15 3
2020-07-26 00:16 3
2020-07-26 00:17 3
2020-07-26 00:18 4
2020-07-26 00:19 4
2020-07-26 00:20 4
2020-07-26 00:21 4
2020-07-26 00:22 4
2020-07-26 00:23 4
2020-07-26 00:24 5
2020-07-26 00:25 5
2020-07-26 00:26 5
2020-07-26 00:27 5
2020-07-26 00:28 5
2020-07-26 00:29 5
2020-07-26 00:30 6
2020-07-26 00:31 6
2020-07-26 00:32 6
2020-07-26 00:33 6
2020-07-26 00:34 6
2020-07-26 00:35 6
2020-07-26 00:36 7
2020-07-26 00:37 7
2020-07-26 00:38 7
2020-07-26 00:39 7
2020-07-26 00:40 7
2020-07-26 00:41 7
2020-07-26 00:42 8
2020-07-26 00:43 8
2020-07-26 00:44 8
2020-07-26 00:45 8
2020-07-26 00:46 8
2020-07-26 00:47 8
2020-07-26 00:48 9
df.resample('T').bfill() # 向上填充
weight height
2020-07-26 00:00 1 1.1
2020-07-26 00:01 2 1.2
2020-07-26 00:02 2 1.2
2020-07-26 00:03 2 1.2
2020-07-26 00:04 2 1.2
2020-07-26 00:05 2 1.2
2020-07-26 00:06 2 1.2
2020-07-26 00:07 3 1.3
2020-07-26 00:08 3 1.3
2020-07-26 00:09 3 1.3
2020-07-26 00:10 3 1.3
2020-07-26 00:11 3 1.3
2020-07-26 00:12 3 1.3
2020-07-26 00:13 4 1.4
2020-07-26 00:14 4 1.4
2020-07-26 00:15 4 1.4
2020-07-26 00:16 4 1.4
2020-07-26 00:17 4 1.4
2020-07-26 00:18 4 1.4
2020-07-26 00:19 5 1.5
2020-07-26 00:20 5 1.5
2020-07-26 00:21 5 1.5
2020-07-26 00:22 5 1.5
2020-07-26 00:23 5 1.5
2020-07-26 00:24 5 1.5
2020-07-26 00:25 6 1.0
2020-07-26 00:26 6 1.0
2020-07-26 00:27 6 1.0
2020-07-26 00:28 6 1.0
2020-07-26 00:29 6 1.0
2020-07-26 00:30 6 1.0
2020-07-26 00:31 7 1.6
2020-07-26 00:32 7 1.6
2020-07-26 00:33 7 1.6
2020-07-26 00:34 7 1.6
2020-07-26 00:35 7 1.6
2020-07-26 00:36 7 1.6
2020-07-26 00:37 8 1.5
2020-07-26 00:38 8 1.5
2020-07-26 00:39 8 1.5
2020-07-26 00:40 8 1.5
2020-07-26 00:41 8 1.5
2020-07-26 00:42 8 1.5
2020-07-26 00:43 9 1.9
2020-07-26 00:44 9 1.9
2020-07-26 00:45 9 1.9
2020-07-26 00:46 9 1.9
2020-07-26 00:47 9 1.9
2020-07-26 00:48 9 1.9
df.resample('T').ffill() # 向下填充 参数limit=2表示向下只填充2个
weight height
2020-07-26 00:00 1 1.1
2020-07-26 00:01 1 1.1
2020-07-26 00:02 1 1.1
2020-07-26 00:03 1 1.1
2020-07-26 00:04 1 1.1
2020-07-26 00:05 1 1.1
2020-07-26 00:06 2 1.2
2020-07-26 00:07 2 1.2
2020-07-26 00:08 2 1.2
2020-07-26 00:09 2 1.2
2020-07-26 00:10 2 1.2
2020-07-26 00:11 2 1.2
2020-07-26 00:12 3 1.3
2020-07-26 00:13 3 1.3
2020-07-26 00:14 3 1.3
2020-07-26 00:15 3 1.3
2020-07-26 00:16 3 1.3
2020-07-26 00:17 3 1.3
2020-07-26 00:18 4 1.4
2020-07-26 00:19 4 1.4
2020-07-26 00:20 4 1.4
2020-07-26 00:21 4 1.4
2020-07-26 00:22 4 1.4
2020-07-26 00:23 4 1.4
2020-07-26 00:24 5 1.5
2020-07-26 00:25 5 1.5
2020-07-26 00:26 5 1.5
2020-07-26 00:27 5 1.5
2020-07-26 00:28 5 1.5
2020-07-26 00:29 5 1.5
2020-07-26 00:30 6 1.0
2020-07-26 00:31 6 1.0
2020-07-26 00:32 6 1.0
2020-07-26 00:33 6 1.0
2020-07-26 00:34 6 1.0
2020-07-26 00:35 6 1.0
2020-07-26 00:36 7 1.6
2020-07-26 00:37 7 1.6
2020-07-26 00:38 7 1.6
2020-07-26 00:39 7 1.6
2020-07-26 00:40 7 1.6
2020-07-26 00:41 7 1.6
2020-07-26 00:42 8 1.5
2020-07-26 00:43 8 1.5
2020-07-26 00:44 8 1.5
2020-07-26 00:45 8 1.5
2020-07-26 00:46 8 1.5
2020-07-26 00:47 8 1.5
2020-07-26 00:48 9 1.9
df['weight'].resample('Q-DEC').mean()
2020Q3    5
Freq: Q-DEC, Name: weight, dtype: int64

04-pandas中resample函数进行时间采样相关推荐

  1. python实现采样函数_python中resample函数实现重采样和降采样代码

    函数原型 resample(self, rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='s ...

  2. python信号采样_python中resample函数实现重采样和降采样代码

    函数原型 resample(self, rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='s ...

  3. resample函数_使用Pandas的resample函数处理时间序列数据的技巧

    时间序列数据在数据科学项目中很常见. 通常,可能会对将时序数据重新采样到要分析数据的频率或从数据中汲取更多见解的频率感兴趣. 在本文中,我们将介绍一些使用Pandas resample()函数对时间序 ...

  4. Pandas中resample方法详解

    Pandas中resample方法详解 Pandas中的resample,重新采样,是对原样本重新处理的一个方法,是一个对常规时间序列数据重新采样和频率转换的便捷的方法.重新取样时间序列数据. 方便的 ...

  5. python resample函数_使用Pandas的resample函数处理时间序列数据的技巧

    时间序列数据在数据科学项目中很常见. 通常,可能会对将时序数据重新采样到要分析数据的频率或从数据中汲取更多见解的频率感兴趣. 在本文中,我们将介绍一些使用Pandas resample()函数对时间序 ...

  6. 使用Pandas的resample函数处理时间序列数据的技巧

    时间序列数据在数据科学项目中很常见. 通常,可能会对将时序数据重新采样到要分析数据的频率或从数据中汲取更多见解的频率感兴趣. 在本文中,我们将介绍一些使用Pandas resample()函数对时间序 ...

  7. pandas使用resample函数计算每个月的统计均值、使用matplotlib可视化特定年份的按月均值

    pandas使用resample函数计算每个月的统计均值.使用matplotlib可视化特定年份的按月均值(month mean with resample and viz with matplotl ...

  8. Pandas中xs()函数索引复合索引数据的不同切面数据(索引复合索引中需要的数据):索引列复合索引中的一个切面、索引行复合索引中的一个切面

    Pandas中xs()函数索引复合索引数据的不同切面数据(索引复合索引中需要的数据):索引列复合索引中的一个切面.索引行复合索引中的一个切面 目录

  9. pandas中align函数的使用示例

    pandas中align函数的使用示例 pandas align函数生成的结果为一个元组 align(),DataFrame与Series或DataFrame之间连接运算,常用的有内联,外联,左联,右 ...

最新文章

  1. 端口映射问题:Bad Request This combination of host and port requires TLS.
  2. HTML元素的ID和Name属性的区别[转]
  3. NOIP2007 count 统计数字
  4. docker中的容器和镜像
  5. .NET Mass Downloader -整体下载.NET源码
  6. Phpcms V9全站伪静态设置方法
  7. mysql mysqld.log_MySQL mysqlbinlog 读取mysql-bin文件出错
  8. vue基础之样式绑定(class,style)
  9. 顺序不能改变的算子,是否跟时间有关
  10. matlab2014a vs2015,Matlab2014a使用VS2015混合编译
  11. WEB渗透测试工程师需要具备的技能
  12. 显示网站Alexa世界排名的代码
  13. java pem 读取_PEM_密钥对生成与读取方法
  14. 关于Parser的知识点总结
  15. svg html转换器,如何在浏览器中使用JavaScript将HTML SVG节点转换为Base64
  16. Mirror 镜像站点的使用
  17. 给linux(centos)操作系统设置主机名的几种方式
  18. Android 判断摄像头权限方法
  19. VBA中同一模块Sub过程按顺序调用?
  20. 简述CISC、RISC、RISC-V、MIPS

热门文章

  1. Redrain仿酷狗音乐播放器开发完毕,发布测试程序
  2. mysql innodb log_MySQL · 引擎特性 · InnoDB redo log漫游
  3. 深度商店Linux,深度商店V5.0正式版发布——好应用好管理
  4. python保留两位小数_python中怎么实现保留两位小数
  5. Subject 与 Observable 的区别
  6. 粘滞键引起的提权实战
  7. 【艾特淘】店铺获取免费流量的三大核心要素
  8. Python爬虫高手爬爬爬(各种案例更新中。。。)
  9. 11-20210225华为海思Hi3518EV300在鸿蒙系统下测试摄像头(拍照+录像)
  10. HarmonyOS应用开发学习笔记(一)