pandas中resample函数进行时间采样

源码的举例可以帮助我们很好的理解

resample源码充分讲解了升采样、降采样

help(df.resample)

Help on method resample in module pandas.core.generic:resample(rule, axis=0, closed: Union[str, NoneType] = None, label: Union[str, NoneType] = None, convention: str = 'start', kind: Union[str, NoneType] = None, loffset=None, base: int = 0, on=None, level=None) method of pandas.core.frame.DataFrame instanceResample time-series data.Convenience method for frequency conversion and resampling of timeseries. Object must have a datetime-like index (`DatetimeIndex`,`PeriodIndex`, or `TimedeltaIndex`), or pass datetime-like valuesto the `on` or `level` keyword.Parameters----------rule : DateOffset, Timedelta or strThe offset string or object representing target conversion.axis : {0 or 'index', 1 or 'columns'}, default 0Which axis to use for up- or down-sampling. For `Series` thiswill default to 0, i.e. along the rows. Must be`DatetimeIndex`, `TimedeltaIndex` or `PeriodIndex`.closed : {'right', 'left'}, default NoneWhich side of bin interval is closed. The default is 'left'for all frequency offsets except for 'M', 'A', 'Q', 'BM','BA', 'BQ', and 'W' which all have a default of 'right'.label : {'right', 'left'}, default NoneWhich bin edge label to label bucket with. The default is 'left'for all frequency offsets except for 'M', 'A', 'Q', 'BM','BA', 'BQ', and 'W' which all have a default of 'right'.convention : {'start', 'end', 's', 'e'}, default 'start'For `PeriodIndex` only, controls whether to use the start orend of `rule`.kind : {'timestamp', 'period'}, optional, default NonePass 'timestamp' to convert the resulting index to a`DateTimeIndex` or 'period' to convert it to a `PeriodIndex`.By default the input representation is retained.loffset : timedelta, default NoneAdjust the resampled time labels.base : int, default 0For frequencies that evenly subdivide 1 day, the "origin" of theaggregated intervals. For example, for '5min' frequency, base couldrange from 0 through 4. Defaults to 0.on : str, optionalFor a DataFrame, column to use instead of index for resampling.Column must be datetime-like.level : str or int, optionalFor a MultiIndex, level (name or number) to use forresampling. `level` must be datetime-like.Returns-------Resampler objectSee Also--------groupby : Group by mapping, function, label, or list of labels.Series.resample : Resample a Series.DataFrame.resample: Resample a DataFrame.Notes-----See the `user guide<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling>`_for more.To learn more about the offset strings, please see `this link<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects>`__.Examples--------Start by creating a series with 9 one minute timestamps.>>> index = pd.date_range('1/1/2000', periods=9, freq='T')>>> series = pd.Series(range(9), index=index)>>> series2000-01-01 00:00:00    02000-01-01 00:01:00    12000-01-01 00:02:00    22000-01-01 00:03:00    32000-01-01 00:04:00    42000-01-01 00:05:00    52000-01-01 00:06:00    62000-01-01 00:07:00    72000-01-01 00:08:00    8Freq: T, dtype: int64Downsample the series into 3 minute bins and sum the valuesof the timestamps falling into a bin.>>> series.resample('3T').sum()2000-01-01 00:00:00     32000-01-01 00:03:00    122000-01-01 00:06:00    21Freq: 3T, dtype: int64Downsample the series into 3 minute bins as above, but label eachbin using the right edge instead of the left. Please note that thevalue in the bucket used as the label is not included in the bucket,which it labels. For example, in the original series thebucket ``2000-01-01 00:03:00`` contains the value 3, but the summedvalue in the resampled bucket with the label ``2000-01-01 00:03:00``does not include 3 (if it did, the summed value would be 6, not 3).To include this value close the right side of the bin interval asillustrated in the example below this one.>>> series.resample('3T', label='right').sum()2000-01-01 00:03:00     32000-01-01 00:06:00    122000-01-01 00:09:00    21Freq: 3T, dtype: int64Downsample the series into 3 minute bins as above, but close the rightside of the bin interval.>>> series.resample('3T', label='right', closed='right').sum()2000-01-01 00:00:00     02000-01-01 00:03:00     62000-01-01 00:06:00    152000-01-01 00:09:00    15Freq: 3T, dtype: int64Upsample the series into 30 second bins.>>> series.resample('30S').asfreq()[0:5]   # Select first 5 rows2000-01-01 00:00:00   0.02000-01-01 00:00:30   NaN2000-01-01 00:01:00   1.02000-01-01 00:01:30   NaN2000-01-01 00:02:00   2.0Freq: 30S, dtype: float64Upsample the series into 30 second bins and fill the ``NaN``values using the ``pad`` method.>>> series.resample('30S').pad()[0:5]2000-01-01 00:00:00    02000-01-01 00:00:30    02000-01-01 00:01:00    12000-01-01 00:01:30    12000-01-01 00:02:00    2Freq: 30S, dtype: int64Upsample the series into 30 second bins and fill the``NaN`` values using the ``bfill`` method.>>> series.resample('30S').bfill()[0:5]2000-01-01 00:00:00    02000-01-01 00:00:30    12000-01-01 00:01:00    12000-01-01 00:01:30    22000-01-01 00:02:00    2Freq: 30S, dtype: int64Pass a custom function via ``apply``>>> def custom_resampler(array_like):...     return np.sum(array_like) + 5...>>> series.resample('3T').apply(custom_resampler)2000-01-01 00:00:00     82000-01-01 00:03:00    172000-01-01 00:06:00    26Freq: 3T, dtype: int64For a Series with a PeriodIndex, the keyword `convention` can beused to control whether to use the start or end of `rule`.Resample a year by quarter using 'start' `convention`. Values areassigned to the first quarter of the period.>>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01',...                                             freq='A',...                                             periods=2))>>> s2012    12013    2Freq: A-DEC, dtype: int64>>> s.resample('Q', convention='start').asfreq()2012Q1    1.02012Q2    NaN2012Q3    NaN2012Q4    NaN2013Q1    2.02013Q2    NaN2013Q3    NaN2013Q4    NaNFreq: Q-DEC, dtype: float64Resample quarters by month using 'end' `convention`. Values areassigned to the last month of the period.>>> q = pd.Series([1, 2, 3, 4], index=pd.period_range('2018-01-01',...                                                   freq='Q',...                                                   periods=4))>>> q2018Q1    12018Q2    22018Q3    32018Q4    4Freq: Q-DEC, dtype: int64>>> q.resample('M', convention='end').asfreq()2018-03    1.02018-04    NaN2018-05    NaN2018-06    2.02018-07    NaN2018-08    NaN2018-09    3.02018-10    NaN2018-11    NaN2018-12    4.0Freq: M, dtype: float64For DataFrame objects, the keyword `on` can be used to specify thecolumn instead of the index for resampling.>>> d = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19],...           'volume': [50, 60, 40, 100, 50, 100, 40, 50]})>>> df = pd.DataFrame(d)>>> df['week_starting'] = pd.date_range('01/01/2018',...                                     periods=8,...                                     freq='W')>>> dfprice  volume week_starting0     10      50    2018-01-071     11      60    2018-01-142      9      40    2018-01-213     13     100    2018-01-284     14      50    2018-02-045     18     100    2018-02-116     17      40    2018-02-187     19      50    2018-02-25>>> df.resample('M', on='week_starting').mean()price  volumeweek_starting2018-01-31     10.75    62.52018-02-28     17.00    60.0For a DataFrame with MultiIndex, the keyword `level` can be used tospecify on which level the resampling needs to take place.>>> days = pd.date_range('1/1/2000', periods=4, freq='D')>>> d2 = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19],...            'volume': [50, 60, 40, 100, 50, 100, 40, 50]})>>> df2 = pd.DataFrame(d2,...                    index=pd.MultiIndex.from_product([days,...                                                     ['morning',...                                                      'afternoon']]...                                                     ))>>> df2price  volume2000-01-01 morning       10      50afternoon     11      602000-01-02 morning        9      40afternoon     13     1002000-01-03 morning       14      50afternoon     18     1002000-01-04 morning       17      40afternoon     19      50>>> df2.resample('D', level=0).sum()price  volume2000-01-01     21     1102000-01-02     22     1402000-01-03     32     1502000-01-04     36      90

案例

import pandas as pd
from datetime import datetime
pd.set_option('display.max_rows', None)data = {'weight': [1, 2, 3, 4, 5, 6, 7, 8, 9],'height': [1.1, 1.2, 1.3, 1.4, 1.5, 1., 1.6, 1.5, 1.9]
}df = pd.DataFrame(data, index=pd.date_range(start='2020-7-26', periods=9, freq='6T'))
df = df.to_period(freq='T')
df

	weight	height
2020-07-26 00:00	1	1.1
2020-07-26 00:06	2	1.2
2020-07-26 00:12	3	1.3
2020-07-26 00:18	4	1.4
2020-07-26 00:24	5	1.5
2020-07-26 00:30	6	1.0
2020-07-26 00:36	7	1.6
2020-07-26 00:42	8	1.5
2020-07-26 00:48	9	1.9

聚合函数有：sum,median,std,max,min,ohlc,mean
填充函数有：pad,bfill,ffill  它们都有一个参数limit, 查看函数源码使用help(df.pad)

df.resample('3T').sum()  # 每两分钟采一次数据

	weight	height
2020-07-26 00:00	1	1.1
2020-07-26 00:03	0	0.0
2020-07-26 00:06	2	1.2
2020-07-26 00:09	0	0.0
2020-07-26 00:12	3	1.3
2020-07-26 00:15	0	0.0
2020-07-26 00:18	4	1.4
2020-07-26 00:21	0	0.0
2020-07-26 00:24	5	1.5
2020-07-26 00:27	0	0.0
2020-07-26 00:30	6	1.0
2020-07-26 00:33	0	0.0
2020-07-26 00:36	7	1.6
2020-07-26 00:39	0	0.0
2020-07-26 00:42	8	1.5
2020-07-26 00:45	0	0.0
2020-07-26 00:48	9	1.9

df.resample('3T', label='right', closed='right').max()

	weight	height
2020-07-26 00:00	1.0	1.1
2020-07-26 00:03	NaN	NaN
2020-07-26 00:06	2.0	1.2
2020-07-26 00:09	NaN	NaN
2020-07-26 00:12	3.0	1.3
2020-07-26 00:15	NaN	NaN
2020-07-26 00:18	4.0	1.4
2020-07-26 00:21	NaN	NaN
2020-07-26 00:24	5.0	1.5
2020-07-26 00:27	NaN	NaN
2020-07-26 00:30	6.0	1.0
2020-07-26 00:33	NaN	NaN
2020-07-26 00:36	7.0	1.6
2020-07-26 00:39	NaN	NaN
2020-07-26 00:42	8.0	1.5
2020-07-26 00:45	NaN	NaN
2020-07-26 00:48	9.0	1.9

df[['weight']].resample('T').pad()  # pad向下填充

	weight
2020-07-26 00:00	1
2020-07-26 00:01	1
2020-07-26 00:02	1
2020-07-26 00:03	1
2020-07-26 00:04	1
2020-07-26 00:05	1
2020-07-26 00:06	2
2020-07-26 00:07	2
2020-07-26 00:08	2
2020-07-26 00:09	2
2020-07-26 00:10	2
2020-07-26 00:11	2
2020-07-26 00:12	3
2020-07-26 00:13	3
2020-07-26 00:14	3
2020-07-26 00:15	3
2020-07-26 00:16	3
2020-07-26 00:17	3
2020-07-26 00:18	4
2020-07-26 00:19	4
2020-07-26 00:20	4
2020-07-26 00:21	4
2020-07-26 00:22	4
2020-07-26 00:23	4
2020-07-26 00:24	5
2020-07-26 00:25	5
2020-07-26 00:26	5
2020-07-26 00:27	5
2020-07-26 00:28	5
2020-07-26 00:29	5
2020-07-26 00:30	6
2020-07-26 00:31	6
2020-07-26 00:32	6
2020-07-26 00:33	6
2020-07-26 00:34	6
2020-07-26 00:35	6
2020-07-26 00:36	7
2020-07-26 00:37	7
2020-07-26 00:38	7
2020-07-26 00:39	7
2020-07-26 00:40	7
2020-07-26 00:41	7
2020-07-26 00:42	8
2020-07-26 00:43	8
2020-07-26 00:44	8
2020-07-26 00:45	8
2020-07-26 00:46	8
2020-07-26 00:47	8
2020-07-26 00:48	9

df.resample('T').bfill() # 向上填充

	weight	height
2020-07-26 00:00	1	1.1
2020-07-26 00:01	2	1.2
2020-07-26 00:02	2	1.2
2020-07-26 00:03	2	1.2
2020-07-26 00:04	2	1.2
2020-07-26 00:05	2	1.2
2020-07-26 00:06	2	1.2
2020-07-26 00:07	3	1.3
2020-07-26 00:08	3	1.3
2020-07-26 00:09	3	1.3
2020-07-26 00:10	3	1.3
2020-07-26 00:11	3	1.3
2020-07-26 00:12	3	1.3
2020-07-26 00:13	4	1.4
2020-07-26 00:14	4	1.4
2020-07-26 00:15	4	1.4
2020-07-26 00:16	4	1.4
2020-07-26 00:17	4	1.4
2020-07-26 00:18	4	1.4
2020-07-26 00:19	5	1.5
2020-07-26 00:20	5	1.5
2020-07-26 00:21	5	1.5
2020-07-26 00:22	5	1.5
2020-07-26 00:23	5	1.5
2020-07-26 00:24	5	1.5
2020-07-26 00:25	6	1.0
2020-07-26 00:26	6	1.0
2020-07-26 00:27	6	1.0
2020-07-26 00:28	6	1.0
2020-07-26 00:29	6	1.0
2020-07-26 00:30	6	1.0
2020-07-26 00:31	7	1.6
2020-07-26 00:32	7	1.6
2020-07-26 00:33	7	1.6
2020-07-26 00:34	7	1.6
2020-07-26 00:35	7	1.6
2020-07-26 00:36	7	1.6
2020-07-26 00:37	8	1.5
2020-07-26 00:38	8	1.5
2020-07-26 00:39	8	1.5
2020-07-26 00:40	8	1.5
2020-07-26 00:41	8	1.5
2020-07-26 00:42	8	1.5
2020-07-26 00:43	9	1.9
2020-07-26 00:44	9	1.9
2020-07-26 00:45	9	1.9
2020-07-26 00:46	9	1.9
2020-07-26 00:47	9	1.9
2020-07-26 00:48	9	1.9

df.resample('T').ffill() # 向下填充 参数limit=2表示向下只填充2个

	weight	height
2020-07-26 00:00	1	1.1
2020-07-26 00:01	1	1.1
2020-07-26 00:02	1	1.1
2020-07-26 00:03	1	1.1
2020-07-26 00:04	1	1.1
2020-07-26 00:05	1	1.1
2020-07-26 00:06	2	1.2
2020-07-26 00:07	2	1.2
2020-07-26 00:08	2	1.2
2020-07-26 00:09	2	1.2
2020-07-26 00:10	2	1.2
2020-07-26 00:11	2	1.2
2020-07-26 00:12	3	1.3
2020-07-26 00:13	3	1.3
2020-07-26 00:14	3	1.3
2020-07-26 00:15	3	1.3
2020-07-26 00:16	3	1.3
2020-07-26 00:17	3	1.3
2020-07-26 00:18	4	1.4
2020-07-26 00:19	4	1.4
2020-07-26 00:20	4	1.4
2020-07-26 00:21	4	1.4
2020-07-26 00:22	4	1.4
2020-07-26 00:23	4	1.4
2020-07-26 00:24	5	1.5
2020-07-26 00:25	5	1.5
2020-07-26 00:26	5	1.5
2020-07-26 00:27	5	1.5
2020-07-26 00:28	5	1.5
2020-07-26 00:29	5	1.5
2020-07-26 00:30	6	1.0
2020-07-26 00:31	6	1.0
2020-07-26 00:32	6	1.0
2020-07-26 00:33	6	1.0
2020-07-26 00:34	6	1.0
2020-07-26 00:35	6	1.0
2020-07-26 00:36	7	1.6
2020-07-26 00:37	7	1.6
2020-07-26 00:38	7	1.6
2020-07-26 00:39	7	1.6
2020-07-26 00:40	7	1.6
2020-07-26 00:41	7	1.6
2020-07-26 00:42	8	1.5
2020-07-26 00:43	8	1.5
2020-07-26 00:44	8	1.5
2020-07-26 00:45	8	1.5
2020-07-26 00:46	8	1.5
2020-07-26 00:47	8	1.5
2020-07-26 00:48	9	1.9

df['weight'].resample('Q-DEC').mean()

2020Q3    5
Freq: Q-DEC, Name: weight, dtype: int64

04-pandas中resample函数进行时间采样相关推荐

python实现采样函数_python中resample函数实现重采样和降采样代码
函数原型 resample(self, rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='s ...
python信号采样_python中resample函数实现重采样和降采样代码
函数原型 resample(self, rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='s ...
resample函数_使用Pandas的resample函数处理时间序列数据的技巧
时间序列数据在数据科学项目中很常见. 通常,可能会对将时序数据重新采样到要分析数据的频率或从数据中汲取更多见解的频率感兴趣. 在本文中,我们将介绍一些使用Pandas resample()函数对时间序 ...
Pandas中resample方法详解
Pandas中resample方法详解 Pandas中的resample,重新采样,是对原样本重新处理的一个方法,是一个对常规时间序列数据重新采样和频率转换的便捷的方法.重新取样时间序列数据. 方便的 ...
python resample函数_使用Pandas的resample函数处理时间序列数据的技巧
时间序列数据在数据科学项目中很常见. 通常,可能会对将时序数据重新采样到要分析数据的频率或从数据中汲取更多见解的频率感兴趣. 在本文中,我们将介绍一些使用Pandas resample()函数对时间序 ...
使用Pandas的resample函数处理时间序列数据的技巧
时间序列数据在数据科学项目中很常见. 通常,可能会对将时序数据重新采样到要分析数据的频率或从数据中汲取更多见解的频率感兴趣. 在本文中,我们将介绍一些使用Pandas resample()函数对时间序 ...
pandas使用resample函数计算每个月的统计均值、使用matplotlib可视化特定年份的按月均值
pandas使用resample函数计算每个月的统计均值.使用matplotlib可视化特定年份的按月均值(month mean with resample and viz with matplotl ...
Pandas中xs()函数索引复合索引数据的不同切面数据（索引复合索引中需要的数据）：索引列复合索引中的一个切面、索引行复合索引中的一个切面
Pandas中xs()函数索引复合索引数据的不同切面数据(索引复合索引中需要的数据):索引列复合索引中的一个切面.索引行复合索引中的一个切面目录
pandas中align函数的使用示例
pandas中align函数的使用示例 pandas align函数生成的结果为一个元组 align(),DataFrame与Series或DataFrame之间连接运算,常用的有内联,外联,左联,右 ...

04-pandas中resample函数进行时间采样

pandas中resample函数进行时间采样

resample源码充分讲解了升采样、降采样

案例

04-pandas中resample函数进行时间采样相关推荐

最新文章

热门文章