【python】无规律时间步长时序数据转为固定步长

写在前面

日常可能会遇到时间步长无规律的数据，需要转化为固定时间步长，此时需要进行重采样或插值。

例子
在选择好时间间隔后，可以用pandas的resample来操作。

import pandas as pd
from pandas.tseries.offsets import Hour,Second
import numpy as np
import datetime
data = pd.read_csv('fuel.csv')
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d %H:%M:%S')
# frequency = 1
# time_range = pd.date_range(data['date'][0], data['date']
#                            [data.shape[0]-1]+frequency*Hour(), freq='%sH' % frequency)
data.set_index('date', inplace=True)  # 把时间列作为索引
#ticks = data.iloc[:]
bars = data.resample('h').mean().dropna().reset_index()  # 切分
print(bars)

参考：CSDN专家-HGJ 在问答“怎么统一多个时间序列时间间隔不同的问题”中的回答[2021-07-10 15:18] https://ask.csdn.net/questions/7471005

主要函数介绍
Pandas中的resample，重新采样，是对原样本重新处理的一个方法，是一个对常规时间序列数据重新采样和频率转换的便捷的方法。重新取样时间序列数据。对象必须具有类似datetime的索引(DatetimeIndex、PeriodIndex或TimedeltaIndex)，或将类似datetime的值传递给on或level关键字。

DataFrame.resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=None, base=None, on=None, level=None, origin='start_day', offset=None)

参数详解：

参数	说明
rule	表示目标转换的偏移量字符串或对象
freq	表示重采样频率，例如‘M’、‘5min’，Second(15)
how=‘mean’	用于产生聚合值的函数名或数组函数，例如‘mean’、‘ohlc’、np.max等，默认是‘mean’，其他常用的值由：‘first’、‘last’、‘median’、‘max’、‘min’
axis=0	哪个轴用于向上采样或向下采样。对于序列，这将默认为0，即沿着行。必须是DatetimeIndex, TimedeltaIndex或PeriodIndex。默认是纵轴，横轴设置axis=1
fill_method = None	升采样时如何插值，比如‘ffill’、‘bfill’等
closed = ‘right’	在降采样时，各时间段的哪一段是闭合的，‘right’或‘left’，默认‘right’
label= ‘right’	在降采样时，如何设置聚合值的标签，例如，9：30-9：35会被标记成9：30还是9：35,默认9：35
convention = None	当重采样时期时，将低频率转换到高频率所采用的约定（start或end）。默认‘end’
kind = None	聚合到时期（‘period’）或时间戳（‘timestamp’），默认聚合到时间序列的索引类型
loffset = None	调整重新取样的时间标签
base	对于平均细分1天的频率，为累计间隔的“起源”。例如，对于“5min”频率，基数可以从0到4。默认值为0。
on	对于数据流，要使用列而不是索引进行重采样。列必须与日期时间类似。
level	对于多索引，用于重采样的级别(名称或数字)。级别必须与日期时间相似
origin	要调整分组的时间戳。起始时区必须与索引的时区匹配。如果没有使用时间戳，也支持以下值:epoch:原点是1970-01-01’；start ': origin是timeseries的第一个值；“start_day”:起源是timeseries午夜的第一天；
offset	加到原点的偏移时间增量

参考：[Pandas中resample方法详解——风雪云侠] https://blog.csdn.net/weixin_40426830/article/details/111512471

由于时间不均，在重采样的过程中可能会遇到重采样后某一时间步长的值为空，我们上面例子中dropna()为删掉NA值。Resample()提供了两种方法：‘ffill’和‘bfill’。

上采样和填充值
上采样是下采样的相反操作。它将时间序列数据重新采样到一个更小的时间框架。例如，从小时到分钟，从年到天。结果将增加行数，并且附加的行值默认为NaN。内置的方法ffill()和bfill()通常用于执行前向填充或后向填充来替代NaN。

df.resample('Q').ffill()

按季度重新采样一年并向后填充值。向后填充方法bfill()将使用下一个已知值来替换NaN。

df.resample('Q').bfill()

参考：[使用Pandas的resample函数处理时间序列数据的技巧——deephub] https://deephub.blog.csdn.net/article/details/109542622

另外一种可行方案是利用插值法填补NAN，例如：

bars = data.resample('h').mean().interpolate('linear').reset_index()  # 线性插值

函数介绍

Resampler.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', downcast=None, **kwargs)

Interpolate values according to different methods.

method	{‘linear’, ‘time’, ‘index’, ‘values’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’, ‘piecewise_polynomial’, ‘from_derivatives’, ‘pchip’, ‘akima’} ‘linear’: ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes. default ‘time’: interpolation works on daily and higher resolution data to interpolate given length of interval ‘index’, ‘values’: use the actual numerical values of the index ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method=’polynomial’, order=4). These use the actual numerical values of the index. ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’ and ‘akima’ are all wrappers around the scipy interpolation methods of similar names. These use the actual numerical values of the index. For more information on their behavior, see the scipy documentation and tutorial documentation ‘from_derivatives’ refers to BPoly.from_derivatives which replaces ‘piecewise_polynomial’ interpolation method in scipy 0.18 New in version 0.18.1: Added support for the ‘akima’ method Added interpolate method ‘from_derivatives’ which replaces ‘piecewise_polynomial’ in scipy 0.18; backwards-compatible with scipy < 0.18
axis	{0, 1}, default 0 0: fill column-by-column 1: fill row-by-row
limit	int, default None. Maximum number of consecutive NaNs to fill. Must be greater than 0.
limit_direction	{‘forward’, ‘backward’, ‘both’}, default ‘forward’ If limit is specified, consecutive NaNs will be filled in this direction. New in version 0.17.0.
inplace	bool, default False Update the NDFrame in place if possible.
downcast	optional, ‘infer’ or None, defaults to None Downcast dtypes if possible.
kwargs	keyword arguments to pass on to the interpolating function.

See also： reindex, replace, fillna

参考：
pandas.core.resample.Resampler.interpolate

【python】无规律时间步长时序数据转为固定步长相关推荐

时序分析 44 -- 时序数据转为空间数据 (三) 格拉姆角场 python 实践（上）
格拉姆角场 python实践时序预测问题是一个古老的问题了,在笔者关于时序分析的系列中已经介绍了多种时序预测分析技术和方法.本篇我们将使用一种新的思路来进行时序预测:对金融数据进行GAF(格拉姆角场 ...
时序分析 43 -- 时序数据转为空间数据 (二) 马尔可夫转换场
马尔可夫转换场(MRF,Markov Transition Fields) MRF 马尔可夫转换场(MRF, Markov Transition Fields)比GAF要简单一些,其数学模型对于从事数 ...
python mk趋势检验_时序数据常用趋势检测方法
背景在最近的项目中,需要自动检测某段时间内的某个指标是上升了还是下降了,因此需要研究下常用的时序数据趋势检测方法. 方法一斜率法原理斜率法的原理就是使用最小二乘等方法对时序数据进行拟合,然后根 ...
Python正则化匹配读取txt数据转为list列表
1. txt文本数据今有txt存放的文本数据格式为要求将其数据提取出来,形成坐标点形式 2. 实现代码 #!/usr/bin/python # -*-coding:utf-8 -*- __auth ...
python numpy.ndarray中的数据转为int型
首先了解内容与类型 >>>print(a)(array([[0.01124722],[0.21752586],[0.05586815],[0.03558792]]), array([ ...
Pandas时序数据Time Series
深入浅出Pandas读书笔记 C14 Pandas时序数据Time Series 14.1 固定时间 14.1.1 时间的表示时间戳Timestamp Pandas是在Python的datetime ...
cnn 一维时序数据_蚂蚁集团智能监控的时序异常检测：基于 CNN 神经网络的异常检测...
1 背景在蚂蚁集团智能监控领域,时序异常检测是极重要一环,异常检测落地中,业务方参考业界标准输出 Metrics 指标数据,监控不同业务.应用.接口.集群的各项指标,包含 Metrics 指标(总量. ...
python使用pandas通过聚合获取时序数据的最后一个指标数据（例如长度指标、时间指标)生成标签并与原表连接（join）进行不同标签特征的可视化分析
python使用pandas通过聚合获取时序数据的最后一个指标数据(例如长度指标.时间指标)生成标签并与原表连接(join)进行不同标签特征的可视化分析目录
python 特征工程_[译] 基于时序数据的特征工程 --- Python实现
基于时序数据的回归预测问题,在工作中经常遇到的.它与一般的监督学习的回归模型的区别在于数据本身是基于时序的.而常用的时序预测模型,比如arima等,添加其他特征时又不方便,不得不求助于经典的监督学习预 ...

【python】无规律时间步长时序数据转为固定步长

【python】无规律时间步长时序数据转为固定步长相关推荐

最新文章

热门文章