时间序列分析(1)-移动平均法

文章目录

1.定义
2.移动平均法、指数平滑法和季节模型
- 1.移动平均法
- 2.二次移动平均法及趋势移动平均法

1.定义

时间序列是按时间顺序排列的、随时间变化且相互关联的数据序列。对时间序列进行观察研究，找寻它的发展规律，预测它将来的走势就是时间序列分析。

时间序列根据所研究的依据不同，可有不同的分类。

按所研究的对象的多少分，有一元时间序列和多元时间序列.
按时间的连续性可将时间序列分为离散时间序列和连续时间序列两种.
按序列的统计特性分，有平稳时间序列和非平稳时间序列.如果一个时间序列的概率分布与时间t无关，则称该序列为严格的（狭义的）平稳时间序列。如果序列的一、二阶矩存在，而且对任意时刻t满足：
1. 均值为常数；
2. 协方差为时间间隔τ的函数

则该该序列为宽平稳时间序列，也叫广义平稳时间序列。

对于这方面的内容，详情可以多参考概率论的随机过程部分-有相似之处。

按时间序列的分布规律来分，有高斯型时间序列和非高斯型时间序列。

本章的主要内容是分析一元的时间序列分析。

2.移动平均法、指数平滑法和季节模型

1.移动平均法

移动平均法是常用的时间序列预测方法，由于其简单而具有很好的实用价值.

设观测序列为y1，···，yT，取移动平均的项数N<T.一次移动平均值计算公式为

Mt(1)(N)=1N(yt+yt−1+⋅⋅⋅+yt−N+1)=1N∑i=0N−1yt−iM_t^{(1)}(N)=\frac{1}{N}(y_t+y_{t-1}+···+y_{t-N+1})=\frac{1}{N}\sum_{i=0}^{N-1}y_{t-i}Mt(1)(N)=N1(yt+yt−1+⋅⋅⋅+yt−N+1)=N1i=0∑N−1yt−i

则有：

Mt(1)(N)=1N(yt−1+⋅⋅⋅+yt−N)+1N(yt−yt−N)=Mt−1(1)(N)+1N(yt−yt−N)M_t^{(1)}(N)=\frac{1}{N}(y_{t-1}+···+y_{t-N})+\frac{1}{N}(y_t-y_{t-N)}=M_{t-1}^{(1)}(N)+\frac{1}{N}(y_t-y_{t-N})Mt(1)(N)=N1(yt−1+⋅⋅⋅+yt−N)+N1(yt−yt−N)=Mt−1(1)(N)+N1(yt−yt−N)

t+1期的预测值为yt+1^=Mt(1)(N)\hat{y_{t+1}}=M_t^{(1)}(N)yt+1^=Mt(1)(N)

其预测标准误差为：

S=∑t=N+1T(yt^−yt)2T−NS=\sqrt{\frac{\sum_{t=N+1}^{T}(\hat{y_t}-y_t)^2}{T-N}}S=T−N∑t=N+1T(yt^−yt)2

如果将yt+1^作为t+1期的实际值，那么就可以用yt+1^=Mt(1)(N)计算第t+2期预测值yt+2^.一般地，也可响应地求得以后各期的预测值。但由于越远时期的预测，误差越大，因此∗∗一次移动平均法一般仅应用于一个时期后的预测值（即预测第t+1期)∗∗如果将\hat{y_{t+1}}作为t+1期的实际值，那么就可以用\hat{y_{t+1}}=M_t^{(1)}(N)计算第t+2期预测值\hat{y_{t+2}}.一般地，也可响应地求得以后各期的预测值。但由于越远时期的预测，误差越大，因此**一次移动平均法一般仅应用于一个时期后的预测值（即预测第t+1期)**如果将yt+1^作为t+1期的实际值，那么就可以用yt+1^=Mt(1)(N)计算第t+2期预测值yt+2^.一般地，也可响应地求得以后各期的预测值。但由于越远时期的预测，误差越大，因此∗∗一次移动平均法一般仅应用于一个时期后的预测值（即预测第t+1期)∗∗

栗子：

汽车配件某年1~12年月份的化油器销售量（单位：只）统计数据见下表中第2行，试用一次移动平均法预测下一年1月份的销售量.

化油器销售量及一次移动平均法预测值表：

月份	1	2	3	4	5	6	7	8	9	10	11	12	预测
yi	423	358	434	445	527	429	502	480	384	427	446
N=3				405	412	469	467	461	452	469	455	430	419
N=5						437	439	452	466	473	444	444	448

分别取N=3，N=5，按预测公式

yt+1^(3)=Mt1(3)=yt+yt−1+yt−23,t=3,4,⋅⋅⋅,12\hat{y_{t+1}}(3)=M_t^{1}(3)=\frac{y_t+{y_{t-1}}+y_{t-2}}{3},t=3,4,···,12yt+1^(3)=Mt1(3)=3yt+yt−1+yt−2,t=3,4,⋅⋅⋅,12

yt+1^(5)=Mt1(3)=yt+yt−1+yt−2+yt−3+yt−43,t=5,6,⋅⋅⋅,12\hat{y_{t+1}}(5)=M_t^{1}(3)=\frac{y_t+{y_{t-1}}+y_{t-2}+y_{t-3}+y_{t-4}}{3},t=5,6,···,12yt+1^(5)=Mt1(3)=3yt+yt−1+yt−2+yt−3+yt−4,t=5,6,⋅⋅⋅,12

计算3个月和5个月移动平均预测值，分别见上表第三行和第四行。N=3时，预测的标准误差为56.5752；N=5时，预测的标准误差为39.8159.

通过预测后，可以看到，实际数据波动较大，经移动平均后，随机波动明显减少，且N越大，波动也越小。同时，也可以看到，一次移动平均法的预测标准误差还是有些大，对于实际数据波动较大的序列，一般较少采用此法进行预测。

代码实现：

import numpy as np
y=np.array([423,358,434,445,527,429,426,502,480,384,427,446])
def MoveAverage(y,N):Mt=['*']*Nfor i in range(N+1,len(y)+2):M=y[i-(N+1):i-1].mean()Mt.append(M)return Mt
yt3=MoveAverage(y,3)
s3=np.sqrt(((y[3:]-yt3[3:-1])**2).mean())
yt5=MoveAverage(y,5)
s5=np.sqrt(((y[5:]-yt5[5:-1])**2).mean())
print('N=3时,预测值：',yt3,'，预测的标准误差：',s3)
print('N=5时,预测值：',yt5,'，预测的标准误差：',s5)

简单移动平均使用的是等量加权策略，可以利用卷积，相应代码如下：

def sma(arr,n): weights=np.ones(n)/n  return np.convolve(weights,arr)[n-1:-n+1]

import numpy as np
y=np.array([423,358,434,445,527,429,426,502,480,384,427,446])
n1=3; yt1=np.convolve(np.ones(n1)/n1,y)[n1-1:-n1+1] ##左开右闭 np.convolve(np.ones(n1)/n1,y,mode='valid')  同样适用-且更加具有普适性
s1=np.sqrt(((y[n1:]-yt1[:-1])**2).mean())
n2=5; yt2=np.convolve(np.ones(n2)/n2,y)[n2-1:-n2+1]
s2=np.sqrt(((y[n2:]-yt2[:-1])**2).mean())
print('N=3时,预测值：',yt1,'，预测的标准误差：',s1)
print('N=5时,预测值：',yt2,'，预测的标准误差：',s2)

np.convolve(a, v, mode='full'):Returns the discrete, linear convolution of two one-dimensional sequences.The convolution operator is often seen in signal processing, where itmodels the effect of a linear time-invariant system on a signal [1]_.  Inprobability theory, the sum of two independent random variables isdistributed according to the convolution of their individualdistributions.If `v` is longer than `a`, the arrays are swapped before computation.Parameters----------a : (N,) array_likeFirst one-dimensional input array.v : (M,) array_likeSecond one-dimensional input array.mode : {'full', 'valid', 'same'}, optional'full':By default, mode is 'full'.  This returns the convolutionat each point of overlap, with an output shape of (N+M-1,). Atthe end-points of the convolution, the signals do not overlapcompletely, and boundary effects may be seen.'same':Mode 'same' returns output of length ``max(M, N)``.  Boundaryeffects are still visible.'valid':Mode 'valid' returns output of length``max(M, N) - min(M, N) + 1``.  The convolution product is only givenfor points where the signals overlap completely.  Values outsidethe signal boundary have no effect.Returns-------out : ndarrayDiscrete, linear convolution of `a` and `v`.See Also--------scipy.signal.fftconvolve : Convolve two arrays using the Fast FourierTransform.scipy.linalg.toeplitz : Used to construct the convolution operator.polymul : Polynomial multiplication. Same output as convolve, but alsoaccepts poly1d objects as input.By default, mode is 'full'.  This returns the convolutionat each point of overlap, with an output shape of (N+M-1,). Atthe end-points of the convolution, the signals do not overlapcompletely, and boundary effects may be seen.默认情况下，模式为“完整”。 这将返回卷积在每个重叠点，总的输出形状为（N + M-1，）。 在卷积的端点，信号不重叠完全可以看到边界效应。使用栗子：Examples--------Note how the convolution operator flips the second arraybefore "sliding" the two across one another:>>> np.convolve([1, 2, 3], [0, 1, 0.5])array([0. , 1. , 2.5, 4. , 1.5])Only return the middle values of the convolution.Contains boundary effects, where zeros are takeninto account:>>> np.convolve([1,2,3],[0,1,0.5], 'same')array([1. ,  2.5,  4. ])The two arrays are of the same length, so thereis only one position where they completely overlap:>>> np.convolve([1,2,3],[0,1,0.5], 'valid')array([2.5])了解什么是卷积：References----------.. [1] Wikipedia, "Convolution",https://en.wikipedia.org/wiki/Convolution卷积公式可以描述为在时刻t处函数f（τ）的加权平均值，其中权重由g（–τ）给出，仅移动量t即可。随着t的变化，加权函数会强调输入函数的不同部分。

离散序列的卷积求法

卷积关键词：权重-时间-叠加性-离散还是连续

2.二次移动平均法及趋势移动平均法

当预测变量的基本趋势发生变化时，一次移动平均法不能迅速适应这种变化.

当时间序列的变化为线性趋势时，一次移动平均法的滞后偏差使预测值偏低，不能进行合理的趋势外推。

二次移动平均法-就是对一次移动平均数再进行二次移动平均，再以一次移动平均值和二次移动平均值为基础建立预测模型，计算预测值的方法。

二次移动平均值计算公式为：

Mt(2)=1N(Mt(1)+⋅⋅⋅+Mt−N+1(1))=Mt−1(2)+1N(Mt(1)−Mt−N(1))M_t^{(2)}=\frac{1}{N}(M_t^{(1)}+···+M_{t-N+1}^{(1)})=M_{t-1}^{(2)}+\frac{1}{N}(M_t^{(1)}-M_{t-N}^{(1)})Mt(2)=N1(Mt(1)+⋅⋅⋅+Mt−N+1(1))=Mt−1(2)+N1(Mt(1)−Mt−N(1))

当预测目标的基本趋势是在某一水平上下波动时，可用一次移动平均方法建立预测模型。当预测目标的基本趋势与某一线性模型相吻合时，常用二次移动平均法.但序列同时存在线性趋势和周期波动时，可用趋势移动平均法建立预测模型:

yT+m^=aT+bTm,m=1,2,⋅⋅⋅,\hat{y_{T+m}}=a_T+b_T^{m},m=1,2,···,yT+m^=aT+bTm,m=1,2,⋅⋅⋅,

其中aT=2MT(1)−MT(2),bT=2N−1(MT(1)−MT(2)).其中a_T=2M_T^{(1)}-M_T^{(2)},b_T=\frac{2}{N-1}(M_T^{(1)}-M_T^{(2)}).其中aT=2MT(1)−MT(2),bT=N−12(MT(1)−MT(2)).