时间序列分析基础

# !pip install jupyterthemes

https://github.com/dunovank/jupyter-themes

修改jupyter notebook 主题和界面宽度，字体

在cmd中，键入jt -t grade3 -cellw 90% -fs 14 -ofs 14

import pandas as pd
import numpy as np
np.random.seed(1206)import matplotlib.pyplot as plt
%matplotlib inline
# 解决坐标轴刻度负号乱码
plt.rcParams['axes.unicode_minus'] = False# 解决中文乱码问题
plt.rcParams['font.sans-serif'] = ['Simhei']from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 18, 6import scipy.stats as statsimport statsmodels.api as sm
import statsmodels.tsa.api as smt
from statsmodels.tsa.arima_process import arma_generate_sample
from statsmodels.tsa.stattools import adfuller
from statsmodels.stats.diagnostic import acorr_ljungboximport warnings
warnings.filterwarnings("ignore")

"""
在这儿：https://github.com/statsmodels/statsmodels/tree/master/statsmodels/tsa
下载holtwinters.py，tsamodel.py，并放在当前脚本所在路径下
"""import sys, os
sys.path.append(os.getcwd())from holtwinters import ExponentialSmoothing, SimpleExpSmoothing, Holt

第一部分：

白噪声及其检验，AR，MA，ARMA，差分

白噪声

# 正太分布，白噪声就是一列独立分布的正态序列
def norm(mu, sigma, size, seed):np.random.seed(seed)s = np.random.normal(mu,sigma,size)return s

import matplotlib.mlab as mlabx = norm(0, 1, 1000, 1206)fig,axs = plt.subplots(1,2,figsize=(20,4))
axs[0].plot(x)
axs[0].set_ylabel('Frequency')
n, bins, patches = axs[1].hist(x, 50, width=.6, normed=1)
y = mlab.normpdf(bins, 0, 1)
plt.plot(bins, y, '--');

[外链图片转存(img-sHu9wKbC-1562728996727)(output_8_0.png)]

白噪声检验

# stats.shapiro?

Perform the Shapiro-Wilk test for normality.The Shapiro-Wilk test tests the null hypothesis that the data was drawn from a normal distribution.
Returns

W : float - The test statistic.

p-value : float - The p-value for the hypothesis test.

W检验全称Shapiro-Wilk检验，是一种基于相关性的算法。计算可得到一个相关系数，它越接近1就越表明数据和正态分布拟合得越好。

print('Shapiro-Wilk test:', stats.shapiro(x))

Shapiro-Wilk test: (0.998481035232544, 0.5423095226287842)

# stats.jarque_bera?

Perform the Jarque-Bera goodness of fit test on sample data.The Jarque-Bera test tests whether the sample data has the skewness and kurtosis matching a normal distribution.

Jarque-Bera 检验是一种拟合优度检验，检验是否样本和正态分布比较有偏度和峰度

print('Jarque-Bera test:', stats.jarque_bera(x))

Jarque-Bera test: (3.5855535668417966, 0.1664972005159684)

# qq 图
sm.qqplot(x,line='45')
plt.title('qq 图');

[外链图片转存(img-Ny2PjGmC-1562728996733)(output_16_0.png)]

制作ACF, PACF

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

plot_acf(x,lags=31)

[外链图片转存(img-YiZ2xXYV-1562728996734)(output_19_0.png)]

[外链图片转存(img-WPpcnZkF-1562728996734)(output_19_1.png)]

# 自相关和偏相关图，默认阶数为31阶
def draw_acf_pacf(ts, subtitle, lags=31):print("自相关图和偏相关图,maxlags={}".format(lags))f = plt.figure(facecolor='white', figsize=(18,4))ax1 = f.add_subplot(121)plot_acf(ts, lags=lags, ax=ax1, title='ACF\n{}'.format(subtitle))ax2 = f.add_subplot(122)plot_pacf(ts, lags=lags, ax=ax2, title='PACF\n{}'.format(subtitle))plt.show()

draw_acf_pacf(x,'白噪声')

自相关图和偏相关图,maxlags=31

[外链图片转存(img-E5yUULPV-1562728996734)(output_21_1.png)]

AR§

1 阶自回归模型

def simulate_ar1(p, size=1000, seed=1206):
#     生成随机数x = norm(0,1,size,seed)w = norm(0,1,size,seed)
#     生成指定p阶的AR序列for i in range(1,size):x[i] = p * x[i-1] + w[i]
#   绘制时序图plt.plot(x)title = " X[t] = {} * X[t-1] + E(t)".format(p)plt.title(title)plt.xlabel('Time')
#   绘制ACF，PACFdraw_acf_pacf(x,title)

X[t] = 0.6 * X[t-1] + E(t)

simulate_ar1(0.6)

自相关图和偏相关图,maxlags=31

[外链图片转存(img-TEaxsNo8-1562728996735)(output_26_1.png)]

[外链图片转存(img-63AZ8TDA-1562728996735)(output_26_2.png)]

X[t] = - 0.6 * X[t-1] + E(t)

simulate_ar1(-0.6)

自相关图和偏相关图,maxlags=31

[外链图片转存(img-hsg3XyTT-1562728996735)(output_28_1.png)]

[外链图片转存(img-gp4xmDnI-1562728996736)(output_28_2.png)]

2 阶自回归模型

def simulate_ar2(p1, p2, size=1000, seed=1206):
#     生成随机数x = norm(0,1,size,seed)w = norm(0,1,size,seed)
#     生成指定p阶的AR序列for i in range(2,size):x[i]=p1 * x[i-1] + p2 * x[i-2] + w[i]
#   绘制时序图plt.plot(x)title = " X[t] = {} * X[t-1] + {} * X[t-2] + E(t)".format(p1,p2)plt.title(title)plt.xlabel('Time')
#   绘制ACF，PACFdraw_acf_pacf(x,title)

X[t] = 0.3 * X[t-1] + 0.4 X[t-2 ]+ E(t)

simulate_ar2(.3,.4)

自相关图和偏相关图,maxlags=31

[外链图片转存(img-wEOVSv4j-1562728996736)(output_32_1.png)]

[外链图片转存(img-PU9iVTM5-1562728996736)(output_32_2.png)]

MA(q)

def simulate_ma(q, size=1000, seed=1206):
#     生成随机数x = norm(0,1,size,seed)w = norm(0,1,size,seed)
#     生成指定p阶的AR序列for i in range(len(q),size):for j,k in enumerate(q):y = k * w[i-j-1]x[i] = y + w[i]
#   绘制时序图plt.plot(x)e = ["{1} * E[t-{0}]".format(i+1,j) for i,j in enumerate(q)]title = " Y[t] = E(t) + " + " + ".join(e)plt.title(title)plt.xlabel('Time')
#   绘制ACF，PACFdraw_acf_pacf(x,title)

1 阶移动平均模型

Y[t] = E(t) + 0.8 * E(t-1)

simulate_ma([0.8])

自相关图和偏相关图,maxlags=31

[外链图片转存(img-OVWbgnEx-1562728996737)(output_37_1.png)]

[外链图片转存(img-zNK9DG7p-1562728996737)(output_37_2.png)]

Y[t] = E(t) - 0.6 * E(t-1)

simulate_ma([-0.6])

自相关图和偏相关图,maxlags=31

[外链图片转存(img-KsHpLanb-1562728996738)(output_39_1.png)]

[外链图片转存(img-OLzQO3Ge-1562728996738)(output_39_2.png)]

2 阶移动平均模型

Y[t] = E(t) + 0.8 * E(t-1) + 0.7 * E(t-2)

simulate_ma([0.8,0.7])

自相关图和偏相关图,maxlags=31

[外链图片转存(img-ayCMsVVf-1562728996738)(output_42_1.png)]

[外链图片转存(img-CmqpKRlK-1562728996738)(output_42_2.png)]

ARMA(p,q)

def simulate_arma(p, q, nsample=1000, seed=1206):np.random.seed(seed)p1 = np.array(p)q1 = np.array(q)ar = np.r_[1, -p1]ma = np.r_[1, q1]arma_sample = arma_generate_sample(ar=ar, ma=ma, nsample=1000)
#     时序图plt.plot(arma_sample)p2 = ["{1} * X[t-{0}]".format(i+1,j) for i,j in enumerate(p)]q2 = ["{1} * E[t-{0}]".format(i+1,j) for i,j in enumerate(q)]title = " X[t] = E(t) + " + " + ".join(p2) + " + " + " + ".join(q2)plt.title(title)plt.xlabel('Time')
#   绘制ACF，PACFdraw_acf_pacf(arma_sample,title)

ARMA(0.8, 0.7)

simulate_arma([0.8], [0.7])

自相关图和偏相关图,maxlags=31

[外链图片转存(img-cPoxhUPn-1562728996739)(output_46_2.png)]

ARMA([0.6, -0.8], [0.5])

simulate_arma([0.6, -0.8], [0.5])

自相关图和偏相关图,maxlags=31

[外链图片转存(img-yVIqV2nz-1562728996739)(output_48_1.png)]

[外链图片转存(img-nCXbaQFy-1562728996740)(output_48_2.png)]

ARMA([0.5, 0.4], [0.8, 0.6])

simulate_arma([0.5, 0.4], [0.8, 0.6])

自相关图和偏相关图,maxlags=31

[外链图片转存(img-cZn4UOgI-1562728996740)(output_50_1.png)]

[外链图片转存(img-wziXw395-1562728996740)(output_50_2.png)]

ARIMA model(p,d,q)

ARIMA([0.6],1,0)

df = pd.read_csv('ARIMA([0.6],1,0).csv', index_col=0)
df.head()

	x
1	0.075630
2	1.142068
3	2.672252
4	3.501414
5	3.621056

ts = df['x']
plt.plot(ts)
plt.xlabel('Time');

[外链图片转存(img-RMbaMZKA-1562728996741)(output_54_0.png)]

draw_acf_pacf(ts, 'ARIMA([0.6],1,0)', 50)

自相关图和偏相关图,maxlags=50

[外链图片转存(img-ZNgTVZSP-1562728996741)(output_55_1.png)]

from statsmodels.tsa.stattools import adfullerdef test_stationarity(timeseries):# Perform Dickey-Fuller test:print('Results of Dickey-Fuller Test:')dftest = adfuller(timeseries, autolag='AIC')dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])for key,value in dftest[4].items():dfoutput['Critical Value ({})'.format(key)] = valueprint(dfoutput)

test_stationarity(ts)

Results of Dickey-Fuller Test:
Test Statistic                  -1.643390
p-value                          0.460490
#Lags Used                       1.000000
Number of Observations Used    998.000000
Critical Value (1%)             -3.436919
Critical Value (5%)             -2.864440
Critical Value (10%)            -2.568314
dtype: float64

p-value = 0.460490 > 0.05，序列非平稳

# 差分
ts_diff = ts.diff(1)
ts_diff = ts_diff[1:]
plt.plot(ts_diff)
plt.xlabel('Time');

[外链图片转存(img-77opmkDS-1562728996741)(output_59_0.png)]

draw_acf_pacf(ts_diff, 'ARIMA([0.6],1,0) - 一阶差分后', 50)

自相关图和偏相关图,maxlags=50

[外链图片转存(img-ASNpskpR-1562728996741)(output_60_1.png)]

test_stationarity(ts_diff)

Results of Dickey-Fuller Test:
Test Statistic                -1.492171e+01
p-value                        1.408389e-27
#Lags Used                     0.000000e+00
Number of Observations Used    9.980000e+02
Critical Value (1%)           -3.436919e+00
Critical Value (5%)           -2.864440e+00
Critical Value (10%)          -2.568314e+00
dtype: float64

p-value = 1.408389e-27 < 0.05，可以在99%的置信度下认为平稳，而且从时序图和ACF图，PACF图上看出序列更加平稳

第二部分：Box - Jenkins建模流程

第一步：检验平稳性

第二步：若时间序列非平稳，进行差分，使其变的平稳

第三步：模型识别

draw_acf_pacf(ts_diff, 'ARIMA([0.6],1,0) - 一阶差分后', 50)

自相关图和偏相关图,maxlags=50

[外链图片转存(img-NBbVB2e6-1562728996742)(output_67_1.png)]

第四步：模型拟合

index = pd.date_range(start='2016-01-01', periods=1000, freq='D')
df.index = index
df.head()

	x
2016-01-01	0.075630
2016-01-02	1.142068
2016-01-03	2.672252
2016-01-04	3.501414
2016-01-05	3.621056

df.tail()

	x
2018-09-22	-5.654425
2018-09-23	-4.833006
2018-09-24	-2.760148
2018-09-25	-0.951620
2018-09-26	-0.088705

df.plot();

[外链图片转存(img-UqLK91QZ-1562728996742)(output_71_0.png)]

arima_1 = smt.ARIMA(df,order=(1,1,0)).fit()
print(arima_1.summary())

                             ARIMA Model Results
==============================================================================
Dep. Variable:                    D.x   No. Observations:                  999
Model:                 ARIMA(1, 1, 0)   Log Likelihood               -1443.650
Method:                       css-mle   S.D. of innovations              1.026
Date:                Mon, 03 Sep 2018   AIC                           2893.300
Time:                        16:14:44   BIC                           2908.020
Sample:                    01-02-2016   HQIC                          2898.895- 09-26-2018
==============================================================================coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0032      0.089      0.036      0.971      -0.171       0.177
ar.L1.D.x      0.6345      0.024     25.969      0.000       0.587       0.682Roots
=============================================================================Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            1.5761           +0.0000j            1.5761            0.0000
-----------------------------------------------------------------------------

第五步：模型检验

# 白噪声检验：Ljung-Box test
def randomness(ts, lags=31):rdtest = acorr_ljungbox(ts,lags=lags)# 对上述函数求得的值进行语义描述rddata = np.c_[range(1,lags+1),rdtest[1:][0]]rdoutput = pd.DataFrame(rddata,columns=['lags','p-value'])return rdoutput.set_index('lags')

plt.plot(arima_1.resid)
plt.ylabel('Residual')
plt.title('残差图');

[外链图片转存(img-OCkjbnLf-1562728996742)(output_75_0.png)]

draw_acf_pacf(arima_1.resid,'残差')

自相关图和偏相关图,maxlags=31

[外链图片转存(img-xqtLh13n-1562728996743)(output_76_1.png)]

randomness(arima_1.resid, 7)

	p-value
lags
1.0	0.904765
2.0	0.861410
3.0	0.950762
4.0	0.899331
5.0	0.953956
6.0	0.670435
7.0	0.707632

第六步：模型预测

arima_1.plot_predict(plot_insample=True)

[外链图片转存(img-DOfkAfhN-1562728996743)(output_79_0.png)]

[外链图片转存(img-gu9L8xgA-1562728996743)(output_79_1.png)]

# arima_1.plot_predict?
# arima_1.plot_predict(start=None, end=None, exog=None, dynamic=False, alpha=0.05, plot_insample=True, ax=None)

arima_1.plot_predict(plot_insample=True, dynamic=True)

[外链图片转存(img-DEJlzTZ9-1562728996744)(output_81_0.png)]

[外链图片转存(img-4wlSGuF1-1562728996744)(output_81_1.png)]

min(df.index), max(df.index)

(Timestamp('2016-01-01 00:00:00', freq='D'),Timestamp('2018-09-26 00:00:00', freq='D'))

fig, ax = plt.subplots(figsize=(20, 8))
arima_1.plot_predict('2016-01-02','2018-10-27',plot_insample=True,dynamic=False,ax=ax)

[外链图片转存(img-pAvdyiDd-1562728996744)(output_83_0.png)]

[外链图片转存(img-p01eVqq0-1562728996745)(output_83_1.png)]

# arima_1.forecast?
# arima_1.forecast(steps=1, exog=None, alpha=0.05)

pred = arima_1.forecast(5)
# pred

pred_diff = pd.concat([pd.DataFrame(pred[0], columns=['点估计']),pd.DataFrame(pred[1], columns=['标准差']),pd.DataFrame(pred[2], columns=['区间估计下限(95%)','区间估计下限(95%)'])],axis=1)
pred_diff

	点估计	标准差	区间估计下限(95%)	区间估计下限(95%)
0	0.459953	1.026237	-1.551435	2.471340
1	0.809223	1.966390	-3.044831	4.663276
2	1.031987	2.869979	-4.593069	6.657042
3	1.174486	3.710991	-6.098923	8.447895
4	1.266059	4.485117	-7.524609	10.056727

arima_1.predict('2018-09-27','2018-10-26')

2018-09-27    0.548658
2018-09-28    0.349270
2018-09-29    0.222764
2018-09-30    0.142499
2018-10-01    0.091573
2018-10-02    0.059262
2018-10-03    0.038761
2018-10-04    0.025754
2018-10-05    0.017502
2018-10-06    0.012266
2018-10-07    0.008943
2018-10-08    0.006836
2018-10-09    0.005498
2018-10-10    0.004650
2018-10-11    0.004111
2018-10-12    0.003770
2018-10-13    0.003553
2018-10-14    0.003415
2018-10-15    0.003328
2018-10-16    0.003273
2018-10-17    0.003238
2018-10-18    0.003215
2018-10-19    0.003201
2018-10-20    0.003192
2018-10-21    0.003187
2018-10-22    0.003183
2018-10-23    0.003181
2018-10-24    0.003179
2018-10-25    0.003178
2018-10-26    0.003178
Freq: D, dtype: float64

plt.plot(arima_1.predict('2016-01-02','2018-09-26'));

[外链图片转存(img-IaizfhSZ-1562728996745)(output_88_0.png)]

plt.plot(arima_1.fittedvalues);

[外链图片转存(img-vAceP0mR-1562728996745)(output_89_0.png)]

predict和fittedvalues是差分后的预测值，需要手动还原！！！

Python时间序列建模基础相关推荐

python数学建模基础教程_Python 3破冰人工智能从入门到实战大学生数学建模竞赛数学建模算法与应用教程机器学习深度学...
第 1章从数学建模到人工智能1 1.1 数学建模 1 1.1.1 数学建模与人工智能 1 1.1.2 数学建模中的常见问题 4 1.2 人工智能下的数学 12 1.2.1 统计量 ...
python数学建模基础教程_Python数学建模极简入门（二）差分方程
差分方程这名字大家可能不太熟悉,其实差分方程指的是下面这种: 差分方程其实就是我们数学中数列的递推公式,在以前学数学的时候,往往要通过递推公式来求通项公式才能快速地得到某一项的值,现在借助编程的话, ...
Python 时间序列建模：用指数平滑法预测股价走势
指数平滑方法适用于非平稳数据(即具有趋势和/或季节性的数据),其工作方式类似于指数移动平均线.预测是过去观察的加权平均值.这些模型更加强调最近的观察结果,因为权重随时间呈指数级变小.平滑方法很受欢迎, ...
python平稳性检验_时间序列预测基础教程系列(14)_如何判断时间序列数据是否是平稳的(Python)...
时间序列预测基础教程系列(14)_如何判断时间序列数据是否是平稳的(Python) 发布时间:2019-01-10 00:02, 浏览次数:620 , 标签: Python 导读: 本文介绍了数据平稳 ...
python画函数求交点_python3数学建模基础（四）多个函数图像求交点
python3数学建模基础(四)多个函数图像求交点,多个,交点,生姜,建模,函数 python3数学建模基础(四)多个函数图像求交点 python3数学建模基础(四)多个函数图像求交点本文以sin( ...
python时间序列预测不连续怎么办_手把手教你用Python处理非平稳时间序列（附代码）...
本文约3600字,建议阅读10分钟. 本文将重点介绍时间序列数据的平稳性检验方法. 简介预测一个家庭未来三个月的用电量,估计特定时期道路上的交通流量,预测一只股票在纽约证券交易所交易的价格--这些问 ...
python数据可视化基础
为了满足兄弟的需求,我强迫自己把数据可视化基础又从头到尾复习了一遍, 我总结了利用python实现可视化的三个步骤: 确定问题,选择图形转换数据,应用函数参数设置,一目了然 1 首先,要知道我们用 ...
python 财务报表建模_Python进行统计建模
前言大家好,在之前的文章中我们已经讲解了很多Python数据处理的方法比如读取数据.缺失值处理.数据降维等,也介绍了一些数据可视化的方法如Matplotlib.pyecharts等,那么在掌握了这些 ...
python与金融建模_【用Python金融建模】从二叉树谈起:衍生品Option期权定价模型的构建...
内容首发乐学偶得(http://lexueoude.com) 公众号: 乐学Fintech 用代码理解分析解决金融问题在金融里面很多地方都出现过一个理念就是"货币的时间价值", ...
《Python科学计算基础教程》 -- 读书笔记
文章目录 Python科学计算基础教程代码路径 http://www.github.com/sundaygeek/MasteringPythonScirntificComputing 第1章科学计 ...

Python时间序列建模基础

时间序列分析基础

修改jupyter notebook 主题和界面宽度，字体

第一部分：

白噪声及其检验，AR，MA，ARMA，差分

白噪声

白噪声检验

Perform the Shapiro-Wilk test for normality.The Shapiro-Wilk test tests the null hypothesis that the data was drawn from a normal distribution.
Returns

制作ACF, PACF

AR§

1 阶自回归模型

X[t] = 0.6 * X[t-1] + E(t)

X[t] = - 0.6 * X[t-1] + E(t)

2 阶自回归模型

X[t] = 0.3 * X[t-1] + 0.4 X[t-2 ]+ E(t)

MA(q)

1 阶移动平均模型

Y[t] = E(t) + 0.8 * E(t-1)

Y[t] = E(t) - 0.6 * E(t-1)

2 阶移动平均模型

Y[t] = E(t) + 0.8 * E(t-1) + 0.7 * E(t-2)

ARMA(p,q)

ARMA(0.8, 0.7)

ARMA([0.6, -0.8], [0.5])

ARMA([0.5, 0.4], [0.8, 0.6])

ARIMA model(p,d,q)

ARIMA([0.6],1,0)

第二部分：Box - Jenkins建模流程

第一步：检验平稳性

第二步：若时间序列非平稳，进行差分，使其变的平稳

第三步：模型识别

第四步：模型拟合

第五步：模型检验

第六步：模型预测

predict和fittedvalues是差分后的预测值，需要手动还原！！！

Python时间序列建模基础相关推荐

最新文章

热门文章

Python时间序列建模基础

时间序列分析基础

修改jupyter notebook 主题和界面宽度，字体

第一部分：

白噪声及其检验，AR，MA，ARMA，差分

白噪声

白噪声检验

Perform the Shapiro-Wilk test for normality.The Shapiro-Wilk test tests the null hypothesis that the data was drawn from a normal distribution. Returns

制作ACF, PACF

AR§

1 阶自回归模型

X[t] = 0.6 * X[t-1] + E(t)

X[t] = - 0.6 * X[t-1] + E(t)

2 阶自回归模型

X[t] = 0.3 * X[t-1] + 0.4 X[t-2 ]+ E(t)

MA(q)

1 阶移动平均模型

Y[t] = E(t) + 0.8 * E(t-1)

Y[t] = E(t) - 0.6 * E(t-1)

2 阶移动平均模型

Y[t] = E(t) + 0.8 * E(t-1) + 0.7 * E(t-2)

ARMA(p,q)

ARMA(0.8, 0.7)

ARMA([0.6, -0.8], [0.5])

ARMA([0.5, 0.4], [0.8, 0.6])

ARIMA model(p,d,q)

ARIMA([0.6],1,0)

第二部分：Box - Jenkins建模流程

第一步：检验平稳性

第二步：若时间序列非平稳，进行差分，使其变的平稳

第三步：模型识别

第四步： 模型拟合

第五步： 模型检验

第六步： 模型预测

predict和fittedvalues是差分后的预测值，需要手动还原！！！

Python时间序列建模基础相关推荐

最新文章

热门文章

Perform the Shapiro-Wilk test for normality.The Shapiro-Wilk test tests the null hypothesis that the data was drawn from a normal distribution.
Returns

第四步：模型拟合

第五步：模型检验

第六步：模型预测