复习心烦，偶遇大作业，故摸鱼

作业题目

自由选取一组数据(可以是R 自带的数据集、或者其它来源，鼓励选取一些有趣的课题进行数据分析)，利用我们这学期所学知识建立恰当模型（ARIMA、GARCH 等），小组成员（不超过三人一组, 自由组队）中的一人安排 5 至 10 分钟左右的PPT 课堂展示（教学周第16 周、无需展示代码，仅汇报初步结果即可，我给出点评后再完善后提交）。内容基本要求如下:
• 说明数据来源背景, 需要分析解决的问题;
• 包含完整的建模过程(拟合、估计、诊断检验、预测)；
• 结论或者建议(依赖于你拟解决的问题)。
注意！！！数据量的选取需要适中, 留取部分原始数据作为评估模型预测能力使用。
此部分对最后成绩的贡献比例为20%。

Background of Data Set

A multivariate yearly time series from 1960 to 1979 with variables (Included in dataset TSA)

Test set and fit set

Select the first 60 data for fitting , while the last 12 (one year) as the test data to test the final model we struct (cut the data frame by using R function window( ))

Data preprocessing

Observe the image directly and there is an obvious logarithmic upward trend. Carry out differential processing on dataset——wages to eliminate the upward trend,

test the unit root(DK test), the result are as followings:

> adf.test(model)Augmented Dickey-Fuller Testdata:  model
Dickey-Fuller = -3.6536, Lag order = 3, p-value = 0.03639
alternative hypothesis: stationary

We are not satisfied with the ppp value. In order to avoid excessive difference, we try to use Box-Cox transformation. From the results, it can be seen that the value of λ\lambdaλ is 1. Although the value of ppp value for significance 0.01 is not good, it can only reduce the significance requirement (α\alphaα = 0.05), rather than searching complex alternatives.

We finally preprocess the dataset by difference : model = diff (log(waves)). From the figure below, we can see that the peaks and troughs are distributed regularly. Combined with the actual background and common sense of life of the data set, workers’ salary will increase at the end of the year. After all, everyone wants to run. Workers’ salary should be fluctuated in an annual cycle. We can see from the image of data after difference. The salary at the end of each year is relatively high, but there are three hidden dangers:
① the periodicity is not very obvious. Although it has practical significance, for the size of sample is so small.
② Although it is expected that the salary at the end of middle age should be high, the data in 1986 conflicts with this statement, which is likely to be related to the social background at that time, and this information is unknown to us. The prediction work we do later may not be so good
③ It can also be seen from the figure that the growth rate of wages has slowed down (because we have just done differential processing, in fact, wages have been rising all the time)

Choose Model

Calculate the ACF and PACF of the model. In fact, the diagrams of ACF and PACF can’t put forward feasible suggestions for us to choose ppp and qqq, hence it’s not necessary to calculate eacf. The results given must not be so good. We try to make a seasonal difference to the model.

After seasonal difference, it is also difficult to select appropriate models according to ACF and PACF images,

This model is a little troublesome, for model ARIMA(p,d,q)×(P,D,Q)s\rm ARIMA(p,d,q)\times(P,D,Q)_sARIMA(p,d,q)×(P,D,Q)s，now we juat know the value of ddd and DDD ARIMA(p,1,q)×(P,1,Q)11ARIMA(p,1,q)\times(P,1,Q)_{11}ARIMA(p,1,q)×(P,1,Q)11, Next we select the optimal subset based on the BIC criterion. The following two figures respectively select the optimal subset from fit_set and model2.

We consider the models：ARIMA(0,1,0)×(0,1,1)12ARIMA(0,1,0)\times(0,1,1)_{12}ARIMA(0,1,0)×(0,1,1)12,ARIMA(0,1,0)×(1,1,1)12ARIMA(0,1,0)\times(1,1,1)_{12}ARIMA(0,1,0)×(1,1,1)12,ARIMA(0,1,1)×(0,1,0)12ARIMA(0,1,1)\times(0,1,0)_{12}ARIMA(0,1,1)×(0,1,0)12,
ARIMA(0,1,0)×(1,1,0)12ARIMA(0,1,0)\times(1,1,0)_{12}ARIMA(0,1,0)×(1,1,0)12,ARIMA(1,1,0)×(0,1,0)12ARIMA(1,1,0)\times(0,1,0)_{12}ARIMA(1,1,0)×(0,1,0)12,ARIMA(1,1,0)×(1,1,1)12ARIMA(1,1,0)\times(1,1,1)_{12}ARIMA(1,1,0)×(1,1,1)12

Parameter estimation

a1=arima(fit_set,order=c(0,1,0),seasonal = list(order=c(0,1,1),period = 12))
a2=arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,1),period = 12))
a3=arima(fit_set,order=c(0,1,1),seasonal = list(order=c(0,1,0),period = 12))
a4=arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,0),period = 12))
a5=arima(fit_set,order=c(1,1,0),seasonal = list(order=c(0,1,0),period = 12))
a6=arima(fit_set,order=c(1,1,0),seasonal = list(order=c(1,1,1),period = 12))

Model diagnosis

Diagnose the six models by R function tsdiag( ). During the diagnosing, it was found that the Ljung box test of model a1 was not good, one point was below the critical line and one point was very close to the critical line; The residual autocorrelation diagram of model a3 and model a5 is also not good, there’re some multiple outliers, so we exclude model a1, model a3 and model a5 first. There are still model a2, model a4 and model a6 models left. From the results of parameter estimation and model diagnosis, these models have no obvious advantages or disadvantages. However, based on the modeling principle of “simple but not simpler”, we exclude model a6 and leave model a2 and model a4 models for final prediction. The following are the results of testing a2 and a4 model.

Model prediction

a2=stats::arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,1),period = 12))
a2_fore=forecast(a2,h=11,level = c(95))
plot(a2_fore)
lines(a2_fore$fitted,col="black")
lines(wages,col="red")

a4=stats::arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,0),period = 12))
a4_fore=forecast(a4,h=11,level = c(95))
plot(a4_fore)
lines(a4_fore$fitted,col="black")
lines(wages,col="red")

Conclusion

We can see the prediction is not successful, for each predictive value is a little higher than the true value. It may be caused by the logarithmic upward trend of the data. Differential processing is suitable for linear trends, but not for logarithmic trend. A possible solution is that we can add a approxiate penalization on t, in order to slow down the upward trend.

Code

library(forecast)
library(TSA)
library(xts)
library(tseries)
data(wages)
help(wages)
plot(wages)
# 分组
fit_set=window(wages,start=c(1981,7),end=c(1986,6))
test_set=window(wages,start=c(1986,7),end=c(1987,6))
# # 确定趋势
# vec_wage=as.vector(wages)
# newwage=data.frame(log_wage=log(vec_wage),day=c(1:length(vec_wage)))
# fit=lm(newwage$log_wage~newwage$day,data=newwage)
# summary(fit)
# rm(fit)
# 一次差分
model=diff(fit_set)
plot(model,ylab="model")
abline(h=0)
abline(v=c(1982:1986),lty=2)
adf.test(model)
# 尝试Box-Cox变换
b=BoxCox.lambda(model)
lambda=which.max(b)
b=BoxCox.ar(model-min(model)*1.1)
# 模型识别
par(mfrow=c(1,2))
acf(as.vector(model),lag.max=60,main="model")
pacf(as.vector(model),lag.max = 60,main="model")
# 考虑季节模型
model2=diff(model,lag = 12)
acf(as.vector(model2),lag.max=60,main="model2")
pacf(as.vector(model2),lag.max = 60,main="model2")
par(mfrow=c(1,1))
# 模型识别
par(mfrow=c(1,2))
dis=armasubsets(y=fit_set,nar=13,nma=13,y.name="test",ar.method="ols")
plot(dis)
dis=armasubsets(y=model2,nar=4,nma=4,y.name="diff(1)",ar.method="ols")
plot(dis)
# 模型拟合
a1=arima(fit_set,order=c(0,1,0),seasonal = list(order=c(0,1,1),period = 12))
a2=arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,1),period = 12))
a3=arima(fit_set,order=c(0,1,1),seasonal = list(order=c(0,1,0),period = 12))
a4=arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,0),period = 12))
a5=arima(fit_set,order=c(1,1,0),seasonal = list(order=c(0,1,0),period = 12))
a6=arima(fit_set,order=c(1,1,0),seasonal = list(order=c(1,1,1),period = 12))
# 模型诊断
tsdiag(a1)
tsdiag(a2)
tsdiag(a3)
tsdiag(a4)
tsdiag(a5)
tsdiag(a6)
# 模型预测
a2=stats::arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,1),period = 12))
a2_fore=forecast(a2,h=11,level = c(95))
plot(a2_fore)
lines(a2_fore$fitted,col="black")
lines(wages,col="red")a4=stats::arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,0),period = 12))
a4_fore=forecast(a4,h=11,level = c(95))
plot(a4_fore)
lines(a4_fore$fitted,col="black")
lines(wages,col="red")

R语言时间序列分析小例相关推荐

R语言时间序列分析之ARIMA模型预测
R语言时间序列分析之ARIMA模型预测今天学习ARIMA预测时间序列. 指数平滑法对于预测来说是非常有帮助的,而且它对时间序列上面连续的值之间相关性没有要求.但是,如果你想使用指数平滑法计算出预测区 ...
R语言时间序列分析-根据aic值选择arima模型
在上一篇中,探讨了R语言时间序列分析常用步骤,如何比对AIC值判断最优模型?代码和解释如下: #WWWusage是datasets包自带的每分钟通过服务器连接到因特网的用户数的长度为100的时间序列数 ...
基于R语言时间序列分析所有指令[2021]
文章主要是总结一学期所学,基本覆盖了所有常见的指令,足够完成arima模型的数据选择到模型预测. 时间序列应用广泛,不能仅仅局限于理论学习,代码实践更为重要. 往期文章链接: 基于 ARIM ...
r软件时间序列分析论文_高度比较的时间序列分析-一篇论文评论
r软件时间序列分析论文数据科学 , 机器学习 (Data Science, Machine Learning) In machine learning with time series, using ...
R语言-时间日期函数
R语言时间日期函数 1. 返回当前日期时间,有两种方式: Sys.time() date() 举例 format(Sys.time(), "%a %b %d %X %Y %Z")# ...
时间序列分析及应用r语言pdf_R语言时间序列分析（十一）：指数平滑法
作者:黄天元,复旦大学博士在读,热爱数据科学与开源工具(R),致力于利用数据科学迅速积累行业经验优势和科学知识发现,涉猎内容包括但不限于信息计量.机器学习.数据可视化.应用统计建模.知识图谱等,著有& ...
r语言结构方程模型可视化_R语言时间序列分析（二）：ts对象及其可视化
作者:黄天元,复旦大学博士在读,热爱数据科学与开源工具(R),致力于利用数据科学迅速积累行业经验优势和科学知识发现,涉猎内容包括但不限于信息计量.机器学习.数据可视化.应用统计建模.知识图谱等,著有& ...
arima 公式_R语言时间序列分析（十二）：ARIMA
作者:黄天元,复旦大学博士在读,热爱数据科学与开源工具(R),致力于利用数据科学迅速积累行业经验优势和科学知识发现,涉猎内容包括但不限于信息计量.机器学习.数据可视化.应用统计建模.知识图谱等,著有& ...
语言在msin函数验证_R语言时间序列分析（七）：模型准确度估计
作者:黄天元,复旦大学博士在读,热爱数据科学与开源工具(R),致力于利用数据科学迅速积累行业经验优势和科学知识发现,涉猎内容包括但不限于信息计量.机器学习.数据可视化.应用统计建模.知识图谱等,著有& ...

R语言时间序列分析小例