纪伯伦先知

Prophet is a forecasting model by Facebook that forecasts time series using special adjustments for factors such as seasonality, holiday periods, and changepoints.

Prophet是Facebook的一种预测模型,该模型使用针对季节性,假期和变更点等因素的特殊调整来预测时间序列。

Let’s investigate this further by building a Prophet model to forecast air passenger numbers.

让我们通过建立一个Prophet模型来预测航空乘客人数来进一步研究。

背景 (Background)

The dataset is sourced from the San Francisco International Airport Report on Monthly Passenger Traffic Statistics by Airline, which is available from data.world (Original Source: San Francisco Open Data) as indicated in the References section below.

该数据集来自《 旧金山国际机场每月航空客运量统计报告》 ,该报告可从data.world(原始来源:旧金山开放数据)获得,如以下“参考”部分所述。

Specifically, adjusted passenger numbers for the airline KLM (enplaned) are filtered as the time series for analysis from the period May 2005 to March 2016.

具体来说,将筛选出荷航 (预定)的已调整乘客人数,作为从2005年5月到2016年3月的时间序列进行分析。

Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

As we can see, the time series shows quite a stationary pattern (one where there is a constant mean, variance and autocorrelation.

如我们所见,时间序列显示出非常平稳的模式(均值,方差和自相关恒定)。

We will not formally test for this condition here, but it is also evident that there appears to be significant seasonality present in the dataset — i.e. significant shifts in the time series trend that occur at certain time intervals.

我们不会在这里正式测试这种情况,但是很显然,数据集中似乎存在明显的季节性变化-即在特定时间间隔发生的时间序列趋势发生了重大变化。

From a visual inspection of the time series, it would appear that this shift happens approximately every eight months or so.

从时间序列的目视检查来看,这种转变似乎大约每八个月发生一次。

建筑模型 (Model Building)

With that in mind, let’s get started on building a forecasting model.

考虑到这一点,让我们开始构建预测模型。

The first step is to properly format the data in order to work with Prophet.

第一步是正确格式化数据,以便与先知一起使用。

train_dataset= pd.DataFrame()train_dataset['ds'] = train_df['Date']train_dataset['y']= train_df['Adjusted Passenger Count']train_dataset.head(115)
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

Here is the dataframe that will be used as the test set (the part of the time series we are trying to predict), with the time interval defined as monthly:

这是将用作测试集(我们尝试预测的时间序列的一部分)的数据帧,其时间间隔定义为monthly

future= prophet_basic.make_future_dataframe(periods=14, freq='M')future.tail(15)
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

We firstly define a model as follows:

我们首先定义一个模型,如下所示:

prophet_basic = Prophet()prophet_basic.fit(train_dataset)

Here is a plot of the forecast:

这是预测的图:

forecast=prophet_basic.predict(future)fig1 =prophet_basic.plot(forecast)
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

Here are the components of the forecast:

以下是预测的组成部分:

fig1 = prophet_basic.plot_components(forecast)

Some observations:

一些观察:

  • We can see that there is a significant growth in the trend from 2007 up until 2009, with passenger numbers levelling off after that.我们可以看到,从2007年到2009年,这一趋势有了显着增长,此后旅客人数趋于稳定。
  • We also observe that passenger numbers appear to be highest from approximately May — September, after which we see a dip in numbers for the rest of the year.我们还观察到,大约5月至9月的旅客人数似乎最高,此后,其余年份的旅客人数均出现下降。

Note that we observed visually that seasonality appears to be present in the dataset. However, given that we are working with a monthly dataset — we will not use Prophet to explicitly model seasonality in this instance.

请注意,我们从视觉上观察到数据集中似乎存在季节性。 但是,鉴于我们正在使用每月数据集-在这种情况下,我们将不会使用Prophet显式模拟季节性。

There are two reasons for this:

有两个原因:

  1. Detection of seasonality would be more accurate if we were using daily data — but we are not in this case.如果我们使用每日数据,则季节性的检测将更加准确-但在这种情况下我们不这样做。
  2. Making an assumption of yearly seasonality may not be particularly accurate in this case. Inspecting the dataset shows that while certain seasonal shifts occur every year, others occur every 6 to 8 months. Therefore, explicitly defining a seasonality parameter in the model may do more harm than good in this instance.在这种情况下,假设年度季节性可能不是特别准确。 检查数据集可以发现,尽管每年发生某些季节性变化,但每6至8个月发生一次。 因此,在这种情况下,在模型中明确定义季节性参数可能弊大于利。

变更点 (Changepoints)

A changepoint represents a significant structural shift in a time series.

变更点表示时间序列中的重大结构变化。

For instance, the big drop in air passenger numbers after the onset of COVID-19 would represent a significant structural shift in the data.

例如,COVID-19发作后航空旅客人数的大幅下降将代表数据的重大结构变化。

For instance, here is the indicated changepoints on the model when the appropriate parameter is set to 4.

例如,当适当的参数设置为4时,这是模型上指示的更改点。

pro_change= Prophet(n_changepoints=4)forecast = pro_change.fit(train_dataset).predict(future)fig= pro_change.plot(forecast);a = add_changepoints_to_plot(fig.gca(), pro_change, forecast)
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

We see that the significant changepoint as indicated in the model lies between 2007–2009.

我们看到模型中指出的重要变更点位于2007–2009年之间。

What is interesting is that while passenger numbers did see a significant decline for 2009 — numbers were still higher on average for this period than for 2005–2007, indicating that the overall demand for air travel (for KLM flights from San Francisco at least) actually grew towards the end of the decade.

有趣的是,尽管2009年的旅客人数确实出现了大幅下降,但此期间的平均人数仍比2005-2007年的平均人数要高,这表明航空旅行的总体需求(至少从旧金山出发的荷航航班)在本世纪末增长。

模型验证 (Model Validation)

Now that the forecasting model has been built, the predicted passenger numbers are compared to the test set in order to determine model accuracy.

现在已经建立了预测模型,将预测的乘客人数与测试集进行比较,以确定模型的准确性。

With the changepoint set to 4, we obtain the following error metrics:

将changepoint设置为4 ,我们获得以下错误度量:

  • Root Mean Squared Error: 524均方根误差:524
  • Mean Forecast Error: 71平均预测误差:71
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

With a mean of 8,799 passengers per month — the errors are quite low in comparison to this figure — indicating that the model is performing well in forecasting monthly passenger numbers.

每月平均有8,799名乘客-与该数字相比,误差非常低-表明该模型在预测每月乘客人数方面表现良好。

However, it is important to note that the model accuracy is significantly influenced by the changepoint parameter.

但是,必须注意的是,模型精度受更改点参数的影响很大。

Let’s see what happens to the RMSE when the changepoints are modified.

让我们看看修改变更点后RMSE会发生什么。

Source: Author’s Calculations
资料来源:作者的计算

We can see that the RMSE drops quite dramatically as more changepoints are introduced — but the RMSE is minimised at 4 changepoints.

我们可以看到,随着引入更多的变更点,RMSE急剧下降-但在4个变更点处,RMSE最小化。

结论 (Conclusion)

In this example, you have seen:

在此示例中,您已经看到:

  • How Prophet can be used to make time series forecasts如何使用先知来进行时间序列预测
  • How to analyse trends and seasonal fluctuations using Prophet如何使用先知分析趋势和季节性波动
  • The importance of changepoints in determining model accuracy变更点在确定模型准确性中的重要性

Hope you found this of use, and grateful for any thoughts or feedback!

希望您发现了这种用法,并感谢您的任何想法或反馈!

You can access the code and datasets for this example at the MGCodesandStats repository as indicated below.

您可以在MGCodesandStats存储库中访问此示例的代码和数据集,如下所示。

Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way.

免责声明:本文按“原样”撰写,不作任何担保。 它旨在提供数据科学概念的概述,并且不应以任何方式解释为专业建议。

翻译自: https://towardsdatascience.com/time-series-analysis-with-prophet-air-passenger-data-6f29c7989681

纪伯伦先知


http://www.taodudu.cc/news/show-6439855.html

相关文章:

  • 我曾七次鄙视自己的灵魂 -- 纪伯伦
  • 先知——纪伯伦(3)
  • 先知 ——纪伯伦
  • 先知——纪伯伦(2)
  • 二级域名备案
  • 一个不错的免费二级域名,可以自己解析A记录 CNAME等,而且是备案的域名。
  • 一个域名可以同时在多家服务器商备案吗?
  • 二级域名需要备案吗?
  • 一个网站一级域名已经备案,二级域名还要不要备案啊?
  • Python金融大数据分析-回归分析
  • python数据分析--金融数据处理
  • 金融行业数据模型
  • 数据分析08-金融领域数据分析
  • 数据分析在金融行业中的应用
  • 金融数据分析用哪些分析软件Python,R还是SQL?
  • 案例1:金融数据分析----code知识点详解版
  • 综合实例--波士顿房价数据集回归分析
  • 全国285个地级市平均房价数据(2000-2021年)
  • 基于房屋数据的房价相关性分析(含python代码)
  • 基于大数据的房价分析
  • boston数据集预测房价
  • Word/WPS中页码从指定的页面开始插入方法
  • 找不到文件Disk1000005.vmdk解决办法
  • 精简70%、内存不到1G,可以装在显卡上的Win11来了
  • imsdroid
  • Ventoy-一种更便捷的OS启动盘制作方法
  • 加速Eclipse使其成为超快的IDE
  • 【Linux】磁盘
  • 苹果cms vod_list修改
  • STM32 蓝牙平衡小车(一)硬件原理图

纪伯伦先知_先知的时间序列分析:航空乘客数据相关推荐

  1. 纪伯伦先知_先知能否准确预测网页浏览量?

    纪伯伦先知 Forecasting web page views can be quite tricky. The reason for this is that page views tend to ...

  2. 《先知·论婚姻》——纪伯伦

    <先知·论婚姻>--纪伯伦 爱尔美差再次说:大师,那么关于婚姻又是如何?  他回答道:  你们一同出生,而且永远相伴.  当死亡白色的羽翼掠过你们的生命时,你们将会合一.  是的,甚至在神 ...

  3. 我曾七次鄙视自己的灵魂 -- 纪伯伦

    对安逸的欲望扼杀了灵魂的激情,而它还在葬礼上咧嘴大笑 Seven times have I despised my soul: The first time when I saw her being ...

  4. 纪·哈·纪伯伦 《孩子》

    Kahlil Gibran On Children  纪伯伦-关于子女 Your children are not your children.   你的儿女,其实不是你的儿女. They are t ...

  5. 《我曾七次鄙视自己的灵魂》-纪伯伦

    Kahlil GibranSeven times have I despised my soul 我曾七次鄙视自己的灵魂 --  Kahlil Gibran   卡里·纪伯伦 The first ti ...

  6. 用Python Script做PS:PIL多图合并DIY桌面之纪伯伦小诗

    我的心曾经不爽八次 第一次,它告诉全世界它只做自己,但它更想取悦他人. 第二次,每个无关紧要的情绪,都会被它无限放大. 第三次,它以无私之名付出,得不到回报却比谁都气恼. 第四次,它总觉得自己可以做得 ...

  7. 《我的心曾悲伤七次》----卡里·纪伯伦

    <我的心曾悲伤七次> ----卡里·纪伯伦 Kahlil Gibran 第一次,当它本可进取时,却故作谦卑: 第二次,当它在空虚时,用爱欲来填充: 第三次,在困难和容易之间,它选择了容易: ...

  8. 《我的心曾悲伤七次》----卡里·纪伯伦 Kahlil Gibran

    <我的心曾悲伤七次> --卡里·纪伯伦 Kahlil Gibran 第一次,当它本可进取时,却故作谦卑: 第二次,当它在空虚时,用爱欲来填充: 第三次,在困难和容易之间,它选择了容易: 第 ...

  9. 卡里·纪伯伦 《我曾经七次鄙视自己的灵魂》

    第一次,当它本可进取时,却故作谦卑:  第二次,当它在空虚时,用爱欲来填充:  第三次,在困难和容易之间,它选择了容易:  第四次,它犯了错,却借由别人也会犯错来宽慰自己:  第五次,它自由软弱,却把 ...

最新文章

  1. 使用Apache Spark让MySQL查询速度提升10倍以上
  2. 活动 Web 页面人机识别验证的探索与实践
  3. Cluster - LB - haproxy
  4. Linux C代码实现读取配置文件示例
  5. javascript window.screen
  6. go语言项目目录文件的管理样式
  7. redis通过lua脚本实现分布式锁
  8. idea 搜索快捷键
  9. 通过matlab实现正交表
  10. 捷联惯导系统学习4.3(静基座误差)
  11. 柯尔莫哥洛夫拟合优度检验函数(Matlab实现)
  12. vs2008配置opencv2.4.0
  13. 如何获取大量廉价可靠代理IP地址?
  14. File.separator用法
  15. 语音识别入门第七节:语言模型(实战篇)
  16. 反斜杠加3个数字是什么编码?JAVA转义序列、转义字符汇总。
  17. Ichorbio/艾美捷 研究级阿达木单抗生物仿制药
  18. 自助式数据可视化BI工具的代表作云蛛系统的业务覆盖及客户
  19. Gartner 网络研讨会 “九问数字化转型” 会后感
  20. c# 小票打印机打条形码_C#打印条码的几种方式

热门文章

  1. stm32驱动 ov7670发送到串口上位机显示图像
  2. python奥运五环_Python绘制奥运五环
  3. 调试OpenGL ES应用程序
  4. Android绝对布局AbsoluteLayout
  5. Prizmo Pro for Mac(OCR图像文字识别工具)
  6. html中可编辑的表格控件,Editable DataGrid(可编辑表格)
  7. 仿QQ分组列表(UITableView)
  8. 注意:直播盒子接口采集器和直播抓包软件有木马
  9. 华为云大数据轻量级解决方案,让数据“慧”说话
  10. 模板————函数模板