arima模型 白噪声检验

White noise are variations in your data that cannot be explained by any regression model.

白噪声是数据中的变化,任何回归模型都无法解释。

And yet, there happens to be a statistical model for white noise. It goes like this for time series data:

然而,碰巧有一个白噪声统计模型。 时间序列数据如下所示:

The additive white noise model
加性白噪声模型

The observed value Y_i at time step i is the sum of the current level L_i and a random component N_i around the current level.

在时间步长i处的观测值Y_i是当前水平L_i与当前水平附近的随机分量N_i之和。

If the extent of random variation is proportional to the current level, then we have the following multiplicative version of the same model:

如果随机变化的程度与当前水平成正比,那么我们可以得到同一模型的以下乘性形式:

The multiplicative white noise model
乘法白噪声模型

If the current level L_i is constant for all i, i.e. L_i = L for all i, then the noise will be seen to fluctuate around a fixed level.

如果当前水平L_i对于所有i都是恒定的,即L_i = L对于所有i ,则将看到噪声围绕固定水平波动。

It’s easy to generate a white noise data set. Here’s how to do it in Excel:

生成白噪声数据集很容易。 这是在Excel中执行的方法:

How to generate an additive white noise data set in Excel
如何在Excel中生成加性白噪声数据集

And here is the output plot of noise that is fluctuating around a constant level of 100:

这是噪声的输出图,它在100的恒定水平附近波动:

Additive white noise around level=100
附加白噪声级= 100

The current level L_i often changes in response to real world factors. For example, if L_i changes linearly in response to a set of regression variables X, then we get the following linear regression model:

当前水平L_i经常响应于现实世界因素而改变。 例如,如果L_i响应一组回归变量X线性变化,那么我们得到以下线性回归模型:

Time series with regression variables plus noise
具有回归变量和噪声的时间序列

In the above equation, β is the vector of regression coefficients and X_i is a vector of regression variables.

在上式中, β是回归系数的向量, X_i是回归变量的向量。

为什么研究白噪声模型很重要? (Why is it important to study the white noise model?)

There are three reasons why:

原因有以下三个:

  1. If you discover using some techniques which I will describe soon, that your data is basically white noise around a fixed level, then the best that you can do is fit a model around that fixed level. It will be a waste of time to try to do anything better than that.如果您使用我将很快描述的一些技术发现,您的数据基本上是固定水平附近的白噪声,那么您可以做的最好的事情就是将模型固定在该水平附近。 尝试做任何比这更好的事情都是浪费时间。
  2. Suppose you have already fitted a regression model to a data set. If you are able to show that the residual errors of the fitted model are white noise, it means your model has done a great job of explaining the variance in the dependent variable. There is nothing left to extract in the way of information and whatever is left is noise. You can pat yourself on the back for a job well done!假设您已经对数据集拟合了回归模型。 如果您能够证明拟合模型的残留误差是白噪声,则表明您的模型在解释因变量的方差方面做得很好。 没有什么可以提取信息的方式了,剩下的就是噪音。 您可以轻拍自己的背,以完成出色的工作!
  3. Thirdly, the white noise model happens to be a stepping stone to another important and famous model in statistics called the Random Walk model which I will explain in the next section.第三,白噪声模型恰好是统计学中另一个重要且著名的模型(称为随机游走模型)的垫脚石,我将在下一部分中进行解释。

随机游走模型 (The Random Walk Model)

Let’s again look at the White Noise Model’s equation:

让我们再次看一下白噪声模型的方程:

If we make the level level L_i at time step i be the output value of the model from the previous time step (i-1), we get the Random Walk model, made famous in the popular literature by Burton Malkiel’s A Random Walk Down Wall Street.

如果我们将时间步长i处的水平L_i设为上一个时间步长(i-1)的模型的输出值, 则会得到随机游走模型,模型在伯顿·马尔基尔(Burton Malkiel)的《随机游走的墙壁》一书中广受欢迎街 。

The Random Walk Model
随机游走模型

The Random Walk model is like the mirage of the Data Science dessert. It has lured many profit-thirsty investors into betting (and losing) their shirt on illusions of trends in stock price movements, movements that were in reality little more than a random walk.

随机游走模型就像数据科学甜点的海市rage楼。 它已经吸引了许多渴求利润的投资者,将其押注(输掉)他们的衬衫,以幻想股价走势的错觉,实际上,这些走势只是随意走动而已。

Here’s a plot of data that was generated using the Random Walk model:

这是使用随机游走模型生成的数据图:

A Random Walk
随机漫步

Just tell me you don’t see any trends in this plot!

告诉我,您在该图中看不到任何趋势!

If you are not completely convinced that the above data can be generated by a purely random process, let’s puff away any remaining illusions by showing how to generate this data in Excel:

如果您不完全相信上面的数据可以通过纯随机过程生成,那么让我们通过展示如何在Excel中生成此数据来消除任何剩余的幻想:

How to generate Random Walk data in Excel
如何在Excel中生成随机游动数据

Let’s look at how we can make use of our knowledge of white noise and random walks to try to detect their presence in time series data.

让我们看看如何利用我们对白噪声和随机游走的知识来尝试检测时间序列数据中它们的存在。

如何在时间序列数据集中检测白噪声 (How to detect white noise in a time series data set)

We’ll look at 3 tests to determine whether your time series is in reality, just white noise:

我们将通过3种测试来确定您的时间序列是否真实,只是白噪声:

  1. Auto-correlation plots自相关图
  2. The Box-Pierce testBox-Pierce检验
  3. The Ljung-Box testLjung-Box测试

使用自相关图测试白噪声 (Testing for white noise using auto-correlation plots)

When two variables move up or down in unison (or if one value goes up, the other one goes down), they are said to be positively (or negatively) correlated. The correlation coefficient can be used to measure the degree of linear correlation between two such variables:

当两个变量一致地上下移动(或者一个值上升时,另一个变量下降)时,它们被认为是正相关的(或负相关的)。 相关系数可用于测量两个此类变量之间的线性相关程度:

X and XYY之间的线性相关

In the above formula, E(X) and E(Y) are the expected (i.e. mean) values of X and Y. σ_X and σ_Y are the standard deviations of X and Y.

在上式中, E( X )E( Y )XY的预期(即平均值)值。 σ_Xσ_YXY的标准偏差。

In time series data, correlations often exist between the current value and values that are 1 time step or more older than the current value, i.e. between Y_i and Y_(i-1), between Y_i and Y_(i-2) and so on. Stock price changes often show such patterns of positive and negative correlations (and beware, so do data containing random walks!).

在时间序列数据中,当前值和比当前值早1个时间步或更早的值之间通常存在相关性,即Y_iY_(i-1)之间, Y_iY_(i-2)之间等等。 。 股票价格变化通常显示出正相关和负相关的模式(请注意,包含随机游走的数据也是如此!)。

StockCharts.com under 使用条款,图表由terms of useStockCharts.com提供

Because the values are correlated with past versions of themselves, we call them auto, meaning self correlated.

由于这些值与自身的过去版本相关,因此我们将其称为“自动”,即自相关。

Here is the formula for calculating the auto-correlation coefficient between Y_i and Y_(i-k):

这是用于计算Y_iY_(ik)之间的自相关系数的公式:

Auto-correlation coefficient at lag k
滞后k的自相关系数

Before we can show how this auto-correlation coefficient r_k can be used to detect white noise, we need to take a short and pleasant side-trip into the land of random variables. I’ll explain why r_k is a normally distributed random variable and how this property of r_k can be used to detect white noise.

在我们展示如何使用该自相关系数r_k来检测白噪声之前,我们需要对随机变量进行短暂而愉快的旁通 。 我将解释为什么r_k是正态分布的随机变量,以及r_k的此属性如何用于检测白噪声。

LAG-k自相关系数 r_k的分布 (Distribution of the LAG-k auto-correlation coefficient r_k)

For any lag k, r_k is a normally distributed random variable with some mean µ_k and variance σ²_k.

任何滞后K,r_k是与一些均值μ_K和方差σ²_K正态分布的随机变量。

To understand why, consider this thought experiment:

要了解原因,请考虑以下思想实验:

  1. Take a time series data set containing 100,000 time points.取得包含100,000个时间点的时间序列数据集。
  2. Draw 5000 randomly selected samples from this data set. Suppose each sample is of length 100 continuous time points.

    从该数据集中抽取5000个随机选择的样本。 假设每个样本的长度为100个连续时间点。

  3. For each sample, calculate the LAG-1 auto-correlation coefficient r_1 using the above formula for r_k.

    对于每个样本,使用上述r_k公式计算LAG-1自相关系数r_1

  4. One can see that each time, r_1 will come out to be some value between 0 and 1 for each sample of 100 time points. So we end up with 5000 values of r_1, each a number between 0 and 1. Thus r_1 is a random variable for which we have measured 5000 values.

    可以看到,对于100个时间点的每个样本, r_1每次都会得出介于0和1之间的某个值。 因此,我们得到r_1的5000个值,每个值在0到1之间。因此r_1是一个随机变量,我们已经为它测量了5000个值。

  5. By appealing to the Limit Theorems of statistics, it can be shown r_1 is a normally distributed random variable, and the distribution of r_1 is centered at some population mean, we’ll call it µ_1, and some variance, we’ll call it σ²_1. In practice, the observed mean and variance of r_1 will be somewhere close to the mean of the 5000 values of r_1 which we measured.

    通过利用统计的极限定理, 可以证明r_1是正态分布的随机变量,并且r_1的分布以某个总体平均值为中心,我们将其称为µ_1,将某些方差称为σ²_1 。 实际上,观察到的r_1的均值和方差将接近我们测量的r_1的5000个值的均值。

  6. By repeating the above experiment for all lags k, it can be shown that auto-correlation coefficients for all lags are normally distributed random variables with mean µ_k and variance σ²_k.

    通过对所有滞后k重复上述实验,可以证明所有滞后的自相关系数都是均值μ_k和方差σ²_k的 正态分布随机变量

Symbolically:

象征性地:

For all lags k, r_k is a normally distributed random variable
对于所有滞后k,r_k是正态分布的随机变量

检测白噪声的含义 (Implications for detecting white noise)

If the time series is white noise, then in theory, its current value T_i ought not be correlated at all with past values T_(i-1), T_(i-2) etc, and the corresponding auto-correlation coefficients r_1, r_2,…etc. will be zero or close to zero.

如果时间序列是白噪声,那么从理论上讲,它的当前值T_i根本不应该与过去的值T_(i-1),T_(i-2)等以及相应的自相关系数r_1,r_2相关,…等将为零或接近零。

i.e.when the time series is white noise, r_k is 0 for all k = 1, 2, 3,…

即,当时间序列是白噪声时,对于所有k = 1、2、3 ...r_k为0

But we have just seen that r_k is a N(µ_k, σ²_k) random variable.

但是我们刚刚看到r_k是一个N(µ_k,σ²_k)随机变量。

Putting the above two facts together, we arrive at the following first important implication:

综合以上两个事实,我们得出以下第一个重要含义:

If the time series is white noise, then the auto-correlation coefficient r_k for all lags k will have a zero mean and some variance σ²_k.

如果时间序列是白噪声,则所有滞后k的自相关系数r_k将具有零均值和一些方差σ²_k。

Symbolically:

象征性地:

For all lags k, r_k has zero mean under white noise conditions
对于所有滞后k,在白噪声条件下r_k的均值为零

But what about the variance σ²_k of the coefficients r_k?

但是关于系数r_k的方差σ²_k什么?

Anderson, Bartlett and Quenouille have shown that under white noise conditions, the standard deviation σ_k is as follows:

Anderson , Bartlett和Quenouille证明,在白噪声条件下,标准偏差σ_k如下:

σ_k = 1/sqrt(n)

σ_k= 1 /平方根(n)

Where n is the same size. Recollect that in our thought experiment, n was 100.

其中n是相同的大小。 回忆一下我们的思想实验中, n为100。

Thus, we know that r_k under white noise conditions has the following distribution:

因此,我们知道白噪声条件下的r_k具有以下分布:

Distribution of auto-correlation coefficients when the data set is pure white noise
数据集为纯白噪声时自相关系数的分布

An important property of the normal distribution is that approximately 95% of it lies within 1.96 standard deviations from the mean. In our case, the mean is 0 and standard deviation is 1/sqrt(n), so we get the following 95% confidence interval for the auto-correlation coefficients:

正态分布的一个重要属性是大约95%的分布在均值的1.96标准偏差之内。 在我们的情况下,平均值为0,标准偏差为1 / sqrt(n) ,因此对于自相关系数,我们得到以下95%的置信区间:

These results yield the following procedure for conducting the white noise test using the auto-correlation coefficients r_k:

这些结果得出以下使用自相关系数r_k进行白噪声测试的过程

  1. Calculate the first k auto-correlation coefficients r_k. k can be set to some high enough value depending on the length n of the time series data set.

    计算前k个自相关系数r_k 。 可以将k设置为足够高的值,具体取决于时间序列数据集的长度n

  2. Calculate the 95% confidence interval [ — 1.96/sqrt(n), +1.96/sqrt(n)].

    计算95%置信区间[-1.96 / sqrt(n),+ 1.96 / sqrt(n)]。

  3. If for all k, if r_k lies within the above confidence interval, conclude at a 95% confidence level that the time series is in reality, possibly just white noise. We say possibly because if we experiment with larger sample sizes, i.e. larger n, the size of the confidence interval will shrink, and values of r_k that were previously inside the 95% bounds will now lie outside the 95% bounds.

    如果对于所有k ,如果r_k都在上述置信区间内,则以95%的置信度推断该时间序列实际上是现实的, 可能只是白噪声。 我们之所以说是可能的,是因为如果我们尝试使用更大的样本量(即更大的n) ,则置信区间的大小将缩小,并且先前在95%范围内的r_k值现在将在95%范围之外。

  4. If any of the r_k lie outside the confidence interval, then the time series possibly has information in it.

    如果r_k中的任何一个位于置信区间之外,则时间序列中可能包含信息。

示例:使用Python检测白噪声 (Example: White noise detection using Python)

Let’s illustrate the above procedure using a real world time series of 5000 decibel level measurements taken at a restaurant using the Google Science Journal app.

让我们通过使用Google Science Journal应用程序在餐厅进行的5000分贝水平的真实世界时间序列说明上述过程。

The data set can be downloaded from here.

数据集可从此处下载。

We’ll use the pandas library to load the data set from the csv file and plot it:

我们将使用pandas库从csv文件加载数据集并进行绘制:

import pandas as pdimport numpy as npfrom matplotlib import pyplot as pltdf = pd.read_csv('restaurant_decibel_level.csv', header=0, index_col=[0])

Let’s print the top 10 rows:

让我们打印前十行:

df.head(10)           DecibelTimeIndex0          55.93132340         57.77926080         62.956952140        65.158100180        60.325242220        45.411725262        55.958807300        62.021807340        62.222563380        56.156684

Let’s plot all 5000 values in the series:

让我们绘制该系列中的所有5000个值:

Decibel level at a restaurant
餐厅的分贝级别

Let’s fetch and plot the auto-correlation coefficients for the first 40 lags. We’ll the statsmodels library to do that.

让我们获取并绘制前40个滞后的自相关系数。 我们将使用statsmodels库来执行此操作。

import statsmodels.graphics.tsaplots as tsatsa.plot_acf(df['Decibel'], lags=40, alpha=0.05, title='Auto-correlation coefficients for lags 1 through 40')

The alpha=0.05 tells statsmodels to also plot the 95% confidence interval region. We get the following plot:

alpha = 0.05指示statsmodels也绘制95%置信区间区域。 我们得到以下图:

Auto-correlation plot for the decibel level time series
分贝级时间序列的自相关图

As we can see, the time series contains significant auto-correlations up through lags 17. Incidentally, the auto-correlation at lag 0 is always 1.0 as a value is always perfectly correlated with itself.

正如我们所看到的,时间序列在滞后17之前包含大量的自相关。顺便说一下,滞后0处的自相关始终为1.0,因为值始终与自身完全相关。

There is wave-like pattern in the auto-correlation plot that indicates that there could be some seasonality contained in the data. We can try to identify and isolate the seasonality by decomposing the time series into the trend, seasonality and noise components.

自相关图上有一个波状图案,表明数据中可能包含一些季节性。 我们可以尝试通过将时间序列分解为趋势,季节性和噪声成分来识别和隔离季节性。

Related read: What is time series decomposition and how does it work

For now we’ll focus on the noise portion. The bottom line is that this time series, in its current form, does not appear to be pure white noise.

现在,我们将集中讨论噪声部分。 最重要的是,此时间序列以其当前形式似乎不是纯白噪声。

Next, we’ll two more tests on the time series to confirm this.

接下来,我们将在时间序列上再进行两次测试以确认这一点。

卡方检验用于白噪声检测 (The Chi-squared test for white noise detection)

The Chi-squared test is based on this powerful result in statistics: the sum of squares of k identical standard normal random variables is a Chi-squared distributed random variable with k degrees of freedom.

卡方检验基于此强大的统计结果: k个相同的标准正态随机变量的平方和是具有k个自由度的卡方分布随机变量。

Wikimedia under CC BY 3.0下的CC BY 3.0Wikimedia

The actual test is called Box-Pierce test and it’s test statistic is called the Q statistic. Its formula is as follows:

实际测试称为Box-Pierce测试,其测试统计量称为Q统计量。 其公式如下:

Box-Pierce test statistic
Box-Pierce检验统计量

It can be shown that if the underlying data set is white noise, the expected value of the Q statistic is zero.

可以证明,如果基础数据集是白噪声,则Q统计量的期望值为零。

For any given time series, one can check if the value of Q deviates from zero in a statistically significant way looking up the p-value of the test statistic in the Chi-square tables for k degrees of freedom. Usually, a p-value of less than 0.05 indicates a significant auto-correlation that cannot be attributed to chance.

对于任何给定的时间序列,可以检查Q值是否以统计学上显着的方式偏离零,从而在卡方表中针对k个自由度查找测试统计量的p值。 通常,小于0.05的p值表示无法归因于偶然性的显着自相关。

Ljung-Box测试以检测白噪声 (The Ljung-Box test for white noise detection)

The Ljung-Box test improves upon the Box-Pierce test to obtain a test statistic having a distribution that is closer to the Chi-square distribution than the Q statistic. The test statistic of the Ljung-Box test is calculated as follows, and it is also Chi-square(k) distributed:

Ljung-Box检验在Box-Pierce检验的基础上进行了改进,从而获得了一个检验统计量,其分布比Q统计量更接近卡方分布。 Ljung-Box检验的检验统计量计算如下,并且也是卡方(k)分布:

Ljung-Box test statistic
Ljung-Box测试统计

Here, n is the number of data points in the time series and k is the number of time lags to be considered. As with the Box-Pierce test, if the underlying data set is white noise, the expected value of this Chi-square distributed random variable is zero. Again, a p-value of less than 0.05 indicates a significant auto-correlation that cannot be attributed to chance.

此处, n是时间序列中的数据点数,而k是要考虑的时间延迟数。 与Box-Pierce检验一样,如果基础数据集是白噪声,则此卡方分布随机变量的期望值为零。 再次,小于0.05的p值表示显着的自相关,不能将其归因于机会。

示例:使用Python中的Ljung-Box测试测试白噪声 (Example: Testing for white noise using the Ljung-Box test in Python)

Let’s run the Ljung-Box test on the restaurant decibel level data set. We will test upto 40 lags and we’ll ask the test to also run the Box-Pierce test.

让我们在餐厅分贝级别的数据集上运行Ljung-Box测试。 我们将测试多达40个延迟,然后要求该测试也运行Box-Pierce测试。

import statsmodels.stats.diagnostic as diagdiag.acorr_ljungbox(df['Decibel'], lags=[40], boxpierce=True, model_df=0, period=None, return_df=None)

We get the following output:

我们得到以下输出:

(array([13172.80554476]), array([0.]), array([13156.42074648]), array([0.]))

The value 13172.80554476 is the value of the test statistic for the Ljung-Box test and 0.0 is its p-value as per the Chi-square(k=40) table.

根据卡方(k = 40)表,值13172.80554476是Ljung-Box测试的测试统计值,而0.0是其p值。

The value 13156.42074648 is the test statistic of the Box-Pierce test and 0.0 is its p-value as per the Chi-square(k=40) tables.

13156.42074648是Box-Pierce检验的检验统计量,0.0是按照卡方(k = 40)表的p值。

As we can see, both p-values are less than 0.01 and so we can say with 99% confidence that the restaurant decibel level time series is not pure white noise.

如我们所见,两个p值均小于0.01,因此我们可以有99%的把握说餐厅分贝级时间序列不是纯白噪声。

Earlier on, we introduced Random Walks as a special case of the White Noise model and pointed out how easy it is to mistake them for a pattern or trend that can be predicted.

早些时候,我们引入了随机游走作为白噪声模型的特例,并指出将它们误认为可预测的模式或趋势是多么容易。

We’ll look at how to avoid making this mistake by applying a technique that will bring out the true random nature of the Random Walk.

我们将研究如何通过应用一种能够展现出随机游走的真正随机性的技术来避免犯此错误。

检测随机游走 (Detecting Random Walks)

Random walks are often highly correlated. In fact, they are auto-correlated white noise!

随机游走通常是高度相关的。 实际上,它们是自动相关的白噪声!

The white noise detection tests presented above will latch on these auto-correlations, causing them to conclude that the time series is not white noise.

上面介绍的白噪声检测测试将锁定这些自相关,使他们得出时间序列不是白噪声的结论。

The remedy is to take the first difference of the time series that is suspected to be a random walk, and run the white noise tests on the differenced series.

补救措施是采取怀疑是随机游走的时间序列的第一个差异,并对差异序列进行白噪声测试。

If the original time series is a random walk, its first difference is pure white noise.

如果原始时间序列是随机游走,则其第一个差异是纯白噪声。

Let’s illustrate this:

让我们说明一下:

We’ll start by loading a data set that is suspected to be a random walk. The data set can be downloaded from here.

我们将从加载怀疑是随机游走的数据集开始。 数据集可从此处下载。

df = pd.read_csv('random_walk.csv', header=0, index_col=[0])

Let’s plot it to see how it looks like:

让我们对其进行绘图以查看其外观:

df.plot()plt.show()

Let’s run the Ljung-Box white noise test on this data:

让我们对这些数据运行Ljung-Box白噪声测试:

diag.acorr_ljungbox(df['Y_i'], lags=[40], boxpierce=True)

We get the following result:

我们得到以下结果:

(array([393833.91252517]), array([0.]), array([392952.07675659]), array([0.]))

The p value of 0.0 indicates that we must strongly reject the null hypothesis that the data is white noise. Both Ljung-Box and Box-Pierce tests think that this data set has not been generated by a pure random process.

p值为0.0表示我们必须强烈拒绝数据为白噪声的零假设。 Ljung-Box和Box-Pierce测试都认为此数据集 不是 由纯随机过程生成的。

This is obviously a false result.

这显然是错误的结果。

Let’s see if things change after we take the first difference of the data, i.e. we create a new data set with Y = Y_i —Y_(i-1) :

让我们看看在获取数据的第一个差异之后情况是否发生了变化,即我们创建了一个新的数据集,其中Y = Y_i —Y_(i-1)

diff_Y_i = df['Y_i'].diff()#drop the NAN in the first rowdiff_Y_i = diff_Y_i.dropna()

Let’s plot the diff-ed data set:

让我们绘制差异数据集:

diff_Y_i.plot()plt.show()

We now see a very different picture:

现在,我们看到了非常不同的图片:

The differenced data set
差异数据集

Here is the zoomed in view:

这是放大的视图:

zoomed in view of the differenced data set
放大查看差异数据集

Let’s run the Ljung-Box test on the differenced data set:

让我们对不同的数据集运行Ljung-Box测试:

diag.acorr_ljungbox(diff_Y_i, lags=[40], boxpierce=True)

We get the following output:

我们得到以下输出:

(array([32.93405364]), array([0.77822417]), array([32.85051846]), array([0.78137548]))

Notice that this time the test statistic’s value 32.934 reported by Ljung-Box, and 32.850 reported by Box-Pierce tests is much smaller. And the corresponding p-values detected on the Chi-square(k=40) tables are 0.778 and 0.781 respectively, which are well above 0.05. This is easily enough to support the null hypothesis that the data (i.e. the differenced time series) is pure white noise.

请注意,这一次的检验统计量的值32.934报道Ljung的盒,以及由32.850箱皮尔斯测试报告的要小得多。 卡方(k = 40)表上检测到的相应p值分别0.7780.781 ,远高于0.05。 这足够容易地支持零假设,即数据(即时间序列不同)是纯白噪声。

The conclusion to be drawn from this exercise is that one should not fit anything except the White Noise model on this data.

从该练习中得出的结论是,除此数据上的白噪声模型外,其他任何条件都不适合。

摘要 (Summary)

  • The white noise model can be used to represent the nature of noise in a data set.白噪声模型可用于表示数据集中噪声的性质。
  • Testing for white noise is one of the first things that a data scientist should do so as to avoid spending time on fitting models on data sets that offer no meaningfully extract-able information.测试白噪声是数据科学家应该做的第一件事,以避免花时间在不提供有意义的可提取信息的数据集的拟合模型上。
  • If a data set is not white noise, then after fitting a model to the data, one should run a white noise test on the residual errors to get a sense for how much information the model has been able to extract from the data.如果数据集不是白噪声,则在将模型拟合到数据之后,应该对残差进行白噪声测试,以了解模型能够从数据中提取多少信息。
  • For time series data, auto-correlation plots and the Ljung-Box test offer two useful techniques for determining if the time series is in reality, just white noise.对于时间序列数据,自相关图和Ljung-Box测试提供了两种有用的技术来确定时间序列是否真实,只是白噪声。

参考,引用和版权 (References, Citations and Copyrights)

Data set of restaurant decibel levels is Copyright Sachin Date under CC-BY-NC-SA.

餐厅分贝级别的数据集为CC-BY-NC-SA下的版权Sachin日期 。

Amgen stock price chart is from stockcharts.com under these terms of use.

根据这些使用条款, Amgen股票价格图表来自stockcharts.com 。

Paper link: Anderson, R. L., Distribution of the Serial Correlation Coefficient, Annals of Mathematical Statistics, Volume 13, Number 1 (1942), 1–13.

论文链接:Anderson,RL, 串行相关系数的分布 ,《数学统计年鉴》,第13卷,第1期(1942),1-13。

Paper link: Bartlett, M. S., On the Theoretical Specification and Sampling Properties of Autocorrelated Time-Series, Supplement to the Journal of the Royal Statistical Society, Vol. 8, №1 (1946), pp. 27–41.

论文链接:Bartlett,MS, 《自相关时间序列的理论规范和采样特性》,《皇家统计学会杂志》增刊 ,第1卷。 8,№1(1946),第27-41页。

Paper link: Quenouille, M. H., The Joint Distribution of Serial Correlation Coefficients, The Annals of Mathematical Statistics, Vol. 20, №4 (Dec., 1949), pp. 561–571

论文链接:Quenouille,MH, 《序列相关系数的联合分布》 ,《数学统计年鉴》,第1卷。 20,№4(1949年12月),第561–571页

Book link: Hyndman, R. J., Athanasopoulos, G., Forecasting: Principles and Practice, OTexts

图书链接:Hyndman,RJ,Athanasopoulos,G。,《 预测:原理与实践》 ,OTexts

All images in this article are copyright Sachin Date under CC-BY-NC-SA, unless a different source and copyright are mentioned underneath the image.

本文中的所有图像均为CC-BY-NC-SA下的版权Sachin Date ,除非在图像下方提及其他来源和版权。

Thanks for reading! If you liked this article, please follow me to receive tips, how-tos and programming advice on regression and time series analysis.

谢谢阅读! 如果您喜欢本文,请 关注我 以获取有关回归和时间序列分析的提示,操作方法和编程建议。

翻译自: https://towardsdatascience.com/the-white-noise-model-1388dbd0a7d

arima模型 白噪声检验


http://www.taodudu.cc/news/show-2880903.html

相关文章:

  • matlab5.白噪声检验
  • 用python实现时间序列白噪声检验
  • python白噪声检验结果查询_使用python实现时间序列白噪声检验方式
  • 平稳性检验和白噪声检验
  • 白噪音及其检验
  • 时间序列平稳性分析和白噪声检验
  • iOS底层系统:虚拟内存
  • iOS 内存五大区讲解
  • iOS ARKit
  • ios文件app访问samba服务器,ios链接samba服务器
  • 【腾讯TMQ】iOS逻辑自动化测试实践
  • 服务器虚拟化及云计算基础实验
  • iOS app的启动优化
  • iOS虚拟支付被封,6个技巧帮你快速解决烦恼
  • ios使用lua详解
  • 自定义Unity在iOS平台上的虚拟键盘
  • [facebook-wda]搭建iOS App自动化环境
  • Mac虚拟机实现ios UI自动化教程-最新版本(MacOS 12.1,ios15.1)
  • 网络虚拟化—概念
  • 虚拟化 Hypervisor
  • VMWare虚拟OSX系统搭建ios、iphone开发环境并成功运行模拟器(2016)
  • APP加密,ios代码混淆工具,虚拟化技术 适用于移动应用程序的虚拟化加密软件
  • 我的七条人生哲理以及个人学习方法总结
  • 今天遇到一个报错信息
  • List<Map,Object>>怎样取出map集合中的某一个的key值?
  • npm下载源问题
  • IDEA 2019.1离线安装lombok
  • 两个Listlt;Mapgt;快速合并
  • List 常用的 Lambda 操作
  • Java的Lambda表达式实例

arima模型 白噪声检验_白噪声模型相关推荐

  1. python时间序列模型有哪些_时间序列模型(ARIMA)

    时间序列简介 时间序列 是指将同一统计指标的数值按其先后发生的时间顺序排列而成的数列.时间序列分析的主要目的是根据已有的历史数据对未来进行预测. 常用的时间序列模型 常用的时间序列模型有四种:自回归模 ...

  2. python白噪声检验_时间序列 平稳性检验 白噪声 峰度 偏度

    时间序列 简而言之,时间序列就是带时间戳的数值序列.股票,期货等金融数据就是典型的时间序列.量化的过程,很多时间都是在分析时间序列,找到稳定赚钱因子. 平稳性定义 所谓时间序列的平稳性,是指时间序列的 ...

  3. 机器学习模型 知乎_机器学习-模型选择与评价

    交叉验证 首先选择模型最简单的方法就是,利用每一种机器学习算法(逻辑回归.SVM.线性回归等)计算训练集的损失值,然后选择其中损失值最小的模型,但是这样是不合理的,因为当训练集不够.特征过多时容易过拟 ...

  4. kmeans及模型评估指标_机器学习模型评估指标总结

    常用机器学习算法包括分类.回归.聚类等几大类型,以下针对不同模型总结其评估指标 一.分类模型 常见的分类模型包括:逻辑回归.决策树.朴素贝叶斯.SVM.神经网络等,模型评估指标包括以下几种: (1)二 ...

  5. python稳健性检验_风控模型6大核心指标(附代码)

    欢迎各位同学学习python金融风控评分卡模型和数据分析微专业课 在我们开发完信用分模型后,经常需要计算如下的一些指标:●      区分度的指标:○      AUC○      KS○       ...

  6. 灰色模型 java代码_灰色模型的简单Java实现

    前几天在以前的遗留代码中发现一个问题,就是我生成的一个数据的走势曲线的预测值(用于灰色时间序列预测)总是和老代码里的不一致,具体来说就是:遗留代码里面的预测值的斜率总是为零,相比之下我生成的就比较合理 ...

  7. lda主题模型困惑度_主题模型(三):LDA主题个数选择

    在上一篇文章的最后,我们生成了15个模型(主题数分别从1到15),然鹅,问题来了,到底多少个主题,才是最好的主题模型呢?到底有没有可以评价一个模型好坏的标准呢?答案肯定是有的,而且还不止一个呢! 先说 ...

  8. 参数等效模型可以用于_缩小模型验证之统计学方法

    工艺表征(Process Characterization,PC)和工艺验证(Process Validation,PV)是药物开发的临床实验后期需要进行的很重要的工作.PC的主要目的是系统的鉴定和评 ...

  9. 风控策略和模型的区别_风控模型之产品赢利分析与策略优化

    欢迎加入全国风控微信群组:免费加入,详情可添加管理Vivian:wmyd80回复微信群组 之前我们说过,产品上线后并不是一成不变的,而是需要后续赢利分析数据表现不断调整,其风控模型策略也是不断优化的. ...

最新文章

  1. CUDA 8混合精度编程
  2. pymssql 安装后只有 .disk-info_变频空调安装注意事项
  3. pythonselect a valied_python 11期 第五天
  4. 【C 语言】变量本质 ( 变量概念 | 变量本质 - 内存空间别名 | 变量存储位置 - 代码区 | 变量三要素 )
  5. 关于appstore多语言版本,不可不看!
  6. MFC最小程序(不使用应用程序向导)
  7. vim常用操作技巧与配置
  8. 转载文章:Microsoft 将僵尸网络威胁智能分析程序引入云中以提供近实时数据
  9. shell命令行快捷键
  10. c语言界面飞机图形代码,求个用最简单的的代码来实现图形界面…
  11. 推荐: SQL Server Management Express Edition插件
  12. java web 导出excel_javaweb导出excel表格
  13. python中import与input_Python import与from import使用及区别介绍
  14. python中列表的使用
  15. NYOJ234-吃土豆(双层DP)
  16. Self-Attention with Relative Position Representations(2018)
  17. bootstrap 垂直居中 布局_Bootstrap 4 栅格系统垂直布局
  18. 微信小程序picker组件
  19. 失去黄金时代的趣店,要走多久才能成功转型?
  20. IDC机房有哪些设备?如何组建中小企业IDC机房?

热门文章

  1. python水印检测_使用Python检测照片中的特定水印(无SciPy)
  2. vue elementUI实现消息通知功能
  3. 2022年中国汽车事件数据记录器(EDR)市场现状研究分析与发展前景预测报告
  4. LaTeX积分符号汇总
  5. mysql绘制er图菱形表示什么意思_ER图中菱形表示的是()
  6. 这三款软件让你实现PDF批量转图片
  7. Java Web GIS 地理信息系统开发
  8. 一个中国方案的落地:马云的eWTP,如何让马来西亚第一个all in
  9. 编辑视频贴纸软件_视频贴纸软件介绍
  10. 20220211关于TL-WDN6200(RTL8812AU)在ubuntu20.04.3下安装驱动程序的历险记