詹森不等式

背景 (Background)

In Kaggle’s M5 Forecasting — Accuracy competition, the square root transformation ruined many of my team’s forecasts and led to a selective patching effort in the eleventh hour. Although it turned out well, we were reminded that “reconstitution bias” can plague predictions on the original scale, even with common transformations such as the square root.

在Kaggle的“ M5预测-准确性”竞赛中，平方根转换破坏了我团队的许多预测，并在第11小时进行了选择性修补工作。尽管结果很好，但我们仍被提醒，“重构偏见”会困扰原始规模的预测，即使采用平方根之类的常见转换也是如此。

平方根变换 (The square root transformation)

For Poisson data, the rationale of the square root is that it is a variance-stabilizing transformation; in theory, the square root of the values are distributed approximately normal with constant variance and a mean that is the square root of the original mean. It is an approximation, and as Wikipedia puts it, one in which the “convergence to normality (as [the original mean] increases) is far faster than the untransformed variable.”

对于Poisson数据，平方根的基本原理是它是方差稳定的变换；从理论上讲，值的平方根近似分布，且具有恒定方差，且均值是原始均值的平方根。正如Wikipedia所说，这是一种近似，其中“ 归一化的收敛性(随着(原始均值)的增加)比未转换的变量快得多。 ”

Imagine you decide to take square roots in a count data scenario, feeling good reassured that the convergence to normality is “fast.” You then model the mean of square-root transformed data and then get predictions on the square root scale. At some point, especially in a forecasting scenario, you’ll have to get back to the original scale. That probably entails squaring the model-estimated means. The M5 competition served as a reminder that this approach can and will break down.

想象一下，您决定在计数数据方案中求平方根，并确信向正态的收敛是“快速的”。然后，您可以对平方根转换后的数据的均值建模，然后获得平方根尺度的预测。在某些时候，尤其是在预测情况下，您必须回到原始比例。这可能需要对模型估计的均方进行平方。 M5竞赛提醒我们，这种方法可能并且将会失败。

詹森差距 (The Jensen Gap)

Jensen’s Inequality states that for convex functions, the function evaluated at the expectation is less than or equal to the expectation of the function, i.e., g(E[Y]) ≤ E[g(Y)]. The inequality is flipped for concave functions.

Jensen不等式指出，对于凸函数，按期望评估的函数小于或等于该函数的期望，即g(E [Y])≤E [g(Y)]。对于凹函数，不等式被翻转。

Similarly, the Jensen Gap is defined as the difference E[g(Y)]-g(E[Y]), which is positive for convex functions g. (As an aside, notice that when g(x) is the square function, the Jensen Gap is the Variance of Y, which had better be non-negative!)

类似地， 詹森差距定义为差E [ g ( Y )]- g (E [ Y ])，对于凸函数g为正。 (顺便说一句，请注意，当g ( x )是平方函数时，Jensen Gap 是 Y的方差，最好是非负的！)

When considering g(x) as the square function and the square root of Y as the random variable, the Jensen Gap becomes E[Y]-E[sqrt(Y)]². Since that quantity is positive, our reconstituted mean will be biased downward. To learn more about the magnitude of the gap, we turn to the Taylor expansion.

当将g ( x )作为平方函数并将Y的 平方根作为随机变量时，Jensen Gap变为E [ Y ] -E [sqrt( Y )]²。由于该数量为正，因此我们重构的均值将向下偏向。要了解有关差距大小的更多信息，我们转向泰勒展开。

泰勒展开至近似偏差 (Taylor expansion to approximate bias)

To the Mathematics StackExchange prompt “Expected Value of Square Root of Poisson Random Variable,” contributor Hernan Gonzalez explains the Taylor expansion of a random variable about its mean, as shown in the screenshot below.

在数学StackExchange提示“ 泊松随机变量平方根的期望值 ”中，贡献者Hernan Gonzalez解释了随机变量的泰勒展开式及其均值，如下面的屏幕快照所示。

Note that the expansion needs at least a few central moments of the original distribution. For the Poisson, the first three are just the mean parameter.

请注意，展开至少需要原始分布的几个中心时刻。对于泊松而言，前三个只是均值参数。

Ignoring that the mean estimator is also a random variable, we can run the expectation above through the inverse transformation, i.e., square it, to get an idea of the bias on the original scale for any Poisson mean value (the algebra isn’t here but it’s computed in line 34 of the demonstration code.) Similarly, with properties of the square root of the random variable, it’s straightforward to analyze g(x) = x ^2 in the same way. That opens up the possibility of bias correction, an interesting proposition, albeit one with assumptions and complexities of its own.

忽略均值估计器也是一个随机变量，我们可以通过逆变换在上面运行期望值，即对它求平方，以了解任何泊松均值在原始比例上的偏差(代数不在此处但是，它是在演示代码的第34行中计算出来的。)类似地，由于具有随机变量的平方根的属性，因此以相同的方式分析g (x)= x ^ 2很简单。这开辟了偏差校正的可能性，这是一个有趣的主张，尽管它有其自身的假设和复杂性。

近似分解 (Approximation breakdown)

Near the end of his answer, Gonzalez mentions that the approximation “is only useful if” the mean of the original Poisson is quite a bit bigger than 1, clarifying in the comments that this is needed so that “the terms of the sum decrease quickly.” That follows from the mean being raised to negative powers after the original term.

冈萨雷斯在回答接近尾声时提到，“ 仅当 ”原始泊松的均值比1大很多时，近似值“ 才有用 ”，并在注释中阐明了这一点是必要的，以便“ 总和的项Swift减少”。。 ”这是因为原任期之后，均值被提升为负数。

In the M5 competition, mean sales for many items were substantially below one, and thus using the square root transformation was a recipe for poor performance. To get an idea of how this plays out in an actual sample, the next section will investigate this phenomenon via simulation.

在M5竞赛中，许多商品的平均销售额都大大低于1，因此使用平方根变换是降低性能的良方。为了了解这种情况在实际样本中如何发挥作用，下一部分将通过仿真研究这种现象。

示范 (Demonstration)

In this section, we use the loess smoother to create models on both the original scale and the square root scale, and square the mean estimates of the latter. For simulated Poisson data with both a mean of 20 and a mean of 0.2, we plot the two sets of predictions and examine the bias. The code is under 50 lines and is available in Nousot’s Public Github repository.

在本节中，我们将使用黄土平滑器在原始比例和平方根比例上创建模型，并对后者的均值进行平方。对于均值为20和均值为0.2的模拟Poisson数据，我们绘制了两组预测并检查了偏差。该代码少于50行，可在Nousot的Public Github存储库中找到。

当平均值是20 (When the mean is 20)

For the case where the mean of the Poisson random variable is 20, the retransformation bias is negative (as Jensen’s Inequality said it would be), but also relatively small. In the code, the first two terms of the Taylor expansion are computed and compared to the empirical bias on the square root scale. At -0.027 and -0.023, respectively, they are relatively close.

对于泊松随机变量的平均值为20的情况，重变换偏差为负(就像詹森的不等式所说的那样)，但也相对较小。在代码中，计算出泰勒展开的前两个项，并将其与平方根尺度上的经验偏差进行比较。它们分别为-0.027和-0.023，相对接近。

当平均值为0.20时 (When the mean is 0.20)

For the case where the mean of the Poisson random variable is 0.20, the picture is much different. While Jensen’s Inequality always holds, the Jensen Gap is now large in a relative sense. Furthermore, the Taylor approximation has completely broken down, with the first two bias terms summing to 0.419 while the empirical bias is -.251 (still on the square root scale).

对于泊松随机变量的平均值为0.20的情况，图片有很大不同。尽管詹森的不平等现象始终存在，但詹森差距现在相对来说还是很大的。此外，泰勒近似已完全分解，前两个偏差项的总和为0.419，而经验偏差为-.251(仍在平方根刻度上)。

讨论区 (Discussion)

David Warton’s 2018 paper “Why You Cannot Transform Your Way Out of Trouble for Small Counts” demonstrates the hopelessness of getting to the standard assumptions for small-mean count data. For the sparse time series in M5, there was nothing to gain and a lot to lose by taking the square root. At the very least, we should have treated those series differently. (Regarding our use of the Kalman Filter, Otto Seiskari’s advice to tune via cross-validation when the model is misspecified is especially compelling).

戴维·沃顿(David Warton)在2018年发表的论文“ 为什么小数位数无法摆脱麻烦 ”，这说明了达到小数位数数据的标准假设的绝望。对于M5中稀疏的时间序列，通过求平方根没有任何收益，也有很多损失。至少，我们应该对这些系列进行不同的处理。 (关于我们对卡尔曼滤波器的使用，当模型指定不正确时， Otto Seiskari的建议通过交叉验证进行调谐特别引人注目)。

Warton’s paper has some harsh words for users of transformations in general. I still believe that if a transformation brings you closer to the standard assumptions, where your code runs faster and you enjoy nicer properties, then it’s worth considering. But there needs to be an honest exploration of properties of the transformation in the context of the data, and this does not come for free.

一般而言，沃顿的论文对转换的使用者来说有些苛刻的话。我仍然相信，如果转换使您更接近标准假设，即代码运行速度更快并且享受更好的属性，因此值得考虑。但是需要在数据的上下文中诚实地探索转换的属性，而这并不是免费的。

Typically transformations (and their inverses) are either convex or concave, and thus Jensen’s Inequality will guarantee bias in the form of a Jensen Gap. If you’re wondering why you’ve never heard of it, it’s because it’s often written off as approximation error. According to Gao et al (2018),

通常，变换(及其逆变换)是凸的或凹的，因此Jensen的不等式将保证以Jensen Gap的形式出现偏差。如果您想知道为什么从未听说过它，那是因为它经常被记为近似误差。根据Gao等人(2018) ，

“Computing a hard-to-compute [expectation of a function] appears in theoretical estimates in a variety of scenarios from statistical mechanics to machine learning theory. A common approach to tackle this problem is to … show that the error, i.e., the Jensen gap, would be small enough for the application.”

从统计力学到机器学习理论，在各种情况下的理论估计中都出现了计算难以计算的[函数期望]。 解决此问题的常用方法是……表明误差(即詹森间隙)对于应用程序而言足够小。”

When using transformations, the work to understand the properties of inverse-transformation (in the context of the data) is worth it. It’s dangerous out there. Watch your step, and mind the Jensen Gap!

使用转换时，了解逆转换属性(在数据上下文中)的工作是值得的。那里很危险。注意您的脚步，并注意詹森差距！

翻译自: https://towardsdatascience.com/mind-the-jensen-gap-c54e0eb9e1b7

詹森不等式

查看全文

http://www.taodudu.cc/news/show-994950.html

数据分析师需求分析师_是什么让分析师出色？
猫眼电影评论_电影的人群意见和评论家的意见一样好吗？
ai前沿公司_美术是AI的下一个前沿吗？
mardown 标题带数字_标题中带有数字的故事更成功吗？
使用Pandas 1.1.0进行稳健的2个DataFrames验证
rstudio 关联r_使用关联规则提出建议（R编程）
jquery数据折叠_通过位折叠缩小大数据
决策树信息熵计算_决策树熵|熵计算
流式数据分析_流式大数据分析
数据科学还是计算机科学_数据科学101
js有默认参数的函数加参数_函数参数：默认，关键字和任意
相似邻里算法_纽约市-邻里之战
数据透视表和数据交叉表_数据透视表的数据提取
图像处理傅里叶变换图像变化_傅里叶变换和图像床单视图。
滞后分析rstudio_使用RStudio进行A / B测试分析
unity3d 可视化编程_R编程系列：R中的3D可视化
python 数据科学包_什么时候应该使用哪个Python数据科学软件包？
熊猫tv新功能介绍_您应该知道的4种熊猫绘图功能
vs显示堆栈数据分析_什么是“数据分析堆栈”？
广告投手_测量投手隐藏自己的音高的程度
python bokeh_提升视觉效果：使用Python和Bokeh制作交互式地图
nosql_探索NoSQL系列
python中api_通过Python中的API查找相关的工作技能
欺诈行为识别_使用R（编程）识别欺诈性的招聘广告
nlp gpt论文_GPT-3：NLP镇的最新动态
基于plotly数据可视化_[Plotly + Datashader]可视化大型地理空间数据集
划痕实验迁移面积自动统计_从Jupyter迁移到合作实验室
数据开放数据集_除开放式清洗之外：叙述是开放数据门户的未来吗？
它们是什么以及为什么我们不需要它们
机器学习啤酒数据集_啤酒数据集上的神经网络

詹森不等式_注意詹森差距相关推荐

【数学】凸函数与詹森不等式（琴生不等式）解析
[数学]凸函数与詹森不等式(琴生不等式) 文章目录 [数学]凸函数与詹森不等式(琴生不等式) 1 凸函数和凹函数 2 琴生不等式(詹森不等式) 1 凸函数和凹函数下凸函数(凸函数),从几何意义上看, ...
gitstack连接詹金斯_詹金斯的Maven报告
gitstack连接詹金斯代码质量是一个敏感的话题. 它会影响您的维护成本以及客户满意度. 更不用说您的开发人员使用代码的动力. 谁想要修复难看的代码,对吗? 讨论代码质量总是需要事实和数字! 因此 ...
质量属性效用树_知识之森-数据化你的quot;知识树”
"知识之森"原始体系 "知识之森",源于对于知识模型焦虑的我,某天在兰陵王的知乎号 @思维有了模型 ,看到的一篇关于"知识树"的文章--知识 ...
詹金斯搭建_与詹金斯一起将GitLab中的Gradle项目发布到Artifactory
詹金斯搭建我将详细展示如何定期构建您的项目,然后如何进行发布. 它涉及许多工具的协作,我发现很难正确设置这些工具,这就是我编写此代码的原因. 目标我将向您展示如何实现以下两种方案. 第一个是如何进 ...
詹金斯搭建_与詹金斯一起连续交付Heroku
詹金斯搭建如果您安装了Jenkins Git插件,那么利用Jenkins并针对Heroku的连续交付管道的设置就非常简单. 通过此管道,对特定Git分支的更改将导致Heroku部署. 为了使此部署过 ...
python白森_江白森002.
再次见到江白森的时候是三天后,也就是军训的第二天. 晚间的时候,以小组为单位举行篝火晚会.修辞坐在树枝堆前,看着堆积的树枝霎时间被点燃,火光一瞬间炸裂开来,越来越旺.树堆那边的人像是在火中沐浴一般,让 ...
埃森哲java笔试_【埃森哲Java面试】埃森哲java面试-看准网
其实对埃森哲不是很了解,在招聘网站上看到之后就填写了申请,本来因为没有对咨询有过经验,以为不会有网测之类的机会,结果就收到了网测,其实应该也不是海发,因为也有人没有收到,总而言之,网测通过之后就会收到 ...
java面试题对日开发_【埃森哲Java面试】面试的是对日java的后端开发-看准网
其实对埃森哲不是很了解,在招聘网站上看到之后就填写了申请,本来因为没有对咨询有过经验,以为不会有网测之类的机会,结果就收到了网测,其实应该也不是海发,因为也有人没有收到,总而言之,网测通过之后就会收到 ...
java面试英语自我介绍_【埃森哲java工程师面试】就是简单英文自我介绍。-看准网...
其实对埃森哲不是很了解,在招聘网站上看到之后就填写了申请,本来因为没有对咨询有过经验,以为不会有网测之类的机会,结果就收到了网测,其实应该也不是海发,因为也有人没有收到,总而言之,网测通过之后就会收到 ...

詹森不等式_注意詹森差距