Huber loss--转

原文地址：https://en.wikipedia.org/wiki/Huber_loss

In statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss. A variant for classification is also sometimes used.

Definition

Huber loss (green, {\displaystyle \delta =1}

) and squared error loss (blue) as a function of {\displaystyle y-f(x)}

The Huber loss function describes the penalty incurred by an estimation procedure f. Huber (1964) defines the loss function piecewise by^[1]

{\displaystyle L_{\delta }(a)={\begin{cases}{\frac {1}{2}}{a^{2}}&{\text{for }}|a|\leq \delta ,\\\delta (|a|-{\frac {1}{2}}\delta ),&{\text{otherwise.}}\end{cases}}}

This function is quadratic for small values of a, and linear for large values, with equal values and slopes of the different sections at the two points where {\displaystyle |a|=\delta }. The variable a often refers to the residuals, that is to the difference between the observed and predicted values {\displaystyle a=y-f(x)}, so the former can be expanded to^[2]

{\displaystyle L_{\delta }(y,f(x))={\begin{cases}{\frac {1}{2}}(y-f(x))^{2}&{\textrm {for}}|y-f(x)|\leq \delta ,\\\delta \,|y-f(x)|-{\frac {1}{2}}\delta ^{2}&{\textrm {otherwise.}}\end{cases}}}

Motivation

Two very commonly used loss functions are the squared loss, {\displaystyle L(a)=a^{2}}, and the absolute loss, {\displaystyle L(a)=|a|}. The squared loss function results in an arithmetic mean-unbiased estimator, and the absolute-value loss function results in a median-unbiased estimator (in the one-dimensional case, and a geometric median-unbiased estimator for the multi-dimensional case). The squared loss has the disadvantage that it has the tendency to be dominated by outliers—when summing over a set of {\displaystyle a}'s (as in {\textstyle \sum _{i=1}^{n}L(a_{i})}), the sample mean is influenced too much by a few particularly large a-values when the distribution is heavy tailed: in terms of estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions.

As defined above, the Huber loss function is convex in a uniform neighborhood of its minimum {\displaystyle a=0}, at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points {\displaystyle a=-\delta } and {\displaystyle a=\delta }. These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimator (using the absolute value function).

Pseudo-Huber loss function

The Pseudo-Huber loss function can be used as a smooth approximation of the Huber loss function, and ensures that derivatives are continuous for all degrees. It is defined as^[3]^[4]

{\displaystyle L_{\delta }(a)=\delta ^{2}({\sqrt {1+(a/\delta )^{2}}}-1).}

As such, this function approximates {\displaystyle a^{2}/2} for small values of {\displaystyle a}, and approximates a straight line with slope {\displaystyle \delta } for large values of {\displaystyle a}.

While the above is the most common form, other smooth approximations of the Huber loss function also exist.^[5]

Variant for classification

For classification purposes, a variant of the Huber loss called modified Huber is sometimes used. Given a prediction {\displaystyle f(x)} (a real-valued classifier score) and a true binary class label {\displaystyle y\in \{+1,-1\}}, the modified Huber loss is defined as^[6]

{\displaystyle L(y,f(x))={\begin{cases}\max(0,1-y\,f(x))^{2}&{\textrm {for}}\,\,y\,f(x)\geq -1,\\-4y\,f(x)&{\textrm {otherwise.}}\end{cases}}}

The term {\displaystyle \max(0,1-y\,f(x))} is the hinge loss used by support vector machines; the quadratically smoothed hinge loss is a generalization of {\displaystyle L}.^[6]

Applications

The Huber loss function is used in robust statistics, M-estimation and additive modelling.^[7]

References

Huber, Peter J. (1964). "Robust Estimation of a Location Parameter". Annals of Statistics 53 (1): 73–101. doi:10.1214/aoms/1177703732. JSTOR 2238020.
Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). The Elements of Statistical Learning. p. 349. Compared to Hastie et al., the loss is scaled by a factor of ½, to be consistent with Huber's original definition given earlier.
Charbonnier, P.; Blanc-Feraud, L.; Aubert, G.; Barlaud, M. (1997). "Deterministic edge-preserving regularization in computed imaging". IEEE Trans. Image Processing 6(2): 298–311. doi:10.1109/83.551699.
Hartley, R.; Zisserman, A. (2003). Multiple View Geometry in Computer Vision (2nd ed.). Cambridge University Press. p. 619. ISBN 0-521-54051-8.
Lange, K. (1990). "Convergence of Image Reconstruction Algorithms with Gibbs Smoothing". IEEE Trans. Medical Imaging 9 (4): 439–446. doi:10.1109/42.61759.
Zhang, Tong (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. ICML.
Friedman, J. H. (2001). "Greedy Function Approximation: A Gradient Boosting Machine". Annals of Statistics 26 (5): 1189–1232. doi:10.1214/aos/1013203451.JSTOR 2699986.

转载于:https://www.cnblogs.com/davidwang456/articles/5586178.html

Huber loss--转相关推荐

huber loss
huber loss huber loss 是一种优化平方loss的一种方式,使得loss变化没有那么大. import numpy as np import matplotlib.pyplot as ...
Huber Loss function
Huber loss是为了增强平方误差损失函数(squared loss function)对噪声(或叫离群点,outliers)的鲁棒性提出的. Definition Lδ(a)=⎧⎩⎨⎪⎪⎪⎪12 ...
线性拟合——从最大似然估计到平方误差到huber loss
考虑这样一些数据: x = np.array([0, 3, 9, 14, 15, 19, 20, 21, 30, 35,40, 41, 42, 43, 54, 56, 67, 69, 72, 88]) ...
回归损失函数2 ： HUber loss,Log Cosh Loss,以及 Quantile Loss
均方误差(Mean Square Error,MSE)和平均绝对误差(Mean Absolute Error,MAE) 是回归中最常用的两个损失函数,但是其各有优缺点.为了避免MAE和MSE各自的优缺 ...
机器学习_LGB自定义huber loss函数
很多时候为了达到更好的训练效果我们需要改变损失函数,以加速数据的拟合. 一.huber函数的近似函数众所周知我们rmse会对异常值的损失关注度特别高,mae对异常会没有那么敏感.将两者进行结合就可以 ...
【损失函数】MSE, MAE, Huber loss详解
转载:https://mp.weixin.qq.com/s/Xbi5iOh3xoBIK5kVmqbKYA https://baijiahao.baidu.com/s?id=16119517755261 ...
【机器学习】Huber loss
Huber Loss 是一个用于回归问题的带参损失函数, 优点是能增强平方误差损失函数(MSE, mean square error)对噪声(或叫离群点,outliers)的鲁棒性. 当预测偏差小于 ...
机器学习之Huber loss
Huber Loss 是用于回归问题的带参损失函数, 优点是能增强平方误差损失函数(MSE, mean square error)对离群点的鲁棒性. 当预测偏差小于 δ 时,它采用平方误差, 当预测偏 ...
回归损失函数：Huber Loss
Huber损失函数,平滑平均绝对误差相比平方误差损失,Huber损失对于数据中异常值的敏感性要差一些.在值为0时,它也是可微分的.它基本上是绝对值,在误差很小时会变为平方值.误差使其平方值的大小如何 ...
回归损失函数：L1，L2，Huber，Log-Cosh，Quantile Loss
回归损失函数:L1,L2,Huber,Log-Cosh,Quantile Loss 机器学习中所有的算法都需要最大化或最小化一个函数,这个函数被称为"目标函数".其中,我们一般把最 ...