Cost Function

首先本人一直有一个疑问缠绕了我很久，就是吴恩达老师所讲的机器学习课程里边的逻辑回归这点，使用的是交叉熵损失函数，但是在进行求导推导时，google了很多的课件以及教程都是直接使用的，这个问题困扰了很久，最后了解了在国外的教程中都是默认log就是ln。所以在机器学习中见到log就在脑海中自动转变一下思想他这里说的是ln就这样去理解吧。

How to fit the parameters theta for logistic regression,In particular, I'd like to define the optimization objective or the cost function that we'll use to fit the parameters.

Here's to supervised learning problem of fitting a logistic regression model.

$x$ 是n+1维的特征向量。 $h_\Theta (x)$ 是hypothesis,and the parameters of the hypothesis is this $\Theta$ over here.

Because this is a classification problem,our training set has the property that every label y is either 0 or 1.

Back when we were developing the linear regression model,we use the followinbg cost function.

Now,this cost function function worked fine for linear regression,but here we're interested in logistic regression.

for logistic regression,We're using a multipower hypothetical function,this would be a non-convex function of the parameters theta.Here is what I mean by non-convex.We have some cost function J of theta(J(θ)), and for logistic regression this function H here has a non linearity,right?因为这里是sigmoid函数，so it's a pretty complicated nonlinear function.And if you take the sigmoid function and plug it in here.J(θ)looks like:

J(θ) can look like a unction just like this, with many local optima and the formal term for this is that this a nion convex function. If you were to turn gradient descent on this sort of function,it is not guaranteed to converge to the global minimum.whereas in contrast,what we would like is to have a cost function J of theta that is convex, that is a single bow-shaped function that looks like this, so that if you run gradient descent ,we would be guaranteed the gradient descent would converge to the global minimun.And the problem of using the squared cost function is that because of this very non linear sigmoid function that appear in the middle here,J of theta ends up being a non convex function if you were to define it sa the squared cost function.So what wewould like to do is to instead come up with a different cost function that is convex and so that we can apply a great algorithm like gradient descent and be guaranteed to find a global minimum.

------------------------------逻辑回归------------------

Here is a cost function that we're going to use for logistic regression.

We are going to the cost or the penalty that the algorithm pays.

When y=1,the curve looks like this:

Now,this cost function has a few interesting and desirable properties,First you notice that:

预测对了cost就是0，没有预测对cost就是1.

First you notice that if y is equal to 1,and h(x)=1,in other words,if the hypothesis exactly predicts h(x) equals 1, and y is exactly equal to what I predicted, Then the cost is equal 0,right?First notice that if h(x)=1,and if the hypothesis predicts that y is equal to 1,and if indeed y is equal to 1,then the cost is equal to 0（the case that y equals 1 here）.

But if h(x) is equal to 1,the cost is down here is equal to 0,and that's what we like it to be.Because if we correctly predict,the output y then the cost is 0。

But now notice also that as h(x) approaches ,the output of the hypoyhesis approaches 0,the cost blows up,and it goes to infinity,And what this does is it caputres the intuition.(如果假设函数输出0，相当于说我们的假设函数Y=1的概率等于0，这类似于我们对病人说你有一个恶性肿瘤的概率y=1的概率是0，就是说你的恶性肿瘤完全不可能是恶性，但是如果结果这个病人的肿瘤的确是恶性的即y=1，虽然我们告诉他，它发生的概率是0，他完全不可能是恶性的，如果我们这样确定无疑的告诉他我们的预测，结果却发现我们是错的，那么我们用非常非常大的代价值乘法这个学习算法，她是被这样呈现的，这个代价值区域无穷。如果y=1，但是h(x)=0)

------------------上述是y=1的情况

下面是y=0时代价函数是什么样的?

If y turns out to be equal to 0,But we predicted y is equal to 1 with almost certainty with probability 1.Then we end up paying a very large cost.

相反，如果h(x)=0 and y=0,then the hypothesis nailed it(那么假设函数预测对了). The predicted y is equal to 0,and it turns out y is equal to 0,so at this point the cost function is going to be 0（就是上图中的原点).

-------------------------上述定义了单训练样本的代价函数，我们所选的代价函数会给我们一个凸优化问题，整体的代价函数j(θ)将会是一个convex and local optima free凸函数和局部最优。

the cost functionfor a single training example to develop further and define the cost functionfor the entire training set。

Those are examples of more sophisticated optimization algorithms,that need a way to compute J of θ,and need a way to computer the derivatives,and can then use more sophisticated strategies than gradient descent to minimize the cost function.

The details of exactly what these three alhorithms do is well beyond the scope of this course.

--------how to get logistic regression to work for mutil-class classification problrms.---an algorithm called one-versus-all classfication.

What's a multi-class classfication problem.

逻辑回归的损失函数推导以及损失函数的导数推导过程见：

记住几个公式：

有了上述几个公式以及性质的理解就可以求导了。

开头已经说过默认log是ln所以下边推导就没有疑点了。

Cost Function相关推荐

逻辑回归损失函数(cost function)
转载:https://www.cnblogs.com/richqian/p/4511557.html 逻辑回归模型预估的是样本属于某个分类的概率,其损失函数(Cost Function)可以像线型回归 ...
Loss Function , Cost Function and Kernel Function in ML(To be continued)
机器学习中的损失函数.代价函数.核函数 1.Definiton Shark Machine Learning Library–分别有介绍 "objective function, cost ...
logistic regression中的cost function选择
一般的线性回归使用的cost function为: 但由于logistic function: 本身非凸函数(convex function), 如果直接使用线性回归的cost function的话, ...
损失函数(Loss function) 和代价函数(Cost function)
1损失函数和代价函数的区别: 损失函数(Loss function):指单个训练样本进行预测的结果与实际结果的误差. 代价函数(Cost function):整个训练集,所有样本误差总和(所有损失函数 ...
使用QP方法解基于五次多项式形式的cost function minimization问题
我们在这里使用二次规划(QP)的方法解一个五次多项式形式的曲线的cost function minimization的问题.二次规划的标准形式如下: 在之前的讨论的五次多项式的方法中,我们在采用空间上 ...
Machine Learning - Coursera week5 cost function and backpropagation 1
1. cost function 这个多类别分类的神经网络和二分类的输出个数不同,二分类只有一个输出,二多分类模型有多个输出.多分类模型的输出也用one-hot码表示. 变量的定义: L = tota ...
深度学习笔记（四） cost function来源和证明
1)什么是代价函数 WIKI的解释: Cost function In economics, the cost curve, expressing production costs in terms ...
两分钟快速理解成本函数（cost function）
对成本函数(cost function)的理解成本函数是用以衡量假设函数h(x)准确性的工具. 直接上公式, 是不是有种熟悉感,最小二乘法,没错,对比理解一下就好了.该函数也被称为"平方误 ...
2.cost function损失函数
目录模型 cost function代价函数模型线性回归:模型是线性的,例如一元一次函数y=x+2,这也叫单变量线性回归univariate linear regression. trainin ...

Cost Function

Cost Function相关推荐

最新文章

热门文章