以下英文文档皆出自课程配套笔记

课9 代价函数二

这一课时考虑使用两个参数来描述代价函数。此时等价函数是一个碗形,碗底点为最小值,将碗形用等高线表示,等高线中心就是代价函数的最小值。所以距离等高线中心较近的点所对应的( θ0, θ1),能够较准确的拟合出原图像。

Cost Function - Intuition II
A contour plot is a graph that contains many contour lines. A contour line of a two variable
function has a constant value at all points of the same line. An example of such a graph is the

one to the right below.

Taking any color and going along the 'circle', one would expect to get the same value of the

cost function. For example, the three green points found on the green line above have the
same value for  J ( θ 0, θ 1) and as a result, they are found along the same line. The circled x
displays the value of the cost function for the graph on the left when  θ 0 = 800 and  θ 1= -0.15.

Taking another h(x) and plotting its contour plot, one gets the following graphs:

When  θ 0 = 360 and  θ 1 = 0, the value of  J ( θ 0, θ 1) in the contour plot gets closer to the center
thus reducing the cost function error. Now giving our hypothesis function a slightly positive

slope results in a better fit of the data.

The graph above minimizes the cost function as much as possible and consequently, the
result of  θ 1 and  θ 0 tend to be around 0.12 and 250 respectively. Plotting those values on our

graph to the right seems to put our point in the center of the inner most 'circle'.

课10 梯度下降

Gradient Descent

用于求出假设函数的参数。

So we have our hypothesis function and we have a way of measuring how well it fits into the data.Now we need to estimate the parameters in the hypothesis function.That's where gradient descent comes in.

Imagine that we graph our hypothesis function based on its fields  θ 0 and  θ 1 (actually we are
graphing the cost function as a function of the parameter estimates). We are not graphing x
and y itself, but the parameter range of our hypothesis function and the cost resulting from
selecting a particular set of parameters.
We put  θ 0 on the x axis and  θ 1 on the y axis, with the cost function on the vertical z axis. The
points on our graph will be the result of the cost function using our hypothesis with those

specific theta parameters. The graph below depicts such a setup.

We will know that we have succeeded when our cost function is at the very bottom of the pits
in our graph, i.e. when its value is the minimum. The red arrows show the minimum points in
the graph.
The way we do this is by taking the derivative (the tangential line to a function) of our cost
function. The slope of the tangent is the derivative at that point and it will give us a direction
to move towards. We make steps down the cost function in the direction with the steepest
descent. The size of each step is determined by the parameter α, which is called the learning
rate.
For example, the distance between each 'star' in the graph above represents a step
determined by our parameter α. A smaller α would result in a smaller step and a larger α
results in a larger step. The direction in which the step is taken is determined by the partial
derivative of  J ( θ 0, θ 1). Depending on where one starts on the graph, one could end up at
different points. The image above shows us two different starting points that end up in two
different places.
The gradient descent algorithm is:
repeat until convergence:

θj := θj − α ∂∂ θjJ ( θ 0, θ 1)

where
j=0,1 represents the feature index number.
At each iteration(迭代) j, one should simultaneously update the parameters  θ 1, θ 2,..., θn . Updating a
specific parameter prior to calculating another one on the  j ( th ) iteration would yield to a

wrong implementation.

注意同时更新两个参数

课11 梯度下降知识点总结

化简为一个参数,偏导数变为导数。展示了从最小点两边向最小点趋近的数学过程。

Gradient Descent Intuition
In this video we explored the scenario where we used one parameter  θ 1 and plotted its
cost function to implement a gradient descent. Our formula for a single parameter was :

Repeat until convergence:

θ 1 :=θ 1 −α ddθ1 J(θ 1 )

Regardless of the slope's sign for  ddθ1 J(θ 1 ) ,  θ 1 eventually converges to its minimum

value. The following graph shows that when the slope is negative, the value of  θ 1 increases and when it is positive, the value of  θ 1 decreases.

α是用来调节下降的“步伐”。

On a side note, we should adjust our parameter  α to ensure that the gradient descent

algorithm converges in a reasonable time. Failure to converge or too much time to obtain

the minimum value imply that our step size is wrong.

How does gradient descent converge with a fixed step size  α ?
The intuition behind the convergence is that  ddθ1 J(θ 1 ) approaches 0 as we approach the

bottom of our convex function. At the minimum, the derivative will always be 0 and thus

we get:

θ 1 :=θ 1 −α∗0      已经在最小点时,θ 1值不再发生变化。

当接近最小点时,下降的趋势会自动变小。因为导数逐渐趋向于0。

课12 线性回归的梯度下降

将梯度下降和代价函数结合得到线性回归的梯度下降算法。

Gradient Descent For Linear Regression

When specifically applied to the case of linear regression, a new form of the gradient descent
equation can be derived. We can substitute our actual cost function and our actual hypothesis
function and modify the equation to :

这些文档出现的x下标j,我认为是指代两种可能,一是各个横坐标,j=1。二是常数1,j=0。

上式可以推导出来

用懊悔法学习吴恩达机器学习【2】-----线性回归的梯度下降相关推荐

  1. 用懊悔法学习吴恩达机器学习【1】

    我比较适合这个 以下英文文档皆出自课程配套笔记 章节一 课3 Supervised Learning Supervised Learning In supervised learning, we ar ...

  2. 吴恩达机器学习(五)梯度下降

    文章目录 1.梯度下降 2.只有一个参数的最小化函数 1.梯度下降 梯度下降是很常用的算法,它不仅被用在线性回归上,还被广泛应用于机器学习的众多领域.我们将使用梯度下降法最小化其他函数,而不仅仅是最小 ...

  3. 用Python学习吴恩达机器学习——梯度下降算法理论篇

    开篇词:(CSDN专供) 欢迎阅读我的文章,本文起先是在B站上进行投稿,一开始是采用吴恩达机器学习2012年版的,目前已经出了十二期了.现在我决定将我摸索出来的学习笔记在各个平台上进行连载,保证同时更 ...

  4. 吴恩达机器学习 -- 多变量线性回归

    5.1 多维特征 前一周所讲是单变量线性回归,即 ,是只有一个变量 的假设函数,现在对房价预测模型有了更多的参考特征,比如楼层数,卧室的数量,还有房子的使用年限.根据这些特征来预测房价.此时的变量有多 ...

  5. 吴恩达-机器学习-多元线性回归模型代码

    吴恩达<机器学习>2022版 第一节第二周 多元线性回归 房价预测简单实现         以下以下共两个实验,都是通过调用sklearn函数,分别实现了 一元线性回归和多元线性回归的房价 ...

  6. 吴恩达-机器学习-一元线性回归模型实现

    吴恩达<机器学习>2022版 第一周 一元线性回归 房价预测简单实现 import numpy as np import math, copy#输入数据 x_train = np.arra ...

  7. 【学习笔记】吴恩达机器学习 WEEK2 线性回归 Octave教程

    Multivariate Linear Regression Multiple Features Xj(i)X_j^{(i)}Xj(i)​ 其中j表示迭代次数,i表示矩阵索引 转换 原来:hθ(x)= ...

  8. 吴恩达机器学习--单变量线性回归【学习笔记】

    说明:本文是本人用于记录学习吴恩达机器学习的学习笔记,如有不对之处请多多包涵. 作者:爱做梦的90后 一.模型的描述: 下面的这张图片是对于课程中一些符号的基本的定义: 吴老师列举的最简单的单变量线性 ...

  9. 吴恩达|机器学习作业目录

    一个多月除了上课自习,就是在coursera和网易云课堂上学习吴恩达机器学习的课程,现在学完了,以下是一个多月来完成的作业目录,整理一下以便使用: 1.0 单变量线性回归 1.1 多变量线性回归 2. ...

最新文章

  1. 面试官问:大量的 TIME_WAIT 状态 TCP 连接,对业务有什么影响?怎么处理?
  2. python编辑器,作为初学者该如何抉择?
  3. tp5 if 如果有html判断,tp5条件判断,in,notin,between,if等
  4. 【学术相关】翻倍!研究生招生规模持续扩张!
  5. oracle文字与格式字符串不匹配的解决
  6. C++ opengl 放置摄像机
  7. rds 数据库营销报告_《营销自动化从入门到精通》第五章 集成营销自动化工具与CRM...
  8. java db4o 教程_面向Java开发人员db4o指南:数组和集合 (1)
  9. H3C Comware平台的优势
  10. tftp目录linux目录,tftp命令指定下载目录,2步完成tftp命令传输文件
  11. 3d旋转图片立方体特效
  12. 收缩毛孔全过程,很详细! - 健康程序员,至尚生活!
  13. 芯片短缺局势依然严峻,供应链上下该如何破局?
  14. 在线音频巨头的新角逐与新平衡
  15. 一个可直接使用的轻量级博客开源系统
  16. 2022-2028全球氨洗涤器行业调研及趋势分析报告
  17. 医院PACS系统工作原理
  18. 服务器一般安装什么系统
  19. DataPipeline亮相“2021科技助力湾区数字金融发展峰会”,解锁“实时数据管理”密码
  20. 可视库位电子纸广泛应用仓储管理

热门文章

  1. 回顾中国的开源浪潮(转载)
  2. 小米1S MIUI V5刷回V4教程
  3. 光场相机重聚焦原理介绍及代码解析
  4. vue项目引入不符合ES6模块化标准的JS文件
  5. QQ宠物冒险岛全攻略
  6. 分式加法JAVA程序_十五:实战2-分式计算器
  7. 色温(Temperature)转RGB (JavaScript)
  8. 论文阅读 HiGAN:《Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis》
  9. WPF 最简单的TextBox水印
  10. 反软件盗版的最佳实践