李宏毅深度学习课程

预测宝可梦的战斗力

Regression

Market Forecast——预测明天股价如何？
self-driving car——预测方向盘角度
Recommendation——购买可能性（推荐系统）

f(x(宝可梦))=y′CPafterevolution′f(x(宝可梦))=y\;'\;CP\;after\;evolution\;' f(x(宝可梦))=y′CPafterevolution′

xcp:进化前战斗力、xs:物种、xhp:生命值、xw:重量、xh:高度x_{cp}:进化前战斗力、x_s:物种、x_{hp}:生命值、x_w:重量、x_h:高度xcp:进化前战斗力、xs:物种、xhp:生命值、xw:重量、xh:高度

Step 1. Model

A set of function … ————→ Model ( f1, f2, f3 … )

linear Model :
y=b+w⋅xcpy = b+w\cdot{x_{cp}} y=b+w⋅xcp
www and bbb are parameters (can be any value)
y=b+∑wixiy = b+\sum{w_ix_i} y=b+∑wixi
xix_ixi : an attribute of input XXX (feature). —— XXX 的各种属性

wiw_iwi : weight

bbb : bias

Step 2. Goodness of function

function input : function output (scalar) :

x1,x2,x3...x^1, x^2, x^3 ...x1,x2,x3... y^1,y^2,y^3...\widehat{y}^1, \widehat{y}^2, \widehat{y}^3 ...y1,y2,y3...

Loss function L :

input : a function
output : how bad it is

L(f)=L(w,b)=∑n=110(y^n−(b+w⋅xcpn))2L(f)=L(w,b)=\sum_{n=1}^{10}(\widehat{y}^n-(b+w\cdot{x_{cp}^n}))^2 L(f)=L(w,b)=n=1∑10(yn−(b+w⋅xcpn))2

estimation error : estimated yyy based on input function .

在衡量一组 www, bbb 的好坏。

Step 3. Gradient Descent

Best function :

( pick the ‘‘best’’ function)
f∗=argminfL(f)w∗,b∗=argminw,bL(w,b)=argminw,b∑n=110(y^n−(b+w⋅xcpn))2f^*=argmin_{f}\;L(f)\\w^*,b^*=argmin_{w,b}\;L(w,b)=argmin_{w,b}\;\sum_{n=1}^{10}(\widehat{y}^n-(b+w\cdot{x_{cp}^n}))^2 f∗=argminfL(f)w∗,b∗=argminw,bL(w,b)=argminw,bn=1∑10(yn−(b+w⋅xcpn))2
Consider loss function L(w)L(w)L(w) with one parameter www .
w∗=argminwL(w)w^*=argmin_w\;L(w) w∗=argminwL(w)
可微分的数可以回传，进行梯度下降

( Randomly ) Pick an initial value w0w^0w0
compute

dLdw∣w=w0−ηdLdw∣w=w0w1=w0−ηdLdw∣w=w0\frac{dL}{dw}|_{w=w^0} \\-\eta\frac{dL}{dw}|_{w=w^0} \\w^1=w^0-\eta\frac{dL}{dw}|_{w=w^0} dwdL∣w=w0−ηdwdL∣w=w0w1=w0−ηdwdL∣w=w0

η\etaη is called ‘learning rate’ .

Many iteration

Local optimal, not global optimal .

How obout two parameters ?
w∗,b∗=argminw,bL(w,b)w^*,b^*=argmin_{w,b}\;L(w,b) w∗,b∗=argminw,bL(w,b)

( Randomly ) Pick an initial value w0,b0w^0,\;b^0w0,b0
compute

dLdw∣w=w0,b=b0dLdb∣w=w0,b=b0w1=w0−ηdLdw∣w=w0,b=b0b1=b0−ηdLdb∣w=w0,b=b0\frac{dL}{dw}|_{w=w^0,b=b^0} \\\frac{dL}{db}|_{w=w^0,b=b^0} \\w^1=w^0-\eta\frac{dL}{dw}|_{w=w^0,b=b^0} \\b^1=b^0-\eta\frac{dL}{db}|_{w=w^0,b=b^0} dwdL∣w=w0,b=b0dbdL∣w=w0,b=b0w1=w0−ηdwdL∣w=w0,b=b0b1=b0−ηdbdL∣w=w0,b=b0

▽=[αLαwαLαb]gradient\bigtriangledown=\left[\begin{array}{rcl} \frac{\alpha{L}}{\alpha{w}} \\\frac{\alpha{L}}{\alpha{b}} \end{array}\right]_{gradient} ▽=[αwαLαbαL]gradient

The linear regression, the loss function LLL is convex. ( No local optimal )

Fomulation of αLαw\frac{\alpha{L}}{\alpha{w}}αwαL and αLαb\frac{\alpha{L}}{\alpha{b}}αbαL
L(w,b)=∑n=110(y^n−(b+w⋅xcpn))2αLαw=∑n=1102(y^n−(b+w⋅xcpn))(−xcpn)αLαb=∑n=1102(y^n−(b+w⋅xcpn))(−1)L(w,b)=\sum_{n=1}^{10}(\widehat{y}^n-(b+w\cdot{x_{cp}^n}))^2 \\\frac{\alpha{L}}{\alpha{w}}=\sum_{n=1}^{10}2(\widehat{y}^n-(b+w\cdot{x_{cp}^n}))(-x_{cp}^n) \\\frac{\alpha{L}}{\alpha{b}}=\sum_{n=1}^{10}2(\widehat{y}^n-(b+w\cdot{x_{cp}^n}))(-1) L(w,b)=n=1∑10(yn−(b+w⋅xcpn))2αwαL=n=1∑102(yn−(b+w⋅xcpn))(−xcpn)αbαL=n=1∑102(yn−(b+w⋅xcpn))(−1)

Model Selection

Model1:y=w1x+bModel2:y=w1x+w2x+bModel3:y=w1x+w2x+w3x+b...Model\;1:y=w_1x+b \\Model\;2:y=w_1x+w_2x+b \\Model\;3:y=w_1x+w_2x+w_3x+b \\... Model1:y=w1x+bModel2:y=w1x+w2x+bModel3:y=w1x+w2x+w3x+b...

A more complex model does not always lead to better performance on testing data. This is overfitting.

Let’s collect more data. There is more hidden factors influence the previous model. : the type of pokeman

Back to Step 1: Redesign the Model

xsx_sxs = species of xxx

X ——→
ifxs=Pidgey:y=b1+w1⋅xcpifxs=Weedle:y=b2+w2⋅xcpifxs=Caterpie:y=b3+w3⋅xcpifxs=Eevee:y=b4+w4⋅xcpif\;x_s=Pidgey:\;y=b_1+w_1\cdot{x_{cp}} \\ if \;x_s=Weedle:\;y=b_2+w_2\cdot{x_{cp}} \\ if \;x_s=Caterpie:\;y=b_3+w_3\cdot{x_{cp}} \\ if \;x_s=Eevee:\;y=b_4+w_4\cdot{x_{cp}} ifxs=Pidgey:y=b1+w1⋅xcpifxs=Weedle:y=b2+w2⋅xcpifxs=Caterpie:y=b3+w3⋅xcpifxs=Eevee:y=b4+w4⋅xcp
——→ yyy
y=b1δ(xs=Pidey)+w1⋅δ(xs=Pidey)xcp+...+b4δ(xs=Eevee)+w4⋅δ(xs=Eevee)xcpy=b_1\delta(x_s=Pidey)+w_1\cdot\delta(x_s=Pidey)x_{cp} \\+... \\+b_4\delta(x_s=Eevee)+w_4\cdot\delta(x_s=Eevee)x_{cp} y=b1δ(xs=Pidey)+w1⋅δ(xs=Pidey)xcp+...+b4δ(xs=Eevee)+w4⋅δ(xs=Eevee)xcp

δ(xs=Pidey)={1,ifxs=Pidey0,otherwise\delta(x_s=Pidey)=\left\{\begin{array}{rcl}1, & if\;x_s=Pidey \\0,&otherwise \end{array}\right. δ(xs=Pidey)={1,0,ifxs=Pideyotherwise

Are there any other hidden factors?

Back to Step 2: Regulazation

y=b+∑wixiy=b+\sum{w_ix_i} y=b+∑wixi

L=∑n(y^n−(b+∑wixi))2+λ∑(wi)2L=\sum_n(\widehat{y}^n-(b+\sum{w_ix_i}))^2+\lambda\sum(w_i)^2 L=n∑(yn−(b+∑wixi))2+λ∑(wi)2

training error + 正则化

bbb 对 function 的平滑程度无关，所以正则化时不考虑 bbb

The functions with smaller wiw_iwi are better. wiw_iwi 越小越平滑。
Training error: larger λ\lambdaλ , considering the training error less.

λ\lambdaλ 越大越平滑，但是不可以太平滑

why smooth function are preferred?

平滑 function 对输入杂物影响小。if some noises corrupt input xix_ixi when testing, a smooth function has less influence.

where are the errors from?

bias
variance

simpler model is less influenced by the sample data.

simple model → small variance, large bias ( underfitting )
complex model → large variance, small bias ( overfitting )

复杂模型包含简单模型

For bias, redesign your model:

add more features as input
a more complex model

what to do with large variance?

more data ( 采集真实数据，生成假数据 ) —— very effective, but not always practical
regularization

深度学习——李宏毅第一课2020相关推荐

吴恩达深度学习笔记——第一课第四周
深层神经网络内容概述深层神经网络概述前向传播和反向传播(Forward and backward propagation) 前向传播反向传播搭建神经网络块超参数代码作业--helper ...
《吴恩达深度学习》第一课第四周任意层的神经网络实现及BUG处理
目录一.实现 1.吴恩达提供的工具函数 sigmoid sigmoid求导 relu relu求导 2.实现代码导包和配置初始化参数前向运算计算损失后向运算更新参数组装模型 3.问题及 ...
吴恩实验（神经网络和深度学习）第一课第三周，代码和数据集，亲测可运行...
代码和数据集已上传到文件中应该可以直接下载吧(第一次上传文件,感觉是),解压后把文件夹拷贝到jupyter工作空间即可注:我对下载的代码的格式稍作了修改,原来定义函数与调用函数在两个单元格里,我直 ...
吴恩达深度学习第四课第一周卷积神经网络
文章目录前言一.计算机视觉(引言) 二.边缘检测示例(过滤器) 三.更多边缘检测内容(由亮到暗还是由暗到亮?) 四.Padding(Valid.Same.p) 五.卷积步长(s) 六.三维卷积(通 ...
在等吴恩达深度学习第5课的时候，你可以先看看第4课的笔记
大数据文摘作品编译:党晓芊.元元.龙牧雪等待吴恩达放出深度学习第5课的时候,你还能做什么?今天,大数据文摘给大家带来了加拿大银行首席分析师Ryan Shrott的吴恩达深度学习第4课学习笔记,一共 ...
Emojify - v2 吴恩达老师深度学习第五课第二周编程作业2
吴恩达老师深度学习第五课第二周编程作业2,包含答案! Emojify! Welcome to the second assignment of Week 2. You are going to use ...
《Python深度学习》第一章笔记
<Python深度学习>第一章笔记 1.1人工智能.机器学习.深度学习人工智能机器学习深度学习深度学习的工作原理 1.2深度学习之前:机器学习简史概率建模早期神经网络核方法 ...
深度学习笔记第一门课第四周：深层神经网络
本文是吴恩达老师的深度学习课程[1]笔记部分. 作者:黄海广[2] 主要编写人员:黄海广.林兴木(第四所有底稿,第五课第一二周,第三周前三节).祝彦森:(第三课所有底稿).贺志尧(第五课第三周底稿). ...
深度学习笔记第一门课第一周：深度学习引言
本文是吴恩达老师的深度学习课程[1]笔记部分. 作者:黄海广[2] 主要编写人员:黄海广.林兴木(第四所有底稿,第五课第一二周,第三周前三节).祝彦森:(第三课所有底稿).贺志尧(第五课第三周底稿). ...

深度学习——李宏毅第一课2020