Machine Learning - Coursera 吴恩达机器学习教程 Week5 学习笔记
神经网络的代价函数
定义
L = 神经网络总层数
sl = 第l层的单元数(不包含bias unit)
K = output units/classes的数量
普通逻辑回归代价函数:
神经网络代价函数:
后面的正则化部分,θ矩阵的:
- 列数=当前层的节点数(包含bias unit)
- 行数=下一层的节点数(不包含bias unit)
反向传播
先正向推导:
再反向求代价:
D:delta矩阵,它正好是J(θ)的偏导函数
有点复杂。暂不深究细节,先会用。
反向传播和正向传播很像,只是换了个方向:
反向传播实现
参数展开
展开参数,作为一个输入:
thetaVector = [ Theta1(:); Theta2(:); Theta3(:); ]
deltaVector = [ D1(:); D2(:); D3(:) ]
在函数里面,还原出各个矩阵:
Theta1 = reshape(thetaVector(1:110),10,11)
Theta2 = reshape(thetaVector(111:220),10,11)
Theta3 = reshape(thetaVector(221:231),1,11)
在反向传播中的应用为:
梯度检查(Gradient Checking)
近似法求导数:
对于多个θ的矩阵,可以用近似法逐个求偏导:
当ε足够小时(比如ε = 10-4),可以得到近似的导数值。
计算gradApprox的伪代码如下:
epsilon = 1e-4;
for i = 1:n,thetaPlus = theta;thetaPlus(i) += epsilon;thetaMinus = theta;thetaMinus(i) -= epsilon;gradApprox(i) = (J(thetaPlus) - J(thetaMinus))/(2*epsilon)
end;
梯度检查可以与DVec对比,检查自己求解是否正确。
这种方法的梯度检查效率比较低,在真正开始训练时要记得关闭梯度检查。
随机初始θ
如果上层每个节点的θij的值都相同(比如0),那么该节点输出到下个节点的结果就都相同,最终造成下层每个a都相同;更进一步,正向传播求时的偏导也相同,每次更新后,下层的每个a仍是相同的,相当于到了后面,只有一个特征了:
需要打破对称性,对每个θ矩阵内部引入随机数初始化:
伪代码为:
If the dimensions of Theta1 is 10x11, Theta2 is 10x11 and Theta3 is 1x11.Theta1 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;
Theta2 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;
Theta3 = rand(1,11) * (2 * INIT_EPSILON) - INIT_EPSILON;
整体流程
架构设计
- 输入节点数:x的特征维度数
- 输出节点数:在多分析类问题中,等于类别数
- 隐藏层的节点数:越多效果越好,但计算量越大,需要权衡
- 默认值:1个隐藏层,若多于1个隐藏层,则每个隐藏层的节点数一样
计算流程
- 随机初始化权值(θ)
- 正向传播,计算hx
- 计算代价函数
- 反向传播,计算偏导数
- 用梯度检查确认反向传播是否正确。然后关闭提督检查。
- 使用梯度下降或优化算法,来最小化代价函数,求出最佳θ。
循环对每个样本进行正向、反向传播:
for i = 1:m,Perform forward propagation and backpropagation using example (x(i),y(i))(Get activations a(l) and delta terms d(l) for l = 2,...,L
神经网络的工作过程大致如下:
作业
θ的维度,就是 新特征数 * 旧特征数 。因为θ的作用,就是计算出新的维度。
% Theta1 has size 25 x 401
% Theta2 has size 10 x 26
作业有难度,反向传播部分代码参考了https://github.com/everpeace/ml-class-assignments
nnCostFunction.m
function [J grad] = nnCostFunction(nn_params, ...input_layer_size, ...hidden_layer_size, ...num_labels, ...X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
% X, y, lambda) computes the cost and gradient of the neural network. The
% parameters for the neural network are "unrolled" into the vector
% nn_params and need to be converted back into the weight matrices.
%
% The returned parameter grad should be a "unrolled" vector of the
% partial derivatives of the neural network.
%% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...hidden_layer_size, (input_layer_size + 1));Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...num_labels, (hidden_layer_size + 1));% Setup some useful variables
m = size(X, 1);% You need to return the following variables correctly
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the code by working through the
% following parts.
%
% Part 1: Feedforward the neural network and return the cost in the
% variable J. After implementing Part 1, you can verify that your
% cost function computation is correct by verifying the cost
% computed in ex4.m
%
% Part 2: Implement the backpropagation algorithm to compute the gradients
% Theta1_grad and Theta2_grad. You should return the partial derivatives of
% the cost function with respect to Theta1 and Theta2 in Theta1_grad and
% Theta2_grad, respectively. After implementing Part 2, you can check
% that your implementation is correct by running checkNNGradients
%
% Note: The vector y passed into the function is a vector of labels
% containing values from 1..K. You need to map this vector into a
% binary vector of 1's and 0's to be used with the neural network
% cost function.
%
% Hint: We recommend implementing backpropagation using a for-loop
% over the training examples if you are implementing it for the
% first time.
%
% Part 3: Implement regularization with the cost function and gradients.
%
% Hint: You can implement this around the code for
% backpropagation. That is, you can compute the gradients for
% the regularization separately and then add them to Theta1_grad
% and Theta2_grad from Part 2.
%% Y = zeros(m, num_labels); % m x num_labels == 5000 x 10
% for i = 1:m,
% Y(i, y(i)) = 1;
% end
Y = (1:num_labels)==y; % m x num_labels == 5000 x 10a1 = [ones(m, 1) X]; % 5000 x 401
z2 = a1 * Theta1'; % m x hidden_layer_size == 5000 x 25
a2 = sigmoid(z2); % m x hidden_layer_size == 5000 x 25
a2 = [ones(m,1), a2]; % 5000 x 26z3 = a2 * Theta2'; % m x num_labels == 5000 x 10
a3 = sigmoid(z3); % m x num_labels == 5000 x 10
h = a3; % m x num_labels == 5000 x 10% calculte penalty
p = sum(sum(Theta1(:, 2:end).^2, 2))+sum(sum(Theta2(:, 2:end).^2, 2));% calculate J
J = sum(sum((-Y).*log(h) - (1-Y).*log(1-h), 2))/m + lambda*p/(2*m); %scalar% calculate sigmas
sigma3 = a3 - Y; % 5000 x 10
sigma2 = (sigma3*Theta2).*sigmoidGradient([ones(size(z2, 1), 1) z2]); % 5000 x 26
sigma2 = sigma2(:, 2:end); % 5000 x 25% accumulate gradients
delta_1 = (sigma2'*a1); % 25 x 401
delta_2 = (sigma3'*a2); % 10 x 26% calculate regularized gradient
p1 = (lambda/m)*[zeros(size(Theta1, 1), 1) Theta1(:, 2:end)];
p2 = (lambda/m)*[zeros(size(Theta2, 1), 1) Theta2(:, 2:end)];
Theta1_grad = delta_1./m + p1; % 25 x 401
Theta2_grad = delta_2./m + p2; % 10 x 26% -------------------------------------------------------------% =========================================================================% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];end
sigmoidGradient.m
function g = sigmoidGradient(z)
%SIGMOIDGRADIENT returns the gradient of the sigmoid function
%evaluated at z
% g = SIGMOIDGRADIENT(z) computes the gradient of the sigmoid function
% evaluated at z. This should work regardless if z is a matrix or a
% vector. In particular, if z is a vector or matrix, you should return
% the gradient for each element.g = zeros(size(z));% ====================== YOUR CODE HERE ======================
% Instructions: Compute the gradient of the sigmoid function evaluated at
% each value of z (z can be a matrix, vector or scalar).g = sigmoid(z).*(1-sigmoid(z));% =============================================================end
randInitializeWeights.m
function W = randInitializeWeights(L_in, L_out)
%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in
%incoming connections and L_out outgoing connections
% W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights
% of a layer with L_in incoming connections and L_out outgoing
% connections.
%
% Note that W should be set to a matrix of size(L_out, 1 + L_in) as
% the first column of W handles the "bias" terms
%% You need to return the following variables correctly
W = zeros(L_out, 1 + L_in);% ====================== YOUR CODE HERE ======================
% Instructions: Initialize W randomly so that we break the symmetry while
% training the neural network.
%
% Note: The first column of W corresponds to the parameters for the bias unit
%epsilon_init = 0.12;
W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init;% =========================================================================end
Machine Learning - Coursera 吴恩达机器学习教程 Week5 学习笔记相关推荐
- Machine Learning - Coursera 吴恩达机器学习教程 Week1 学习笔记
机器学习的定义 Arthur Samuel 传统定义 Arthur Samuel: "the field of study that gives computers the ability ...
- 下载量过百万的吴恩达机器学习和深度学习笔记更新了!(附PDF下载)
今天,我把吴恩达机器学习和深度学习课程笔记都更新了,并提供下载,这两本笔记非常适合机器学习和深度学习入门.(作者:黄海广) 0.导语 我和同学将吴恩达老师机器学习和深度学习课程笔记做成了打印版,放在g ...
- Andrew Ng -- machine learning ex2/吴恩达机器学习ex2
这个项目包含了吴恩达机器学习ex2的python实现,主要知识点为逻辑回归.正则化,题目内容可以查看数据集中的ex2.pdf 代码来自网络(原作者黄广海的github),添加了部分对于题意的中文翻译, ...
- Machine Learning(吴恩达) 学习笔记(一)
Machine Learning(吴恩达) 学习笔记(一) 1.什么是机器学习? 2.监督学习 3.无监督学习 4.单变量线性回归 4.1代价函数 4.2 梯度下降 5.代码回顾 最近在听吴恩达老师的 ...
- 新建网站了!Github标星过万的吴恩达机器学习、深度学习课程笔记,《统计学习方法》代码实现,可以在线阅读了!...
吴恩达机器学习.深度学习,李航老师<统计学习方法>,可以说是机器学习入门的宝典.本文推荐一个网站"机器学习初学者",把以上资源的笔记.代码实现做成了网页版,可以在线阅读 ...
- 手机上的机器学习资源!Github标星过万的吴恩达机器学习、深度学习课程笔记,《统计学习方法》代码实现!...
吴恩达机器学习.深度学习,李航老师<统计学习方法>.CS229数学基础等,可以说是机器学习入门的宝典.本文推荐一个网站"机器学习初学者",把以上资源的笔记.代码实现做成 ...
- 吴恩达深度学习笔记_Github标星过万的吴恩达机器学习、深度学习课程笔记,《统计学习方法》代码实现,可以在线阅读了!...
吴恩达机器学习.深度学习,李航老师<统计学习方法>,可以说是机器学习入门的宝典.本文推荐一个网站"机器学习初学者",把以上资源的笔记.代码实现做成了网页版,可以在线阅读 ...
- Stanford CS230吴恩达Reading Research Papers学习笔记
目录 Stanford CS230吴恩达Reading Research Papers学习笔记 如何通过更有效地阅读研究论文,来接触新领域知识 如何有效地针对一篇论文进行阅读 论文的多次阅读法 阅读论 ...
- 吴恩达机器学习与深度学习作业目录 [图片已修复]
python3.6 (一) 吴恩达机器学习作业目录 1 吴恩达机器学习作业Python实现(一):线性回归 2 吴恩达机器学习作业Python实现(二):logistic回归 3 吴恩达机器学习作业P ...
最新文章
- R堆叠柱状图各成分连线画法:突出展示组间物种丰度变化
- iOS: NSTimer的循环引用(解决)
- 云计算的本质是什么?
- epoll 的accept , read, write
- 二十九、Pyspider爬取v2Ex网的python帖子
- Yii Framework2.0开发教程(5)数据库mysql性能
- python websocket django vue_Django资料 Vue实现网页前端实时反馈输出信息
- linux java解压文件怎么打开,linux下面的解压缩文件的命令
- 【Oracle】DataGuard中 Switchover 主、备切换
- gitserver提交代码的总结
- 如何在qt中插入html,如何在QT中的文本文件中插入文本?
- Redis数据丢失问题
- Java生成二维码,Spring Boot整合ZXing实现二维码生成,支持自定义二维码
- 依行科技日常实习面经
- 微信小程序显示空格符
- 05 - 钓鱼网站的攻击与防御
- 2021美团笔试秋招后台高清
- ArcGIS Desktop 10.5 打开遇到严重的应用程序错误的解决办法
- 电脑不能正常连接网络常见问题解决方法
- kanzi学习之路(序)