神经网络基础：损失函数、梯度下降

本篇以最简单的多个输入一个输出的1层神经网络为例，使用logistic regression讲解了神经网络的前向反向计算（forward/backward propagation）、损失和成本函数(loss/cost function)、梯度下降(gradient)和向量化(vectorization)。

一、二分分类(binary classification)

In a binary classification problem, the result is a discrete value output (1 or 0).

Example: Cat vs Non-Cat

The goal is to train a classifier that the input is an image represented by a feature vector and predicts whether the corresponding label y is 1 or 0. In this case, whether this is a cat image (1) or a non-cat image (0).

An image is store in the computer in three separate matrices corresponding to the Red, Green, and Blue color channels of the image. The three matrices have the same size as the image, for example, the resolution of the cat image is 64 pixels X 64 pixels, the three matrices (RGB) are 64 X 64 each.

The value in a cell represents the pixel intensity which will be used to create a feature vector of n dimension. In pattern recognition and machine learning, a feature vector represents an object, in this case, a cat or no cat.

To create a feature vector, x, the pixel intensity values will be “unroll” or “reshape” for each color. The dimension of the input feature vector is nx= 64x64x3 = 12288.

本文使用的Notation：
单个样本example：（x,y）
训练样本个数：m
每个样本的特征数：nx
所有样本：每个样本一列。X.Shape=(nx,m); Y.shape=(1,m)

二、逻辑回归(logistic regression)

Logistic regression is a learning algorithm used in a supervised learning problem when the output y are all either zero or one. The goal of logistic regression is to minimize the error between its predictions and training data.
Given an image represented by a feature vector x, the algorithm will evaluate the probability of a cat being in that image.

三、逻辑回归损失函数(logistic regression cost function)

凸函数(convex):只有一个 local optimal solution 找到的 optimal solution 即 global optimal solution
非凸函(non-convex):有很多个 local optimal solution 找到的 optimal solution 不一定是 global optimal solution
L=(y^-y)2是非凸函数，本文使用的L是凸函数。

i：代表第i个example

四、梯度下降(gradient descent)

梯度下降法是用来求Cost function的最小值，经过多次迭代得到w和b的值。下图说明迭代是如何下降到 global optimal solution.

每次迭代都要更新w和b：α表示学习率（learning rate）

在程序中，偏导的符号通常使用dw和db来表示。

五、导数(derivatives)

只要学过点微积分的就知道什么是导数了。可以简单看做是斜率(slope)。

六、计算图(computation graph)–前向和反向传播的简单示例

前向如上图，反向计算如下：
because: dJ/dv=3, dv/da=1, dv/du=1, du/db=c=2, du/dc=b=3
so： da=dJ/dvdv/da=31=3
db=dJ/dvdv/dudu/db=312=6
dc=dJ/dvdv/dudu/dc=313=9
计算偏导用链式法则（chain rule）。

七、逻辑回归的梯度下降(logistic regression gradient descent )

single example:

求da,dz：

求dw1,dw2,db：

更新w和b：

m example: 改为对cost function求导。

八、向量化(vectorization)

for loop的运行时间是向量化的几百倍，训练时通常有大量的数据所以应该尽最可能的少使用 for loop语句，利用python的numpy可以实现向量化即矩阵运算，提高程序的运行速度。

X: (nx,m) Y: (1,m) w: (nx,1) b: scalar

python代码：

Z = np.dot(w.T,X) + b
A = sigmoid(Z)
dz = A-Y
db = 1/m*np.sum(dZ)
dw = 1/m*np.dot(X,dZ.T)
w = w - alpha*dw
b = b - alpha*db

九、python的broadcasting和编程注意点

broadcasting:
矩阵加减乘除向量/数：该向量/数会自动扩展成和矩阵一样大小的矩阵
向量加减乘除数：该数会自动扩展成和向量一样大小的向量

cal = A.sum(axis=0)   #axis=0垂直相加变成行向量，=1水平相加变成列向量
A/=cal   # cal自动进行broadcasting

编程note:

#生成5个高斯随机数
a = np.random.randn(5)   #rank为1，维度为（5，）#如果需要定义（5，1）或者（1，5）向量，要使用下面标准的语句：
a = np.random.randn(5,1)
b = np.random.randn(1,5)#使用assert语句对向量或数组的维度进行判断。不符合条件，则程序在此处停止。帮助我们及时检查、发现语句是否正确。
assert(a.shape == (5,1))#使用reshape函数把数组设置为我们所需的维度
a.reshape((5,1))

十、logistic loss和cost函数的理解

吴恩达深度学习笔记2-Course1-Week2【神经网络基础：损失函数、梯度下降】相关推荐

吴恩达深度学习第一课--第二周神经网络基础作业上正反向传播推导
文章目录正向传播推导第i个样本向量化(从个别到整体) 判断向量维度将原始数据进行整合反向传播推导第i个样本损失函数代价函数梯度下降法(实则是多元函数求微分) 向量化(从个别到整体) ...
吴恩达深度学习笔记10-Course4-Week1【卷积神经网络】
卷积神经网络(Convolutional Neural Networks) 一.计算机视觉(Computer Vision) 计算机视觉处理的输入都是图片.当图片尺寸比较小时,可以采用深度神经网络的结 ...
吴恩达深度学习笔记4-Course1-Week4【深层神经网络】
深层神经网络(DNN): 一.深层神经网络 4层的神经网络: 二.前向与反向传播前向 (forward propagation): 反向 (backward propagation): notati ...
吴恩达深度学习第一课--第二周神经网络基础作业下代码实现
文章目录需要的库文件步骤取出训练集.测试集了解训练集.测试集查看图片数据维度处理标准化数据定义sigmoid函数初始化参数定义前向传播函数.代价函数及梯度下降优化部分预测部分 ...
吴恩达深度学习笔记（四）
吴恩达深度学习笔记(四) 卷积神经网络CNN-第二版卷积神经网络深度卷积网络:实例探究目标检测特殊应用:人脸识别和神经风格转换卷积神经网络编程作业卷积神经网络CNN-第二版卷积神经网络 ...
花书+吴恩达深度学习（十三）卷积神经网络 CNN 之运算过程（前向传播、反向传播）
目录 0. 前言 1. 单层卷积网络 2. 各参数维度 3. CNN 前向传播反向传播如果这篇文章对你有一点小小的帮助,请给个关注,点个赞喔~我会非常开心的~ 花书+吴恩达深度学习(十)卷积神经网络 ...
花书+吴恩达深度学习（十一）卷积神经网络 CNN 之池化层
目录 0. 前言 1. 最大池化(max pooling) 2. 平移不变形 3. 其他池化函数 4. 卷积和池化作为一种无限强的先验如果这篇文章对你有一点小小的帮助,请给个关注,点个赞喔~我会非常 ...
花书+吴恩达深度学习（十）卷积神经网络 CNN 之卷积层
目录 0. 前言 1. 2D 图像卷积 2. 3D 图像卷积 3. 过滤器(核函数) 4. 过滤器应用于边缘检测 5. padding 填充 6. stride 步长 7. 使用卷积的动机 8. 1乘 ...
花书+吴恩达深度学习（一）前馈神经网络（多层感知机 MLP）
目录 0. 前言 1. 每一个神经元的组成 2. 梯度下降改善线性参数 3. 非线性激活函数 4. 输出单元 4.1. 线性单元 4.2. sigmoid 单元 4.3. softmax 单元 5. ...
799页！吴恩达深度学习笔记.PDF
吴恩达深度学习课程,是公认的最优秀的深度学习课程之一,目前没有教材,只有视频,本文提供完整笔记下载,这本笔记非常适合和深度学习入门. 0.导语黄海广博士和同学将吴恩达老师深度学习视频课程做了完整的笔 ...

吴恩达深度学习笔记2-Course1-Week2【神经网络基础：损失函数、梯度下降】