Building your Deep Neural Network: Step by Step

接上一篇，训练一个两层的网络模型（包含一个隐藏层），这次，要搭建一个深度的神经网络模型，它有多个隐藏层．

首先，需要编写一些辅助函数来帮助实现这个模型．
在下一篇，我们还会继续用到这些辅助函数，来实现一个图片分类的深度神经网络
　
阅读玩之后你将学习到一下几点:
使用非线性单元比如：ReLU(修正线性单元)来提升你的模型
搭建一个深度神经网络模型(包含多个隐藏层)
Implement an easy-to-use neural network class

符号表示:

上标[l][l][l]：表示模型的层数，lthlthl^{th} layer.
- Example: a[L]a[L]a^{[L]} 表示网络第 LthLthL^{th}层的激活(activation)输出.
- Example: W[L]W[L]W^{[L]}，b[L]b[L]b^{[L]} 为 LthLthL^{th} 层的参数(parameters).
上标 (i)(i)(i)：表示样本id,第 ithithi^{th} 个样本(example).
- Example: x(i)x(i)x^{(i)}: 第ithithi^{th} 个training example.
下标 iii：表示向量的第 ith" role="presentation" style="position: relative;">ithithi^{th} 个元素.
- Example: a[l]iai[l]a^{[l]}_i ：lthlthl^{th} 层(activations)向量(vector)的第 ithithi^{th} 个元素．

Let’s get started!

1 -Use Packages

Let’s first import all the packages that you will need during this assignment.
- numpy is the main package for scientific computing with Python.
- matplotlib is a library to plot graphs in Python.
- np.random.seed(1) is used to keep all the random function calls consistent. It will help us grade your work. Please don’t change the seed.

import numpy as np
import h5py
import matplotlib.pyplot as plt
from testCases_v2 import *%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'%load_ext autoreload
%autoreload 2np.random.seed(1)

* Help function

计算sigmoid() 和　tanh() 的非线性输出和激活函数的求导

def sigmoid(Z):"""Returns:A -- output of sigmoid(z), same shape as Zcache -- returns Z as well, useful during backpropagation"""A = 1/(1+np.exp(-Z))cache = Zreturn A, cachedef relu(Z):"""Returns:A -- Post-activation parameter, of the same shape as ZZ -- linear output"""A = np.maximum(0,Z)assert(A.shape == Z.shape)cache = Z return A, cachedef relu_backward(dA, cache):"""Arguments:dA -- post-activation gradient, of any shapecache -- 'Z' where we store for computing backward propagation efficientlyReturns:dZ -- Gradient of the cost with respect to Z"""Z = cachedZ = np.array(dA, copy=True) # just converting dz to a correct object.# When z <= 0, you should set dz to 0 as well. dZ[Z <= 0] = 0assert (dZ.shape == Z.shape)return dZdef sigmoid_backward(dA, cache):"""Arguments:dA -- post-activation gradient, of any shapecache -- 'Z' where we store for computing backward propagation efficientlyReturns:dZ -- Gradient of the cost with respect to Z"""Z = caches = 1/(1+np.exp(-Z))dZ = dA * s * (1-s)assert (dZ.shape == Z.shape)return dZ

2 - Outline of the Assignment

初始化参数（LLL层的DNN）

正向传播(forward propagation)

线性计算部分WX+b=Z" role="presentation" style="position: relative;">WX+b=ZWX+b=ZWX+b = Z (resulting in Z[l]Z[l]Z^{[l]}).
非线性激活 ACTIVATION function (relu/sigmoid).
合并以上两步[LINEAR->ACTIVATION] 实现forward function.
在forward function中的前（L-1）次的正向传播的计算当中．使用的是［LINEAR->RELU］的组合，但是在最后一层，使用的是[LINEAR->SIGMOID］的组合．

计算损失cost.
反向传播 (backward propagation)
- 线性部分求偏导数
- 激活函数的求导(relu_backward/sigmoid_backward)
- 通过链式法则，求每层的参数的梯度
最后更新参数.

Note
在正向传播的过程中一些中间的计算结果需要保存下来，应为在后面的反向传播中，需要用到这些值来计算参数的梯度．所在正向传播的计算当中，我们包一些中间的计算结果存储在＇cache’中，cache是Python中的一个字典对象．

3 - Initialization

参数的初始化：

先编写一个初始化２层网络的函数，练练手
编写一个能对更深层网络（L层）初始化的函数

3.1 - 2-layer Neural Network

Exercise: 写一个实现两层网络初始化的函数　initialize_parameters( )
Instructions:
- 模型的结构: LINEAR -> RELU -> LINEAR -> SIGMOID.
- 随即初始化权值矩阵 np.random.randn(shape)*0.01 .
- 全０初始化偏置np.zeros(shape).

# GRADED FUNCTION: initialize_parametersdef initialize_parameters(n_x, n_h, n_y):"""n_x -- size of the input layern_h -- size of the hidden layern_y -- size of the output layerReturns:parameters -- python dictionary containing your parameters:W1 -- weight matrix of shape (n_h, n_x)b1 -- bias vector of shape (n_h, 1)W2 -- weight matrix of shape (n_y, n_h)b2 -- bias vector of shape (n_y, 1)"""np.random.seed(1)W1 = np.random.randn(n_h, n_x)*0.01b1 = np.zeros((n_h, 1))W2 = np.random.randn(n_y, n_h)*0.01b2 = np.zeros((n_y, 1))parameters = {"W1": W1,"b1": b1,"W2": W2,"b2": b2}return parameters

3.2 - L-layer Neural Network

对 LLL 层网络的参数进行初始化，比浅层网络的初始化要复杂的多，因为有很多的权重矩阵和偏置向量．但是它也是有技巧的，完成下面的＇initialize_parameters_deep() ‘,就可以对任意层数的全连接网络进行初始化了．
n[l]" role="presentation" style="position: relative;">n[l]n[l]n^{[l]} ：lll　层神经元的个数.
举个例子：模型的输入为 X" role="presentation" style="position: relative;">XXX ，(12288,209)(12288,209)(12288, 209) (with m=209m=209m=209 examples) then:

	Shape of W	Shape of b	Activation	Shape of Activation

Layer 1	(n[1],12288)(n[1],12288)(n^{[1]},12288)	(n[1],1)(n[1],1)(n^{[1]},1)	Z[1]=W[1]X+b[1]Z[1]=W[1]X+b[1]Z^{[1]} = W^{[1]} X + b^{[1]}	(n[1],209)(n[1],209)(n^{[1]},209)

Layer 2	(n[2],n[1])(n[2],n[1])(n^{[2]}, n^{[1]})	(n[2],1)(n[2],1)(n^{[2]},1)	Z[2]=W[2]A[1]+b[2]Z[2]=W[2]A[1]+b[2]Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}	(n[2],209)(n[2],209)(n^{[2]}, 209)

⋮⋮\vdots	⋮⋮\vdots	⋮⋮\vdots	⋮⋮\vdots	⋮⋮\vdots

Layer L-1	(n[L−1],n[L−2])(n[L−1],n[L−2])(n^{[L-1]}, n^{[L-2]})	(n[L−1],1)(n[L−1],1)(n^{[L-1]}, 1)	Z[L−1]=W[L−1]A[L−2]+b[L−1]Z[L−1]=W[L−1]A[L−2]+b[L−1]Z^{[L-1]} = W^{[L-1]} A^{[L-2]} + b^{[L-1]}	(n[L−1],209)(n[L−1],209)(n^{[L-1]}, 209)

Layer L	(n[L],n[L−1])(n[L],n[L−1])(n^{[L]}, n^{[L-1]})	(n[L],1)(n[L],1)(n^{[L]}, 1)	Z[L]=W[L]A[L−1]+b[L]Z[L]=W[L]A[L−1]+b[L]Z^{[L]} = W^{[L]} A^{[L-1]} + b^{[L]}	(n[L],209)(n[L],209)(n^{[L]}, 209)

计算 WX+bWX+bW X + b in python, it carries out broadcasting. For example, if:

W=⎡⎣⎢jmpknqlor⎤⎦⎥X=⎡⎣⎢adgbehcfi⎤⎦⎥b=⎡⎣⎢stu⎤⎦⎥(2)(2)W=[jklmnopqr]X=[abcdefghi]b=[stu]

W = \begin{bmatrix}j & k & l\\m & n & o \\p & q & r \end{bmatrix}\;\;\; X = \begin{bmatrix}a & b & c\\d & e & f \\g & h & i \end{bmatrix} \;\;\; b =\begin{bmatrix}s \\t \\u \end{bmatrix}\tag{2}

Then WX+bWX+bWX + b will be:

WX+b=⎡⎣⎢(ja+kd+lg)+s(ma+nd+og)+t(pa+qd+rg)+u(jb+ke+lh)+s(mb+ne+oh)+t(pb+qe+rh)+u(jc+kf+li)+s(mc+nf+oi)+t(pc+qf+ri)+u⎤⎦⎥(3)(3)WX+b=[(ja+kd+lg)+s(jb+ke+lh)+s(jc+kf+li)+s(ma+nd+og)+t(mb+ne+oh)+t(mc+nf+oi)+t(pa+qd+rg)+u(pb+qe+rh)+u(pc+qf+ri)+u]

WX + b = \begin{bmatrix}(ja + kd + lg) + s & (jb + ke + lh) + s & (jc + kf + li)+ s\\(ma + nd + og) + t & (mb + ne + oh) + t & (mc + nf + oi) + t\\(pa + qd + rg) + u & (pb + qe + rh) + u & (pc + qf + ri)+ u \end{bmatrix}\tag{3}

Exercise: L-layer网络模型的初始化.

Instructions:
- 模型的结构 [linear -> ReLU] ×× \times (L-1) -> linear -> Sigmoid.DNN模型中有 L−1L−1L-1 使用的是ReLU activation function，在输出层使用的是　sigmoid activation function.
- 随即初始化权值矩阵 Use np.random.rand(shape) * 0.01.
- ０初始化偏置向量 Use np.zeros(shape).
- n[l]n[l]n^{[l]},为每一层神经元的个数, 存储在变量layer_dims中.the layer_dims是python中的列表对象，举个例子：layer_dims = [2,4,1]:bi表示的是一个两层的网络结构，输入层神经元的个数为：２，隐藏层的神经元个数为：４，输出层的个数为：１．

# GRADED FUNCTION: initialize_parameters_deepdef initialize_parameters_deep(layer_dims):"""layer_dims -- python array (list),元素为网络每层神经元的数量Returns:parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])bl -- bias vector of shape (layer_dims[l], 1)"""np.random.seed(3)parameters = {}L = len(layer_dims)            # number of layers in the networkfor l in range(1, L):parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*0.01parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))return parameters

4 - Forward propagation module

4.1 - Linear Forward

线性计算部分方程如下：

Z[l]=W[l]A[l−1]+b[l](4)(4)Z[l]=W[l]A[l−1]+b[l]

Z^{[l]} = W^{[l]}A^{[l-1]} +b^{[l]}\tag{4}

A[0]=XA[0]=XA^{[0]} = X.

Exercise: linear_forward( )实现线性部分的正向传播

# GRADED FUNCTION: linear_forwarddef linear_forward(A, W, b):"""A -- 前一层的激活输出或者输入层: (size of previous layer, number of examples)W -- 权值矩阵:(size of current layer, size of previous layer)b -- 偏置向量:(size of the current layer, 1)Returns:Z -- 激活函数的输入：pre-activation parameter cache -- a python tuple containing "A", "W" and "b" ; 误差反向传播是需要用到"""Z = np.dot(W, A)+bcache = (A, W, b)return Z, cache

4.2 - Linear-Activation Forward

在DNN模型中会用到两个activation functions:

Sigmoid: σ(Z)=σ(WA+b)=11+e−(WA+b)σ(Z)=σ(WA+b)=11+e−(WA+b)\sigma(Z) = \sigma(W A + b) = \frac{1}{ 1 + e^{-(W A + b)}}.
在上面的helper function部分，已经实现了这个函数， sigmoid function. 这个函数返回两个值．一个是activation value “a” 和 a “cache“,cache中包含”Z“：函数的输入值．

A, activation_cache = sigmoid(Z)

ReLU:　A=RELU(Z)=max(0,Z)A=RELU(Z)=max(0,Z)A = RELU(Z) = max(0, Z).
在helper function部分也提供了这个函数的实现，函数返回的结果也是两个值，activation value “A” 和 “cache“，cache中包含”Z”

A, activation_cache = relu(Z)

接下来要把上面的线性和非线性的两步计算合并到一个函数中来完成
(linear->activation)＝liner_acivate_forward().

数学表达式:
A[l]=g(Z[l])=g(W[l]A[l−1]+b[l])A[l]=g(Z[l])=g(W[l]A[l−1]+b[l])A^{[l]} = g(Z^{[l]}) = g(W^{[l]}A^{[l-1]} +b^{[l]})
激活函数 “g” 可以是：sigmoid() or relu().

# GRADED FUNCTION: linear_activation_forwarddef linear_activation_forward(A_prev, W, b, activation):"""A_prev -- 上一层的输出或者输入层输入: (size of previous layer, number of examples)W -- weights matrix:(size of current layer, size of previous layer)b -- bias vector：(size of the current layer, 1)activation -- 该层使用的激活函数"sigmoid" or "relu"Returns:A -- 激活后的输出值post-activation value cache -- a python tuple containing "linear_cache" and "activation_cache";"""# Inputs: "A_prev, W, b". Outputs: "A, activation_cache".Z, linear_cache = linear_forward(A_prev, W, b)if activation == "sigmoid":A, activation_cache = sigmoid(Z)elif activation == "relu":A, activation_cache = relu(Z)cache = (linear_cache,activation_cache)return A, cache

d) L-Layer Model

说了这么多，要实现一个LLL-layer Neural Net,需要调用(linear_activation_forward with ReLU) L−1" role="presentation" style="position: relative;">L−1L−1L-1 times,和 linear_activation_forward with sigmoid一次．

Exercise: L_model_forward(X, parameters)

Instruction: 在下面的代码中变量 AL：A[L]=σ(Z[L])=σ(W[L]A[L−1]+b[L])A[L]=σ(Z[L])=σ(W[L]A[L−1]+b[L])A^{[L]} = \sigma(Z^{[L]}) = \sigma(W^{[L]} A^{[L-1]} + b^{[L]}).
(有时候也用Yhat来表示 Y^Y^\hat{Y}.)

Tips:
- 循环执行linear_activation_forward (L-1) times
- 不要忘记保存中间计算结果到”caches” list中. 　

# GRADED FUNCTION: L_model_forwarddef L_model_forward(X, parameters):"""Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computationX -- data, numpy array of shape (input size, number of examples)parameters -- output of initialize_parameters_deep()Returns:AL -- last post-activation valuecaches -- list of caches containing:every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)the cache of linear_sigmoid_forward() (there is one, indexed L-1)"""caches = []A = XL = len(parameters) // 2                  # number of layers in the neural network# Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.for l in range(1, L):A_prev = A W = parameters['W'+str(l)]b = parameters['b'+str(l)]A, cache = linear_activation_forward(A, W, b, 'relu')caches.append(cache)# Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.AL, cache = linear_activation_forward(A, parameters['W'+str(L)],parameters['b'+str(L)], 'sigmoid')caches.append(cache)assert(AL.shape == (1,X.shape[1]))return AL, caches

Great!现在我们已经完整的实现了正向传播的整个过程，从输入 X　到输出最后一层的结果向量 A[L]A[L]A^{[L]},下面同过 A[L]A[L]A^{[L]}计算cost.

5 - Cost function

现在开始实现反向传播的计算，首先计算　cost.

Exercise: 计算交叉熵损失 JJJ, 　

(7)−1m∑i=1m(y(i)log⁡(a[L](i))+(1−y(i))log⁡(1−a[L](i)))" role="presentation" style="position: relative;">−1m∑i=1m(y(i)log(a[L](i))+(1−y(i))log(1−a[L](i)))(7)(7)−1m∑i=1m(y(i)log⁡(a[L](i))+(1−y(i))log⁡(1−a[L](i)))

-\frac{1}{m} \sum\limits_{i = 1}^{m} (y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L](i)}\right)) \tag{7}

# GRADED FUNCTION: compute_costdef compute_cost(AL, Y):"""Implement the cost function defined by equation (7).Arguments:AL -- shape (1, number of examples)Y -- true "label" vectorshape (1, number of examples)Returns:cost -- cross-entropy cost"""m = Y.shape[1]# Compute loss from aL and y.cost = -1*(np.dot(Y, np.log(AL.T))+np.dot(np.log(1-AL), (1-Y).T))/mcost = np.squeeze(cost)      # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).assert(cost.shape == ())return cost

6 - Backward propagation module

反向传播的过程，主要是计算，loss function中参数的梯度．
Reminder:

- 计算偏导loss dL(a[2],y)dz[1]dL(a[2],y)dz[1]\frac{d \mathcal{L}(a^{[2]},y)}{{dz^{[1]}}} 在一个两层的网络中:

dL(a[2],y)dz[1]=dL(a[2],y)da[2]da[2]dz[2]dz[2]da[1]da[1]dz[1](8)(8)dL(a[2],y)dz[1]=dL(a[2],y)da[2]da[2]dz[2]dz[2]da[1]da[1]dz[1]

\frac{d \mathcal{L}(a^{[2]},y)}{{dz^{[1]}}} = \frac{d\mathcal{L}(a^{[2]},y)}{{da^{[2]}}}\frac{{da^{[2]}}}{{dz^{[2]}}}\frac{{dz^{[2]}}}{{da^{[1]}}}\frac{{da^{[1]}}}{{dz^{[1]}}} \tag{8}

最终，我们要更新的参数是 W 和 b,还要进行一步进行计算：

dW[1]=∂L∂W[1]dW[1]=∂L∂W[1]dW^{[1]} = \frac{\partial L}{\partial W^{[1]}}
dW[1]=dz[1]×∂z[1]∂W[1]dW[1]=dz[1]×∂z[1]∂W[1]dW^{[1]} = dz^{[1]} \times \frac{\partial z^{[1]} }{\partial W^{[1]}}.
db[1]=∂L∂b[1]db[1]=∂L∂b[1]db^{[1]} = \frac{\partial L}{\partial b^{[1]}}
db[1]=dz[1]×∂z[1]∂b[1]db[1]=dz[1]×∂z[1]∂b[1]db^{[1]} = dz^{[1]} \times \frac{\partial z^{[1]} }{\partial b^{[1]}}.

This is all backpropagation.
3 steps:
- linear backward
- linear-> activation backward where ACTIVATION computes the derivative
- [linear -> ReLU] ××\times (L-1) -> linear -> sigmoid backward (whole model)

6.1 - Linear backward

For layer lll, the linear part is: Z[l]=W[l]A[l−1]+b[l]" role="presentation" style="position: relative;">Z[l]=W[l]A[l−1]+b[l]Z[l]=W[l]A[l−1]+b[l]Z^{[l]} = W^{[l]} A^{[l-1]} + b^{[l]} (followed by an activation).

假设我们已经算出了dZ[l]=∂L∂Z[l]dZ[l]=∂L∂Z[l]dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}}.接下来要计算(dW[l],db[l]dA[l−1])(dW[l],db[l]dA[l−1])(dW^{[l]}, db^{[l]} dA^{[l-1]}).

计算(dW[l],db[l],dA[l])(dW[l],db[l],dA[l])(dW^{[l]}, db^{[l]}, dA^{[l]}) 用到的方程:

dW[l]=∂L∂W[l]=1mdZ[l]A[l−1]T(8)(8)dW[l]=∂L∂W[l]=1mdZ[l]A[l−1]T

dW^{[l]} = \frac{\partial \mathcal{L} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T} \tag{8}

db[l]=∂L∂b[l]=1m∑i=1mdZ[l](i)(9)(9)db[l]=∂L∂b[l]=1m∑i=1mdZ[l](i)

db^{[l]} = \frac{\partial \mathcal{L} }{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[l](i)}\tag{9}

dA[l−1]=∂L∂A[l−1]=W[l]TdZ[l](10)(10)dA[l−1]=∂L∂A[l−1]=W[l]TdZ[l]

dA^{[l-1]} = \frac{\partial \mathcal{L} }{\partial A^{[l-1]}} = W^{[l] T} dZ^{[l]} \tag{10}

Exercise: Use the 3 formulas above to implement linear_backward().

# GRADED FUNCTION: linear_backwarddef linear_backward(dZ, cache):"""dZ -- Gradient of the cost with respect to the linear output (of current layer l)cache --  of values (A_prev, W, b) coming from the forward propagation in the current layerdA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prevdW -- Gradient of the cost with respect to W (current layer l), same shape as Wdb -- Gradient of the cost with respect to b (current layer l), same shape as b"""A_prev, W, b = cachem = A_prev.shape[1]dW = np.dot(dZ, A_prev.T)/mdb = np.sum(dZ, axis=1,keepdims=True)/mdA_prev = np.dot(W.T, dZ)assert (dA_prev.shape == A_prev.shape)assert (dW.shape == W.shape)assert (db.shape == b.shape)return dA_prev, dW, db

6.2 - Linear-Activation backward

linear_activation_backward.

实现这一步骤需要用到之前编写的两个辅助函数：

sigmoid_backward: Implements the backward propagation for SIGMOID unit.

dZ = sigmoid_backward(dA, activation_cache)

relu_backward: Implements the backward propagation for RELU unit.

dZ = relu_backward(dA, activation_cache)

If g(.)g(.)g(.) is the activation function,
sigmoid_backward and relu_backward compute

dZ[l]=dA[l]∗g′(Z[l])(11)(11)dZ[l]=dA[l]∗g′(Z[l])

dZ^{[l]} = dA^{[l]} * g'(Z^{[l]}) \tag{11}.

Exercise: Implement the backpropagation for the LINEAR->ACTIVATION layer.

# GRADED FUNCTION: linear_activation_backwarddef linear_activation_backward(dA, cache, activation):"""Implement the backward propagation for the LINEAR->ACTIVATION layer.Arguments:dA -- post-activation gradient for current layer l cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficientlyactivation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"Returns:dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prevdW -- Gradient of the cost with respect to W (current layer l), same shape as Wdb -- Gradient of the cost with respect to b (current layer l), same shape as b"""linear_cache, activation_cache = cacheif activation == "relu":dZ = relu_backward(dA, activation_cache)dA_prev, dW, db = linear_backward(dZ, linear_cache)elif activation == "sigmoid":dZ = sigmoid_backward(dA, activation_cache)dA_prev, dW, db = linear_backward(dZ, linear_cache)return dA_prev, dW, db

6.3 - L-Model Backward

下面我们要实现蔓完整的反向传播功能，在L_model_forward 函数中,在模型的每一次的迭代当中，都要存储一些中间计算结果，比如(X(A),W,b, and z). 在反向传播的过程中，我们需要这些变量来计算，参数的梯度，并且在每次的迭代中更新参数的值．在每一次的迭代当中，会计算每一层的参数的梯度，从最后一层　LLL 层开始，从后往前．然后完成一轮的学习．

计算dA[L]" role="presentation" style="position: relative;">dA[L]dA[L]d{A^{[L]}}:

A[L]=σ(Z[L])A[L]=σ(Z[L])A^{[L]} = \sigma(Z^{[L]}).
dA[L]=∂L∂A[L]dA[L]=∂L∂A[L]d{A^{[L]}} = \frac{\partial \mathcal{L}}{\partial A^{[L]}}.

dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) # derivative of cost with respect to AL

grads["dW"+str(l)]=dW[l](15)(15)grads["dW"+str(l)]=dW[l]

grads["dW" + str(l)] = dW^{[l]}\tag{15}

For example, for l=3l=3l=3 this would store dW[l]dW[l]dW^{[l]} in grads["dW3"].

# GRADED FUNCTION: L_model_backwarddef L_model_backward(AL, Y, caches):"""Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID groupArguments:AL -- probability vector, output of the forward propagation (L_model_forward())Y -- true "label" vector (containing 0 if non-cat, 1 if cat)caches -- list of caches containing:every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])Returns:grads -- A dictionary with the gradientsgrads["dA" + str(l)] = ...grads["dW" + str(l)] = ...grads["db" + str(l)] = ..."""grads = {}L = len(caches) # the number of layersm = AL.shape[1]Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL# Initializing the backpropagationdAL = - (np.divide(Y, AL)- np.divide(1-Y, 1-AL))# Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "AL, Y, caches". Outputs: "grads["dAL"], grads["dWL"], grads["dbL"]current_cache = caches[L-1]grads["dA" + str(L-1)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, 'sigmoid')for l in reversed(range(L - 1)):# lth layer: (RELU -> LINEAR) gradients.# Inputs: "grads["dA" + str(l + 1)], caches". Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] current_cache = caches[l]dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads['dA'+str(l+1)],current_cache, 'relu')grads["dA" + str(l + 1)] = dA_prev_tempgrads["dW" + str(l + 1)] = dW_tempgrads["db" + str(l + 1)] = db_tempreturn grads

6.4 - Update Parameters

In this section you will update the parameters of the model, using gradient descent:

W[l]=W[l]−α dW[l](16)(16)W[l]=W[l]−αdW[l]

W^{[l]} = W^{[l]} - \alpha \text{ } dW^{[l]} \tag{16}

b[l]=b[l]−α db[l](17)(17)b[l]=b[l]−αdb[l]

b^{[l]} = b^{[l]} - \alpha \text{ } db^{[l]} \tag{17}

where αα\alpha is the learning rate. After computing the updated parameters, store them in the parameters dictionary.

Exercise: Implement update_parameters() to update your parameters using gradient descent.

Instructions:
Update parameters using gradient descent on every W[l]W[l]W^{[l]} and b[l]b[l]b^{[l]} for l=1,2,...,Ll=1,2,...,Ll = 1, 2, ..., L.

# GRADED FUNCTION: update_parametersdef update_parameters(parameters, grads, learning_rate):"""Update parameters using gradient descentArguments:parameters -- python dictionary containing your parameters grads -- python dictionary containing your gradients, output of L_model_backwardReturns:parameters -- python dictionary containing your updated parameters parameters["W" + str(l)] = ... parameters["b" + str(l)] = ..."""L = len(parameters) // 2 # number of layers in the neural network# Update rule for each parameter. Use a for loop.for l in range(L):parameters["W" + str(l+1)] = parameters["W"+str(l+1)]-learning_rate*grads['dW'+str(l+1)]parameters["b" + str(l+1)] = parameters["b"+str(l+1)]-learning_rate*grads['db'+str(l+1)]return parameters

7 - Conclusion

Congrats on implementing all the functions required for building a deep neural network!

We know it was a long assignment but going forward it will only get better. The next part of the assignment is easier.

In the next assignment you will put all these together to build two models:
- A two-layer neural network
- An L-layer neural network

You will in fact use these models to classify cat vs non-cat images!

DL_C1_week4-1(Build Deep Neural Network)相关推荐

5.深度学习练习：Deep Neural Network for Image Classification: Application
本文节选自吴恩达老师<深度学习专项课程>编程作业,在此表示感谢. 课程链接:https://www.deeplearning.ai/deep-learning-specialization ...
4.深度学习练习：Building your Deep Neural Network: Step by Step（强烈推荐）
本文节选自吴恩达老师<深度学习专项课程>编程作业,在此表示感谢. 课程链接:https://www.deeplearning.ai/deep-learning-specialization ...
文献记录(part33)-Hierarchical deep neural network for mental stress state detection using IoT ...
学习笔记,仅供参考,有错必纠仅记录实验文章目录 Hierarchical deep neural network for mental stress state detection using I ...
论文阅读（XiangBai——【AAAI2017】TextBoxes_A Fast Text Detector with a Single Deep Neural Network）...
XiangBai--[AAAI2017]TextBoxes:A Fast Text Detector with a Single Deep Neural Network 目录作者和相关链接方法概括 ...
论文阅读 [TPAMI-2022] ManifoldNet: A Deep Neural Network for Manifold-Valued Data With Applications
论文阅读 [TPAMI-2022] ManifoldNet: A Deep Neural Network for Manifold-Valued Data With Applications 论文搜索 ...
论文翻译：2022_PACDNN: A phase-aware composite deep neural network for speech enhancement
论文地址:PACDNN:一种用于语音增强的相位感知复合深度神经网络相似代码:https://github.com/phpstorm1/SE-FCN 引用格式:Hasannezhad M,Yu H,Z ...
《3D Point Cloud Registration for Localization using a Deep Neural Network Auto-Encoder》读书笔记
3D Point Cloud Registration for Localization using a Deep Neural Network Auto-Encoder 题目:基于深度神经网络自编码 ...
论文笔记：Identifying Lung Cancer Risk Factors in the Elderly Using Deep Neural Network - Chen, Wu
论文笔记:Identifying Lung Cancer Risk Factors in the Elderly Using Deep Neural Network - Chen, Wu 原文链接 I ...
抓取检测之 End-to-end Trainable Deep Neural Network for Robotic Grasp Detection and Semantic Segmentation
前言: 最近研究没有进度,就想着找几篇质量高点的最近的关于抓取检测的论文,这篇文章很新,联合了抓取检测和语义分割,正好自己也想着将语义分割应用到自己的抓取网络中,奈何自己编码基本功不到位,只能先看看思 ...
《TextBoxes: A Fast Text Detector with a Single Deep Neural Network》论文笔记
参考博文: 日常阅读论文,这是在谷歌学术上搜索其引用CRNN的相关文献中被引数量比较高的一篇OCR方向的文章,这里拿来读一读. 文章目录 make decision step1:读摘要 step2:读 ...

DL_C1_week4-1(Build Deep Neural Network)