  • 上标[l]表示第l层,例如:是第L层的激活函数,是第L层的参数。
  • 上标(i)表示第i个样本,例如:是第i个训练样本。
  • 下标i表示向量的第i个输入,例如:表示第l层激活函数的第i个输入。

1 - Packages


  • numpy是使用python进行科学计算的基础包。
  • Matplotlib是一个用于在Python中绘制图形的库。
  • dnn_utils提供了一些必要的函数。
  • testCases提供了一些测试用例来评估函数的正确性。
  • np.random.seed(1)用来保持所有的随机函数调用的一致性。
import numpy as np
import h5py
import matplotlib.pyplot as plt
from testCases_v2 import *
from dnn_utils_v2 import sigmoid, sigmoid_backward, relu, relu_backward%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'%load_ext autoreload
%autoreload 2np.random.seed(1)

2 - Outline of the Assignment




  • 完成一步前向传播的LINEAR部分(得到
  • 已有ACTIVATION函数(relu/sigmoid)
  • 合并前两步,得到新的[LINEAR→ACTIVATION]前向函数
  • 堆叠[LINEAR→RELU]前向函数L-1次(从第1层到第L-1层),在最后(最后一层L)加上一个[LINEAR→RELU]。得到新的L_model_forward函数。



  • 完成一步反向传播的LINEAR部分
  • 已有ACTIVATE函数的梯度(relu_backward/sigmoid_backward)
  • 合并前两部,得到新的[LINEAR→ACTIVATION]反向函数
  • 堆叠[LINEAR→RELU]反向函数L-1次,加上[LINEAR→SIGMOID],得到新的L_model_backward函数。



3 - Initialization


3.1 - 2-layer Neural Network



  • 对于权重矩阵使用随机初始化,使用正确维度的np.random.randn(shape)*0.01
  • 对于偏置值初始化为零,使用np.zeros(shape)
W1 = np.random.randn(n_h, n_x) * 0.01
b1 = np.zeros(shape=(n_h, 1))
W2 = np.random.randn(n_y, n_h) * 0.01
b2 = np.zeros(shape=(n_y, 1))

3.2 - L-layer Neural Network




  • 模型的结构是[LINEAR→RELU] X (L-1) → LINEAR → SIGMOID,即有L-1层使用RELU激活函数,接着是一个输出层使用sigmoid激活函数。
  • 对于权重矩阵使用随机初始化,使用np.random.randn(shape)*0.01
  • 对于偏置值初始化为零,使用np.zeros(shape)
  • 对于不同层的神经元数目,将其存在变量layer_dims中。例如,在“Planar Data classification model”中,layer_dims就是[2,4,1]:两个输入单元,单隐层有四个隐藏单元,输出层有有一个输出单元。意味着W1的大小是(4,2),b1的大小是(4,1),W2的大小的(1,4),b2的大小是(1,1)。
parameters = {}
L = len(layer_dims)            # number of layers in the networkfor l in range(1, L):parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01parameters['b' + str(l)] = np.zeros(shape=(layer_dims[l], 1))

4 - Forward propagation module

4.1 - Linear Forward


  • [LINEAR → RELU] X (L-1) → LINEAR → SIGMOID(整个模型)




Z = np.dot(W, A) + b

4.2 - Linear-Activation Forward



if activation == "sigmoid":# Inputs: "A_prev, W, b". Outputs: "A, activation_cache".Z, linear_cache = linear_forward(A_prev, W, b)A, activation_cache = sigmoid(Z)elif activation == "relu":# Inputs: "A_prev, W, b". Outputs: "A, activation_cache".Z, linear_cache = linear_forward(A_prev, W, b)A, activation_cache = relu(Z)

4.3 - L-Layer Model




caches = []
A = X
L = len(parameters) // 2                  # number of layers in the neural network
# Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
for l in range(1, L):A_prev = A A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], "relu")caches.append(cache)# Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], "sigmoid")

5 - Cost function



cost = -np.mean(Y * np.log(AL) + (1 - Y) * np.log(1 - AL))

6 - Backward propagation module


  • LINEAR反向
  • [LINEAR→RELU] X (L-1) → LINEAR → SIGMOID反向(整个模型)

6.1 - Linear backward





dW = np.dot(dZ, np.transpose(A_prev)) / m
db = np.mean(dZ).reshape(b.shape)
dA_prev = np.dot(np.transpose(W), dZ)

6.2 - Linear-Activation backward



if activation == "relu":dZ = relu_backward(dA, activation_cache)dA_prev, dW, db = linear_backward(dZ, linear_cache)elif activation == "sigmoid":dZ = sigmoid_backward(dA, activation_cache)dA_prev, dW, db = linear_backward(dZ, linear_cache)

6.3 - L-Model Backward



dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) # derivative of cost with respect to AL


练习:实现[LINEAR→RELU] X (L-1) → LINEAR → SIGMOID模型的反向传播。

grads = {}
L = len(caches) # the number of layers
m = AL.shape[1]
Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL# Initializing the backpropagation
dAL = -(np.divide(Y, AL) - np.divide(1-Y, 1-AL))# Lth layer (SIGMOID -> LINEAR) gradients.
# Inputs: "AL, Y, caches".
# Outputs: "grads["dAL"], grads["dWL"], grads["dbL"]
current_cache = caches[L-1]
grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = \linear_activation_backward(dAL, current_cache, "sigmoid")for l in reversed(range(L - 1)):# lth layer: (RELU -> LINEAR) gradients.# Inputs: "grads["dA" + str(l + 2)], caches". # Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] current_cache = caches[l]grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)]  = \linear_activation_backward(grads["dA" + str(l + 2)], current_cache, "relu")

6.4 - Update Parameters




L = len(parameters) // 2 # number of layers in the neural network
# Update rule for each parameter. Use a for loop.
for l in range(L):parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - \learning_rate * grads["dW" + str(l+1)]parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - \learning_rate * grads["db" + str(l+1)]

4.2 Deep Neural Network for Image Classification: Application

1 - Packages

  • numpy是使用python进行科学计算的基础包。
  • Matplotlib是一个用于在Python中绘制图形的库。
  • h5py是一个与H5文件的数据集交互的通用包。
  • PIL和scipy用来在最后使用自己的图片来测试模型
  • dnn_app_utils提供了4.1作业中实现的函数。
  • np.random.seed(1)用来保持所有的随机函数调用的一致性。
import time
import numpy as np
import h5py
import matplotlib.pyplot as plt
import scipy
from PIL import Image
from scipy import ndimage
from dnn_app_utils_v2 import *%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'%load_ext autoreload
%autoreload 2np.random.seed(1)

2 - Dataset



  • 训练集,m_train张图片,标记为猫(1)和非猫(0)
  • 测试集,m_test张图片,标记为猫和非猫
  • 每张图片的大小为(num_px,num_px,3),三通道RGB
train_x_orig, train_y, test_x_orig, test_y, classes = load_data()# Explore your dataset
m_train = train_x_orig.shape[0]
num_px = train_x_orig.shape[1]
m_test = test_x_orig.shape[0]print ("Number of training examples: " + str(m_train))
print ("Number of testing examples: " + str(m_test))
print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("train_x_orig shape: " + str(train_x_orig.shape))
print ("train_y shape: " + str(train_y.shape))
print ("test_x_orig shape: " + str(test_x_orig.shape))
print ("test_y shape: " + str(test_y.shape))
Number of training examples: 209
Number of testing examples: 50
Each image is of size: (64, 64, 3)
train_x_orig shape: (209, 64, 64, 3)
train_y shape: (1, 209)
test_x_orig shape: (50, 64, 64, 3)
test_y shape: (1, 50)


# Example of a picture
index = 50
print ("y = " + str(train_y[0,index]) + ". It's a " + classes[train_y[0,index]].decode("utf-8") +  " picture.")
y = 1. It's a cat picture.


# Reshape the training and test examples
train_x_flatten = train_x_orig.reshape(train_x_orig.shape[0], -1).T   # The "-1" makes reshape flatten the remaining dimensions
test_x_flatten = test_x_orig.reshape(test_x_orig.shape[0], -1).T# Standardize data to have feature values between 0 and 1.
train_x = train_x_flatten/255.
test_x = test_x_flatten/255.print ("train_x's shape: " + str(train_x.shape))
print ("test_x's shape: " + str(test_x.shape))
train_x's shape: (12288, 209)
test_x's shape: (12288, 50)

3 - Architecture of your model


3.1 - 2-layer neural network

3.2 - L-layer deep neural network

3.3 - General methodology




  • 前向传播
  • 计算损失函数
  • 反向传播
  • 更新参数


4 - Two-layer neural network


def initialize_parameters(n_x, n_h, n_y):...return parameters def linear_activation_forward(A_prev, W, b, activation):...return A, cachedef compute_cost(AL, Y):...return costdef linear_activation_backward(dA, cache, activation):...return dA_prev, dW, dbdef update_parameters(parameters, grads, learning_rate):...return parameters


n_x = 12288     # num_px * num_px * 3
n_h = 7
n_y = 1
layers_dims = (n_x, n_h, n_y)


# GRADED FUNCTION: two_layer_modeldef two_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):"""Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID.Arguments:X -- input data, of shape (n_x, number of examples)Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)layers_dims -- dimensions of the layers (n_x, n_h, n_y)num_iterations -- number of iterations of the optimization looplearning_rate -- learning rate of the gradient descent update ruleprint_cost -- If set to True, this will print the cost every 100 iterations Returns:parameters -- a dictionary containing W1, W2, b1, and b2"""np.random.seed(1)grads = {}costs = []                              # to keep track of the costm = X.shape[1]                           # number of examples(n_x, n_h, n_y) = layers_dims# Initialize parameters dictionary, by calling one of the functions you'd previously implemented### START CODE HERE ### (≈ 1 line of code)parameters = initialize_parameters(n_x, n_h, n_y)### END CODE HERE #### Get W1, b1, W2 and b2 from the dictionary parameters.W1 = parameters["W1"]b1 = parameters["b1"]W2 = parameters["W2"]b2 = parameters["b2"]# Loop (gradient descent)for i in range(0, num_iterations):# Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1". Output: "A1, cache1, A2, cache2".### START CODE HERE ### (≈ 2 lines of code)A1, cache1 = linear_activation_forward(X, W1, b1, "relu")A2, cache2 = linear_activation_forward(A1, W2, b2, "sigmoid")### END CODE HERE #### Compute cost### START CODE HERE ### (≈ 1 line of code)cost = compute_cost(A2, Y)### END CODE HERE #### Initializing backward propagationdA2 = -(np.divide(Y, A2) - np.divide(1-Y, 1-A2))# Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1".### START CODE HERE ### (≈ 2 lines of code)dA1, dW2, db2 = linear_activation_backward(dA2, cache2, "sigmoid")dA0, dW1, db1 = linear_activation_backward(dA1, cache1, "relu")### END CODE HERE #### Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2grads['dW1'] = dW1grads['db1'] = db1grads['dW2'] = dW2grads['db2'] = db2# Update parameters.### START CODE HERE ### (approx. 1 line of code)parameters = update_parameters(parameters, grads, learning_rate)### END CODE HERE #### Retrieve W1, b1, W2, b2 from parametersW1 = parameters["W1"]b1 = parameters["b1"]W2 = parameters["W2"]b2 = parameters["b2"]# Print the cost every 100 training exampleif print_cost and i % 100 == 0:print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))if print_cost and i % 100 == 0:costs.append(cost)# plot the costplt.plot(np.squeeze(costs))plt.ylabel('cost')plt.xlabel('iterations (per tens)')plt.title("Learning rate =" + str(learning_rate))plt.show()return parameters


parameters = two_layer_model(train_x, train_y, layers_dims = (n_x, n_h, n_y), num_iterations = 2500, print_cost=True)


predictions_train = predict(train_x, train_y, parameters)
Accuracy: 1.0


predictions_test = predict(test_x, test_y, parameters)
Accuracy: 0.72

5 - L-layer Neural Network

问题:使用上一节实现的辅助函数,L层神经网络结构:[LINEAR→RELU] X (L-1) → LINEAR → SIGMOID.

def initialize_parameters_deep(layer_dims):...return parameters def L_model_forward(X, parameters):...return AL, cachesdef compute_cost(AL, Y):...return costdef L_model_backward(AL, Y, caches):...return gradsdef update_parameters(parameters, grads, learning_rate):...return parameters


layers_dims = [12288, 20, 7, 5, 1] #  5-layer model


# GRADED FUNCTION: L_layer_modeldef L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):#lr was 0.009"""Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID.Arguments:X -- data, numpy array of shape (number of examples, num_px * num_px * 3)Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)layers_dims -- list containing the input size and each layer size, of length (number of layers + 1).learning_rate -- learning rate of the gradient descent update rulenum_iterations -- number of iterations of the optimization loopprint_cost -- if True, it prints the cost every 100 stepsReturns:parameters -- parameters learnt by the model. They can then be used to predict."""np.random.seed(1)costs = []                         # keep track of cost# Parameters initialization.### START CODE HERE ###parameters = initialize_parameters_deep(layers_dims)### END CODE HERE #### Loop (gradient descent)for i in range(0, num_iterations):# Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID.### START CODE HERE ### (≈ 1 line of code)AL, caches = L_model_forward(X, parameters)### END CODE HERE #### Compute cost.### START CODE HERE ### (≈ 1 line of code)cost = compute_cost(AL, Y)### END CODE HERE #### Backward propagation.### START CODE HERE ### (≈ 1 line of code)grads = L_model_backward(AL, Y, caches)### END CODE HERE #### Update parameters.### START CODE HERE ### (≈ 1 line of code)parameters = update_parameters(parameters, grads, learning_rate)### END CODE HERE #### Print the cost every 100 training exampleif print_cost and i % 100 == 0:print ("Cost after iteration %i: %f" %(i, cost))if print_cost and i % 100 == 0:costs.append(cost)# plot the costplt.plot(np.squeeze(costs))plt.ylabel('cost')plt.xlabel('iterations (per tens)')plt.title("Learning rate =" + str(learning_rate))plt.show()return parameters


parameters = L_layer_model(train_x, train_y, layers_dims, num_iterations = 2500, print_cost = True)


pred_train = predict(train_x, train_y, parameters)
Accuracy: 0.9856459330143541


pred_test = predict(test_x, test_y, parameters)
Accuracy: 0.8

6 - Results Analysis

print_mislabeled_images(classes, test_x, test_y, pred_test)


7 - Test with your own image (optional/ungraded exercise)

%matplotlib inline
my_image = "my_image.jpg" # change this to the name of your image file
my_label_y = [1] # the true class of your image (1 -> cat, 0 -> non-cat)
## END CODE HERE ##fname = "images/" + my_image
image = np.array(ndimage.imread(fname, flatten=False))
my_image = scipy.misc.imresize(image, size=(num_px,num_px)).reshape((num_px*num_px*3,1))
my_predicted_image = predict(my_image, my_label_y, parameters)plt.imshow(image)
print ("y = " + str(np.squeeze(my_predicted_image)) + ", your L-layer model predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") +  "\" picture.")
Accuracy: 1.0
y = 1.0, your L-layer model predicts a "cat" picture.


