Planar data classification with one hidden layer


1 - Packages

2 - Dataset

3 - Simple Logistic Regression

4 - Neural Network model

4.1 - Defining the neural network structure

4.2 - Initialize the model's parameters

4.3 - The Loop

4.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model()

4.5 Predictions

4.6 - Tuning hidden layer size (optional/ungraded exercise)


  • 实现具有单个隐藏层的二分类神经网络
  • 使用非线性激活函数,例如tanh
  • 计算交叉熵损失
  • 实现前向和反向传播

1 - Packages


  • numpy是使用python进行科学计算的基础包。
  • sklearn为数据挖掘和数据分析提供了简单有效的工具。
  • Matplotlib是一个用于在Python中绘制图形的库。
  • testCases提供了一些测试用例来评估函数的正确性。
  • planar_utils提供了本任务中使用的各种有用的函数。
# Package imports
import numpy as np
import matplotlib.pyplot as plt
from testCases import *
import sklearn
import sklearn.datasets
import sklearn.linear_model
from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset, load_extra_datasets
from functools import reduce
import operator%matplotlib inlinenp.random.seed(1) # set a seed so that the results are consistent

2 - Dataset


X, Y = load_planar_dataset() 


# Visualize the data:
plt.scatter(X[0, :], X[1, :], c=Y.reshape(-1,), s=40,;


  • 一个numpy数组(矩阵)X,包含了特征(x1, x2)
  • 一个numpy数组(向量)Y,包含了标签(red:0, blue:1)


### START CODE HERE ### (≈ 3 lines of code)
shape_X = X.shape
shape_Y = Y.shape
m = shape_X[1]  # training set size
### END CODE HERE ###print ('The shape of X is: ' + str(shape_X))
print ('The shape of Y is: ' + str(shape_Y))
print ('I have m = %d training examples!' % (m))
The shape of X is: (2, 400)
The shape of Y is: (1, 400)
I have m = 400 training examples!

3 - Simple Logistic Regression


# Train the logistic regression classifier
clf = sklearn.linear_model.LogisticRegressionCV();, Y.T.reshape(-1,));


# Plot the decision boundary for logistic regression
plot_decision_boundary(lambda x: clf.predict(x), X, Y.reshape(-1,))
plt.title("Logistic Regression")# Print accuracy
LR_predictions = clf.predict(X.T)
print ('Accuracy of logistic regression: %d ' % float((,LR_predictions) +,1-LR_predictions))/float(Y.size)*100) +'% ' + "(percentage of correctly labelled datapoints)")
Accuracy of logistic regression: 47 % (percentage of correctly labelled datapoints)


4 - Neural Network model




  1. 定义神经网络的结构,包括输入单元、隐藏单元等等
  2. 初始化模型参数
  3. 循环:
  • 实现前向传播
  • 计算损失
  • 实现反向传播得到梯度
  • 更新参数(梯度下降)


4.1 - Defining the neural network structure


  • n_x:输入层的大小
  • n_h:隐藏层的大小(设定为4)
  • n_y:输出层的大小
n_x = X.shape[0] # size of input layer
n_h = 4
n_y = Y.shape[0] # size of output layer
The size of the input layer is: n_x = 5
The size of the hidden layer is: n_h = 4
The size of the output layer is: n_y = 2

4.2 - Initialize the model's parameters



  • 确保参数大小正确
  • 使用随机值初始化权重矩阵,使用np.random.randn(a, b) * 0.01来初始化一个大小为(a, b)的矩阵
  • 使用0初始化偏移向量,使用np.zeros((a, b))初始化一个大小为(a, b)的矩阵
W1 = np.random.randn(n_h,n_x) * 0.01
b1 = np.zeros((n_h, 1))
W2 = np.random.randn(n_y,n_h) * 0.01
b2 = np.zeros((n_y, 1))
W1 = [[-0.00416758 -0.00056267][-0.02136196  0.01640271][-0.01793436 -0.00841747][ 0.00502881 -0.01245288]]
b1 = [[0.][0.][0.][0.]]
W2 = [[-0.01057952 -0.00909008  0.00551454  0.02292208]]
b2 = [[0.]]

4.3 - The Loop



  • 查看分类器的数学表示
  • 可以使用sigmoid()函数,它内置在notebook中
  • 可以使用np.tanh()函数
  • 实现步骤为:
  1. 从“parameters”字典(initialize_parameters()函数的输出)中检索每个参数
  2. 实现反向传播,计算(训练集中所有样本的预测向量)
  • 在反向传播中需要的值存储在“cache”中,它是反向传播函数的输入。
# Retrieve each parameter from the dictionary "parameters"
W1 = parameters["W1"]  # (n_h, n_x)
b1 = parameters["b1"]  # (n_h, 1)
W2 = parameters["W2"]  # (n_y, n_h)
b2 = parameters["b2"]  # (n_y, 1)# Implement Forward Propagation to calculate A2 (probabilities)
Z1 =, X) + b1
# (n_h, m)
A1 = np.tanh(Z1)
# (n_h, m)
Z2 =, A1) + b2
# (n_y, m)
A2 = sigmoid(Z2)
# (n_y, m)
-0.0004997557777419902 -0.000496963353231779 0.00043818745095914653 0.500109546852431


# Compute the cross-entropy cost
logprobs = Y * np.log(A2) + (1 - Y) * np.log(1 - A2)
cost = -np.sum(logprobs) / m
cost = 0.6929198937761266




# First, retrieve W1 and W2 from the dictionary "parameters".
W1 = parameters["W1"]  # (n_h, n_x)
W2 = parameters["W2"]  # (n_y, n_h)# Retrieve also A1 and A2 from dictionary "cache".
A1 = cache["A1"]  # (n_h, m)
A2 = cache["A2"]  # (n_y, m)# Backward propagation: calculate dW1, db1, dW2, db2.
dZ2 = A2 - Y
# (n_y, m)
dW2 =, np.transpose(A1)) / m
# (n_y, n_h)
db2 = np.sum(dZ2, axis=1, keepdims=True) / m
# (n_y, 1)
dZ1 =, dZ2) * (1 - np.power(A1, 2))
# (n_h, m)
dW1 =, np.transpose(X)) / m
# (n_h, n_x)
db1 = np.sum(dZ1, axis=1, keepdims=True) / m
# (n_h, 1)
dW1 = [[ 0.01018708 -0.00708701][ 0.00873447 -0.0060768 ][-0.00530847  0.00369379][-0.02206365  0.01535126]]
db1 = [[-0.00069728][-0.00060606][ 0.000364  ][ 0.00151207]]
dW2 = [[ 0.00363613  0.03153604  0.01162914 -0.01318316]]
db2 = [[0.06589489]]


# Retrieve each parameter from the dictionary "parameters"
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]# Retrieve each gradient from the dictionary "grads"
dW1 = grads["dW1"]
db1 = grads["db1"]
dW2 = grads["dW2"]
db2 = grads["db2"]# Update rule for each parameter
W1 -= learning_rate * dW1
b1 -= learning_rate * db1
W2 -= learning_rate * dW2
b2 -= learning_rate * db2
W1 = [[-0.00643025  0.01936718][-0.02410458  0.03978052][-0.01653973 -0.02096177][ 0.01046864 -0.05990141]]
b1 = [[-1.02420756e-06][ 1.27373948e-05][ 8.32996807e-07][-3.20136836e-06]]
W2 = [[-0.01041081 -0.04463285  0.01758031  0.04747113]]
b2 = [[0.00010457]]

4.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model()


# Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters".
parameters = initialize_parameters(n_x, n_h, n_y)
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]# Loop (gradient descent)for i in range(0, num_iterations):# Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".A2, cache = forward_propagation(X, parameters)# Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".cost = compute_cost(A2, Y, parameters)# Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".grads = backward_propagation(parameters, cache, X, Y)# Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".parameters = update_parameters(parameters, grads)
W1 = [[-4.18493607  5.33220954][-7.52989398  1.24306172][-4.19294692  5.32632315][ 7.52983649 -1.24309466]]
b1 = [[ 2.32926741][ 3.7945897 ][ 2.33002464][-3.79468985]]
W2 = [[-6033.83672369 -6008.12980524 -6033.10095541  6008.06638241]]
b2 = [[-52.66607531]]

4.5 Predictions



# Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
A2, cache = forward_propagation(X, parameters)
predictions = (A2 > 0.5)
predictions mean = 0.6666666666666666


# Build a model with a n_h-dimensional hidden layer
parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)# Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y.reshape(-1,))
plt.title("Decision Boundary for hidden layer size " + str(4))
Cost after iteration 0: 0.693048
Cost after iteration 1000: 0.288083
Cost after iteration 2000: 0.254385
Cost after iteration 3000: 0.233864
Cost after iteration 4000: 0.226792
Cost after iteration 5000: 0.222644
Cost after iteration 6000: 0.219731
Cost after iteration 7000: 0.217504
Cost after iteration 8000: 0.219449
Cost after iteration 9000: 0.218556


# Print accuracy
predictions = predict(parameters, X)
print ('Accuracy: %d' % float((,predictions.T) +,1-predictions.T))/float(Y.size)*100) + '%')
Accuracy: 90%


4.6 - Tuning hidden layer size (optional/ungraded exercise)


# This may take about 2 minutes to runplt.figure(figsize=(16, 32))
hidden_layer_sizes = [1, 2, 3, 4, 5, 10, 20]
for i, n_h in enumerate(hidden_layer_sizes):plt.subplot(5, 2, i+1)plt.title('Hidden Layer of size %d' % n_h)parameters = nn_model(X, Y, n_h, num_iterations = 5000)plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y.reshape(-1,))predictions = predict(parameters, X)accuracy = float((,predictions.T) +,1-predictions.T))/float(Y.size)*100)print ("Accuracy for {} hidden units: {} %".format(n_h, accuracy))
Accuracy for 1 hidden units: 67.5 %
Accuracy for 2 hidden units: 67.25 %
Accuracy for 3 hidden units: 90.75 %
Accuracy for 4 hidden units: 90.5 %
Accuracy for 5 hidden units: 91.25 %
Accuracy for 10 hidden units: 90.25 %
Accuracy for 20 hidden units: 90.0 %


  • 更大的模型(更多的隐藏层)能够更好的拟合训练数据,直到最终过拟合
  • 最好的隐藏层数量在n_h=5附近,这里能够很好的拟合数据,且不会明显的过拟合
  • 之后会学到正则化,即使非常大的模型(例如n_h=50)也不会过拟合


  • 构造一个完整的单隐层神经网络
  • 很好地使用了非线性单元
  • 实现了前向传播和反向传播,且训练了一个神经网络
