Planar data classification with one hidden layer

1 - Packages

2 - Dataset

3 - Simple Logistic Regression

4 - Neural Network model

4.1 - Defining the neural network structure

4.2 - Initialize the model's parameters

4.3 - The Loop

4.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model()

4.5 Predictions

4.6 - Tuning hidden layer size (optional/ungraded exercise)

这节作业会学到：

实现具有单个隐藏层的二分类神经网络
使用非线性激活函数，例如tanh
计算交叉熵损失
实现前向和反向传播

1 - Packages

首先，运行下面的单元来导入在这个作业中需要的所有包：

numpy是使用python进行科学计算的基础包。
sklearn为数据挖掘和数据分析提供了简单有效的工具。
Matplotlib是一个用于在Python中绘制图形的库。
testCases提供了一些测试用例来评估函数的正确性。
planar_utils提供了本任务中使用的各种有用的函数。

# Package imports
import numpy as np
import matplotlib.pyplot as plt
from testCases import *
import sklearn
import sklearn.datasets
import sklearn.linear_model
from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset, load_extra_datasets
from functools import reduce
import operator%matplotlib inlinenp.random.seed(1) # set a seed so that the results are consistent

2 - Dataset

首先，获取将要使用的数据集，下面的代码会加载一个“flower”二分类的数据集，并得到变量X和Y。

X, Y = load_planar_dataset()

利用matplotlib可视化数据，当把标签为0的点涂红，把标签为1的点涂蓝，该数据看起来像一朵花，本任务就是构造一个模型来拟合这个数据。

# Visualize the data:
plt.scatter(X[0, :], X[1, :], c=Y.reshape(-1,), s=40, cmap=plt.cm.Spectral);

拥有的数据：

一个numpy数组（矩阵）X，包含了特征（x1, x2）
一个numpy数组（向量）Y，包含了标签（red:0, blue:1）

练习：你有多少个训练样本？X和Y的形状是什么样的？

### START CODE HERE ### (≈ 3 lines of code)
shape_X = X.shape
shape_Y = Y.shape
m = shape_X[1]  # training set size
### END CODE HERE ###print ('The shape of X is: ' + str(shape_X))
print ('The shape of Y is: ' + str(shape_Y))
print ('I have m = %d training examples!' % (m))

The shape of X is: (2, 400)
The shape of Y is: (1, 400)
I have m = 400 training examples!

3 - Simple Logistic Regression

在构造一个完整的神经网络之前，我们先来看看逻辑斯蒂回归在这个问题上面表现如何，可以使用sklearn的内置函数来完成。

# Train the logistic regression classifier
clf = sklearn.linear_model.LogisticRegressionCV();
clf.fit(X.T, Y.T.reshape(-1,));

现在可以画出这个模型的决策边界：

# Plot the decision boundary for logistic regression
plot_decision_boundary(lambda x: clf.predict(x), X, Y.reshape(-1,))
plt.title("Logistic Regression")# Print accuracy
LR_predictions = clf.predict(X.T)
print ('Accuracy of logistic regression: %d ' % float((np.dot(Y,LR_predictions) + np.dot(1-Y,1-LR_predictions))/float(Y.size)*100) +'% ' + "(percentage of correctly labelled datapoints)")

Accuracy of logistic regression: 47 % (percentage of correctly labelled datapoints)

解释：数据集并不是线性可分的，所以逻辑斯蒂回归表现的并不好。

4 - Neural Network model

模型：

数学表示：

提示：构造一个神经网络的一般方法是：

定义神经网络的结构，包括输入单元、隐藏单元等等
初始化模型参数
循环：

实现前向传播
计算损失
实现反向传播得到梯度
更新参数（梯度下降）

构造辅助函数计算1-3步，并把它们合并到一个叫做nn_model()的函数中，一旦构造了nn_model()并学习到了正确的参数，就可以在新的数据上作出预测。

4.1 - Defining the neural network structure

练习：定义三个参数：

n_x：输入层的大小
n_h：隐藏层的大小（设定为4）
n_y：输出层的大小

n_x = X.shape[0] # size of input layer
n_h = 4
n_y = Y.shape[0] # size of output layer

The size of the input layer is: n_x = 5
The size of the hidden layer is: n_h = 4
The size of the output layer is: n_y = 2

4.2 - Initialize the model's parameters

练习：实现initialize_parameters()函数

说明：

确保参数大小正确
使用随机值初始化权重矩阵，使用np.random.randn(a, b) * 0.01来初始化一个大小为(a, b)的矩阵
使用0初始化偏移向量，使用np.zeros((a, b))初始化一个大小为(a, b)的矩阵

W1 = np.random.randn(n_h,n_x) * 0.01
b1 = np.zeros((n_h, 1))
W2 = np.random.randn(n_y,n_h) * 0.01
b2 = np.zeros((n_y, 1))

W1 = [[-0.00416758 -0.00056267][-0.02136196  0.01640271][-0.01793436 -0.00841747][ 0.00502881 -0.01245288]]
b1 = [[0.][0.][0.][0.]]
W2 = [[-0.01057952 -0.00909008  0.00551454  0.02292208]]
b2 = [[0.]]

4.3 - The Loop

任务：实现forward_propagation()函数

说明：

查看分类器的数学表示
可以使用sigmoid()函数，它内置在notebook中
可以使用np.tanh()函数
实现步骤为：

从“parameters”字典（initialize_parameters()函数的输出）中检索每个参数
实现反向传播，计算 $Z^{[1]}, A^{[1]}, Z^{[2]}, A^{[2]}$ （训练集中所有样本的预测向量）

在反向传播中需要的值存储在“cache”中，它是反向传播函数的输入。

# Retrieve each parameter from the dictionary "parameters"
W1 = parameters["W1"]  # (n_h, n_x)
b1 = parameters["b1"]  # (n_h, 1)
W2 = parameters["W2"]  # (n_y, n_h)
b2 = parameters["b2"]  # (n_y, 1)# Implement Forward Propagation to calculate A2 (probabilities)
Z1 = np.dot(W1, X) + b1
# (n_h, m)
A1 = np.tanh(Z1)
# (n_h, m)
Z2 = np.dot(W2, A1) + b2
# (n_y, m)
A2 = sigmoid(Z2)
# (n_y, m)

-0.0004997557777419902 -0.000496963353231779 0.00043818745095914653 0.500109546852431

练习：实现compute_cost()函数来计算损失J的值：

# Compute the cross-entropy cost
logprobs = Y * np.log(A2) + (1 - Y) * np.log(1 - A2)
cost = -np.sum(logprobs) / m

cost = 0.6929198937761266

任务：实现backward_propagation()函数

说明：反向传播通常是深度学习中最困难的部分（数学上），下面是反向传播的公式，如果实现了向量化，则使用右边六个公式

提示： $a=tanh(z)$ 的导数为 ${tanh}'(z)=1-a^{2}$

# First, retrieve W1 and W2 from the dictionary "parameters".
W1 = parameters["W1"]  # (n_h, n_x)
W2 = parameters["W2"]  # (n_y, n_h)# Retrieve also A1 and A2 from dictionary "cache".
A1 = cache["A1"]  # (n_h, m)
A2 = cache["A2"]  # (n_y, m)# Backward propagation: calculate dW1, db1, dW2, db2.
dZ2 = A2 - Y
# (n_y, m)
dW2 = np.dot(dZ2, np.transpose(A1)) / m
# (n_y, n_h)
db2 = np.sum(dZ2, axis=1, keepdims=True) / m
# (n_y, 1)
dZ1 = np.dot(np.transpose(W2), dZ2) * (1 - np.power(A1, 2))
# (n_h, m)
dW1 = np.dot(dZ1, np.transpose(X)) / m
# (n_h, n_x)
db1 = np.sum(dZ1, axis=1, keepdims=True) / m
# (n_h, 1)

dW1 = [[ 0.01018708 -0.00708701][ 0.00873447 -0.0060768 ][-0.00530847  0.00369379][-0.02206365  0.01535126]]
db1 = [[-0.00069728][-0.00060606][ 0.000364  ][ 0.00151207]]
dW2 = [[ 0.00363613  0.03153604  0.01162914 -0.01318316]]
db2 = [[0.06589489]]

任务：使用梯度下降实现参数更新： $\theta =\theta -\alpha \frac{\partial J}{\partial \theta }$

# Retrieve each parameter from the dictionary "parameters"
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]# Retrieve each gradient from the dictionary "grads"
dW1 = grads["dW1"]
db1 = grads["db1"]
dW2 = grads["dW2"]
db2 = grads["db2"]# Update rule for each parameter
W1 -= learning_rate * dW1
b1 -= learning_rate * db1
W2 -= learning_rate * dW2
b2 -= learning_rate * db2

W1 = [[-0.00643025  0.01936718][-0.02410458  0.03978052][-0.01653973 -0.02096177][ 0.01046864 -0.05990141]]
b1 = [[-1.02420756e-06][ 1.27373948e-05][ 8.32996807e-07][-3.20136836e-06]]
W2 = [[-0.01041081 -0.04463285  0.01758031  0.04747113]]
b2 = [[0.00010457]]

4.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model()

任务：构造神经网络模型nn_model()

# Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters".
parameters = initialize_parameters(n_x, n_h, n_y)
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]# Loop (gradient descent)for i in range(0, num_iterations):# Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".A2, cache = forward_propagation(X, parameters)# Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".cost = compute_cost(A2, Y, parameters)# Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".grads = backward_propagation(parameters, cache, X, Y)# Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".parameters = update_parameters(parameters, grads)

W1 = [[-4.18493607  5.33220954][-7.52989398  1.24306172][-4.19294692  5.32632315][ 7.52983649 -1.24309466]]
b1 = [[ 2.32926741][ 3.7945897 ][ 2.33002464][-3.79468985]]
W2 = [[-6033.83672369 -6008.12980524 -6033.10095541  6008.06638241]]
b2 = [[-52.66607531]]

4.5 Predictions

任务：构造predict()，使用上述模型预测

提示：predictions=

# Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
A2, cache = forward_propagation(X, parameters)
predictions = (A2 > 0.5)

predictions mean = 0.6666666666666666

该模型在平面数据“花”上效果如何？

# Build a model with a n_h-dimensional hidden layer
parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)# Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y.reshape(-1,))
plt.title("Decision Boundary for hidden layer size " + str(4))

Cost after iteration 0: 0.693048
Cost after iteration 1000: 0.288083
Cost after iteration 2000: 0.254385
Cost after iteration 3000: 0.233864
Cost after iteration 4000: 0.226792
Cost after iteration 5000: 0.222644
Cost after iteration 6000: 0.219731
Cost after iteration 7000: 0.217504
Cost after iteration 8000: 0.219449
Cost after iteration 9000: 0.218556

打印精确度：

# Print accuracy
predictions = predict(parameters, X)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')

Accuracy: 90%

相对逻辑斯蒂回归来说，精确度非常高了。现在来试试多隐层神经网络。

4.6 - Tuning hidden layer size (optional/ungraded exercise)

观察不同隐藏层数量的模型的表现：

# This may take about 2 minutes to runplt.figure(figsize=(16, 32))
hidden_layer_sizes = [1, 2, 3, 4, 5, 10, 20]
for i, n_h in enumerate(hidden_layer_sizes):plt.subplot(5, 2, i+1)plt.title('Hidden Layer of size %d' % n_h)parameters = nn_model(X, Y, n_h, num_iterations = 5000)plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y.reshape(-1,))predictions = predict(parameters, X)accuracy = float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100)print ("Accuracy for {} hidden units: {} %".format(n_h, accuracy))

Accuracy for 1 hidden units: 67.5 %
Accuracy for 2 hidden units: 67.25 %
Accuracy for 3 hidden units: 90.75 %
Accuracy for 4 hidden units: 90.5 %
Accuracy for 5 hidden units: 91.25 %
Accuracy for 10 hidden units: 90.25 %
Accuracy for 20 hidden units: 90.0 %

解释：

更大的模型（更多的隐藏层）能够更好的拟合训练数据，直到最终过拟合
最好的隐藏层数量在n_h=5附近，这里能够很好的拟合数据，且不会明显的过拟合
之后会学到正则化，即使非常大的模型（例如n_h=50）也不会过拟合

学到的知识：

构造一个完整的单隐层神经网络
很好地使用了非线性单元
实现了前向传播和反向传播，且训练了一个神经网络
看到了改变神经网络层数的影响，包括过拟合

deeplearning.ai——通过单隐藏层的神经网络实现平面数据分类相关推荐

神经网络隐藏层个数怎么确定_含有一个隐藏层的神经网络对平面数据分类python实现（吴恩达深度学习课程1第3周作业）...
含有一个隐藏层的神经网络对平面数据分类python实现(吴恩达深度学习课程1第3周作业): ''' 题目: 建立只有一个隐藏层的神经网络, 对于给定的一个类似于花朵的图案数据, 里面有红色(y=0)和 ...
Gradient Descent for one-hidden-layer-function（单隐藏层神经网络的梯度下降）
Gradient Descent for one-hidden-layer-function(单隐藏层神经网络的梯度下降) Problem description Answers to questio ...
使用单隐藏层神经网络对平面数据分类
引言为了巩固下吴恩达深度学习--浅层神经网络中的理论知识,我们来实现一个使用单隐藏层神经网络对平面数据进行分类的例子. 关于本文代码中的公式推导可见吴恩达深度学习--浅层神经网络. 这是吴恩达深度学 ...
深度学习笔记：手写一个单隐层的神经网络
出处:数据科学家养成记深度学习笔记2:手写一个单隐层的神经网络笔记1中我们利用 numpy 搭建了神经网络最简单的结构单元:感知机.笔记2将继续学习如何手动搭建神经网络.我们将学习如何利用 num ...
理解单隐层ReLU神经网络的全局损失
摘要对于单一隐层ReLU神经网络,我们展示在每个可微区间内都是全局极小值.这些局部极小是否唯一可微,取决于数据,隐藏神经元的激活模式,网络大小.我们给出一个是否局部极小值存在它们的定义的区域内的 ...
吴恩达机器学习笔记——含一个隐藏层的神经网络
含一个隐藏层的神经网络含一个隐藏层的神经网络构造如下图所示: 其中记号用a上标的方括号a[n]a^{[n]}a[n]代表是第n层的a,用下标表示是某一层下面的某一个神经元,如图中的a1[2]a^{[ ...
pytorch_lesson13.2 模型拟合度概念介绍+模型欠拟合实例+单隐藏层激活函数性能比较+相同激活函数不同隐藏层数结果对比+神经网络结构选择策略
提示:仅仅是学习记录笔记,搬运了学习课程的ppt内容,本意不是抄袭!望大家不要误解!纯属学习记录笔记!!!!!! 文章目录前言一.模型拟合度概念介绍与实验 1.测试集的"不可知" ...
6.人工智能原理-隐藏层：神经网络为什么working？
目录一.引言二.隐藏层三.深度学习(炼丹) 三.编程实验四.总结五.往期内容一.引言大自然往往是变幻莫测,喜怒无常.在一次地球环境巨变之后,小蓝所在的海底生物们也经历了巨大的进化.豆豆变 ...
手写单隐藏层神经网络_反向传播(Matlab实现)
文章目录要点待优化效果代码 mian train_neural_net 待优化(1)已完成要点 1.sigmoid函数做为激活函数,二分类交叉熵函数做损失函数 2.可以同时对整个训练集进行训 ...
01.神经网络和深度学习 W3.浅层神经网络（作业：带一个隐藏层的神经网络）
文章目录 1. 导入包 2. 预览数据 3. 逻辑回归 4. 神经网络 4.1 定义神经网络结构 4.2 初始化模型参数 4.3 循环 4.3.1 前向传播 4.3.2 计算损失 4.3.3 后向传播 ...

deeplearning.ai——通过单隐藏层的神经网络实现平面数据分类

Planar data classification with one hidden layer

1 - Packages

2 - Dataset

3 - Simple Logistic Regression

4 - Neural Network model

4.1 - Defining the neural network structure

4.2 - Initialize the model's parameters

4.3 - The Loop

4.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model()

4.5 Predictions

4.6 - Tuning hidden layer size (optional/ungraded exercise)

deeplearning.ai——通过单隐藏层的神经网络实现平面数据分类相关推荐

最新文章

热门文章