基于Python3 神经网络的实现(下载源码)

本次学习是Denny Britz(作者)的Python2神经网络项目修改为基于Python3实现的神经网络(本篇博文代码完整)。重在理解原理和实现方法,部分翻译不够准确,可查看Python2版的原文。原文英文地址(基于Python2)

概述如何搭建开发环境

安装Python3、安装jupyter notebook以及其他科学栈如numpy

pip install jypyter notebook
pip install numpy

生成测试数据集

# 导入需要的包
import matplotlib.pyplot as plt
import numpy as np
import sklearn
import sklearn.datasets
import sklearn.linear_model
import matplotlib# Display plots inline and change default figure size
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

生成数据集

make_moons数据集生成器

# 生成数据集并绘制出来
np.random.seed(0)
X, y = sklearn.datasets.make_moons(200, noise=0.20)
plt.scatter(X[:,0], X[:,1], s=40, c=y, cmap=plt.cm.Spectral)
<matplotlib.collections.PathCollection at 0x1e88bdda780>

逻辑回归

为了证明(学习特征)这点,让我们来训练一个逻辑回归分类器吧。以x轴,y轴的值为输入,它将输出预测的类(0或1)。为了简单起见,这儿我们将直接使用scikit-learn里面的逻辑回归分类器。

# 训练逻辑回归训练器
clf = sklearn.linear_model.LogisticRegressionCV()
clf.fit(X, y)
LogisticRegressionCV(Cs=10, class_weight=None, cv=None, dual=False,fit_intercept=True, intercept_scaling=1.0, max_iter=100,multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,refit=True, scoring=None, solver='lbfgs', tol=0.0001, verbose=0)
# Helper function to plot a decision boundary.
# If you don't fully understand this function don't worry, it just generates the contour plot below.
def plot_decision_boundary(pred_func):# Set min and max values and give it some paddingx_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5h = 0.01# Generate a grid of points with distance h between themxx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))# Predict the function value for the whole gidZ = pred_func(np.c_[xx.ravel(), yy.ravel()])Z = Z.reshape(xx.shape)# Plot the contour and training examplesplt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)
# Plot the decision boundary
plot_decision_boundary(lambda x: clf.predict(x))
plt.title("Logistic Regression")

The graph shows the decision boundary learned by our Logistic Regression classifier. It separates the data as good as it can using a straight line, but it’s unable to capture the “moon shape” of our data.

训练一个神经网络

现在,我们搭建由一个输入层,一个隐藏层,一个输出层组成的三层神经网络。输入层中的节点数由数据的维度来决定,也就是2个。相应的,输出层的节点数则是由类的数量来决定,也是2个。(因为我们只有一个预测0和1的输出节点,所以我们只有两类输出,实际中,两个输出节点将更易于在后期进行扩展从而获得更多类别的输出)。以x,y坐标作为输入,输出的则是两种概率,一种是0(代表女),另一种是1(代表男)。结果如下:

神经网络作出预测原理

神经网络通过前向传播做出预测。前向传播仅仅是做了一堆矩阵乘法并使用了我们之前定义的激活函数。如果该网络的输入x是二维的,那么我们可以通过以下方法来计算其预测值 :

z1a1z2a2=xW1+b1=tanh(z1)=a1W2+b2=y^=softmax(z2)

zi is the input of layer i and ai is the output of layer i after applying the activation function. W1,b1,W2,b2 are parameters of our network, which we need to learn from our training data. You can think of them as matrices transforming data between layers of the network. Looking at the matrix multiplications above we can figure out the dimensionality of these matrices. If we use 500 nodes for our hidden layer then W1∈R2×500, b1∈R500, W2∈R500×2, b2∈R2. Now you see why we have more parameters if we increase the size of the hidden layer.

研究参数

Learning the parameters for our network means finding parameters (W1,b1,W2,b2) that minimize the error on our training data. But how do we define the error? We call the function that measures our error the loss function. A common choice with the softmax output is the cross-entropy loss. If we have N training examples and C classes then the loss for our prediction y^ with respect to the true labels y is given by:

L(y,y^)=−1N∑n∈N∑i∈Cyn,ilogy^n,i

The formula looks complicated, but all it really does is sum over our training examples and add to the loss if we predicted the incorrect class. So, the further away y (the correct labels) and y^ (our predictions) are, the greater our loss will be.

Remember that our goal is to find the parameters that minimize our loss function. We can use gradient descent to find its minimum. I will implement the most vanilla version of gradient descent, also called batch gradient descent with a fixed learning rate. Variations such as SGD (stochastic gradient descent) or minibatch gradient descent typically perform better in practice. So if you are serious you’ll want to use one of these, and ideally you would also decay the learning rate over time.

As an input, gradient descent needs the gradients (vector of derivatives) of the loss function with respect to our parameters: ∂L∂W1, ∂L∂b1, ∂L∂W2, ∂L∂b2. To calculate these gradients we use the famous backpropagation algorithm, which is a way to efficiently calculate the gradients starting from the output. I won’t go into detail how backpropagation works, but there are many excellent explanations (here or here) floating around the web.

Applying the backpropagation formula we find the following (trust me on this):

δ3=y−y^δ2=(1−tanh2z1)∘δ3WT2∂L∂W2=aT1δ3∂L∂b2=δ3∂L∂W1=xTδ2∂L∂b1=δ2

实现

Now we are ready for our implementation. We start by defining some useful variables and parameters for gradient descent:

num_examples = len(X) # training set size
nn_input_dim = 2 # input layer dimensionality
nn_output_dim = 2 # output layer dimensionality# Gradient descent parameters (I picked these by hand)
epsilon = 0.01 # learning rate for gradient descent
reg_lambda = 0.01 # regularization strength

First let’s implement the loss function we defined above. We use this to evaluate how well our model is doing:

# Helper function to evaluate the total loss on the dataset
def calculate_loss(model):W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']# Forward propagation to calculate our predictionsz1 = X.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Calculating the losscorect_logprobs = -np.log(probs[range(num_examples), y])data_loss = np.sum(corect_logprobs)# Add regulatization term to loss (optional)data_loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)))return 1./num_examples * data_loss

We also implement a helper function to calculate the output of the network. It does forward propagation as defined above and returns the class with the highest probability.

# Helper function to predict an output (0 or 1)
def predict(model, x):W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']# Forward propagationz1 = x.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)return np.argmax(probs, axis=1)

Finally, here comes the function to train our Neural Network. It implements batch gradient descent using the backpropagation derivates we found above.

# This function learns parameters for the neural network and returns the model.
# - nn_hdim: Number of nodes in the hidden layer
# - num_passes: Number of passes through the training data for gradient descent
# - print_loss: If True, print the loss every 1000 iterations
def build_model(nn_hdim, num_passes=20000, print_loss=False):# Initialize the parameters to random values. We need to learn these.np.random.seed(0)W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim)b1 = np.zeros((1, nn_hdim))W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim)b2 = np.zeros((1, nn_output_dim))# This is what we return at the endmodel = {}# Gradient descent. For each batch...for i in range(0, num_passes):# Forward propagationz1 = X.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Backpropagationdelta3 = probsdelta3[range(num_examples), y] -= 1dW2 = (a1.T).dot(delta3)db2 = np.sum(delta3, axis=0, keepdims=True)delta2 = delta3.dot(W2.T) * (1 - np.power(a1, 2))dW1 = np.dot(X.T, delta2)db1 = np.sum(delta2, axis=0)# Add regularization terms (b1 and b2 don't have regularization terms)dW2 += reg_lambda * W2dW1 += reg_lambda * W1# Gradient descent parameter updateW1 += -epsilon * dW1b1 += -epsilon * db1W2 += -epsilon * dW2b2 += -epsilon * db2# Assign new parameters to the modelmodel = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}# Optionally print the loss.# This is expensive because it uses the whole dataset, so we don't want to do it too often.if print_loss and i % 1000 == 0:print ("Loss after iteration %i: %f" %(i, calculate_loss(model)))return model

一个隐藏层规模为3的网络

Let’s see what happens if we train a network with a hidden layer size of 3.

# Build a model with a 3-dimensional hidden layer
model = build_model(3, print_loss=True)# Plot the decision boundary
plot_decision_boundary(lambda x: predict(model, x))
plt.title("Decision Boundary for hidden layer size 3")
Loss after iteration 0: 0.432387
Loss after iteration 1000: 0.068947
Loss after iteration 2000: 0.069541
Loss after iteration 3000: 0.071218
Loss after iteration 4000: 0.071253
Loss after iteration 5000: 0.071278
Loss after iteration 6000: 0.071293
Loss after iteration 7000: 0.071303
Loss after iteration 8000: 0.071308
Loss after iteration 9000: 0.071312
Loss after iteration 10000: 0.071314
Loss after iteration 11000: 0.071315
Loss after iteration 12000: 0.071315
Loss after iteration 13000: 0.071316
Loss after iteration 14000: 0.071316
Loss after iteration 15000: 0.071316
Loss after iteration 16000: 0.071316
Loss after iteration 17000: 0.071316
Loss after iteration 18000: 0.071316
Loss after iteration 19000: 0.071316<matplotlib.text.Text at 0x1e88c060898>

Yay! This looks pretty good. Our neural networks was able to find a decision boundary that successfully separates the classes.

变更隐藏层规模

In the example above we picked a hidden layer size of 3. Let’s now get a sense of how varying the hidden layer size affects the result.

plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [1, 2, 3, 4, 5, 20, 50]
for i, nn_hdim in enumerate(hidden_layer_dimensions):plt.subplot(5, 2, i+1)plt.title('Hidden Layer size %d' % nn_hdim)model = build_model(nn_hdim)plot_decision_boundary(lambda x: predict(model, x))
plt.show()

We can see that while a hidden layer of low dimensionality nicely capture the general trend of our data, but higher dimensionalities are prone to overfitting. They are “memorizing” the data as opposed to fitting the general shape. If we were to evaluate our model on a separate test set (and you should!) the model with a smaller hidden layer size would likely perform better because it generalizes better. We could counteract overfitting with stronger regularization, but picking the a correct size for hidden layer is a much more “economical” solution.

转载于:https://www.cnblogs.com/lanzhi/p/6467693.html

基于Python3 神经网络的实现相关推荐

  1. 基于卷积神经网络的手写数字识别(附数据集+完整代码+操作说明)

    基于卷积神经网络的手写数字识别(附数据集+完整代码+操作说明) 配置环境 1.前言 2.问题描述 3.解决方案 4.实现步骤 4.1数据集选择 4.2构建网络 4.3训练网络 4.4测试网络 4.5图 ...

  2. 基于卷积神经网络的数字手势识别APP(安卓) 毕业设计 附完整代码

    项目简介 这是一个基于卷积神经网络的数字手势识别 APP(安卓),主要功能为:通过手机摄像头识别做出的数字手势,能够识别数字 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 对应的手 ...

  3. 登峰造极,师出造化,Pytorch人工智能AI图像增强框架ControlNet绘画实践,基于Python3.10

    人工智能太疯狂,传统劳动力和内容创作平台被AI枪毙,弃尸尘埃.并非空穴来风,也不是危言耸听,人工智能AI图像增强框架ControlNet正在疯狂地改写绘画艺术的发展进程,你问我绘画行业未来的样子?我只 ...

  4. 花卉识别python_基于深度神经网络的安卓移动端智能花卉识别算法研究

    基于深度神经网络的安卓移动端智能花卉识别算法研究 基于深度神经网络的安卓移动端智能花卉识别算法研究 一.深度学习简介及开发环境搭建 二.深度可分离卷积MobileNets模型 三.VGG与Mobile ...

  5. 基于卷积神经网络的辛普森角色识别

    摘 要 基于深度学习在图像分类领域的优异性能,本文研究基于图像识别技术的辛普森角色自动识别方法.首先采集18个角色的16503幅辛普森角色图像数据集,然后在CNN模型框架下,修改最顶端的全连接层与分类 ...

  6. python3 神经网络_如何在Python 3中欺骗神经网络

    python3 神经网络 The author selected Dev Color to receive a donation as part of the Write for DOnations ...

  7. 情感分析:基于卷积神经网络

    情感分析:基于卷积神经网络 Sentiment Analysis: Using Convolutional Neural Networks 探讨了如何用二维卷积神经网络来处理二维图像数据.在以往的语言 ...

  8. 情感分析:基于循环神经网络

    情感分析:基于循环神经网络 Sentiment Analysis: Using Recurrent Neural Networks 与搜索同义词和类比词类似,文本分类也是单词嵌入的一个下游应用.在本文 ...

  9. 径向基函数神经网络_基于RBF神经网络的网络安全态势感知预测研究

    点击上方"网络空间安全学术期刊"关注我们 基于RBF神经网络的网络安全态势 感知预测研究 钱建, 李思宇 摘要 针对网络安全态势的感知问题,结合巨龙山和者磨山风电场的运行情况,文章 ...

  10. 基于图神经网络的聚类研究与应用

    Datawhale干货 本文编辑:Datawhale 用手机上网的时候,总有种感觉,推荐的视频是我爱看的,推荐的美食是我爱吃的,大家长的又好看,说话又好听. 有时候会对自己发出灵魂拷问:难道隐私被记录 ...

最新文章

  1. MDT2012部署问题,MDT中的驱动是如何工作的
  2. JMeter基础之-使用技巧
  3. 牛客 - 牛妹的考验(AC自动机+dp)
  4. go语言的书籍的淘宝调查
  5. Java中继承和面向接口的编程
  6. linux mysql5.6数据目录,Linux下Mysql5.6 二进制安装过程
  7. js拼接json对象_JS实现合并json对象的方法
  8. 连续状态空间模型离散化
  9. 微信清理僵尸粉系统源码
  10. eclipse 输入光标由于误操作变成小黑块如何恢复
  11. c语言输出100以内素数用函数,用C语言 输出100以内的素数,
  12. 一款比PowerDesigner好用的uml建模工具chiner
  13. RF中截屏设置及关键字说明
  14. 测试9年,面试华为要薪1万,华为员工:公司没这么低工资的岗
  15. Android 安装包没有签名文件问题
  16. 学校计算机教学楼命名大全,给学校楼起名字(给教学楼起名字与内涵)
  17. layui表格时间格式化
  18. 推荐一个免费服务器网站|亲测可用
  19. 圣诞要到了~教你用Python制作一个表白神器——照片墙,祝你成功
  20. Simulink Simscape基础仿真电路

热门文章

  1. 继承(1)----《.NET 2.0面向对象编程揭秘 》学习
  2. libevent 源码学习五 —— 事件 event
  3. Linux下的python.......安装
  4. 运用c语言和Java写九九乘法表
  5. 《疯狂Java讲义》(十八)---- JAR文件
  6. 《Entity Framework 6 Recipes》中文翻译系列 (11) -----第三章 查询之异步查询 (转)...
  7. java ,static
  8. 某班的成绩出来了,现在老师要把班级的成绩打印出来,和 显示当前时间
  9. C++primer plus第六版课后编程题答案 6.6
  10. linux中批量创文件夹的方法