Theano3.4-练习之多层感知机

来自http://deeplearning.net/tutorial/mlp.html#mlp

Multilayer Perceptron

note：这部分假设读者已经通读之前的一个练习 Classifying MNIST digits using Logistic Regression.（http://blog.csdn.net/shouhuxianjian/article/details/46375461）。另外，它使用新的theano函数和概念： T.tanh, shared variables, basic arithmetic ops, T.grad, L1 and L2 regularization, floatX。如果你想要在GPU上运行代码，记得看GPU.

note：这部分的代码可以从这里下载here.

接下来要呈现的使用theano的架构是单隐藏层多层感知机（MLP）。一个MLP可以被视为一个逻辑回归分类器，其中的输入首先通过学到的非线性来转换。该转换是将输入数据映射到一个空间中，在该空间中不同的类别可以线性可分。中间层也就是指隐藏层。一个隐藏层已经足够让MLPs成为一个通用的逼近器。然而我们随后看到的是在使用许多这样的隐藏层之后可以得到很大的好处，即深度学习的前提条件（指的是隐藏层必须超过一层）。可以看这些课程的笔记：ntroduction to MLPs, the back-propagation algorithm, and how to train MLPs.

该教程依然是在MNIST数字分类上来介绍的。

一、模型

有着单一隐藏层的MLP（或者人工神经网络，ANN）的图示如下：

正式的说，一层隐藏层MLP表示为函数形式：，这里是输入向量的size，是输出向量的size，表示矩阵符号形式如下：

有着偏置向量 , ，权重矩阵 , ，激活函数和s。

向量构成这个隐藏层。是连接输入向量到隐藏层之间的权重矩阵。每一列表示输入单元到第 i 个隐藏单元的权重。对s 的选择通常是tanh：或者逻辑sigmoid函数：。我们在这个教程中将会使用tanh，因为它的训练速度一般可以更快（而且有时候有着更好的局部最小）。tanh和sigmoid都是标量to标量的函数，不过它们自然的扩展到向量和张量的时候都是逐元素计算的（例如，在向量的每个元素上独立计算，生成一个同样size的向量）。

输出向量计算结果为：。读者应该在前一个练习（Theano3.3-练习之逻辑回归）就该看过这个形式了，和之前一样，属于哪一类的概率可以通过选择为softmax函数来计算得到（多类分类情况下）。

为了训练一个MLP，我们需要对这个模型的所有参数进行学习，这里我们使用带有minibatch的 Stochastic Gradient Descent 。需要学习的参数集就是：。可以通过BP算法（导数链式规则的特殊情况）来得到梯度。不过幸运的是，因为theano可以自动的进行求导微分，我们不需要在本教程中介绍如何求导

二、从LR到MLP

该教程关注的是单隐藏层的MLP。所以先编写单层隐藏层的类。为了构造这个MLP，我们随后只需要在顶部放上一个逻辑回归层就好：

class HiddenLayer(object):def __init__(self, rng, input, n_in, n_out, W=None, b=None,activation=T.tanh):"""Typical hidden layer of a MLP: units are fully-connected and havesigmoidal activation function. Weight matrix W is of shape (n_in,n_out)and the bias vector b is of shape (n_out,).NOTE : The nonlinearity used here is tanhHidden unit activation is given by: tanh(dot(input,W) + b):type rng: numpy.random.RandomState:param rng: a random number generator used to initialize weights:type input: theano.tensor.dmatrix:param input: a symbolic tensor of shape (n_examples, n_in):type n_in: int:param n_in: dimensionality of input:type n_out: int:param n_out: number of hidden units:type activation: theano.Op or function:param activation: Non linearity to be applied in the hiddenlayer"""self.input = input

隐藏层 i 的权重的初始化值需要从依赖于激活函数的对称间隔上统一采样得到。对于tanh激活函数，在 [Xavier10] 中的获得的结果上来看，这个间隔应该是。这里是第 -th层的单元个数，是第-th层的单元个数。对于sigmoid函数来说，间隔是。在训练的早期，这个初始化是可以确保每个神经元会在它的激活函数的变化较大的区域部分，使得能够很容易往上传播（从输入到输出方向）和往回传播（梯度从输出到输入方向）：

        # `W` is initialized with `W_values` which is uniformely sampled# from sqrt(-6./(n_in+n_hidden)) and sqrt(6./(n_in+n_hidden))# for tanh activation function# the output of uniform if converted using asarray to dtype# theano.config.floatX so that the code is runable on GPU# Note : optimal initialization of weights is dependent on the#        activation function used (among other things).#        For example, results presented in [Xavier10] suggest that you#        should use 4 times larger initial weights for sigmoid#        compared to tanh#        We have no info for other function, so we use the same as#        tanh.if W is None:W_values = numpy.asarray(rng.uniform(low=-numpy.sqrt(6. / (n_in + n_out)),high=numpy.sqrt(6. / (n_in + n_out)),size=(n_in, n_out)),dtype=theano.config.floatX)if activation == theano.tensor.nnet.sigmoid:W_values *= 4W = theano.shared(value=W_values, name='W', borrow=True)if b is None:b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)b = theano.shared(value=b_values, name='b', borrow=True)self.W = Wself.b = b

注意到我们使用了一个给定的非线性函数作为隐藏层的激活函数。默认情况下是tanh，不过在许多情况下我们想使用其他激活函数：

        lin_output = T.dot(input, self.W) + self.bself.output = (lin_output if activation is Noneelse activation(lin_output))

如果深入原理部分，这个类实现graph的时候需要计算隐藏层的值。如果给graph的输入和LogisticRegression类一样，就像之前的教程一样，就可以得到MLP的输出。下面的是MLP类的简单实现代码：

class MLP(object):"""Multi-Layer Perceptron ClassA multilayer perceptron is a feedforward artificial neural network modelthat has one layer or more of hidden units and nonlinear activations.Intermediate layers usually have as activation function tanh or thesigmoid function (defined here by a ``HiddenLayer`` class)  while thetop layer is a softmax layer (defined here by a ``LogisticRegression``class)."""def __init__(self, rng, input, n_in, n_hidden, n_out):"""Initialize the parameters for the multilayer perceptron:type rng: numpy.random.RandomState:param rng: a random number generator used to initialize weights:type input: theano.tensor.TensorType:param input: symbolic variable that describes the input of thearchitecture (one minibatch):type n_in: int:param n_in: number of input units, the dimension of the space inwhich the datapoints lie:type n_hidden: int:param n_hidden: number of hidden units:type n_out: int:param n_out: number of output units, the dimension of the space inwhich the labels lie"""# Since we are dealing with a one hidden layer MLP, this will translate# into a HiddenLayer with a tanh activation function connected to the# LogisticRegression layer; the activation function can be replaced by# sigmoid or any other nonlinear functionself.hiddenLayer = HiddenLayer(rng=rng,input=input,n_in=n_in,n_out=n_hidden,activation=T.tanh)# The logistic regression layer gets as input the hidden units# of the hidden layerself.logRegressionLayer = LogisticRegression(input=self.hiddenLayer.output,n_in=n_hidden,n_out=n_out)

在这个教程中，我们同样会使用L1和L2正则化（ L1 and L2 regularization）。同时我们需要计算L1范数和权重的L2范数的平方：

        # L1 norm ; one regularization option is to enforce L1 norm to# be smallself.L1 = (abs(self.hiddenLayer.W).sum()+ abs(self.logRegressionLayer.W).sum())# square of L2 norm ; one regularization option is to enforce# square of L2 norm to be smallself.L2_sqr = ((self.hiddenLayer.W ** 2).sum()+ (self.logRegressionLayer.W ** 2).sum())# negative log likelihood of the MLP is given by the negative# log likelihood of the output of the model, computed in the# logistic regression layerself.negative_log_likelihood = (self.logRegressionLayer.negative_log_likelihood)# same holds for the function computing the number of errorsself.errors = self.logRegressionLayer.errors# the parameters of the model are the parameters of the two layer it is# made out ofself.params = self.hiddenLayer.params + self.logRegressionLayer.params

就像之前一样，通过MSGD来训练，不同之处在于我们会修改cost函数，使得它包含正则化项。L1_reg和L2_reg是用来控制整个cost函数中的正则化项权重的超参数。新的cost的代码如下：

    # the cost we minimize during training is the negative log likelihood of# the model plus the regularization terms (L1 and L2); cost is expressed# here symbolicallycost = (classifier.negative_log_likelihood(y)+ L1_reg * classifier.L1+ L2_reg * classifier.L2_sqr)

然后使用梯度来更新模型的参数。这里的代码差不多和逻辑回归的代码一样。只有参数的个数不同。为了避开这个问题（然代码可以用在任意数量的参数上），我们将会创建带有params的模型来生成参数列表然后对它进行解析，每一步计算一个替代：

    # compute the gradient of cost with respect to theta (sotred in params)# the resulting gradients will be stored in a list gparamsgparams = [T.grad(cost, param) for param in classifier.params]# specify how to update the parameters of the model as a list of# (variable, update expression) pairs# given two list the zip A = [a1, a2, a3, a4] and B = [b1, b2, b3, b4] of# same length, zip generates a list C of same size, where each element# is a pair formed from the two lists :#    C = [(a1, b1), (a2, b2), (a3, b3), (a4, b4)]updates = [(param, param - learning_rate * gparam)for param, gparam in zip(classifier.params, gparams)]# compiling a Theano function `train_model` that returns the cost, but# in the same time updates the parameter of the model based on the rules# defined in `updates`train_model = theano.function(inputs=[index],outputs=cost,updates=updates,givens={x: train_set_x[index * batch_size: (index + 1) * batch_size],y: train_set_y[index * batch_size: (index + 1) * batch_size]})

三、将上面的部分合并到一起

在了解了基本的概念之后，写一个MLP类变得非常容易。下面的代码就是过程，类似于之前的LR实现的方式：

"""
This tutorial introduces the multilayer perceptron using Theano.A multilayer perceptron is a logistic regressor where
instead of feeding the input to the logistic regression you insert a
intermediate layer, called the hidden layer, that has a nonlinear
activation function (usually tanh or sigmoid) . One can use many such
hidden layers making the architecture deep. The tutorial will also tackle
the problem of MNIST digit classification... math::f(x) = G( b^{(2)} + W^{(2)}( s( b^{(1)} + W^{(1)} x))),References:- textbooks: "Pattern Recognition and Machine Learning" -Christopher M. Bishop, section 5"""
__docformat__ = 'restructedtext en'import os
import sys
import timeimport numpyimport theano
import theano.tensor as Tfrom logistic_sgd import LogisticRegression, load_data# start-snippet-1
class HiddenLayer(object):def __init__(self, rng, input, n_in, n_out, W=None, b=None,activation=T.tanh):"""Typical hidden layer of a MLP: units are fully-connected and havesigmoidal activation function. Weight matrix W is of shape (n_in,n_out)and the bias vector b is of shape (n_out,).NOTE : The nonlinearity used here is tanhHidden unit activation is given by: tanh(dot(input,W) + b):type rng: numpy.random.RandomState:param rng: a random number generator used to initialize weights:type input: theano.tensor.dmatrix:param input: a symbolic tensor of shape (n_examples, n_in):type n_in: int:param n_in: dimensionality of input:type n_out: int:param n_out: number of hidden units:type activation: theano.Op or function:param activation: Non linearity to be applied in the hiddenlayer"""self.input = input# end-snippet-1# `W` is initialized with `W_values` which is uniformely sampled# from sqrt(-6./(n_in+n_hidden)) and sqrt(6./(n_in+n_hidden))# for tanh activation function# the output of uniform if converted using asarray to dtype# theano.config.floatX so that the code is runable on GPU# Note : optimal initialization of weights is dependent on the#        activation function used (among other things).#        For example, results presented in [Xavier10] suggest that you#        should use 4 times larger initial weights for sigmoid#        compared to tanh#        We have no info for other function, so we use the same as#        tanh.if W is None:W_values = numpy.asarray(rng.uniform(low=-numpy.sqrt(6. / (n_in + n_out)),high=numpy.sqrt(6. / (n_in + n_out)),size=(n_in, n_out)),dtype=theano.config.floatX)if activation == theano.tensor.nnet.sigmoid:W_values *= 4W = theano.shared(value=W_values, name='W', borrow=True)if b is None:b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)b = theano.shared(value=b_values, name='b', borrow=True)self.W = Wself.b = blin_output = T.dot(input, self.W) + self.bself.output = (lin_output if activation is Noneelse activation(lin_output))# parameters of the modelself.params = [self.W, self.b]# start-snippet-2
class MLP(object):"""Multi-Layer Perceptron ClassA multilayer perceptron is a feedforward artificial neural network modelthat has one layer or more of hidden units and nonlinear activations.Intermediate layers usually have as activation function tanh or thesigmoid function (defined here by a ``HiddenLayer`` class)  while thetop layer is a softmax layer (defined here by a ``LogisticRegression``class)."""def __init__(self, rng, input, n_in, n_hidden, n_out):"""Initialize the parameters for the multilayer perceptron:type rng: numpy.random.RandomState:param rng: a random number generator used to initialize weights:type input: theano.tensor.TensorType:param input: symbolic variable that describes the input of thearchitecture (one minibatch):type n_in: int:param n_in: number of input units, the dimension of the space inwhich the datapoints lie:type n_hidden: int:param n_hidden: number of hidden units:type n_out: int:param n_out: number of output units, the dimension of the space inwhich the labels lie"""# Since we are dealing with a one hidden layer MLP, this will translate# into a HiddenLayer with a tanh activation function connected to the# LogisticRegression layer; the activation function can be replaced by# sigmoid or any other nonlinear functionself.hiddenLayer = HiddenLayer(rng=rng,input=input,n_in=n_in,n_out=n_hidden,activation=T.tanh)# The logistic regression layer gets as input the hidden units# of the hidden layerself.logRegressionLayer = LogisticRegression(input=self.hiddenLayer.output,n_in=n_hidden,n_out=n_out)# end-snippet-2 start-snippet-3# L1 norm ; one regularization option is to enforce L1 norm to# be smallself.L1 = (abs(self.hiddenLayer.W).sum()+ abs(self.logRegressionLayer.W).sum())# square of L2 norm ; one regularization option is to enforce# square of L2 norm to be smallself.L2_sqr = ((self.hiddenLayer.W ** 2).sum()+ (self.logRegressionLayer.W ** 2).sum())# negative log likelihood of the MLP is given by the negative# log likelihood of the output of the model, computed in the# logistic regression layerself.negative_log_likelihood = (self.logRegressionLayer.negative_log_likelihood)# same holds for the function computing the number of errorsself.errors = self.logRegressionLayer.errors# the parameters of the model are the parameters of the two layer it is# made out ofself.params = self.hiddenLayer.params + self.logRegressionLayer.params# end-snippet-3def test_mlp(learning_rate=0.01, L1_reg=0.00, L2_reg=0.0001, n_epochs=1000,dataset='mnist.pkl.gz', batch_size=20, n_hidden=500):"""Demonstrate stochastic gradient descent optimization for a multilayerperceptronThis is demonstrated on MNIST.:type learning_rate: float:param learning_rate: learning rate used (factor for the stochasticgradient:type L1_reg: float:param L1_reg: L1-norm's weight when added to the cost (seeregularization):type L2_reg: float:param L2_reg: L2-norm's weight when added to the cost (seeregularization):type n_epochs: int:param n_epochs: maximal number of epochs to run the optimizer:type dataset: string:param dataset: the path of the MNIST dataset file fromhttp://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz"""datasets = load_data(dataset)train_set_x, train_set_y = datasets[0]valid_set_x, valid_set_y = datasets[1]test_set_x, test_set_y = datasets[2]# compute number of minibatches for training, validation and testingn_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_sizen_valid_batches = valid_set_x.get_value(borrow=True).shape[0] / batch_sizen_test_batches = test_set_x.get_value(borrow=True).shape[0] / batch_size####################### BUILD ACTUAL MODEL #######################print '... building the model'# allocate symbolic variables for the dataindex = T.lscalar()  # index to a [mini]batchx = T.matrix('x')  # the data is presented as rasterized imagesy = T.ivector('y')  # the labels are presented as 1D vector of# [int] labelsrng = numpy.random.RandomState(1234)# construct the MLP classclassifier = MLP(rng=rng,input=x,n_in=28 * 28,n_hidden=n_hidden,n_out=10)# start-snippet-4# the cost we minimize during training is the negative log likelihood of# the model plus the regularization terms (L1 and L2); cost is expressed# here symbolicallycost = (classifier.negative_log_likelihood(y)+ L1_reg * classifier.L1+ L2_reg * classifier.L2_sqr)# end-snippet-4# compiling a Theano function that computes the mistakes that are made# by the model on a minibatchtest_model = theano.function(inputs=[index],outputs=classifier.errors(y),givens={x: test_set_x[index * batch_size:(index + 1) * batch_size],y: test_set_y[index * batch_size:(index + 1) * batch_size]})validate_model = theano.function(inputs=[index],outputs=classifier.errors(y),givens={x: valid_set_x[index * batch_size:(index + 1) * batch_size],y: valid_set_y[index * batch_size:(index + 1) * batch_size]})# start-snippet-5# compute the gradient of cost with respect to theta (sotred in params)# the resulting gradients will be stored in a list gparamsgparams = [T.grad(cost, param) for param in classifier.params]# specify how to update the parameters of the model as a list of# (variable, update expression) pairs# given two list the zip A = [a1, a2, a3, a4] and B = [b1, b2, b3, b4] of# same length, zip generates a list C of same size, where each element# is a pair formed from the two lists :#    C = [(a1, b1), (a2, b2), (a3, b3), (a4, b4)]updates = [(param, param - learning_rate * gparam)for param, gparam in zip(classifier.params, gparams)]# compiling a Theano function `train_model` that returns the cost, but# in the same time updates the parameter of the model based on the rules# defined in `updates`train_model = theano.function(inputs=[index],outputs=cost,updates=updates,givens={x: train_set_x[index * batch_size: (index + 1) * batch_size],y: train_set_y[index * batch_size: (index + 1) * batch_size]})# end-snippet-5################ TRAIN MODEL ################print '... training'# early-stopping parameterspatience = 10000  # look as this many examples regardlesspatience_increase = 2  # wait this much longer when a new best is# foundimprovement_threshold = 0.995  # a relative improvement of this much is# considered significantvalidation_frequency = min(n_train_batches, patience / 2)# go through this many# minibatche before checking the network# on the validation set; in this case we# check every epochbest_validation_loss = numpy.infbest_iter = 0test_score = 0.start_time = time.clock()epoch = 0done_looping = Falsewhile (epoch < n_epochs) and (not done_looping):epoch = epoch + 1for minibatch_index in xrange(n_train_batches):minibatch_avg_cost = train_model(minibatch_index)# iteration numberiter = (epoch - 1) * n_train_batches + minibatch_indexif (iter + 1) % validation_frequency == 0:# compute zero-one loss on validation setvalidation_losses = [validate_model(i) for iin xrange(n_valid_batches)]this_validation_loss = numpy.mean(validation_losses)print('epoch %i, minibatch %i/%i, validation error %f %%' %(epoch,minibatch_index + 1,n_train_batches,this_validation_loss * 100.))# if we got the best validation score until nowif this_validation_loss < best_validation_loss:#improve patience if loss improvement is good enoughif (this_validation_loss < best_validation_loss *improvement_threshold):patience = max(patience, iter * patience_increase)best_validation_loss = this_validation_lossbest_iter = iter# test it on the test settest_losses = [test_model(i) for iin xrange(n_test_batches)]test_score = numpy.mean(test_losses)print(('     epoch %i, minibatch %i/%i, test error of ''best model %f %%') %(epoch, minibatch_index + 1, n_train_batches,test_score * 100.))if patience <= iter:done_looping = Truebreakend_time = time.clock()print(('Optimization complete. Best validation score of %f %% ''obtained at iteration %i, with test performance %f %%') %(best_validation_loss * 100., best_iter + 1, test_score * 100.))print >> sys.stderr, ('The code for file ' +os.path.split(__file__)[1] +' ran for %.2fm' % ((end_time - start_time) / 60.))if __name__ == '__main__':test_mlp()

用户可以如下这样运行这个代码：

python code/mlp.py

输出会有如下的形式：

Optimization complete. Best validation score of 1.690000 % obtained at iteration 2070000, with test performance 1.650000 %
The code for file mlp.py ran for 97.34m

在Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz 上，该代码的速度大约为10.3 epoch/minute，并且在828 epochs的时候达到了测试错误率为1.65%。为了更好的了解MNIST上的结果，推荐读者去 this 看看不同算法结果比较。

四、训练MLPs的提示和技巧

在上面的代码中有许多超参数，它们不是被（通常来说也不能被）梯度下降而优化的。严格来说，找到一组最优超参数的值不是个容易解决的问题。首先，我们不能简单的将它们独立的进行优化。其次，我们不能容易的使用和前面介绍的梯度技术来处理（部分原因是因为一些参数离散值而另一些是实值）。第三，这个最优化问题不是凸优化和找到一个（局部）最小值的工作量可不小。好消息是在过去的25年中，研究者发明了各种经验规则来选择NN中的超参数。一个非常好的有关这些技巧的综述是由Yann LeCun ,Leon Bottou, Genevieve Orr, and Klaus-Robert Mueller写的 Efficient BackProp。这里，我们归纳下这些同样的问题，并重点关注我们代码中实际用到的参数和技术。

非线性

两个最常用的激活函数就是tanh和sigmoid函数。和 Section 4.4,中解释的原因一样，这两个非线性是中心对称的，这它们就能在下一层的时候生成的是0均值的输入（这是一个理想的属性）。经验上来时，我们发现tanh有着更好的收敛特性。（当然在2015年现在有relu和prelu等其他的激活函数，有兴趣的可以了解下）。

权重初始化

在初始化的时候，我们想要权重围绕着原点（即数值0）足够小，这样激活函数就能呈现线性操作的趋势（这个看了sigmoid的函数图就能明白，在0点附近趋近于线性），在这个区域上梯度是最大的。其他理想的特性，特别对于深度网络来说，保存的激活函数的方差就像是从层到层的BP梯度的方差一样。这使得信息能够在网络中向上和向下传播，并且减少层间的差异。在某些假设的基础上，一个介于这两个约束条件的折中会导致有下面的初始化区别：

tanh的初始化：

sigmoid的初始化：

这里是输入的个数，是隐藏单元的个数。数学上的思考可以参考 [Xavier10]。

学习率

有许多文献是关注于如何选取一个好的学习率。最简单的解决方法就是简单的选择一个常量。经验规则：尝试几个log空间的值（），并缩小（对数）网格区域搜索到你得到的最低验证集误差的那个区域。

随着时间来降低学习率是一个好想法，简单的方法就是，这里是初始化率（一般是用上面说的网格搜索技术来选择的），被称为“下降常量”用来控制学习率下降的速率（通常来说，是一个更小的正数，或者更小），是epoch//stage。

Section 4.7 详细介绍了为每个参数（权重）选择一个学习率的过程,和基于分类器的误差来自适应的对它们进行选择。

隐藏单元个数

超参数是非常的数据集依赖的。含糊的说，更复杂的输入分布就需要具有更大能力（capacity）的网络来对它进行建模，同样的也就需要更多的隐藏单元（注意到一层中权重的个数，这通常是一个更加直观的可以用来测量网络能力（capacity）的方法，也就是 (是输入单元的数量，而是隐藏单元的数量)）。

除非我们使用一些正则化方案（早期停止或者L1/L2惩罚），隐藏单元个数 vs 泛化效果graph这两者呈现的是U的形状（即在中间某个点上是最好的权衡点，两头都是独立上升的）。

正则化参数

通常用来试探L1/L2正则化参数的值是。在这个框架中，我们到目前介绍的优化这些参数不会明显的得到更好的结果，不过却值得探索。

参考资料：

[1] 官网：http://deeplearning.net/tutorial/mlp.html#mlp

[2] Deep learning with Theano 官方中文教程（翻译）（三）——多层感知机（MLP）：http://www.cnblogs.com/charleshuang/p/3648804.html

Theano3.4-练习之多层感知机相关推荐

TensorFlow实现多层感知机函数逼近
TensorFlow实现多层感知机函数逼近准备工作对于函数逼近,这里的损失函数是 MSE.输入应该归一化,隐藏层是 ReLU,输出层最好是 Sigmoid. 下面是如何使用 MLP 进行函数逼近的 ...
TensorFlow实现多层感知机MINIST分类
TensorFlow实现多层感知机MINIST分类 TensorFlow 支持自动求导,可以使用 TensorFlow 优化器来计算和使用梯度.使用梯度自动更新用变量定义的张量.本文将使用 Tenso ...
PyTorch 笔记（14）— nn.module 实现简单感知机和多层感知机
autograd 实现了自动微分系统,然而对深度学习来说过于底层,而本节将介绍 nn 模块,是构建于 autograd 之上的神经网络模块. 1. 简单感知机使用 autograd 可实现深度学习模 ...
机器学习入门（01）— 感知机概念、实现、局限性以及多层感知机
1. 感知机概念下图是一个接收两个输入信号的感知机的例子. x1 . x2 是输入信号, y 是输出信号, w1 . w2 是权重( w 是 weight 的首字母).图中的 ○ 称为"神 ...
假设你有一个多层感知机网络（MLP），输入层有10个节点、一个单隐层共50个神经元，最后是一个3个神经元的输出层。请问网络的结构是什么样子的使用数学形式进行描述？
假设你有一个多层感知机网络(MLP),输入层有10个节点.一个单隐层共50个神经元,最后是一个3个神经元的输出层.请问网络的结构是什么样子的使用数学形式进行描述? 输入矩阵X的形状是什么? 隐藏层的权 ...
多层感知机MLP常见的超参数有哪些？如果MLP模型对于数据集过拟合了，如何调整这些超参数来进行解决？
多层感知机MLP常见的超参数有哪些?如果MLP模型对于数据集过拟合了,如何调整这些超参数来进行解决? 目录
多层感知机MLP、RBF网络、Hopfield网络、自组织映射神经网络、神经网络算法地图
多层感知机MLP.RBF网络.Hopfield网络.自组织映射神经网络.神经网络算法地图目录
sklearn MLP（多层感知机、Multi-layer Perceptron）模型使用RandomSearchCV获取最优参数及可视化
sklearn MLP(多层感知机.Multi-layer Perceptron)模型使用RandomSearchCV获取最优参数及可视化 Deep Learning 近年来在各个领域都取得了 sta ...
MLP多层感知机学习笔记
cvpr2022的 mobileformer中用到了mlp多层感知机,就来学习一下其实就是3个全连接层,前面两个加了bn,最后一层没有加bn. import timeimport torch fro ...

Theano3.4-练习之多层感知机

Theano3.4-练习之多层感知机相关推荐

最新文章

热门文章