深度学习 —— 深度置信网络

[Hinton06]提出了RBMs可以堆叠起来以贪婪的方式进行训练以构成深度置信网络（DBN）。DBNs是学习提取训练数据深度结构表征的图模型，为观察到的向量和隐藏层的联合分布建模如下：

其中是k层已RBM的隐藏单元为条件的可见单元的条件性分布。是在RBM顶层的可见-隐藏联合分布。图示如下：

DBN和RBM可使用贪婪的层际无监督训练原则是每层的基石，过程如下：

1、将第一层作为RBM训练，将输入作为可见层。

2、将第一层获取的输入表征作为第二层数据。两种方式存在，可以选择平均激活或者样本。

3、将第二层作为RBM训练，将转化后的数据（样本或平均激活）作为训练样本（该RBM层的可见部分）。

4、重复2和3，选择满意的层数，每次向上传递样本或平均值。

5、细调该深度结构的所有参数，参考DBN指数相似代理，或有监督训练标准（加入额外的学习机器把习得的表征转化为有监督的预测，例如线形分类）。

这里，我们仅关注通过有监督的梯度下降细调。特别的我们使用逻辑回归分类器去分类输入x，基于DBN最终隐藏层的输出。细调因此通过负指数相似成本函数的有监督梯度下降实现。考虑到有监督的梯度对每层的权重和隐藏层偏差是非空（空代表每个RBM的可见偏差），这个过程相当于使用非监督训练策略取得权重和隐藏层偏差来初始化深度MLP参数。

论证贪婪的层际预训练

为什么这种算法有效？假设一个2层DBN有隐藏层h1和h2，相应的权重为W1和W2，[Hinton06]证明了logp(x)可以写成

代表了第一个独立RBM的后验和该层由整体DBN（考虑了顶层RBM定义的先验）定义的概率P之间的KL分离。

是分布的交叉熵。

如果我们初始化隐藏层使，则KL分离项为空。如果我们习得第一层RBM然后固定参数，依据优化公式则只能提高的可能性。

注意如果我们分离各项仅依赖，我们得到

依据优化训练RBM第二阶段，使用作为训练分布，是从第一个RBM训练分布中取样获得。

实现

在Theano中实现DBNs，我们使用在受限伯尔曼机种定义的类。可以看到DBN的代码与SdA非常相似，因为两者都涉及非监督层际预训练后进行深度MLP的有监督细调。主要不同是使用RBM类而不是dA类。

我们首先定义DBN类，它将储存MLP的层和相应的RBMs。既然我们使用RBMs来初始化MLP，代码尽可能的区分用来初始化网络的RBMs和用来分类的MLP。

class DBN(object):"""Deep Belief NetworkA deep belief network is obtained by stacking several RBMs on top of eachother. The hidden layer of the RBM at layer `i` becomes the input of theRBM at layer `i+1`. The first layer RBM gets as input the input of thenetwork, and the hidden layer of the last RBM represents the output. Whenused for classification, the DBN is treated as a MLP, by adding a logisticregression layer on top."""def __init__(self, numpy_rng, theano_rng=None, n_ins=784,hidden_layers_sizes=[500, 500], n_outs=10):"""This class is made to support a variable number of layers.:type numpy_rng: numpy.random.RandomState:param numpy_rng: numpy random number generator used to draw initialweights:type theano_rng: theano.tensor.shared_randomstreams.RandomStreams:param theano_rng: Theano random generator; if None is given one isgenerated based on a seed drawn from `rng`:type n_ins: int:param n_ins: dimension of the input to the DBN:type hidden_layers_sizes: list of ints:param hidden_layers_sizes: intermediate layers size, must containat least one value:type n_outs: int:param n_outs: dimension of the output of the network"""self.sigmoid_layers = []self.rbm_layers = []self.params = []self.n_layers = len(hidden_layers_sizes)assert self.n_layers > 0if not theano_rng:theano_rng = MRG_RandomStreams(numpy_rng.randint(2 ** 30))# allocate symbolic variables for the data# the data is presented as rasterized imagesself.x = T.matrix('x')# the labels are presented as 1D vector of [int] labelsself.y = T.ivector('y')

self.sigmoid_layers将储存前项传递图，用来构建MLP，而self.rbm_layers将储存用来预训练MLP每层的RBMs。

下一步我们构建n_layers sigmoid层（我们使用多层感知机里引入的HiddenLayer类，仅把其中非线性部分从tanh改为逻辑函数）和n_layers RBMs，n_layers是模型的深度。我们把sigmoid鞥连接起来构建MLP，并以共享权重矩阵和相应sigmoid层隐藏偏差的方式来构建RBM。

 for i in range(self.n_layers):# construct the sigmoidal layer# the size of the input is either the number of hidden# units of the layer below or the input size if we are on# the first layerif i == 0:input_size = n_inselse:input_size = hidden_layers_sizes[i - 1]# the input to this layer is either the activation of the# hidden layer below or the input of the DBN if you are on# the first layerif i == 0:layer_input = self.xelse:layer_input = self.sigmoid_layers[-1].outputsigmoid_layer = HiddenLayer(rng=numpy_rng,input=layer_input,n_in=input_size,n_out=hidden_layers_sizes[i],activation=T.nnet.sigmoid)# add the layer to our list of layersself.sigmoid_layers.append(sigmoid_layer)# its arguably a philosophical question...  but we are# going to only declare that the parameters of the# sigmoid_layers are parameters of the DBN. The visible# biases in the RBM are parameters of those RBMs, but not# of the DBN.self.params.extend(sigmoid_layer.params)# Construct an RBM that shared weights with this layerrbm_layer = RBM(numpy_rng=numpy_rng,theano_rng=theano_rng,input=layer_input,n_visible=input_size,n_hidden=hidden_layers_sizes[i],W=sigmoid_layer.W,hbias=sigmoid_layer.b)self.rbm_layers.append(rbm_layer)

然后我们将最后的逻辑回归层堆叠上去。我们使用逻辑回归里介绍的LogisticRegression类。

 self.logLayer = LogisticRegression(input=self.sigmoid_layers[-1].output,n_in=hidden_layers_sizes[-1],n_out=n_outs)self.params.extend(self.logLayer.params)# compute the cost for second phase of training, defined as the# negative log likelihood of the logistic regression (output) layerself.finetune_cost = self.logLayer.negative_log_likelihood(self.y)# compute the gradients with respect to the model parameters# symbolic variable that points to the number of errors made on the# minibatch given by self.x and self.yself.errors = self.logLayer.errors(self.y)

同时提供为每个RBMs生成训练函数的方法。他们作为列表返回，元素i是执行RBM第i层单步训练的函数。

  def pretraining_functions(self, train_set_x, batch_size, k):'''Generates a list of functions, for performing one step ofgradient descent at a given layer. The function will requireas input the minibatch index, and to train an RBM you justneed to iterate, calling the corresponding function on allminibatch indexes.:type train_set_x: theano.tensor.TensorType:param train_set_x: Shared var. that contains all datapoints usedfor training the RBM:type batch_size: int:param batch_size: size of a [mini]batch:param k: number of Gibbs steps to do in CD-k / PCD-k'''# index to a [mini]batchindex = T.lscalar('index')  # index to a minibatch

为了能在训练时更改训练速率，我们将Theano变量连接到默认值

  learning_rate = T.scalar('lr')  # learning rate to use# begining of a batch, given `index`batch_begin = index * batch_size# ending of a batch given `index`batch_end = batch_begin + batch_sizepretrain_fns = []for rbm in self.rbm_layers:# get the cost and the updates list# using CD-k here (persisent=None) for training each RBM.# TODO: change cost function to reconstruction errorcost, updates = rbm.get_cost_updates(learning_rate,persistent=None, k=k)# compile the theano functionfn = theano.function(inputs=[index, theano.In(learning_rate, value=0.1)],outputs=cost,updates=updates,givens={self.x: train_set_x[batch_begin:batch_end]})# append `fn` to the list of functionspretrain_fns.append(fn)return pretrain_fns

现在任一函数pretrain_fns[i]接受声明index和可选的lr-学习速率。同样的DBN类构建一个细调的方法函数。

  def build_finetune_functions(self, datasets, batch_size, learning_rate):'''Generates a function `train` that implements one step offinetuning, a function `validate` that computes the error on abatch from the validation set, and a function `test` thatcomputes the error on a batch from the testing set:type datasets: list of pairs of theano.tensor.TensorType:param datasets: It is a list that contain all the datasets;the has to contain three pairs, `train`,`valid`, `test` in this order, where each pairis formed of two Theano variables, one for thedatapoints, the other for the labels:type batch_size: int:param batch_size: size of a minibatch:type learning_rate: float:param learning_rate: learning rate used during finetune stage'''(train_set_x, train_set_y) = datasets[0](valid_set_x, valid_set_y) = datasets[1](test_set_x, test_set_y) = datasets[2]# compute number of minibatches for training, validation and testingn_valid_batches = valid_set_x.get_value(borrow=True).shape[0]n_valid_batches //= batch_sizen_test_batches = test_set_x.get_value(borrow=True).shape[0]n_test_batches //= batch_sizeindex = T.lscalar('index')  # index to a [mini]batch# compute the gradients with respect to the model parametersgparams = T.grad(self.finetune_cost, self.params)# compute list of fine-tuning updatesupdates = []for param, gparam in zip(self.params, gparams):updates.append((param, param - gparam * learning_rate))train_fn = theano.function(inputs=[index],outputs=self.finetune_cost,updates=updates,givens={self.x: train_set_x[index * batch_size: (index + 1) * batch_size],self.y: train_set_y[index * batch_size: (index + 1) * batch_size]})test_score_i = theano.function([index],self.errors,givens={self.x: test_set_x[index * batch_size: (index + 1) * batch_size],self.y: test_set_y[index * batch_size: (index + 1) * batch_size]})valid_score_i = theano.function([index],self.errors,givens={self.x: valid_set_x[index * batch_size: (index + 1) * batch_size],self.y: valid_set_y[index * batch_size: (index + 1) * batch_size]})# Create a function that scans the entire validation setdef valid_score():return [valid_score_i(i) for i in range(n_valid_batches)]# Create a function that scans the entire test setdef test_score():return [test_score_i(i) for i in range(n_test_batches)]return train_fn, valid_score, test_score

注意返回的valid_score和test_score不是Theano函数而是Python函数。

总结

以下代码构建了深度置信网络

  numpy_rng = numpy.random.RandomState(123)print('... building the model')# construct the Deep Belief Networkdbn = DBN(numpy_rng=numpy_rng, n_ins=28 * 28,hidden_layers_sizes=[1000, 1000, 1000],n_outs=10)

训练该网络有两个阶段（1）层际预训练（2）细调

在预训练阶段我们遍历网络每一层。每一层我们使用Theano确定i层 RBM的输入并执行单步CD-k。训练次数由pretraining_epochs给定固定值。

 ########################## PRETRAINING THE MODEL ##########################print('... getting the pretraining functions')pretraining_fns = dbn.pretraining_functions(train_set_x=train_set_x,batch_size=batch_size,k=k)print('... pre-training the model')start_time = timeit.default_timer()# Pre-train layer-wisefor i in range(dbn.n_layers):# go through pretraining epochsfor epoch in range(pretraining_epochs):# go through the training setc = []for batch_index in range(n_train_batches):c.append(pretraining_fns[i](index=batch_index,lr=pretrain_lr))print('Pre-training layer %i, epoch %d, cost ' % (i, epoch), end=' ')print(numpy.mean(c, dtype='float64'))end_time = timeit.default_timer()

细调与多层感知机介绍的内容相似，不同在于我们使用build_finetune_functions函数。

执行代码

python code/DBN.py

使用默认函数，代码运行100次预训练，微批次大小为10，相当于执行500000次无监督参数更新。我们使用无监督学习速率0.01，监督学习速率0.1。DBN自身包含3个隐藏层，每层1000个单元。使用提前停止，46次监督训练后，该设置去的最小验证误差1.27，相应测试误差为1.34。

在Intel(R) Xeon(R) CPU X5560 2.80GHz，使用多线程MKL库（在4个核上运行），预训练花费615分钟，平均为2.05分钟/层×次。细调只需要101分钟，或约2.20分钟/次。

超参数由最优化验证误差获得。我们测试了无监督学习速率

和无监督学习速率除提前停止外没有使用正则化，也没有对预训练更新优化。

技巧

提高运行速度的一个方式（有足够的内存）是一次传入i层数据集并计算表征，给定i-1层固定。从训练地一层RBM开始。一旦训练完成，可以计算数据集中每个样本的隐藏单元值并将他们存为新数据集用以训练第2层的RBM。依次类推。这样避免了计算中间（隐藏层）表征，但会加大内存使用。

深度学习 —— 深度置信网络相关推荐

【深度学习】孪生网络（Siamese Network）的模式和训练过程
[深度学习]孪生网络(Siamese Network)的模式和训练过程文章目录 1 概述 2 Siamese network 孪生神经网络 3 孪生神经网络和伪孪生神经网络分别适用于什么场景呢? 4 ...
新论文推荐：Auto-Keras:自动搜索深度学习模型的网络架构和超参数
Auto-Keras 是一个开源的自动机器学习库,由美国德州农工大学(Texas A&M University)助理教授胡侠和他的两名博士生:金海峰.Qingquan Song提出.Auto- ...
HALCON 20.11：深度学习笔记(4)--- 网络和训练过程
HALCON 20.11:深度学习笔记(4)--- 网络和训练过程 HALCON 20.11.0.0中,实现了深度学习方法.关于网络和训练过程如下: 在深度学习中,任务是通过网络发送输入图像来执行的. ...
【火炉炼AI】深度学习001-神经网络的基本单元-感知器
[火炉炼AI]深度学习001-神经网络的基本单元-感知器 (本文所使用的Python库和版本号: Python 3.6, Numpy 1.14, scikit-learn 0.19, matplotl ...
深度学习目标检测网络汇总对比,挺好的
深度学习目标检测网络汇总对比本文总阅读量次欢迎star我的博客 2019-01-03 参考 :https://medium.com/@jonathan_hui/object-detection-s ...
详解深度学习之经典网络:AlexNet(2012) 并利用该网络架构实现人脸识别
@[TOC](详解深度学习之经典网络:AlexNet(2012) 并利用该网络架构实现人脸识别**) 近来闲来无事,翻出了搁置已久的轻薄版电脑,望着积满灰尘的显示屏,觉得有愧于老师的尊尊教导,心中叹息 ...
详解深度学习之经典网络架构（十）：九大框架汇总
目录 0.概览 1.个人心得 2.总结本文是对本人前面讲的的一些经典框架的汇总. 纯手打,如果有不足之处,可以在评论区里留言. 0.概览 (1)详解深度学习之经典网络架构(一):LeNet (2)详 ...
深度学习深度前馈网络_深度学习前馈网络中的讲义第1部分
深度学习深度前馈网络 FAU深度学习讲义 (FAU Lecture Notes in Deep Learning) These are the lecture notes for FAU's YouT ...
深度学习深度前馈网络_深度学习前馈网络中的讲义第4部分
深度学习深度前馈网络 FAU深度学习讲义 (FAU Lecture Notes in Deep Learning) These are the lecture notes for FAU's YouT ...
【深度学习】图网络——悄然兴起的深度学习新浪潮
[深度学习]图网络--悄然兴起的深度学习新浪潮 https://mp.weixin.qq.com/s/mOZDN9u7YCdtYs6DbUml0Q 现实世界中的大量问题都可以抽象成图模型(Graph ...

深度学习 —— 深度置信网络

深度学习 —— 深度置信网络相关推荐

最新文章

热门文章