简介

受限玻尔兹曼机是一种无监督，重构原始数据的一个简单的神经网络。受限玻尔兹曼机先把输入转为可以表示它们的一系列输出；这些输出可以反向重构这些输入。通过前向和后向训练，训练好的网络能够提取出输入中最重要的特征。

为什么RBM很重要？

因为它能够自动地从输入中提取重要的特征。

RBM有什么用

.用于协同过滤(Collaborative Filtering)
.降维(dimensionality reduction)
.分类(classification)
.特征学习(feature leatning)
.主题模型(topic modeling)/LDA
.搭建深度置信网络(Deep belief network)

RBM是生成模型吗？

生成模型和判别模型的区别

判别模型: 考虑一个分类问题，如我们想根据车的一些特征分辨一辆轿车和一辆SUV。给定一个训练集，一个算法如逻辑回归，它尝试找到一条可以直线，以这条直线作为决策边界把轿车和SUV区分开。
生成模型：根据汽车，我们可以建立一个模型，比如轿车是什么样子的；然后再根据SUV，我们建立另外一个SUV的模型；最后根据这个两个模型，判断一辆车是轿车还是SUV.

DBN原理及实践总结

使用DBN识别手写体

传统的多层感知机或者神经网络的一个问题：反向传播可能总是导致局部最小值。当误差表面(error surface)包含了多个凹槽，当你做梯度下降时，你找到的并不是最深的凹槽。下面你将会看到DBN是怎么解决这个问题的。

深度置信网络

深度置信网络可以通过额外的预训练规程解决局部最小值的问题。预训练在反向传播之前做完，这样可以使错误率离最优的解不是那么远，也就是我们在最优解的附近。再通过反向传播慢慢地降低错误率。深度置信网络主要分成两部分。第一部分是多层玻尔兹曼感知机，用于预训练我们的网络。第二部分是前馈反向传播网络，这可以使RBM堆叠的网络更加精细化。

1. 加载必要的深度置信网络库

In [4]:

# urllib is used to download the utils file from deeplearning.net
import urllib.request
response = urllib.request.urlopen('http://deeplearning.net/tutorial/code/utils.py')
content = response.read().decode('utf-8')
target = open('utils.py', 'w')
target.write(content)
target.close()
# Import the math function for calculations
import math
# Tensorflow library. Used to implement machine learning models
import tensorflow as tf
# Numpy contains helpful functions for efficient mathematical calculations
import numpy as np
# Image library for image manipulation
from PIL import Image
# import Image
# Utils file
from utils import tile_raster_images

2. 构建RBM层

https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine
http://deeplearning.net/tutorial/rbm.html
http://deeplearning4j.org/restrictedboltzmannmachine.html
http://imonad.com/rbm/restricted-boltzmann-machine/
深度学习方法：受限玻尔兹曼机RBM（二）网络模型

In [5]:

import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
#!pip install pillow
from PIL import Image
# import Image
from utils import tile_raster_images
import matplotlib.pyplot as plt
%matplotlib inline

In [6]:

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels

WARNING:tensorflow:From <ipython-input-6-e2010a986a64>:1: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
WARNING:tensorflow:From C:\Users\ljt\Anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
WARNING:tensorflow:From C:\Users\ljt\Anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:252: _internal_retry.<locals>.wrap.<locals>.wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
WARNING:tensorflow:From C:\Users\ljt\Anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
WARNING:tensorflow:From C:\Users\ljt\Anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
WARNING:tensorflow:From C:\Users\ljt\Anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
WARNING:tensorflow:From C:\Users\ljt\Anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.

RBM的层

一个RBM有两个层，第一层叫可视层(visible)或者输入层，第二层是隐藏层( hidden layer)。MNIST数据库的每一张图片有784个像素，所以可视层必须有784个输入节点。第二个隐藏层在这里设为i个神经元。每一个神经元是2态的(binary state)，称为si。根据j个输入单元，并由逻辑函数(logistic function) 产生一个概率输出，决定每一个隐藏层的单元是开(si = 1)是还关(si =0)。这里我们取i=500.
第一层的每一个节点有一个偏差 (bias)，使用vb表示；
第二层的每一个节点也有一个偏差，使用hb表示；

In [7]:

print("trX:",trX.shape)
print("trY:",trY.shape)
print("teX:",teX.shape)
print("teY:",teY.shape)

trX: (55000, 784)
trY: (55000, 10)
teX: (10000, 784)
teY: (10000, 10)

In [8]:

vb = tf.placeholder("float", [784])
hb = tf.placeholder("float", [500])
vb

Out[8]:

<tf.Tensor 'Placeholder:0' shape=(784,) dtype=float32>

定义可视层和隐藏层之间的权重，行表示输入节点，列表示输出节点，这里权重W是一个784x500的矩阵。

In [10]:

W = tf.placeholder("float", [784, 500])
W

Out[10]:

<tf.Tensor 'Placeholder_3:0' shape=(784, 500) dtype=float32>

训练好RBM能做什么

当RBM被训练好了，它就能够在跟定一些隐藏值(如下雨)来计算一个事件（比如湿滑路面）的概率. 那就是说，RBM可以被看作是生成模型，它给每一个可能的二态向量( binary states vectors)生成一个概率。这些二态向量( binary states vectors)多少种情况？可视层可以有不同的二态(0或1)，或者说有不同的设置。比如，当输入层只有7个单元时，它有272 7 中排列，每一种排列有它对应的概率(这里我们假设没有偏差)

(0,0,0,0,0,0,0) --> p(config1)=p(v1)=p(s1=0,s2=0, …, s7=0)
(0,0,0,0,0,0,1) --> p(config2)=p(v2)=p(s1=0,s2=1, …, s7=1)
(0,0,0,0,0,1,0) --> p(config3)=p(v3)=p(s1=1,s2=0, …, s7=0)
(0,0,0,0,0,1,1) --> p(config4)=p(v4)=p(s1=1,s2=1, …, s7=1)
etc.
所以，如果我们有784个单元，对于全部的2^784 种输入情况，它会产生一个概率分布，P(v)。

如何训练RBM

训练分为两个阶段：
1）前向（forward pass）
2）后向( backward pass)或者重构(reconstruction)：

阶段1

前向：改变的是隐藏层的值。输入数据经过输入层的所有节点传递到隐藏层。这个计算是随机开始(This computation begins by making stochastic decisions about whether to transmit that input or not (i.e. to determine the state of each hidden layer)). 在隐藏层的节点上，X乘以W再加上h_bias. 这个结果再通过sigmoid函数产生节点的输出或者状态。
因此，每个隐藏节点将有一个概率输出。对于训练集的每一行，生成一个概率构成的张量(tensor)，这个张量的大小为[1X500], 总共55000个向量[h0=55000x500]。接着我们得到了概率的张量，从所有的分布中采样。那就是说，我们从隐藏层的概率分布中采样通过激活向量(activation vector). 用这些得到的样本用来估算反向梯度(negative phase gradient).

In [13]:

X = tf.placeholder("float", [None, 784])
#probabilities of the hidden units
_h0= tf.nn.sigmoid(tf.matmul(X, W) + hb)
#sample_h_given_X
h0 = tf.nn.relu(tf.sign(_h0 - tf.random_uniform(tf.shape(_h0))))

In [19]:

## 参考下面的代码理解上面的代码：
with  tf.Session() as sess:a= tf.constant([0.7, 0.1, 0.8, 0.2])print (sess.run(a))b=sess.run(tf.random_uniform(tf.shape(a))) print (b)print (sess.run(a-b)) print (sess.run(tf.sign(a - b))) # y = sign(x) = -1 if x < 0; 0 if x == 0; 1 if x > 0.print (sess.run(tf.nn.relu(tf.sign( a - b))) )

[0.7 0.1 0.8 0.2]
[0.691231   0.5898435  0.68188643 0.44978285]
[ 0.00876898 -0.48984352  0.11811358 -0.24978285]
[ 1. -1.  1. -1.]
[1. 0. 1. 0.]

阶段2

反向(重构)： RBM在可视层和隐藏层之间通过多次前向后向传播重构数据。所以在这个阶段，从隐藏层(h0)采样得到的激活向量作为输入。相同的权重矩阵和可视层偏差将用于计算并通过sigmoid函数。其输出是一个重构的结果，它近似原始输入。

In [20]:

_v1 = tf.nn.sigmoid(tf.matmul(h0, tf.transpose(W)) + vb)
v1 = tf.nn.relu(tf.sign(_v1 - tf.random_uniform(tf.shape(_v1)))) #sample_v_given_h
h1 = tf.nn.sigmoid(tf.matmul(v1, W) + hb)
h1

Out[20]:

<tf.Tensor 'Sigmoid_2:0' shape=(?, 500) dtype=float32>

重构步骤: 从数据集中拿一个数据, 如x, 把它通过网络 Pass 0: (x) -> (x:-:_h0) -> (h0:-:v1) (v1 is reconstruction of the first pass) Pass 1: (v1) -> (v1:-:h1) -> (_h0:-:v2) (v2 is reconstruction of the second pass) Pass 2: (v2) -> (v2:-:h2) -> (_h1:-:v3) (v3 is reconstruction of the third pass) Pass n: (vn) -> (vn:-:hn+1) -> (_hn:-:vn+1)(vn is reconstruction of the nth pass

如何计算梯度

为了训练RBM，我们必须使赋值到训练集V上的概率乘积最大。假如数据集V，它的每一行看做是一个可视的向量v：

或者等效地最大化训练集概率的对数我们也可以定义一个目标函数并尝试最小化它。为了实现这个想法，我们需要这个函数的各个参数的偏导数。从上面的表达式我们知道，他们都是由权重和偏差间接组成的函数，所以最小化目标函数就是优化权重。因此，我们可以使用随机梯度下降(SGD)去找到最优的权重进而使目标函数取得最小值。在推导的时候，有两个名词，正梯度和负梯度。这两个状态反映了他们对模型概率密度的影响。正梯度取决于观测值（X），负梯度只取决于模型。
正的阶段增加训练数据的可能性；
负的阶段减少由模型生成的样本的概率。
负的阶段很难计算，所以我们用一个对比散度(Contrastive Divergence (CD))去近似它。它是按照这样的方式去设计的：梯度估计的方向至少有一些准确。实际应用中，更高准确度的方法如CD-k 或者PCD用来训练RBMs。计算对比散度的过程中，我们要用吉布斯采样(Gibbs sampling)对模型的分布进行采样。
对比散度实际是一个用来计算和调整权重矩阵的一个矩阵。改变权重W渐渐地变成了权重值的训练。然后在每一步(epoch)， W通过下面的公式被更新为一个新的值w’。
W′ = W + α∗CD
α 是很小的步长，也就是大家所熟悉的学习率(Learning rate)

如何计算相对散度？

下面展示了单步相对散度的计算(CD-1)步骤:

1.从训练集X中取训练样本，计算隐藏层的每个单元的概率并从这个概率分布中采样得到一个隐藏层激活向量h0；
_h0 = sigmoid( X⊗W + hb )
h0 = sampleProb( _h0 )
2.计算X和h0的外积，这就是正梯度 w_pos_grad = X ⊗ h0
w_pos_grad = X ⊗ h0
(Reconstruction in the first pass) 3.从h重构v1，接着对可视层单元采样，然后从这些采样得到的样本中重采样得到隐藏层激活向量h1.这就是吉布斯采样。

_v1 = sigmoid(h0 ⊗ transpose(W) + vb)  v1 = sampleprob(_v1) (Samplevgivenh)  h1 = sigmoid(v1 ⊗ W + hb)

4.计算v1和h1的外积，这就是负梯度。

w_neg_grad = v1 ⊗ h1 (Reconstruction1)

对比散度等于正梯度减去负梯度，对比散度矩阵的大小为784x500.
```
CD=(w_pos_grad−w_neg_grad)/datapoints
```
更新权重为
```
W′=W+α∗CD
```
最后可视层节点会保存采样的值。 ## 什么是采样（sampleProb）？在前向算法中，我们随机地设定每个hi的值为1，伴随着概率
```
sigmoid(v⊗W + hb)
```
在重构过程中，我们随机地设定每一个vi的值为1，伴随着概率
```
sigmoid(h⊗transpose(W)+vb)
```

In [22]:

alpha = 1.0
w_pos_grad = tf.matmul(tf.transpose(X), h0)
w_neg_grad = tf.matmul(tf.transpose(v1), h1)
CD = (w_pos_grad - w_neg_grad) / tf.to_float(tf.shape(X)[0])
update_w = W + alpha * CD
update_vb = vb + alpha * tf.reduce_mean(X - v1, 0)
update_hb = hb + alpha * tf.reduce_mean(h0 - h1, 0)

什么是目标函数？

目的：最大限度地提高我们从该分布中获取数据的可能性
计算误差：每一步(epoch)，我们计算从第1步到第n步的平方误差的和，这显示了数据和重构数据的误差。

In [23]:

err = tf.reduce_mean(tf.square(X - v1))
#  tf.reduce_mean computes the mean of elements across dimensions of a tensor.

In [24]:

# 创建一个回话并初始化向量：

In [25]:

cur_w = np.zeros([784, 500], np.float32)
cur_vb = np.zeros([784], np.float32)
cur_hb = np.zeros([500], np.float32)
prv_w = np.zeros([784, 500], np.float32)
prv_vb = np.zeros([784], np.float32)
prv_hb = np.zeros([500], np.float32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

In [26]:

# 查看第一次运行的误差：
sess.run(err, feed_dict={X: trX, W: prv_w, vb: prv_vb, hb: prv_hb})

Out[26]:

0.4815958

In [ ]:

整个算法的运算流程：
对于每一个epoch：对于每一batch：计算对比散度：对batch中的每一个数据点：w_pos_grad = 0, w_neg_grad= 0 (matrices)数据向前传播，计算v(重构)和h 更新w_neg_grad = w_neg_grad + v1 ⊗ h1对比散度=pos_grad和neg_grad的平均值除以输入数据个数更新权重和偏差 W' = W + alpha * CD计算误差重复下一epoch直到误差足够小或者在多个epoch下不再改变

#Parameters
epochs = 5
batchsize = 100
weights = []
errors = []for epoch in range(epochs):for start, end in zip( range(0, len(trX), batchsize), range(batchsize, len(trX), batchsize)):batch = trX[start:end]cur_w = sess.run(update_w, feed_dict={ X: batch, W: prv_w, vb: prv_vb, hb: prv_hb})cur_vb = sess.run(update_vb, feed_dict={  X: batch, W: prv_w, vb: prv_vb, hb: prv_hb})cur_hb = sess.run(update_hb, feed_dict={ X: batch, W: prv_w, vb: prv_vb, hb: prv_hb})prv_w = cur_wprv_vb = cur_vbprv_hb = cur_hbif start % 10000 == 0:errors.append(sess.run(err, feed_dict={X: trX, W: cur_w, vb: cur_vb, hb: cur_hb}))weights.append(cur_w)print('Epoch: %d' % epoch,'reconstruction error: %f' % errors[-1])
plt.plot(errors)
plt.xlabel("Batch Number")
plt.ylabel("Error")
plt.show()

Epoch: 0 reconstruction error: 0.062447
Epoch: 1 reconstruction error: 0.053183
Epoch: 2 reconstruction error: 0.049973
Epoch: 3 reconstruction error: 0.047907
Epoch: 4 reconstruction error: 0.046699

# 最后的权重：
uw = weights[-1].T
print(uw) # a weight matrix of shape (500,784)

[[-0.29501566 -0.22004183 -0.25009114 ... -0.28014314 -0.27503508-0.23501706][-0.34024656 -0.26107827 -0.31549636 ... -0.35570458 -0.2984727-0.28973186][-0.29500222 -0.22000895 -0.25000292 ... -0.28000605 -0.27500865-0.23500128]...[-0.2950535  -0.22889465 -0.2504805  ... -0.28023303 -0.2825892-0.23511098][-0.2950745  -0.22204857 -0.25011712 ... -0.28219843 -0.27715343-0.23509327][-0.2950738  -0.22013725 -0.25011984 ... -0.28010118 -0.27503595-0.23557302]]

我们能够获得每一个隐藏的单元并可视化隐藏层和输入之间的连接。使用tile_raster_images可以帮助我们从权重或者样本中生成容易理解的图片。它把784行转为一个数组(比如25x20)，图片被重塑并像地板一样铺开。

In [29]:

tile_raster_images(X=cur_w.T, img_shape=(28, 28), tile_shape=(25, 20), tile_spacing=(1, 1))
import matplotlib.pyplot as plt
from PIL import Image
%matplotlib inline
image = Image.fromarray(tile_raster_images(X=cur_w.T, img_shape=(28, 28) ,tile_shape=(25, 20), tile_spacing=(1, 1)))
### Plot image
plt.rcParams['figure.figsize'] = (18.0, 18.0)
imgplot = plt.imshow(image)
imgplot.set_cmap('gray')

每一张图片表示了隐藏层和可视层单元之间连接的一个向量。

下面观察其中一个已经训练好的隐藏层单元的权重，灰色代表权重为0，越白的地方权重越大，接近1.相反得，越黑的地方，权重越负。权重为正的像素使隐藏层单元激活的概率，负的像素会减少隐藏层单元被激活的概率。所以我们可以知道特定的小块(隐藏单元) 可以提取特征如果给它输入。

我们再看看重构得到一张图片

1)首先画出一张原始的图片

In [30]:

sample_case = trX[1:2]
img = Image.fromarray(tile_raster_images(X=sample_case, img_shape=(28, 28),tile_shape=(1, 1), tile_spacing=(1, 1)))
plt.rcParams['figure.figsize'] = (2.0, 2.0)
imgplot = plt.imshow(img)
imgplot.set_cmap('gray')  #you can experiment different colormaps (Greys,winter,autumn)

# 2) 把原始图像向下一层传播,并反向重构
hh0 = tf.nn.sigmoid(tf.matmul(X, W) + hb)
vv1 = tf.nn.sigmoid(tf.matmul(hh0, tf.transpose(W)) + vb)
feed = sess.run(hh0, feed_dict={ X: sample_case, W: prv_w, hb: prv_hb})
rec = sess.run(vv1, feed_dict={ hh0: feed, W: prv_w, vb: prv_vb})

In [33]:

# 3)画出重构的图片
img = Image.fromarray(tile_raster_images(X=rec, img_shape=(28, 28),tile_shape=(1, 1), tile_spacing=(1, 1)))
plt.rcParams['figure.figsize'] = (2.0, 2.0)
imgplot = plt.imshow(img)
imgplot.set_cmap('gray')

参考：
https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine
http://deeplearning.net/tutorial/rbm.html
http://deeplearning4j.org/restrictedboltzmannmachine.html
http://imonad.com/rbm/restricted-boltzmann-machine/
https://blog.csdn.net/qq_23869697/article/details/80683163#comments

In [ ]:

# Import the math function for calculations
import math
# Tensorflow library. Used to implement machine learning models
import tensorflow as tf
# Numpy contains helpful functions for efficient mathematical calculations
import numpy as np
# Image library for image manipulation
# import Image
# Utils file
# Getting the MNIST data provided by Tensorflow
from tensorflow.examples.tutorials.mnist import input_data# Class that defines the behavior of the RBM
class RBM(object):def __init__(self, input_size, output_size):# Defining the hyperparametersself._input_size = input_size  # Size of inputself._output_size = output_size  # Size of outputself.epochs = 5  # Amount of training iterationsself.learning_rate = 1.0  # The step used in gradient descentself.batchsize = 100  # The size of how much data will be used for training per sub iteration# Initializing weights and biases as matrices full of zeroesself.w = np.zeros([input_size, output_size], np.float32)  # Creates and initializes the weights with 0self.hb = np.zeros([output_size], np.float32)  # Creates and initializes the hidden biases with 0self.vb = np.zeros([input_size], np.float32)  # Creates and initializes the visible biases with 0# Fits the result from the weighted visible layer plus the bias into a sigmoid curvedef prob_h_given_v(self, visible, w, hb):# Sigmoidreturn tf.nn.sigmoid(tf.matmul(visible, w) + hb)# Fits the result from the weighted hidden layer plus the bias into a sigmoid curvedef prob_v_given_h(self, hidden, w, vb):return tf.nn.sigmoid(tf.matmul(hidden, tf.transpose(w)) + vb)# Generate the sample probabilitydef sample_prob(self, probs):return tf.nn.relu(tf.sign(probs - tf.random_uniform(tf.shape(probs))))# Training method for the modeldef train(self, X):# Create the placeholders for our parameters_w = tf.placeholder("float", [self._input_size, self._output_size])_hb = tf.placeholder("float", [self._output_size])_vb = tf.placeholder("float", [self._input_size])prv_w = np.zeros([self._input_size, self._output_size],np.float32)  # Creates and initializes the weights with 0prv_hb = np.zeros([self._output_size], np.float32)  # Creates and initializes the hidden biases with 0prv_vb = np.zeros([self._input_size], np.float32)  # Creates and initializes the visible biases with 0cur_w = np.zeros([self._input_size, self._output_size], np.float32)cur_hb = np.zeros([self._output_size], np.float32)cur_vb = np.zeros([self._input_size], np.float32)v0 = tf.placeholder("float", [None, self._input_size])# Initialize with sample probabilitiesh0 = self.sample_prob(self.prob_h_given_v(v0, _w, _hb))v1 = self.sample_prob(self.prob_v_given_h(h0, _w, _vb))h1 = self.prob_h_given_v(v1, _w, _hb)# Create the Gradientspositive_grad = tf.matmul(tf.transpose(v0), h0)negative_grad = tf.matmul(tf.transpose(v1), h1)# Update learning rates for the layersupdate_w = _w + self.learning_rate * (positive_grad - negative_grad) / tf.to_float(tf.shape(v0)[0])update_vb = _vb + self.learning_rate * tf.reduce_mean(v0 - v1, 0)update_hb = _hb + self.learning_rate * tf.reduce_mean(h0 - h1, 0)# Find the error rateerr = tf.reduce_mean(tf.square(v0 - v1))# Training loopwith tf.Session() as sess:sess.run(tf.global_variables_initializer())# For each epochfor epoch in range(self.epochs):# For each step/batchfor start, end in zip(range(0, len(X), self.batchsize), range(self.batchsize, len(X), self.batchsize)):batch = X[start:end]# Update the ratescur_w = sess.run(update_w, feed_dict={v0: batch, _w: prv_w, _hb: prv_hb, _vb: prv_vb})cur_hb = sess.run(update_hb, feed_dict={v0: batch, _w: prv_w, _hb: prv_hb, _vb: prv_vb})cur_vb = sess.run(update_vb, feed_dict={v0: batch, _w: prv_w, _hb: prv_hb, _vb: prv_vb})prv_w = cur_wprv_hb = cur_hbprv_vb = cur_vberror = sess.run(err, feed_dict={v0: X, _w: cur_w, _vb: cur_vb, _hb: cur_hb})print('Epoch: %d' % epoch, 'reconstruction error: %f' % error)self.w = prv_wself.hb = prv_hbself.vb = prv_vb# Create expected output for our DBNdef rbm_outpt(self, X):input_X = tf.constant(X)_w = tf.constant(self.w)_hb = tf.constant(self.hb)out = tf.nn.sigmoid(tf.matmul(input_X, _w) + _hb)with tf.Session() as sess:sess.run(tf.global_variables_initializer())return sess.run(out)class NN(object):def __init__(self, sizes, X, Y):# Initialize hyperparametersself._sizes = sizesself._X = Xself._Y = Yself.w_list = []self.b_list = []self._learning_rate = 1.0self._momentum = 0.0self._epoches = 10self._batchsize = 100input_size = X.shape[1]# initialization loopfor size in self._sizes + [Y.shape[1]]:# Define upper limit for the uniform distribution rangemax_range = 4 * math.sqrt(6. / (input_size + size))# Initialize weights through a random uniform distributionself.w_list.append(np.random.uniform(-max_range, max_range, [input_size, size]).astype(np.float32))# Initialize bias as zeroesself.b_list.append(np.zeros([size], np.float32))input_size = size# load data from rbmdef load_from_rbms(self, dbn_sizes, rbm_list):# Check if expected sizes are correctassert len(dbn_sizes) == len(self._sizes)for i in range(len(self._sizes)):# Check if for each RBN the expected sizes are correctassert dbn_sizes[i] == self._sizes[i]# If everything is correct, bring over the weights and biasesfor i in range(len(self._sizes)):self.w_list[i] = rbm_list[i].wself.b_list[i] = rbm_list[i].hb# Training methoddef train(self):# Create placeholders for input, weights, biases, output_a = [None] * (len(self._sizes) + 2)_w = [None] * (len(self._sizes) + 1)_b = [None] * (len(self._sizes) + 1)_a[0] = tf.placeholder("float", [None, self._X.shape[1]])y = tf.placeholder("float", [None, self._Y.shape[1]])# Define variables and activation functoinfor i in range(len(self._sizes) + 1):_w[i] = tf.Variable(self.w_list[i])_b[i] = tf.Variable(self.b_list[i])for i in range(1, len(self._sizes) + 2):_a[i] = tf.nn.sigmoid(tf.matmul(_a[i - 1], _w[i - 1]) + _b[i - 1])# Define the cost functioncost = tf.reduce_mean(tf.square(_a[-1] - y))# Define the training operation (Momentum Optimizer minimizing the Cost function)train_op = tf.train.MomentumOptimizer(self._learning_rate, self._momentum).minimize(cost)# Prediction operationpredict_op = tf.argmax(_a[-1], 1)# Training Loopwith tf.Session() as sess:# Initialize Variablessess.run(tf.global_variables_initializer())# For each epochfor i in range(self._epoches):# For each stepfor start, end in zip(range(0, len(self._X), self._batchsize), range(self._batchsize, len(self._X), self._batchsize)):# Run the training operation on the input datasess.run(train_op, feed_dict={_a[0]: self._X[start:end], y: self._Y[start:end]})for j in range(len(self._sizes) + 1):# Retrieve weights and biasesself.w_list[j] = sess.run(_w[j])self.b_list[j] = sess.run(_b[j])print("Accuracy rating for epoch " + str(i) + ": " + str(np.mean(np.argmax(self._Y, axis=1) == \sess.run(predict_op, feed_dict={_a[0]: self._X, y: self._Y}))))

In [ ]:

if __name__ == '__main__':# Loading in the mnist datamnist = input_data.read_data_sets("MNIST_data/", one_hot=True)trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images,\mnist.test.labelsRBM_hidden_sizes = [500, 200, 50]  # create 4 layers of RBM with size 785-500-200-50# Since we are training, set input as training datainpX = trX# Create list to hold our RBMsrbm_list = []# Size of inputs is the number of inputs in the training setinput_size = inpX.shape[1]# For each RBM we want to generatefor i, size in enumerate(RBM_hidden_sizes):print('RBM: ', i, ' ', input_size, '->', size)rbm_list.append(RBM(input_size, size))input_size = size# For each RBM in our listfor rbm in rbm_list:print('New RBM:')# Train a new onerbm.train(inpX)# Return the output layerinpX = rbm.rbm_outpt(inpX)nNet = NN(RBM_hidden_sizes, trX, trY)nNet.load_from_rbms(RBM_hidden_sizes, rbm_list)nNet.train()

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
RBM:  0   784 -> 500
RBM:  1   500 -> 200
RBM:  2   200 -> 50
New RBM:
Epoch: 0 reconstruction error: 0.060938
Epoch: 1 reconstruction error: 0.052212
Epoch: 2 reconstruction error: 0.048820
Epoch: 3 reconstruction error: 0.046884
Epoch: 4 reconstruction error: 0.045594
New RBM:
Epoch: 0 reconstruction error: 0.033911
Epoch: 1 reconstruction error: 0.030028
Epoch: 2 reconstruction error: 0.028021
Epoch: 3 reconstruction error: 0.026995
Epoch: 4 reconstruction error: 0.026132
New RBM:
Epoch: 0 reconstruction error: 0.058358
Epoch: 1 reconstruction error: 0.056799
Epoch: 2 reconstruction error: 0.055280
Epoch: 3 reconstruction error: 0.054365
Epoch: 4 reconstruction error: 0.053400
Accuracy rating for epoch 0: 0.4086
Accuracy rating for epoch 1: 0.5560181818181819
Accuracy rating for epoch 2: 0.6254727272727273
Accuracy rating for epoch 3: 0.7149818181818182
Accuracy rating for epoch 4: 0.7492545454545455
Accuracy rating for epoch 5: 0.8105272727272728

rbm的类创建好了和数据都已经载入，可以创建DBN。在这个例子中，我们使用了3个RBM，一个的隐藏层单元个数为500，第二个RBM的隐藏层个数为200，最后一个为50. 我们想要生成训练数据的深层次表示形式。

参考:
• http://deeplearning.net/tutorial/DBN.html
• https://github.com/myme5261314/dbn_tf

DBN-RBM TensorFlow实现相关推荐

Python 3深度置信网络(DBN)在Tensorflow中的实现MNIST手写数字识别
任何程序错误,以及技术疑问或需要解答的,请扫码添加作者VX:1755337994 使用DBN识别手写体传统的多层感知机或者神经网络的一个问题: 反向传播可能总是导致局部最小值. 当误差表面(erro ...
深度学习-深度信念（置信）网络（DBN）-从原理到实现（DeepLearnToolBox）
之前的文章有些地方不太完善,故补充完善一下. 2017-4-10. 深度信念网络,DBN,Deep Belief Nets,神经网络的一种.既可以用于非监督学习,类似于一个自编码机:也可以用于监督学习 ...
深度学习DBN深度置信网络
之前的文章有些地方不太完善,故补充完善一下. 2017-4-10. 深度信念网络,DBN,Deep Belief Nets,神经网络的一种.既可以用于非监督学习,类似于一个自编码机:也可以用于监督学习 ...
深度学习基础--不同网络种类--深度置信网络(DBN)
深度置信网络(DBN) RBM的作用就是用来生成似然分布的互补先验分布,使得其后验分布具有因子形式. 因此,DBN算法解决了Wake-Sleep算法表示分布难以匹配生成分布的难题,通过RBM使 ...
dbn源代码matlab,深度学习工具箱的DBN代码的例子有问题
从这里找到的,https:// github.co/ rasmusbergpalm/ DeepLearnToolbox 当然,搜索深度学习工具箱也行. function test_example_DB ...
MATLAB下载DeepLearnToolbox-master工具箱
目录目录一.声明二.工具箱文件目录说明三.设置四.实例五.工具箱下载网址一.声明此工具箱已过时,不再维护. 有比这个工具箱更好的深度学习工具,例如Theano,torch或tensor ...
Matlab深度学习笔记——深度学习工具箱说明
本文是Rasmus Berg Palm发布在Github上的Deep-learning toolbox的说明文件,作者对这个工具箱进行了详细的介绍(原文链接:https://github.com/ra ...
深度学习（二十八）——SOM, Group Normalization, MobileNet, 花式卷积进阶
RBM & DBN & Deep Autoencoder(续) DBN RBM不仅可以单独使用,也可以堆叠起来形成Deep Belief Nets(DBNs),其中每个RBM层都与其前 ...
转【面向代码】学习 Deep Learning（二）Deep Belief Nets(DBNs)
[面向代码]学习 Deep Learning(二)Deep Belief Nets(DBNs) http://blog.csdn.net/dark_scope/article/details/9447 ...
【总结】关于玻尔兹曼机(BM)、受限玻尔兹曼机(RBM)、深度玻尔兹曼机(DBM)、深度置信网络(DBN)理论总结和代码实践
近期学习总结前言玻尔兹曼机(BM) 波尔兹曼分布推导过程吉布斯采样受限玻尔兹曼机(RBM) 能量函数 CD学习算法代码实现受限玻尔兹曼机深度玻尔兹曼机(DBM) 代码实现深度玻尔兹曼机深 ...

DBN-RBM TensorFlow实现

简介

为什么RBM很重要？

RBM有什么用

RBM是生成模型吗？

生成模型和判别模型的区别

DBN原理及实践总结

使用DBN识别手写体

深度置信网络

1. 加载必要的深度置信网络库

2. 构建RBM层

RBM的层

训练好RBM能做什么

如何训练RBM

阶段1

阶段2

如何计算梯度

如何计算相对散度？

什么是目标函数？

我们再看看重构得到一张图片

DBN-RBM TensorFlow实现相关推荐

最新文章

热门文章