作者: G.E.Hinton et. al.
日期: 2006
类型: article
来源: Neural Computation
评价: Deep Learning eve(深度学习前夕)
论文链接: http://www.cs.toronto.edu/~hinton/absps/ncfast.pdf
文章比较"硬核", 各种算法原理解释, 数学公式和术语. 而且作者真的是很喜欢用一些生物上的词汇来描述模型, 比如synapse strength(突触强度), mind(头脑), 导致读的时候很困惑(挠头). 需要有RBM(受限玻尔兹曼机)和wake-sleep 算法的基础. 恰好我没有, 读的很困难, 笔记只做了简单的梳理和摘抄.

1 Purpose

To design a generative model to surpass discriminative models.
To train a deep, densely connected belief network efficiently.
The explaining-away effects make inference difficult in densely connected belief nets that has many hidden layers.

challenges

It’s difficult to infer the conditional distribution of the hidden activities when given a data vector.
Variational methds(变分方法) use simple approximations to teh true conditional distribution, but the approximation may be poor, especially at the deepest hidden layer, where the prior assumes independence.
Variational learning still requires all of the parameters to be learned together and this make the learning time scale poorly (extreme time consuming?) as the number of parameters increase.

2 The previous work

Back propagation nets
support vector machines

3 The proposed method

The authors designed A hybrid model, in which its top two hidden layers form an undirected associative memory, and the remaining hidden layers form a directed acyclic graph that converts the representations in the associative memory into observable variables such as the pixels of an image.

my understanding:
- associative memory: the top two hidden layers . It really confused me when I was reading some parts of the paper when *associative memory * jump out. Actually it’s just the top two hidden layers.
- directed graph and undirected associative memory: supervised layers and unsupervised layers??
- It is a generative model, what’s the association with current hot network namely GAN(generative adversarial network)

pixels of an image(原文中说的是observable variables, 直接说剩余的层组成一个directed )
The authors derive a fast, greedy algotithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory(??).

3.1 Hypotheses

In section7, the authors uses the term of mind to describe the internal state of the model, and they emphasize that they are not intended to use it as a metaphor D:).

3.2 Methodology

In order to deal with explaining-away(相消解释) phennomenon, the model introduced the idea of a “complementary” prior(先验补偿).
Introduces a fast, greedy learning alogrithm for constructing multilayer directed networks one layer at a time. Using a variational bound, it shows that as each layer is added, the overall generative model improves.
using a up-down algorithm to fine-tune the weights generated by the fast, greedy algorithm.
up-down algorithm is a contrastive version of wake-sleep algorithm.

3.3 Experiment

In section 6, a network with three hidden layers and 1.7 million weights was tested on the MNIST set of handwritten digits. It got a performance of error rate of 1.25%, which outperform the best backpropagation nets and support vector machine(reported in 2002).

3.4 Data sets

MNIST handwritten digits set.

Are the data sets sufficient?

In 2006, yes, It is a sufficient data sets. Even it is small in a modern perspective, it took days to train model in 2006.

3.5 Features

In the proposed model , there is a fast, greedy learning algorithm that can find fairly good set of parameters quickly, even in deep networks with millions of parameters and many layers.
The learing algorithm is unsupervised but can be applied to labeled data by learning a model that generates both the label and the data.(means learning the joint distribution of label and data??)
In the proposed model , there is a fine-tuning algorithm that learns an excellent generative model that outperform discriminative mehtods on the MNIST database of hand-written digits.(What does learns a … model mean? Does the algorithm create a model or just the model was trained well?)
The generative model makes it easy to interpret the distributed representations in the deep hidden layers.(The author did this by generate images through the model, to look into the mind of a neural network)
The inference required for forming a percept is both fast and accurate.(Does it mean that the model can form the cognitive ablility quickly?)
The learning algorithm is local. Adjustments to a synapse strength depend on only the state of the presynaptic and postsynaptic neuron. (What does synapse mean here? A unit of the network?)
The communication is simple. Neurons need only to communicate their stochastic binary states.

3.6 Advantages

It has some major advantages as compared to discriminative models.

Generative models can learn low-level features without requiring feedback from the label, and they can learn many more parameters than discriminative models without overfitting.
It is easy to see what the network has learned by generating from its model.
It is possible to interpret the nonlinear, distributed hidden representations in the deep hidden layers by generating images from them.
The superior classification performance of discriminative learning methods holds only for domains in which it is not possible to learn a good generative model. This set of domains is being eroded by Moore’s law.(So, you mean that as the computational ability of computers grows, those domains will diminish?)

3.7 Weakness

the author lists limitations of his model:

It is designed for images in which nonbinary values can be treated as probabilities(which is not the case for natural images);
Its use of top-down feedback during perception is limited to the associative memory in the top two layers;(associative memory, actually, is the top two layers)
It does not have a systematic way to deal with perceptual invariances. (perspective distortion? unbalanced illumination?)
It assumes that segmentation has already be performed.
It does not learn to sequentially attend to the most informative parts of objects when discrimination is difficult.

3.8 Application

not mentioned.

4 What is the author’s next step?

not mentioned, maybe breaking through the limitations? Actually, the limitations of 3 and 5 is almost resolved by deep convolutional neural networks and attention mechanism.

4.1 Do you agree with the author about the next step?

I agree with to break through limitation 3 and 5.

5 What do other researchers say about his work?

[George E. Dahl. et al. | Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition | IEEE. | 2012]
- The pretraining algorithm we use is the deep belief network (DBN) pre-training algorithm of [24].
- we abandon the deep belief network once pre-training is complete and only retain and continue training the recognition weights
- It often outperforms random initialization for the deeper architectures we are interested in training and provides results very robust to the initial random seed. The generative model learned during pre-training helps prevent overfitting, even when using models with very high capacity and can aid in the subsequent optimization of the recognition weights
[Daniela M.witten et al. | Covariance-regularized regression and classification for high dimensional problems | Journal of the Royal Statistical Society | 2009]
Indeed, many methods in the deep learning literature involve processing the features without using the outcome. Principal components regression is a classical example of this; a more recent example with much more extensive preprocessing is in Hinton et al. (2006).
[Ruslan Salakhutdinov et al. | Semantic Hashing | International Journal of Approximate Reasoning | 2009 ]
The model can be trained efficiently by using a Restriced Boltzmann Machine (RBM) to learn one layer of hidden variables at a time [8] (Hinton is the second author of this paper).

参考博客

A Fast Learning Algorithm for Deep Belief Nets. 一些概念不太懂, 比如explaining away(相消解释), 这篇博客翻译的还稍微好点.
Deep Belief Network简介. 对DBN解释的比较清楚, 可以帮助理解.

论文阅读-A Fast Learning Algorithm for Deep Belief Nets相关推荐

DBN训练学习-A fast Learning algorithm for deep belief nets
转载自:http://blog.sciencenet.cn/blog-110554-889016.html DBN的学习一般都是从Hinton的论文A Fast Learning Algorithm ...
Zero-shot Learning零样本学习论文阅读（五）——DeViSE:A Deep Visual-Semantic Embedding Model
Zero-shot Learning零样本学习论文阅读(五)--DeViSE:A Deep Visual-Semantic Embedding Model 背景 Skip-gram 算法算法思路 ...
论文阅读 [CVPR-2022] BatchFormer: Learning to Explore Sample Relationships for Robust Representation Lea
论文阅读 [CVPR-2022] BatchFormer: Learning to Explore Sample Relationships for Robust Representation Lea ...
论文阅读 [TPAMI-2022] On Learning Disentangled Representations for Gait Recognition
论文阅读 [TPAMI-2022] On Learning Disentangled Representations for Gait Recognition 论文搜索(studyai.com) 搜索 ...
《论文阅读》Multi-Task Learning of Generation and Classification for Emotion-Aware Dialogue Response Gener
<论文阅读>Multi-Task Learning of Generation and Classification for Emotion-Aware Dialogue Response ...
【论文阅读】Cross-X Learning for Fine-Grained Visual Categorization
[论文阅读]Cross-X Learning for Fine-Grained Visual Categorization 摘要具体实现 OSME模块跨类别跨语义正则化(C3SC^{3} SC3S ...
论文阅读笔记（15）：Deep Subspace Clustering with Data Augmentation，深度子空间聚类+数据增强
论文阅读笔记(15):Deep Subspace Clustering with Data Augmentation,深度子空间聚类+数据增强摘要 1 介绍 2 相关工作带增强的聚类方法具有一致 ...
论文阅读|DeepWalk: Online Learning of Social Representations
论文阅读|DeepWalk: Online Learning of Social Representations 文章目录论文阅读|DeepWalk: Online Learning of Soci ...
转【面向代码】学习 Deep Learning（二）Deep Belief Nets(DBNs)
[面向代码]学习 Deep Learning(二)Deep Belief Nets(DBNs) http://blog.csdn.net/dark_scope/article/details/9447 ...
强化学习泛化性综述论文阅读 A SURVEY OF GENERALISATION IN DEEP REINFORCEMENT LEARNING
强化学习泛化性综述论文阅读摘要一.介绍二.相关工作:强化学习子领域的survey 三.强化学习中的泛化的形式 3.1 监督学习中泛化性 3.2 强化学习泛化性背景 3.3 上下文马尔可夫决策过 ...

论文阅读-A Fast Learning Algorithm for Deep Belief Nets