作者: G.E.Hinton et. al.
日期: 2006
类型: article
来源: Neural Computation
评价: Deep Learning eve(深度学习前夕)
论文链接: http://www.cs.toronto.edu/~hinton/absps/ncfast.pdf
文章比较"硬核", 各种算法原理解释, 数学公式和术语. 而且作者真的是很喜欢用一些生物上的词汇来描述模型, 比如synapse strength(突触强度), mind(头脑), 导致读的时候很困惑(挠头). 需要有RBM(受限玻尔兹曼机)和wake-sleep 算法的基础. 恰好我没有, 读的很困难, 笔记只做了简单的梳理和摘抄.

1 Purpose

  1. To design a generative model to surpass discriminative models.
  2. To train a deep, densely connected belief network efficiently.
    The explaining-away effects make inference difficult in densely connected belief nets that has many hidden layers.
  • challenges
  1. It’s difficult to infer the conditional distribution of the hidden activities when given a data vector.
  2. Variational methds(变分方法) use simple approximations to teh true conditional distribution, but the approximation may be poor, especially at the deepest hidden layer, where the prior assumes independence.
  3. Variational learning still requires all of the parameters to be learned together and this make the learning time scale poorly (extreme time consuming?) as the number of parameters increase.

2 The previous work

  1. Back propagation nets
  2. support vector machines

3 The proposed method

  1. The authors designed A hybrid model, in which its top two hidden layers form an undirected associative memory, and the remaining hidden layers form a directed acyclic graph that converts the representations in the associative memory into observable variables such as the pixels of an image.

    my understanding:

    • associative memory: the top two hidden layers . It really confused me when I was reading some parts of the paper when *associative memory * jump out. Actually it’s just the top two hidden layers.
    • directed graph and undirected associative memory: supervised layers and unsupervised layers??
    • It is a generative model, what’s the association with current hot network namely GAN(generative adversarial network)

pixels of an image(原文中说的是observable variables, 直接说剩余的层组成一个directed )
The authors derive a fast, greedy algotithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory(??).

3.1 Hypotheses

In section7, the authors uses the term of mind to describe the internal state of the model, and they emphasize that they are not intended to use it as a metaphor D:).

3.2 Methodology

  1. In order to deal with explaining-away(相消解释) phennomenon, the model introduced the idea of a “complementary” prior(先验补偿).
  2. Introduces a fast, greedy learning alogrithm for constructing multilayer directed networks one layer at a time. Using a variational bound, it shows that as each layer is added, the overall generative model improves.
  3. using a up-down algorithm to fine-tune the weights generated by the fast, greedy algorithm.
    up-down algorithm is a contrastive version of wake-sleep algorithm.

3.3 Experiment

  1. In section 6, a network with three hidden layers and 1.7 million weights was tested on the MNIST set of handwritten digits. It got a performance of error rate of 1.25%, which outperform the best backpropagation nets and support vector machine(reported in 2002).

3.4 Data sets

MNIST handwritten digits set.

Are the data sets sufficient?

In 2006, yes, It is a sufficient data sets. Even it is small in a modern perspective, it took days to train model in 2006.

3.5 Features

  1. In the proposed model , there is a fast, greedy learning algorithm that can find fairly good set of parameters quickly, even in deep networks with millions of parameters and many layers.
  2. The learing algorithm is unsupervised but can be applied to labeled data by learning a model that generates both the label and the data.(means learning the joint distribution of label and data??)
  3. In the proposed model , there is a fine-tuning algorithm that learns an excellent generative model that outperform discriminative mehtods on the MNIST database of hand-written digits.(What does learns a … model mean? Does the algorithm create a model or just the model was trained well?)
  4. The generative model makes it easy to interpret the distributed representations in the deep hidden layers.(The author did this by generate images through the model, to look into the mind of a neural network)
  5. The inference required for forming a percept is both fast and accurate.(Does it mean that the model can form the cognitive ablility quickly?)
  6. The learning algorithm is local. Adjustments to a synapse strength depend on only the state of the presynaptic and postsynaptic neuron. (What does synapse mean here? A unit of the network?)
  7. The communication is simple. Neurons need only to communicate their stochastic binary states.

3.6 Advantages

It has some major advantages as compared to discriminative models.

  1. Generative models can learn low-level features without requiring feedback from the label, and they can learn many more parameters than discriminative models without overfitting.
  2. It is easy to see what the network has learned by generating from its model.
  3. It is possible to interpret the nonlinear, distributed hidden representations in the deep hidden layers by generating images from them.
  4. The superior classification performance of discriminative learning methods holds only for domains in which it is not possible to learn a good generative model. This set of domains is being eroded by Moore’s law.(So, you mean that as the computational ability of computers grows, those domains will diminish?)

3.7 Weakness

the author lists limitations of his model:

  1. It is designed for images in which nonbinary values can be treated as probabilities(which is not the case for natural images);
  2. Its use of top-down feedback during perception is limited to the associative memory in the top two layers;(associative memory, actually, is the top two layers)
  3. It does not have a systematic way to deal with perceptual invariances. (perspective distortion? unbalanced illumination?)
  4. It assumes that segmentation has already be performed.
  5. It does not learn to sequentially attend to the most informative parts of objects when discrimination is difficult.

3.8 Application

not mentioned.

4 What is the author’s next step?

not mentioned, maybe breaking through the limitations? Actually, the limitations of 3 and 5 is almost resolved by deep convolutional neural networks and attention mechanism.

4.1 Do you agree with the author about the next step?

I agree with to break through limitation 3 and 5.

5 What do other researchers say about his work?

  • [George E. Dahl. et al. | Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition | IEEE. | 2012]

    • The pretraining algorithm we use is the deep belief network (DBN) pre-training algorithm of [24].
    • we abandon the deep belief network once pre-training is complete and only retain and continue training the recognition weights
    • It often outperforms random initialization for the deeper architectures we are interested in training and provides results very robust to the initial random seed. The generative model learned during pre-training helps prevent overfitting, even when using models with very high capacity and can aid in the subsequent optimization of the recognition weights
  • [Daniela M.witten et al. | Covariance-regularized regression and classification for high dimensional problems | Journal of the Royal Statistical Society | 2009]
    Indeed, many methods in the deep learning literature involve processing the features without using the outcome. Principal components regression is a classical example of this; a more recent example with much more extensive preprocessing is in Hinton et al. (2006).

  • [Ruslan Salakhutdinov et al. | Semantic Hashing | International Journal of Approximate Reasoning | 2009 ]
    The model can be trained efficiently by using a Restriced Boltzmann Machine (RBM) to learn one layer of hidden variables at a time [8] (Hinton is the second author of this paper).

参考博客

  • A Fast Learning Algorithm for Deep Belief Nets. 一些概念不太懂, 比如explaining away(相消解释), 这篇博客翻译的还稍微好点.
  • Deep Belief Network简介. 对DBN解释的比较清楚, 可以帮助理解.

论文阅读-A Fast Learning Algorithm for Deep Belief Nets相关推荐

  1. DBN训练学习-A fast Learning algorithm for deep belief nets

    转载自:http://blog.sciencenet.cn/blog-110554-889016.html DBN的学习一般都是从Hinton的论文A Fast Learning Algorithm ...

  2. Zero-shot Learning零样本学习 论文阅读(五)——DeViSE:A Deep Visual-Semantic Embedding Model

    Zero-shot Learning零样本学习 论文阅读(五)--DeViSE:A Deep Visual-Semantic Embedding Model 背景 Skip-gram 算法 算法思路 ...

  3. 论文阅读 [CVPR-2022] BatchFormer: Learning to Explore Sample Relationships for Robust Representation Lea

    论文阅读 [CVPR-2022] BatchFormer: Learning to Explore Sample Relationships for Robust Representation Lea ...

  4. 论文阅读 [TPAMI-2022] On Learning Disentangled Representations for Gait Recognition

    论文阅读 [TPAMI-2022] On Learning Disentangled Representations for Gait Recognition 论文搜索(studyai.com) 搜索 ...

  5. 《论文阅读》Multi-Task Learning of Generation and Classification for Emotion-Aware Dialogue Response Gener

    <论文阅读>Multi-Task Learning of Generation and Classification for Emotion-Aware Dialogue Response ...

  6. 【论文阅读】Cross-X Learning for Fine-Grained Visual Categorization

    [论文阅读]Cross-X Learning for Fine-Grained Visual Categorization 摘要 具体实现 OSME模块 跨类别跨语义正则化(C3SC^{3} SC3S ...

  7. 论文阅读笔记(15):Deep Subspace Clustering with Data Augmentation,深度子空间聚类+数据增强

    论文阅读笔记(15):Deep Subspace Clustering with Data Augmentation,深度子空间聚类+数据增强 摘要 1 介绍 2 相关工作 带增强的聚类方法 具有一致 ...

  8. 论文阅读|DeepWalk: Online Learning of Social Representations

    论文阅读|DeepWalk: Online Learning of Social Representations 文章目录 论文阅读|DeepWalk: Online Learning of Soci ...

  9. 转【面向代码】学习 Deep Learning(二)Deep Belief Nets(DBNs)

    [面向代码]学习 Deep Learning(二)Deep Belief Nets(DBNs) http://blog.csdn.net/dark_scope/article/details/9447 ...

  10. 强化学习泛化性 综述论文阅读 A SURVEY OF GENERALISATION IN DEEP REINFORCEMENT LEARNING

    强化学习泛化性 综述论文阅读 摘要 一.介绍 二.相关工作:强化学习子领域的survey 三.强化学习中的泛化的形式 3.1 监督学习中泛化性 3.2 强化学习泛化性背景 3.3 上下文马尔可夫决策过 ...

最新文章

  1. php显示TABLE数据
  2. 【从caffe到Tensorflow 1】io 操作
  3. linux fedora35安装deepin-wine:deepin-wine-on-fedora项目
  4. 第一章计算机基础知识作业答案,第一章 计算机基础知识.doc第一次作业
  5. 安卓实现序列化之Parcelable接口
  6. 怎样呵护友谊_【家校联动共同呵护孩子健康成长科普课堂】关爱学生心理健康,守护学生健康成长...
  7. 计算机中信息表示方法,计算机中的信息表示方法二进制
  8. (node:2612) DeprecationWarning: collection.ensureIndex is deprecated. Use createIndexes instead.
  9. 伯克利人工智能导论课开放:视频、PPT和练习都在这 | 资源
  10. 我的世界服务器显示unknown,我的世界找不到家怎么办-​我的世界unknown
  11. 如何使语音社交app源码实现腾讯实时音视频数据回调
  12. 本周小折腾记录: ipad和电脑完成同屏功能
  13. 【MTK 6737】Modem编译
  14. 私网地址与公网地址是如何转换的?
  15. lableme json转化为图片常用的脚本
  16. Atom使用教程:Atom下载,Atom汉化教程,Atom常用插件
  17. java中jdk的下载与安装
  18. JavaScript-空位补零
  19. 炒股笔记之反击线形态
  20. 单片机与微处理器和微型计算机的关系,单片机、微控制器和微处理器有什么区别?...

热门文章

  1. 移动端和pc端的区别html,pc端是什么意思(PC端和移动端有哪些区别?)
  2. 【数学建模】Python+Gurobi求解线性规划
  3. 总有云开日出时候, 万丈阳光照耀你我
  4. 产品营销策划方案:6个创意来源
  5. happen-before讲解
  6. 《30岁前每一天》 读书心得体会
  7. 不可或缺那就现在安排,22款奔驰GLE350升级ACC自适应巡航系统
  8. 按照是否执行程序分类
  9. Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection
  10. CryEngine GameLaucher 和Editor