有一个中文翻译版:http://www.csdn.net/article/2015-06-05/2824880

This article provides a basic introduction to Long Short Term Memory Neural Networks. For a more thorough review of RNNs, see the full 33 page review hosted on arXiv.


Given its wide applicability to real-world tasks, deep learning has attracted the attention of a wide audience of interested technologists, investors, and spectators. While the most celebrated results use feedforward convolutional neural networks (convnets) to solve problems in computer vision, less public attention has been paid to developments using recurrent neural network to model relationships in time.

(Note: To help you begin experimenting with LSTM recurrent nets, I've attached a snap of a simple micro instance preloaded with numpy, theano, and a git clone of Jonathan Raiman's LSTM example.)

In a recent post, "Learning to Read with Recurrent Neural Networks," I explained why, despite their incredible successes, feedforward networks are limited by their inability to explicitly model relationships in time and by their assumption that all data points consist of vectors of fixed length. At the posts' conclusion, I promised a forthcoming post, explaining the basics of recurrent nets and introducing the Long Short Term Memory (LSTM) model.

First, the basics of neural networks. A neural network can be represented as a graph ofartificial neurons, also called nodes and directed edges, which model synapses. Each neuron is a processing unit which takes as input the outputs of those nodes connected to it. Before emitting output, each neuron first applies a nonlinear activation function. It is this activation function that gives neural networks the ability to model nonlinear relationships.

Now, consider this recent famous paper, "Playing Atari with Deep Reinforcement Learning," which combines convnets with reinforcement learning to train computer to play video games. The system achieves superhuman performance on games like Breakout!, at which the proper strategy at any point can be deduced by looking at the screen. However, the system falls far short of human performance when optimal strategies require planning over long spans of time, as in Space Invaders.

With this motivation we introduce recurrent neural networks, an approach which endows neural networks with the ability to explicitly model time by adding a self-connected hidden layer which spans time points. In other words, the hidden layer feeds not only into the output, but also into the hidden layer at the next time step. Throughout this post I'll use some illustrations of recurrent networks pilferred from my forthcoming review of the literature on the subject.

We can now unfold this network across two time steps to visualize the connections in an acyclic way. Note that the weights (from input to hidden and hidden to output) are identical at each time step. A recurrent net is sometimes described as a deep network where the depth occurs not between input and output, but across time steps, where each time step can be thought of as a layer.

Once unfolded, these networks can be trained end to end using backpropagation. This extension of backpropagation to span time steps is called backpropagation through time.

One problem, however, is the vanishing gradient as described by Yoshua Bengio in the frequently cited paper, "Learning Long-Term Dependencies with Gradient Descent is Difficult." In other words, the error signal from later time steps often doesn't make it far enough back in time to influence the network at much earlier time steps. This makes it difficult to learn long-range effects, such as taking that pawn will come back to bite you in 12 moves.

A remedy to this problem is the Long Short Term Memory (LSTM ) model first described in 1997 by Sepp Hochreiter and Jurgen Schmidhuber. In this model, ordinary neurons, i.e. units which apply a sigmoidal activation to a linear combination of their inputs, are replaced bymemory cells. Each memory cell is associated with an input gate, an output gate and an internal state that feeds into itself unperturbed across time steps.

In this model, for each memory cell, three sets of weights are learned from the input as well as the entire hidden state at the previous time step. One feeds into the input node, pictured at bottom. One feeds into the input gate, shown on the far right side of the cell at bottom. Another feeds into the output gate, shown on the far right side of the cell at top. Each blue node is associated with an activation function, typically sigmoidal, and the Pi nodes represent multiplication. The centermost node in the cell is called the internal state and feeds into itself with a fixed weight of 1 across time steps. The self-connected edge attached to the internal state is referred to as the constant error carousel or CEC.

Thinking in terms of the forward pass, the input gate learns to decide when to let activation pass into the memory cell and the output gate learns when to let activation pass out of the memory cell. Alternative, in terms of the backwards pass, the output gate is learning when to let error flow into the memory cell, and the input gates are learning when to let it flow out of the memory cell and to through the rest of the network. These models have proven remarkably successful on tasks as varied as handwriting recognition and image captioning. Perhaps with some love they can be made to win at Space Invaders.

For a more thorough review of RNNs, see my full 33 page review hosted on arXiv.

from: http://blog.terminal.com/demistifying-long-short-term-memory-lstm-recurrent-neural-networks/

LSTM神经网络Demystifying LSTM neural networks相关推荐

  1. 图神经网络(Graph Neural Networks,GNN)综述

    鼠年大吉 HAPPY 2020'S NEW YEAR 作者:苏一 https://zhuanlan.zhihu.com/p/75307407 本文仅供学术交流.如有侵权,可联系删除. 本篇文章是对论文 ...

  2. 循环神经网络(RNN, Recurrent Neural Networks)介绍

    循环神经网络(RNN, Recurrent Neural Networks)介绍   循环神经网络(Recurrent Neural Networks,RNNs)已经在众多自然语言处理(Natural ...

  3. 斯坦福大学机器学习第八课“神经网络的表示(Neural Networks: Representation)”

    斯坦福大学机器学习第八课"神经网络的表示(Neural Networks: Representation)" 斯坦福大学机器学习第八课"神经网络的表示(Neural Ne ...

  4. 斯坦福大学机器学习第六课“神经网络的表示(Neural Networks: Representation)”

    斯坦福大学机器学习第六课"神经网络的表示(Neural Networks: Representation)" 本次课程主要包括7部分: 1)  Non-linear hypothe ...

  5. 【卷积神经网络】卷积神经网络(Convolutional Neural Networks, CNN)基础

    卷积神经网络(Convolutional Neural Networks, CNN),是一种 针对图像 的特殊的 神经网络. 卷积神经网络概述 Why not DNN? 图像数据的维数很高,比如 1, ...

  6. 【机器学习网络】神经网络与深度学习-6 深度神经网络(deep neural Networks DNN)

    目录 深度神经网络(deep neural Networks DNN) DNN的底层原理 深度学习网络的问题: 案例1:书写数字识别(梯度下降法详解) 男女头发长短区分案例(为什么隐藏层追求深度): ...

  7. 模糊神经网络:基于模糊神经网络(Fuzzy Neural Networks,FNN)的数据分类(提供MATLAB代码)

    一.模糊神经网络FNN 模糊神经网络(Fuzzy Neural Networks,FNN)结合了神经网络系统和模糊系统的长处,它在处理非线性.模糊性等问题上有很大的优越性,在 智能信息处理方面存在巨大 ...

  8. 神经网络:基于模糊神经网络(Fuzzy Neural Networks,FNN)的数据预测(提供MATLAB代码)

    一.模糊神经网络FNN 模糊神经网络(Fuzzy Neural Networks,FNN)结合了神经网络系统和模糊系统 的长处,它在处理非线性. 模糊性 等问题上有很大的优越性,在 智能信息处理 方面 ...

  9. 第四门课 卷积神经网络(Convolutional Neural Networks)

    第四门课 卷积神经网络(Convolutional Neural Networks) 第一周 卷积神经网络(Foundations of Convolutional Neural Networks) ...

最新文章

  1. R语言glmnet拟合lasso回归模型实战:lasso回归模型的模型系数及可视化、lasso回归模型分类评估计算(混淆矩阵、accuracy、Deviance)
  2. 手机不断进入recovery mode
  3. 【JAVA】使用IntelliJ IDEA创建Java控制台工程
  4. php 502.88,Nginx+PHP-FPM 访问出现 502错误
  5. python测试函数怎么写_Python - 函数
  6. 【Elasticsearch】Elasticsearch 最佳实践系列之分片恢复并发故障
  7. windows自动导出oracle数据库,Oracle数据库的自动导出备份脚本(windows环境)
  8. 摊牌了!2021年3D视觉算法岗求职群
  9. 2022危险化学品经营单位安全管理人员考试模拟100题及模拟考试
  10. 正则判断手机号地区_匹配中国大陆所有手机号正则表达式
  11. 查看ASA日志服务器信息,ASA 日志管理
  12. 企业邮箱手机怎么设置服务器,如何设置手机企业邮箱
  13. (原创推荐文章)kerberos服务器端与客户端
  14. 2021南京大学计算机考研分数线,【34所院校线】南京大学2021考研复试分数线已公布...
  15. Python生成迷宫
  16. vue项目性能优化(图片优化)
  17. QTableView 例三(代理)
  18. 简单的ps去掉图片上不想留的文字
  19. 人人都说的SaaS,你真的了解它吗?
  20. 各大厂商企业级BOM解决方案PK

热门文章

  1. Dlib模型人脸特征检测原理及demo
  2. 智能合约开发环境搭建及 Hello World 合约
  3. 字节跳动(今日头条),为何战斗力如此凶猛?
  4. 求解LambdaMART的疑惑?
  5. 机器学习和图像识别是怎样彻底改变搜索的?
  6. 任志强罕见看衰楼市:没看到任何理由能阻止下滑
  7. 王石:人生60岁才是开始
  8. 实战SSM_O2O商铺_16【商铺注册】前后端联调验证整体模块功能
  9. Spring JDBC-Spring事务管理之ThreadLocal基础知识
  10. 解决Android Studio内代码乱码