LSTM神经网络Demystifying LSTM neural networks

有一个中文翻译版：http://www.csdn.net/article/2015-06-05/2824880

This article provides a basic introduction to Long Short Term Memory Neural Networks. For a more thorough review of RNNs, see the full 33 page review hosted on arXiv.

Given its wide applicability to real-world tasks, deep learning has attracted the attention of a wide audience of interested technologists, investors, and spectators. While the most celebrated results use feedforward convolutional neural networks (convnets) to solve problems in computer vision, less public attention has been paid to developments using recurrent neural network to model relationships in time.

(Note: To help you begin experimenting with LSTM recurrent nets, I've attached a snap of a simple micro instance preloaded with numpy, theano, and a git clone of Jonathan Raiman's LSTM example.)

In a recent post, "Learning to Read with Recurrent Neural Networks," I explained why, despite their incredible successes, feedforward networks are limited by their inability to explicitly model relationships in time and by their assumption that all data points consist of vectors of fixed length. At the posts' conclusion, I promised a forthcoming post, explaining the basics of recurrent nets and introducing the Long Short Term Memory (LSTM) model.

First, the basics of neural networks. A neural network can be represented as a graph ofartificial neurons, also called nodes and directed edges, which model synapses. Each neuron is a processing unit which takes as input the outputs of those nodes connected to it. Before emitting output, each neuron first applies a nonlinear activation function. It is this activation function that gives neural networks the ability to model nonlinear relationships.

Now, consider this recent famous paper, "Playing Atari with Deep Reinforcement Learning," which combines convnets with reinforcement learning to train computer to play video games. The system achieves superhuman performance on games like Breakout!, at which the proper strategy at any point can be deduced by looking at the screen. However, the system falls far short of human performance when optimal strategies require planning over long spans of time, as in Space Invaders.

With this motivation we introduce recurrent neural networks, an approach which endows neural networks with the ability to explicitly model time by adding a self-connected hidden layer which spans time points. In other words, the hidden layer feeds not only into the output, but also into the hidden layer at the next time step. Throughout this post I'll use some illustrations of recurrent networks pilferred from my forthcoming review of the literature on the subject.

We can now unfold this network across two time steps to visualize the connections in an acyclic way. Note that the weights (from input to hidden and hidden to output) are identical at each time step. A recurrent net is sometimes described as a deep network where the depth occurs not between input and output, but across time steps, where each time step can be thought of as a layer.

Once unfolded, these networks can be trained end to end using backpropagation. This extension of backpropagation to span time steps is called backpropagation through time.

One problem, however, is the vanishing gradient as described by Yoshua Bengio in the frequently cited paper, "Learning Long-Term Dependencies with Gradient Descent is Difficult." In other words, the error signal from later time steps often doesn't make it far enough back in time to influence the network at much earlier time steps. This makes it difficult to learn long-range effects, such as taking that pawn will come back to bite you in 12 moves.

A remedy to this problem is the Long Short Term Memory (LSTM ) model first described in 1997 by Sepp Hochreiter and Jurgen Schmidhuber. In this model, ordinary neurons, i.e. units which apply a sigmoidal activation to a linear combination of their inputs, are replaced bymemory cells. Each memory cell is associated with an input gate, an output gate and an internal state that feeds into itself unperturbed across time steps.

In this model, for each memory cell, three sets of weights are learned from the input as well as the entire hidden state at the previous time step. One feeds into the input node, pictured at bottom. One feeds into the input gate, shown on the far right side of the cell at bottom. Another feeds into the output gate, shown on the far right side of the cell at top. Each blue node is associated with an activation function, typically sigmoidal, and the Pi nodes represent multiplication. The centermost node in the cell is called the internal state and feeds into itself with a fixed weight of 1 across time steps. The self-connected edge attached to the internal state is referred to as the constant error carousel or CEC.

Thinking in terms of the forward pass, the input gate learns to decide when to let activation pass into the memory cell and the output gate learns when to let activation pass out of the memory cell. Alternative, in terms of the backwards pass, the output gate is learning when to let error flow into the memory cell, and the input gates are learning when to let it flow out of the memory cell and to through the rest of the network. These models have proven remarkably successful on tasks as varied as handwriting recognition and image captioning. Perhaps with some love they can be made to win at Space Invaders.

For a more thorough review of RNNs, see my full 33 page review hosted on arXiv.

from: http://blog.terminal.com/demistifying-long-short-term-memory-lstm-recurrent-neural-networks/

LSTM神经网络Demystifying LSTM neural networks相关推荐

图神经网络（Graph Neural Networks，GNN）综述
鼠年大吉 HAPPY 2020'S NEW YEAR 作者:苏一 https://zhuanlan.zhihu.com/p/75307407 本文仅供学术交流.如有侵权,可联系删除. 本篇文章是对论文 ...
循环神经网络(RNN, Recurrent Neural Networks)介绍
循环神经网络(RNN, Recurrent Neural Networks)介绍循环神经网络(Recurrent Neural Networks,RNNs)已经在众多自然语言处理(Natural ...
斯坦福大学机器学习第八课“神经网络的表示(Neural Networks: Representation)”
斯坦福大学机器学习第八课"神经网络的表示(Neural Networks: Representation)" 斯坦福大学机器学习第八课"神经网络的表示(Neural Ne ...
斯坦福大学机器学习第六课“神经网络的表示(Neural Networks: Representation)”
斯坦福大学机器学习第六课"神经网络的表示(Neural Networks: Representation)" 本次课程主要包括7部分: 1) Non-linear hypothe ...
【卷积神经网络】卷积神经网络（Convolutional Neural Networks, CNN）基础
卷积神经网络(Convolutional Neural Networks, CNN),是一种针对图像的特殊的神经网络. 卷积神经网络概述 Why not DNN? 图像数据的维数很高,比如 1, ...
【机器学习网络】神经网络与深度学习-6 深度神经网络（deep neural Networks DNN）
目录深度神经网络(deep neural Networks DNN) DNN的底层原理深度学习网络的问题: 案例1:书写数字识别(梯度下降法详解) 男女头发长短区分案例(为什么隐藏层追求深度): ...
模糊神经网络：基于模糊神经网络（Fuzzy Neural Networks，FNN）的数据分类（提供MATLAB代码）
一.模糊神经网络FNN 模糊神经网络(Fuzzy Neural Networks,FNN)结合了神经网络系统和模糊系统的长处,它在处理非线性.模糊性等问题上有很大的优越性,在智能信息处理方面存在巨大 ...
神经网络：基于模糊神经网络（Fuzzy Neural Networks，FNN）的数据预测（提供MATLAB代码）
一.模糊神经网络FNN 模糊神经网络(Fuzzy Neural Networks,FNN)结合了神经网络系统和模糊系统的长处,它在处理非线性. 模糊性等问题上有很大的优越性,在智能信息处理方面 ...
第四门课卷积神经网络（Convolutional Neural Networks）
第四门课卷积神经网络(Convolutional Neural Networks) 第一周卷积神经网络(Foundations of Convolutional Neural Networks) ...

LSTM神经网络Demystifying LSTM neural networks

LSTM神经网络Demystifying LSTM neural networks相关推荐

最新文章

热门文章