循环神经网络递归神经网络

有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING)

These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was created with deep learning techniques largely automatically and only minor manual modifications were performed. If you spot mistakes, please let us know!

这些是FAU YouTube讲座“ 深度学习 ”的讲义。 这是演讲视频和匹配幻灯片的完整记录。 我们希望您喜欢这些视频。 当然，此成绩单是使用深度学习技术自动创建的，并且仅进行了较小的手动修改。 如果发现错误，请告诉我们！

导航 (Navigation)

Previous Lecture / Watch this Video / Top Level / Next Lecture

上一个讲座 / 观看此视频 / 顶级 / 下一个讲座

Welcome back to deep learning! Today I want to show you one alternative solution to solve this vanishing gradient problem in recurrent neural networks.

欢迎回到深度学习！今天，我想向您展示一种替代解决方案，以解决递归神经网络中逐渐消失的梯度问题。

You already noticed long temporal contexts are a problem. Therefore, we will talk about long short-term memory units (LSTMs). They have been introduced by a Hochreiter in Schmidhuber and they were published in 1997.

您已经注意到长时态上下文是个问题。因此，我们将讨论长短期存储单元(LSTM)。它们由Schmidhuber的Hochreiter引入，并于1997年出版。

CC BY 4.0 from the 深度学习讲座中 Deep Learning Lecture.CC BY 4.0下的图像。

They were designed to solve this vanishing gradient problem in the long term dependencies. The main idea is that you introduce gates that control writing and accessing the memory in additional states.

它们旨在解决长期依赖关系中逐渐消失的梯度问题。主要思想是引入门来控制在其他状态下写入和访问内存。

So, let’s have a look into the LSTM unit. You see here, one main feature is that we now have essentially two things that could be considered as a hidden state: We have the cell state C and we have the hidden state h. Again, we have some input x. Then we have quite a few of activation functions. We then combine them and in the end, we produce some output y. This unit is much more complex than what you’ve seen previously in the simple RNNs.

因此，让我们看一下LSTM单元。您会在这里看到一个主要特征，就是现在我们基本上有两件事可以被视为隐藏状态：我们拥有单元状态C和拥有隐藏状态h 。同样，我们有一些输入x 。然后，我们有很多激活功能。然后，我们将它们组合在一起，最后产生一些输出y 。这个单元比以前在简单RNN中看到的要复杂得多。

Okay, so what are the main features the LSTM: Given some input x it produces a hidden state h. It also has a cell state Cthat we will look into a little more detail in the next couple of slides, to produce the output y. Now, we have several gates and the gates essentially are used to control the flow of information. There’s a forget gate and this is used to forget old information in the cell state. Then, we have the input gate and this is essentially deciding new input into the cell state. From this, we then compute the updated cell state and the updated hidden state.

好的，LSTM的主要特征是什么：给定输入x，它会产生隐藏状态h 。它还具有单元状态C ，我们将在接下来的几张幻灯片中对其进行详细研究，以产生输出y 。现在，我们有几个闸门，而这些闸门实际上是用来控制信息流的。有一个忘记门，用于忘记单元状态中的旧信息。然后，我们有了输入门，这实际上是在确定新输入进入单元状态。由此，我们可以计算出更新后的单元状态和更新后的隐藏状态。

So let’s look into the workflow. We have the cell state after each time point t and the cell state undergoes only linear changes. So there is no activation function. You see there are only one multiplication and one addition on the path of the cell state. So, the cell state can flow through the unit. The cell state can be constant for multiple time steps. Now, we want to operate on the cell state. We do that with several gates and the first one is going to be the forget gate. The key idea here is that we want to forget information from the cell state. In another step, we then want to think about how to actually put new information in the cell state that is going to be used to memorize things.

因此，让我们看一下工作流程。在每个时间点t之后，我们都有单元状态，并且单元状态仅发生线性变化。因此没有激活功能。您会看到单元状态的路径上只有一个乘法和一个加法。因此，单元状态可以流过单元。单元状态对于多个时间步长可以是恒定的。现在，我们要对单元状态进行操作。我们用几个门来做到这一点，第一个将是忘记门。这里的关键思想是我们要从单元状态中忘记信息。然后，下一步，我们要考虑如何将新信息实际置于将用于存储事物的单元状态中。

So, the forget gate f controls how much of the previous cell state is forgotten. You can see it is computed by a sigmoid function. So, it’s somewhere between 0 and 1. It’s essentially computed with a matrix multiplication of a concatenation of the hidden state and x plus some bias. This is then multiplied to the cell state. So, we decide which parts of the state vector to forget and which ones to keep.

因此，遗忘门f控制着遗忘了多少先前的电池状态。您可以看到它是由S型函数计算的。因此，它介于0到1之间。它基本上是由隐藏状态和x的串联矩阵乘积加上一些偏差来计算的。然后将其乘以单元状态。因此，我们决定要忘记状态向量的哪些部分，并保留哪些部分。

Now, we also need to put in new information. For the new information, we have to somehow decide what information to input into the cell state. So here, we need two activation functions: One that we call I that is also produced by a sigmoid activation function. Again, matrix multiplication of the hidden state concatenated with the input plus some bias and the sigmoid function as non-linearity. Remember, this value is going to be between 0 and 1 so you could argue that it is kind of selecting something. Then, we have some C tilde which is a kind of update state that is produced by the hyperbolic tangent. This then takes as input some weight matrix W subscript c that is multiplied to the concatenation of hidden and input vector plus some bias. So essentially, we have this index that is then multiplied to the intermediate cell stage C tilde. We could say that the hyperbolic tangent is producing some new cell state and then we select via I which of these indices should be added to the current cell state. So, we multiply with I the newly produced C tilde and add it to the cell state C.

现在，我们还需要添加新信息。对于新信息，我们必须以某种方式决定将哪些信息输入到单元状态。所以在这里，我们需要两个激活功能：一个是我们所说的我也由一个S形激活函数产生的。同样，隐藏状态的矩阵乘法与输入加上一些偏置和S形函数(作为非线性函数)连接在一起。请记住，该值将在0到1之间，因此您可以说这是一种选择。然后，我们有一些C波浪号，它是由双曲正切产生的一种更新状态。然后，这将一些权重矩阵W下标c作为输入，该权重矩阵W下标c乘以隐藏向量和输入向量的串联加上一些偏差。因此，基本上，我们拥有此索引，然后将其乘以中间单元阶段C tilde。我们可以说双曲正切正在产生一些新的单元状态，然后我们通过I选择应将这些索引中的哪一个添加到当前单元状态。因此，我们将新产生的C波浪号与I相乘，并将其添加到单元状态C。

Now, we update as we’ve just seen the complete cell state using a point-wise multiplication with the forget gate of the previous state. Then, we add the elements of the update cell state that have been identified by I with a point-wise multiplication. So, you see the update of the cell state is completely linear only using multiplications and additions.

现在，我们进行更新，因为我们刚刚看到了完整的单元格状态，并使用了前一个状态的“忘记门”进行逐点乘法。然后，我们使用逐点乘法添加已由I标识的更新单元状态的元素。因此，您看到单元状态的更新仅使用乘法和加法是完全线性的。

Now, we still have to produce the hidden state and the output. As we have seen in the Elman cell, the output of our network only depends on the hidden state. So, we first update the hidden state by another non-linearity that is then multiplied to a transformation of the cell state. This gives us the new hidden state and from the new hidden state, we produce the output with another non-linearity.

现在，我们仍然必须产生隐藏状态和输出。正如我们在Elman单元中看到的那样，网络的输出仅取决于隐藏状态。因此，我们首先通过另一种非线性来更新隐藏状态，然后将该非线性乘以单元状态的转换。这给了我们新的隐藏状态，并且从新的隐藏状态中，我们产生了另一个非线性的输出。

So, you see these are the update equations. We produce some o which is essentially a proposal for the new hidden state by a sigmoid function. Then, we multiply it with the hyperbole tangent that is generated from the cell state in order to select which elements are actually produced. This gives us the new hidden state. The new hidden state we can then pass through another non-linearity in order to produce the output. You can see here, by the way, that for the update of the hidden state and the production of the new output, we omitted the transformation matrices that are of course required. You could interpret each of these nonlinearities in the network essentially as a universal function approximator. So, we still need the linear part, of course, inside here to reduce vanishing gradients.

因此，您将看到这些是更新方程式。我们产生一些o ，这实际上是通过S型函数对新隐藏状态的建议。然后，我们将其与从单元状态生成的双曲线正切相乘，以选择实际生成的元素。这给了我们新的隐藏状态。然后，我们可以通过另一个非线性来传递新的隐藏状态，以产生输出。顺便说一下，您在这里可以看到，对于隐藏状态的更新和新输出的产生，我们省略了当然需要的转换矩阵。您可以将网络中的所有这些非线性本质上解释为通用函数逼近器。因此，我们当然仍然需要在这里内部使用线性部分来减小消失梯度。

If you want to train all of this, you can go back and use a very similar recipe as we’ve already seen for the Elman cell. So, you use backpropagation through time in order to update all of the different weight matrices.

如果您想训练所有这些，则可以回过头来，使用与我们在Elman细胞中看到的非常相似的配方。因此，您将使用反向传播，以更新所有不同的权重矩阵。

Okay. This already brings us to the end of this video. So you’ve seen the long short-term memory cell, the different parts, the different gates, and, of course, this is a very important part of this lecture. So, if you’re preparing for the exam, then I would definitely recommend having a look at how to sketch such a long short-term memory unit. You can see that the LSTM has a lot of advantages. In particular, we can alleviate the problem with the vanishing gradients by the linear transformations in the cell state. By the way, it’s also noteworthy to point out that we somehow include in our long short term memory cell some ideas that we know from computer design. We essentially learn how to manipulate memory cells. We could argue that in the hidden state, we now have the kind of program a kind of finite state machine that then operates on some memory and learns which information to store, which information to delete, and which information to load. So, this is very interesting how these network designs gradually seem to be approaching computer architectures. Of course, there’s much more to say about this. In the next video, we will look into the gated recurrent neural networks which are a kind of simplification of the LSTM cell. You will see that with a slightly slimmer design, we can still get many of the benefits of the LSTM, but much fewer parameters. Ok, so I hope you enjoyed this video and see you next time when we talk about gated recurrent neural networks. Bye-bye!

好的。这已经将我们带到了该视频的结尾。因此，您已经看到了很长的短期存储单元，不同的部分，不同的门，当然，这是本讲座中非常重要的一部分。因此，如果您正在准备考试，那么我绝对建议您看一下如何绘制这么长的短期记忆单元。您可以看到LSTM有很多优点。特别地，我们可以通过单元状态中的线性变换来缓解梯度消失的问题。顺便说一句，还需要指出的是，我们以某种方式在我们的长期短期存储单元中包含了一些我们从计算机设计中了解到的想法。我们从本质上了解如何操纵存储单元。我们可以争辩说，在隐藏状态下，我们现在拥有一种程序，即一种有限状态机，然后在某些内存上运行，并了解要存储的信息，要删除的信息以及要加载的信息。因此，这很有趣，这些网络设计似乎逐渐接近计算机体系结构。当然，还有更多要说的。在下一个视频中，我们将研究门控循环神经网络，这是LSTM单元的一种简化。您会发现，通过稍微苗条的设计，我们仍然可以获得LSTM的许多好处，但参数却少得多。好的，所以我希望您喜欢这个视频，并且下次在我们谈论门控递归神经网络时再见。再见！

If you liked this post, you can find more essays here, more educational material on Machine Learning here, or have a look at our Deep LearningLecture. I would also appreciate a follow on YouTube, Twitter, Facebook, or LinkedIn in case you want to be informed about more essays, videos, and research in the future. This article is released under the Creative Commons 4.0 Attribution License and can be reprinted and modified if referenced.

如果你喜欢这篇文章，你可以找到这里更多的文章，更多的教育材料，机器学习在这里，或看看我们的深入学习讲座。如果您希望将来了解更多文章，视频和研究信息，也欢迎关注YouTube ， Twitter ， Facebook或LinkedIn 。本文是根据知识共享4.0署名许可发布的，如果引用，可以重新打印和修改。

RNN民间音乐 (RNN Folk Music)

FolkRNN.orgMachineFolkSession.comThe Glass Herry Comment 14128

FolkRNN.org MachineFolkSession.com 玻璃哈里评论14128

链接 (Links)

Character RNNsCNNs for Machine TranslationComposing Music with RNNs

字符RNN CNN用于机器翻译和RNN组合音乐

翻译自: https://towardsdatascience.com/recurrent-neural-networks-part-3-1032d4a67757

循环神经网络递归神经网络

查看全文

http://www.taodudu.cc/news/show-1874179.html

人工智能和金融是天作之合的5个理由
好莱坞法则_人工智能去好莱坞
什么时候需要档案_需要什么
逻辑回归分析与回归分析_逻辑回归从零开始的情感分析
构建ai数据库_为使用AI的所有人构建更安全的互联网
社会达尔文主义盛行时间_新达尔文主义的心理理论
两种思想
强化学习推荐系统_推荐人系统：价值调整，强化学习和道德规范
ai带来的革命_AI革命就在这里。这与我们预期的不同。
卷积神经网络解决拼图_使用神经网络解决拼图难题
通用逼近定理证明_通用逼近定理：代码证明
ai人工智能的本质和未来_人工智能如何塑造音乐产业的未来
机器学习指南_管理机器学习实验的快速指南
强化学习与环境不确定_不确定性意识强化学习
部署容器jenkins_使用Jenkins部署用于进行头盔检测的烧瓶容器
贝叶斯网络神经网络_随机贝叶斯神经网络
智能机器人机器人心得_如果机器人说到上帝
深度前馈神经网络_深度前馈神经网络简介
女人在聊天中说给你一个拥抱_不要提高技能；拥抱一个机器人
机器学习中特征选择_机器学习中的特征选择
学术会议查询边缘计算_我设计了可以预测边缘性的“学术不诚实”的AI系统（SMART课堂）...
机器学习深度学习 ai_用AI玩世界末日：深度Q学习的多目标优化
学习自动驾驶技术学习之路_一天学习驾驶
python 姿势估计_Python中的实时头姿势估计
node-red 可视化_可视化和注意-第4部分
人工智能ai算法_AI算法比您想象的要脆弱得多
自然语言理解gpt_GPT-3：自然语言处理的创造潜力
ai中如何建立阴影_在投资管理中采用AI：公司如何成功建立
ibm watson_IBM Watson Assistant与Web聊天的集成
ai替代数据可视化_在药物发现中可视化AI初创公司

循环神经网络递归神经网络_递归神经网络-第3部分相关推荐

人工神经网络心得体会_人工神经网络
内容介绍原文档由会员天缘发布人工神经网络页数 44 字数 22434 摘要人工神经网络是由一些类似人脑神经元的简单处理单元相互连接而成的复杂网络.已涌现出许多不同类型的ANN及相应的学习算 ...
递归函数非递归化_递归神秘化
递归函数非递归化 by Sachin Malhotra 由Sachin Malhotra 递归神秘化 (Recursion Demystified) In order to understand re ...
递归尾递归_递归，递归，递归
递归尾递归 by Michael Olorunnisola 通过Michael Olorunnisola 递归,递归,递归 (Recursion, Recursion, Recursion) Bef ...
java 递归尾递归_递归和尾递归
C允许一个函数调用其本身,这种调用过程被称作递归(recursion). 最简单的递归形式是把递归调用语句放在函数结尾即恰在return语句之前.这种形式被称作尾递归或者结尾递归,因为递归调用出现在函 ...
易语言神经网络验证码识别_递归神经网络 GRU+CTC+CNN 教会验证码识别
利用 NLP 技术做简单数据可视化分析 Chat 简介: 用递归神经网络采用端到端识别图片文字,递归神经网络大家最早用 RNN ,缺陷造成梯度消失问题:然后采用了 LSTM,解决 RNN 问题,并且大 ...
利用循环神经网络生成唐诗_进化神经网络基本概念入门
深入介绍了神经进化,其理论基础和该领域的标志性研究. 这篇博客文章是我关于该主题的系列文章中的第一篇文章. 神经进化是一种机器学习技术,可通过进化算法生成越来越好的拓扑,权重和超参数,从而改善作为人工 ...
python神经网络预测股票_用神经网络预测股票市场
作者:Vivek Palaniappan 编译:NumberOne 机器学习和深度学习已经成为定量对冲基金常用的新的有效策略,以最大化其利润.作为一名人工智能和金融爱好者,这是令人激动的消息,因为它结 ...
神经网络目标跟踪_图神经网络的多目标跟踪
神经网络目标跟踪 Multiple object tracking(MOT) is the task of studying object appearance and movements to a ...
python bp神经网络异或_【神经网络】BP算法解决XOR异或问题MATLAB版
第一种 %% %用神经网络解决异或问题 clear clc close ms=4;%设置4个样本 a=[0 0;0 1;1 0;1 1];%设置输入向量 y=[0,1,1,0];%设置输出向量 n=2 ...
人工神经网络心得体会_卷积神经网络学习心得
萌新小白一只,刚刚接触AI,在遍历人工智能发展时就看到了"卷积神经网络",顿时想到了去年被概率论支配的恐惧,因此想在这里分享一点经验来帮助大家更好理解. 所谓"卷积神经网 ...

循环神经网络递归神经网络_递归神经网络-第3部分

有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING)

导航 (Navigation)

RNN民间音乐 (RNN Folk Music)

链接 (Links)

相关文章：

循环神经网络递归神经网络_递归神经网络-第3部分相关推荐

最新文章

热门文章

循环神经网络 递归神经网络_递归神经网络-第3部分

有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING)

导航 (Navigation)

RNN民间音乐 (RNN Folk Music)

链接 (Links)

相关文章：

循环神经网络 递归神经网络_递归神经网络-第3部分相关推荐

最新文章

热门文章

循环神经网络递归神经网络_递归神经网络-第3部分

循环神经网络递归神经网络_递归神经网络-第3部分相关推荐