机器学习深度学习加强学习

机器学习，强化学习 (Machine Learning, Reinforcement Learning)

You’re getting bore stuck in lockdown, you decided to play computer games to pass your time.

您因锁定而感到无聊，决定玩电脑游戏来度过您的时间。

You launched Chess and chose to play against the computer, and you lost!

您启动国际象棋并选择与计算机对战，然后输了！

But how did that happen? How can you lose against a machine that came into existence like 50 years ago?

但是那是怎么发生的呢？您如何在50年前出现的机器上遭受损失？

This is the magic of Reinforcement learning.

这是强化学习的魔力。

Reinforcement learning lies under the umbrella of Machine Learning. They aim at developing intelligent behavior in a complex dynamic environment. Nowadays since the range of AI is expanding enormously, we can easily locate their importance around us. From Autonomous Driving, Recommender Search Engines, Computer games to Robot skills, AI is playing a vital role.

强化学习属于机器学习的范畴。 他们旨在在复杂的动态环境中开发智能行为。如今，由于AI的范围正在Swift扩展，因此我们可以轻松地确定其重要性。从自动驾驶，推荐搜索引擎，计算机游戏到机器人技能，人工智能发挥着至关重要的作用。

巴甫洛夫的调理 (Pavlov’s Conditioning)

When we think about AI, we have a perception of thinking about the future, but our idea takes us back in the late 19th century, Ivan Pavlov, a Russian physiologist was studying the salivation effect in dogs. He was interested in knowing how much dogs salivate when they see food, but, while conducting the experiment, he noticed that dogs were even salivating before seeing any food. After his conclusions on that experiment, Pavlov would ring a bell before feeding them and as expected they again started salivating. The reason behind their behavior can be their ability to learn because they had learned that after the bell, they’ll be fed. Another thing to ponder is, the dog doesn’t salivate because the bell is ringing but because given past experiences he had learned that food will follow the bell.

当我们想到AI时，我们对未来有一种看法，但是我们的想法使我们回到了19世纪末，俄罗斯生理学家Ivan Pavlov正在研究狗的流涎作用。他很想知道当狗看到食物时他们会流涎多少，但是在进行实验时，他发现狗甚至在看到食物之前都在流涎。在完成该实验的结论之后，巴甫洛夫会在喂食它们之前敲响钟声，然后按预期，它们再次开始流涎。他们的行为背后的原因可能是他们的学习能力， 因为他们已经学会了在铃声响起后就会被喂饱 。需要考虑的另一件事是，狗不会因为铃响而流口水，而是因为根据过去的经验，他已经知道食物会随着铃响。

什么是强化学习？ (What is Reinforcement Learning?)

Reinforcement Learning is a part of Machine Learning techniques that enables an AI agent to interact with the environment and thus learn from its own sequence of actions and experiences.

强化学习是机器学习技术的一部分，该技术使AI代理能够与环境交互，从而从自身的动作和经验序列中学习。

For sake of illustration, imagine you’re stuck on an isolated island. You can expect yourself to freak out first, but, with no options left, you’ll start fighting for your survival. You’ll look for a place to hunt, you’ll look for a place to sleep, you’ll inspect what to eat and what to avoid. If you stay at a safe place, you’ll notice that it is correct action that has to be performed by you and, at the same instant, if you ate some animal that led to diarrhea you’ll avoid eating that in future. Your actions will become better over time and you’ll easily adjust to the new environment by learning. Reinforcement learning follows the same method wherein we expect the agent to experience the new environment, track its actions and consequences by discovering errors and rewards, and learn to get better or aims at maximizing the reward.

为了说明起见，假设您被困在一个孤立的岛屿上。您可以期望自己先发狂，但是在没有其他选择的情况下，您将开始为生存而战。您将寻找一个狩猎的地方，您将寻找一个睡觉的地方，您将检查吃什么和避免吃什么。如果您待在安全的地方，您会发现必须采取正确的措施，同时，如果您吃了一些会导致腹泻的动物，将来也将避免食用这种动物。随着时间的推移，您的动作会变得更好，并且您将通过学习轻松地适应新的环境。强化学习遵循相同的方法，在这种方法中，我们希望代理人体验新环境，通过发现错误和奖励来跟踪其行为和后果，并学会变得更好或旨在最大化奖励。

但是，与有监督的学习相比，它又如何呢？ (But, how does it compare against supervised learning?)

It is possible to use a supervised learning method instead of reinforcement learning techniques. But, for that, we need a really large dataset that would constitute every action and its consequence. Its next unfavorable outcome would be limited learning, suppose if track actions of best player but still he is not perfect and following his actions machine might become great like him but won’t be able to exceed his scores.

可以使用监督学习方法代替强化学习技术。但是，为此，我们需要一个非常大的数据集，它将构成每个动作及其后果。下一个不利的结果是学习受限，假设最佳球员的跟踪动作仍然不完美，并且跟随他的动作机器可能像他一样变得很棒，但不能超过他的得分。

而且，它如何抵制无监督学习？ (And, how does it stand against Unsupervised Learning?)

In unsupervised learning, there is no direct connection between input and output rather it aims at recognizing the patterns, on the contrary, Reinforcement learning is all about learning from the output provided by past input.

在无监督学习中，输入和输出之间没有直接联系，而是旨在识别模式，相反，强化学习就是从过去输入提供的输出中学习。

然后，它是深度学习吗？ (Then, is it Deep Learning?)

Deep learning irrefutably comes under the umbrella of Machine Learning and is capable of computing complex problems that require human-like intelligence.

深度学习不可避免地属于机器学习的范畴，并且能够计算需要类似人的智能的复杂问题。

The Venn-diagram shows the relation between all Machine Learning techniques, according to Universal Approximation Theorem(UAT), we can solve any problem using Neural Nets, but these are not necessarily an optimal solution to every problem as they require a lot of data to process and are often challenging to interpret.

维恩图显示了所有机器学习技术之间的关系，根据通用逼近定理 (UAT)，我们可以使用神经网络解决任何问题，但是由于每个问题需要大量数据才能完成，因此不一定是每个问题的最佳解决方案过程，通常很难解释。

Analyzing the figure, it shows that we are not required to use Deep Learning for every Reinforcement Learning problem that clears the myth that it doesn’t solely depend upon Deep Learning.

分析该图，它表明我们不需要对每个强化学习问题都使用深度学习，这消除了一个神话，即它并不完全依赖于深度学习 。

强化学习如何工作？ (How does Reinforcement Learning work?)

In Reinforcement Learning, we aim at the interaction of Agent and Environment.

在强化学习中，我们的目标是代理与环境的相互作用。

An Agent can be regarded as the “solution”, which is a computer program that we expect to make decisions to solve decision-making problems.

代理可以被视为“解决方案”，它是我们希望做出决策以解决决策问题的计算机程序。
An Environment can be regarded as the “problem”, which is where the decision taken by the agent is implemented.

可以将环境视为“问题”，由代理执行决定。

For example, in the case of the chess game, we can consider that the Agent is one of the players and the Environment constitute the board and competitor.

例如，在国际象棋游戏中，我们可以认为代理是参与者之一，而环境则是董事会和竞争对手。

Both components are inter-dependent in a way that the Agent tries to adjust its actions based on the influence by the Environment, and Environment reacts to Agent’s action.

这两个组件是相互依赖的，即代理试图根据环境的影响来调整其动作，而环境会对代理的动作做出React。

The Environment is bound by a set of variables that are usually associated with decision-making problems. A set of all possible values can be regarded as state space. A state is a part of state space i.e. a value the variable takes.

环境受一组通常与决策问题相关的变量的约束。所有可能值的集合可以视为状态空间 。状态是状态空间的一部分，即变量采用的值。

At each state, the Environment is entitled to provide a set of actions to the Agent, amongst whom it should choose one. The agent tries to influence the Environment using these actions and Environment may change states as a response to the Agent’s actions. Transition function is something that tracks these associations.

在每个州，环境均有权向代理提供一系列操作，环境应从中选择一个。代理尝试使用这些操作来影响环境，并且环境可能会更改状态以作为对代理操作的响应。 转换功能可以跟踪这些关联。

The Environment either reward or penalize the agent based on its actions. The Reward is the positive feedback provided if the last action of the agent is contributing to achieving a favorable goal. The Penalty is the negative feedback provided by the environment if the last action of the agent results in a deviation from the goal. The agent’s goal is to maximize the overall reward and keep making its actions better in order to achieve the desired final result.

环境根据代理的行为来奖励或惩罚代理。奖励是代理商的最后行动有助于实现有利目标时提供的积极反馈。如果代理商的最后行动导致偏离目标，则惩罚是环境提供的负面反馈。代理商的目标是最大程度地提高整体回报，并不断改善其行动，以实现所需的最终结果。

Another thing that Reinforcement learning requires is a lot of training time, as the rewards aren’t disclosed to the Agent until the end of an episode(game). e.g. if our computer is playing chess against us and it wins, then it will be rewarded (as our desired outcome was to win) but still, it needs to figure out for which actions it was rewarded and that can only be achieved when it is given a tonne of training time and data.

强化学习需要的另一件事是大量的培训时间，因为直到情节(游戏)结束时才会向特工透露奖励。例如，如果我们的计算机在对我们下棋并且获胜，那么它将得到奖励(因为我们期望的结果是获胜 )，但是仍然需要弄清楚它为哪些操作受到奖励，并且只有在获得奖励时才能实现给出了大量的培训时间和数据。

强化学习如何学习？ (Q学习) (How does Reinforcement Learning learn? (Q-learning))

Goal: To maximize the total reward

目标：最大化总回报

We expect, the rewards to come early as to make our training faster and thus quickly achieving desired outcomes.

我们希望，奖励会早日出现，以使我们的培训更快，从而Swift达到预期的效果。

But, in a real case, we encounter late rewards, and to penalize late rewards we will introduce Discount Factor().

但是，在实际情况下，我们会遇到延迟奖励，并且为了惩罚延迟奖励，我们将引入Discount Factor()。

In a real case scenario, as we move towards right, the uncertainty increases.

在实际情况下，随着我们向右走，不确定性会增加。

贝尔曼方程 (Bellman Equation)

Our goal was to maximize the reward or we can say to minimize the error(loss).

我们的目标是使报酬最大化，或者我们可以说使误差(损失)最小。

To minimize the loss, we can implement Gradient Descent using Mean-square error loss.

为了使损失最小化，我们可以使用均方误差损失实现梯度下降。

探索与开发权衡 (Exploration v/s Exploitation trade-off)

Other interesting components of Reinforcement Learning are Exploration and Exploitation. To obtain quick rewards, an Agent must follow past experiences. But to detect such actions, it has to try actions at first.

强化学习的其他有趣组成部分是探索和开发。为了获得快速的回报，代理商必须遵循过去的经验。但是要检测此类动作，首先必须尝试动作。

In nutshell, to obtain quick rewards an Agent has to exploit but it is also expected to explore to make its actions better, and thus that might help it to get a better reward.

简而言之，要想获得快速的回报，特工必须加以利用，但也希望它能探索以使其行动更好，从而有可能帮助其获得更好的报酬。

Let’s get back to the island, you’ve three spots for fishing and each is home to three types of fishes, spot 1 is habitat to black fishes which are poisonous, spot 2 is home to orange fishes that are delicious as well as nutritious and, spot 3 constitutes grey fishes that are best in terms of nutrition and taste. The goal would be not to eat blackfish and try to have a grey one.

回到岛上，您有3个钓鱼点，每个点都是三种鱼类的栖息地，第1点是有毒黑鱼的栖息地，第2点是美味又有营养的橙色鱼的栖息地，点3构成灰色鱼类是营养和口味方面最好。目标是不要吃黑鱼，而要吃灰色的鱼。

Let’s assume, on Day 1, you chose spot 1 for fishing and ended up eating a blackfish and having diarrhea. On Day 2, you reached spot 2 and ended up having a delicious meal. Now, your instincts will try to exploit the path you’ve chosen i.e. the road to spot 2 because as per your past experiences spot 2 seems to be a better policy. And, hence, your mind will be stuck in a policy where it is sacrificing for a moderate award.

假设在第1天，您选择了第1点进行钓鱼，最后吃了一条黑鱼并腹泻。在第2天，您到达了第2点，最后吃了一顿美味的饭。现在，您的直觉将尝试利用您选择的路径，即发现2的道路，因为根据您过去的经验，发现2似乎是更好的策略。因此，您的想法将停留在牺牲适度奖励的政策上。

Exploration: Helps you to try various actions; good in the beginning.

探索：帮助您尝试各种操作；一开始就很好。

Exploitation: Sample good experience from past; need memory space; good at the end

剥削：回顾过去的良好经验；需要内存空间；末日好

结论 (Conclusion)

Hopefully, this article will help you to understand about Reinforcement Learning in the best possible way and also assist you to its practical usage.

希望本文将帮助您以最佳方式了解强化学习，并帮助您进行实际使用。

As always, thank you so much for reading, and please share this article if you found it useful!

与往常一样，非常感谢您的阅读，如果发现有用，请分享这篇文章！

Feel free to connect:

随时连接：

LinkedIn ~ https://www.linkedin.com/in/dakshtrehan/

领英〜https: //www.linkedin.com/in/dakshtrehan/

Instagram ~ https://www.instagram.com/_daksh_trehan_/

Instagram〜https: //www.instagram.com/_daksh_trehan_/

Github ~ https://github.com/dakshtrehan

Github〜https: //github.com/dakshtrehan

Follow for further Machine Learning/ Deep Learning blogs.

关注更多机器学习/深度学习博客。

Medium ~ https://medium.com/@dakshtrehan

中〜https ://medium.com/@dakshtrehan

想了解更多？ (Want to learn more?)

Detecting COVID-19 Using Deep Learning

使用深度学习检测COVID-19

The Inescapable AI Algorithm: TikTok

不可避免的AI算法：TikTok

An insider’s guide to Cartoonization using Machine Learning

使用机器学习进行卡通化的内部指南

Why are YOU responsible for George Floyd’s Murder and Delhi Communal Riots?

您为什么要为乔治·弗洛伊德(George Floyd)的谋杀和德里公社暴动负责？

Decoding science behind Generative Adversarial Networks

生成对抗网络背后的解码科学

Understanding LSTM’s and GRU’s

了解LSTM和GRU

Recurrent Neural Network for Dummies

递归神经网络

Convolution Neural Network for Dummies

卷积神经网络

Diving Deep into Deep Learning

深入学习

Why Choose Random Forest and Not Decision Trees

为什么选择随机森林而不是决策树

Clustering: What it is? When to use it?

聚类：是什么？什么时候使用？

Start off your ML Journey with k-Nearest Neighbors

从k最近邻居开始您的ML旅程

Naive Bayes Explained

朴素贝叶斯解释

Activation Functions Explained

激活功能介绍

Parameter Optimization Explained

参数优化说明

Gradient Descent Explained

梯度下降解释

Logistic Regression Explained

逻辑回归解释

Linear Regression Explained

线性回归解释

Determining Perfect Fit for your ML Model

确定最适合您的ML模型

Cheers

干杯

翻译自: https://medium.com/towards-artificial-intelligence/reinforcing-the-science-behind-reinforcement-learning-d2643ca39b51

机器学习深度学习加强学习

查看全文

http://www.taodudu.cc/news/show-3296600.html

深度强化学习和强化学习_强化强化学习背后的科学
分析chrome扩展程序-Holmes
Mac上测试Internet Explorer的N种方法
美国学生在使用计算机小短文,美国中学生的一篇作文
jmeter tcp sampler 测试网速
自动解析zxw文档的javascript 代码保存
如何将旧笔记本电脑显示屏魔改扩展屏-2021-05-26
学校旧机房电脑如何改造升级成云教室客户端旧电脑改造成x86云终端
旧电脑改路由器加文件服务器,旧电脑怎么安装软路由固件变成路由器教程
电脑旧电脑,旧电脑怎么升级旧电脑升级方法【详细步骤】
闲置android平板电脑,旧电脑改造成安卓平板电脑（一）
笔记本电池9针脚图解_旧毛衣如何妙用改造成暖暖的抱枕教程图解
水墨屏开发设备，旧 Kindle 改造而成
旧电脑变废转“物理服务器”之旅
旧笔记本电脑自制nas_将旧笔记本电脑变成Windows 8平板电脑
笔记本改装家庭文件服务器,如何用一个废旧的笔记本打造一个家庭网络服务器？...
旧笔记本改造成小电视
旧笔记本改造数码相框
七年修炼，一日成仙：将淘汰笔记本改造成高性能nas
废旧笔记本改造记（1）---安装Linux系统和Docker
废旧笔记本改造安装黑群晖打造私人NAS超级详细图文教程
旧电脑改造攻略
旧笔记本电脑改造为服务器_如何为孩子设置旧笔记本电脑
php dir opendir,php中目录操作opendir()、readdir()及scandir()用法示例
php里opendir 返回值,php 目录遍历opendir函数
opendir php 返回值,php中opendir函数的用法详解
React 基础-创建元素
M4宏什么鬼
上海应用技术大学计算机类,2018年上海应用技术大学计算机学院计算机大类考研调剂信息公布...
上海程序员落户攻略

机器学习深度学习加强学习_加强强化学习背后的科学相关推荐

各类学习平台收集记录（强化学习、深度学习、机器学习）
各类学习平台收集记录(强化学习.深度学习.机器学习) 1.百度开发者中心 https://developer.baidu.com/?from=aistudio 有很多开源项目代码可以借鉴学习. 2. ...
机器学习、监督学习、非监督学习、强化学习、深度学习、迁移学习
机器学习.监督学习.非监督学习.强化学习.深度学习.迁移学习机器学习(machine learning) 监督学习(supervised learning) 非监督学习(unsupervised l ...
机器学习、监督学习、非监督学习、强化学习传统机器学习、深度学习、迁移学习基本概念
文章目录机器学习(machine learning) 监督学习(supervised learning) 非监督学习(unsupervised learning) 强化学习(reinforcemen ...
PyTorch强化学习实战（1）——强化学习环境配置与PyTorch基础
PyTorch强化学习实战(1)--强化学习环境配置与PyTorch基础 0. 前言 1. 搭建 PyTorch 环境 2. OpenAI Gym简介与安装 3. 模拟 Atari 环境 4. 模拟 ...
什么是强化学习？（主要的强化学习概念）
文章目录什么是强化学习?(主要的强化学习概念) 4.主要的强化学习概念 4.1 基于模型与无模型(Model-based vs. Model-free) 4.2 预测(prediction)和控制( ...
基于模型的强化学习比无模型的强化学习更好？错！
作者 | Carles Gelada and Jacob Buckman 编辑 | DeepRL 来源 | 深度强化学习实验室(ID:Deep-RL) [导读]许多研究人员认为,基于模型的强化学习(M ...
【David Silver强化学习公开课之一】强化学习入门
本文是David Silver强化学习公开课第一课的总结笔记.第一课主要解释了强化学习在多领域的体现,主要解决什么问题,与监督学习算法的区别,完整的算法流程由哪几部分组成,其中的agent又包含什么内 ...
初探强化学习(11)Dyna类型的强化学习
为什么研究Dyna类型的强化学习呢? 主要是因为这个类型的强化学习是融合了model-based和model free两个类型的强化学习. 主要参考的博客是这个.说实话,我也是时隔三天后,第三次看了这 ...
强化学习入门系列一VS强化学习的基本概念
文章目录强化学习的基本概念 1. 强化学习的算法步骤: 2. 强化学习和其他机器学习范式的不同 3. 强化学习的要素 a. 智能体 b. 策略函数 c. 值函数 d. 模型 4. 强化学习的环境类型 ...
王树森强化学习笔记——多智能体强化学习
多智能体强化学习想要了解更多强化学习的内容,推荐观看王树森教授的教学视频深度强化学习(王树森) 设定在之前的学习当中,我们讨论的都是单个智能体如何进行决策,然而现实中还存在需要同时控制多个智能体 ...

机器学习深度学习加强学习_加强强化学习背后的科学

机器学习，强化学习 (Machine Learning, Reinforcement Learning)

巴甫洛夫的调理 (Pavlov’s Conditioning)

什么是强化学习？ (What is Reinforcement Learning?)

但是，与有监督的学习相比，它又如何呢？ (But, how does it compare against supervised learning?)

而且，它如何抵制无监督学习？ (And, how does it stand against Unsupervised Learning?)

然后，它是深度学习吗？ (Then, is it Deep Learning?)

强化学习如何工作？ (How does Reinforcement Learning work?)

强化学习如何学习？ (Q学习) (How does Reinforcement Learning learn? (Q-learning))

贝尔曼方程 (Bellman Equation)

探索与开发权衡 (Exploration v/s Exploitation trade-off)

结论 (Conclusion)

想了解更多？ (Want to learn more?)

相关文章：

机器学习深度学习加强学习_加强强化学习背后的科学相关推荐

最新文章

热门文章

机器学习深度学习加强学习_加强强化学习背后的科学

机器学习 ，强化学习 (Machine Learning, Reinforcement Learning)

巴甫洛夫的调理 (Pavlov’s Conditioning)

什么是强化学习？ (What is Reinforcement Learning?)

但是，与有监督的学习相比，它又如何呢？ (But, how does it compare against supervised learning?)

而且，它如何抵制无监督学习？ (And, how does it stand against Unsupervised Learning?)

然后，它是深度学习吗？ (Then, is it Deep Learning?)

强化学习如何工作？ (How does Reinforcement Learning work?)

强化学习如何学习？ (Q学习) (How does Reinforcement Learning learn? (Q-learning))

贝尔曼方程 (Bellman Equation)

探索与开发权衡 (Exploration v/s Exploitation trade-off)

结论 (Conclusion)

想了解更多？ (Want to learn more?)

相关文章：

机器学习深度学习加强学习_加强强化学习背后的科学相关推荐

最新文章

热门文章

机器学习，强化学习 (Machine Learning, Reinforcement Learning)