演示地址

I am of the strong opinion that Reinforcement Learning (RL) is the future of game-playing AI agents. However, to transition to a world where we can use RL in real-life games, there are many sub-problems that we need to take care of first. One such sub-problem is that it is very difficult to make a generalized AI that is good at playing more than one game which it isn’t trained on, even if the game genre is exactly the same.

我坚信强化学习(RL)是玩游戏的AI代理的未来。但是，要过渡到可以在现实游戏中使用RL的世界，我们首先需要解决许多子问题。这样的子问题之一是，即使游戏类型完全相同，也很难制作出一个擅长于玩多个未经训练的游戏的通用AI。

For example, if we train a bot that is excellent at driving a car in Gran Turismo, this bot will not perform very well on other car racing games like Forza Horizon. This means for every game we need to train the RL agents from scratch, which is not ideal because the current RL algorithms are not very sample efficient and take a lot of resources to train.

例如，如果我们在Gran Turismo中训练出一个擅长驾驶汽车的机器人，那么该机器人在诸如Forza Horizon之类的其他赛车游戏中的表现将不会很好。这意味着对于每个游戏，我们都需要从头开始训练RL代理，这是不理想的，因为当前的RL算法效率不高，并且需要大量资源进行训练。

This is why today I want to cover a review paper from the robotics world which could also be applied to solve our problem in the game development. It is titled “Crossing The Gap: A Deep Dive into Zero-Shot Sim-to-Real Transfer for Dynamics” and is published by Robotics Lab at Imperial College London.

这就是为什么我今天想报道机器人世界的一篇评论论文，该论文也可以用于解决我们在游戏开发中的问题。它的标题是“ 跨越鸿沟：深入研究零动态模拟到真实的动力学转换 ”，由伦敦帝国学院的机器人实验室出版。

This paper focuses on various techniques that are used in robotics research to transfer a trained model from a simulation to real life. The problem here is that various robot and environment dynamics like friction of surface, wind resistance, speed of robotic arm, etc. are ever so slightly different between the simulation and real-life, making it difficult to successfully transfer a model between the two. This means a robot arm that performs well in simulation may not perform well in real life due to these differences in dynamics, which is very similar to our problem in game AI development stated earlier.

本文重点研究机器人技术中使用的各种技术，以将经过训练的模型从仿真转移到现实生活。这里的问题是，各种机器人和环境动力学(例如表面摩擦，风阻，机械臂的速度等)在模拟和现实生活之间相差甚远，因此很难在两者之间成功地传递模型。这意味着，由于动力学方面的这些差异，在仿真中表现出色的机械臂在现实生活中可能无法获得良好的表现，这与我们前面提到的游戏AI开发中的问题非常相似。

It seems one of the better approaches to train robust RL policies is to introduce variations and randomness during training time, instead of adapting the learned policies at deployment time. By adding small randomness to the environment dynamics, the RL policy learns to generalize well due to the added noise and works much better when transferred to real-life. Similarly, adding Random Forces to this environment also seems to work just as well, if not better, than tweaking the simulation dynamics.

训练健壮的RL策略的更好方法之一似乎是在训练时引入变化和随机性 ，而不是在部署时适应所学的策略。通过为环境动力学增加较小的随机性，RL策略可以学习到归因于噪声的一般性，并且在转移到现实生活中时效果更好。同样，将随机力添加到该环境中似乎也比调整模拟动力学效果更好，甚至更好。

Something similar should be done when training game bots on a particular game. By tweaking parameters like reaction time of agents, we could build a more generalized game AI that can also be transferred to other games of similar genre.

在特定游戏上训练游戏机器人时，应执行类似的操作。通过调整诸如代理的React时间之类的参数，我们可以构建更通用的游戏AI，也可以将其转移到类似类型的其他游戏中。

有用的链接 (Useful Links)

Paper Full-Text (PDF)

论文全文(PDF)
Authors’ Video

作者的视频

Thank you for reading. If you liked this article, you may follow more of my work on Medium, GitHub, or subscribe to my YouTube channel.

感谢您的阅读。如果您喜欢这篇文章，可以关注我在Medium ， GitHub上的更多工作，或者订阅我的YouTube频道。

翻译自: https://medium.com/deepgamingai/towards-generalized-game-playing-rl-agents-b76998afa3d8

查看全文

http://www.taodudu.cc/news/show-3196781.html

计算机网络子网的特定主机地址,网络基础之IP地址和子网掩码
【计算机网络-2】计算机网络的历史
【20211106】在技术上是如何实现分布式事务_V3（TCC）
玩具维修配件服务器,有生命的“破烂”
Java接收solr动态域_Spring Data Solr创建动态域报错:org.springframework.data.solr.UncategorizedSolrException...
电子产品回收值钱吗？
学生励志创业者成为破烂王
三对象创建与回收
大学生励志创业成为破烂王
深入理解Java虚拟机（二）：Java内存回收及垃圾收集算法
RecycleBank：回收垃圾的生意模式
【JVM17】垃圾回收器
金融专业学生收卖废品，做起了“破烂王”
手机回收价格大起底，换换回收分析为什么“破烂”也值钱？
计算机垃圾回收的过程,谈谈.net对象生命周期(垃圾回收)
靠收破烂年入60万的思维操作?究竟该怎么做呢？
八旬老翁收破烂抚养六名弃儿
孩子爱收集“小破烂”？谁还没点收藏的癖好呢
震惊：没有20-30W，你有资格收破烂吗？
非诚勿扰php灯全灭,“收破烂”小伙上非诚勿扰，话没完灯全灭，孟非：这破烂你们没有...
收破烂的临沂大妈
windows 序列号
windows 7 安装asp.net必备的IIS 7 ,VS2005和SQL server 2005流程及注意
Win7产品密钥大收集
Windows 7 Ultimate Magic Key(更新至65枚)【2010.01.19、神KEY之后的疯狂】
win7介绍
win7桌面右下角提示副本不是正版怎么办？
今晨Windows7系统和Office2010安装经历
关于thinkpad安装windows7屏幕亮度调节的解决方案
Windows 7系统安装图解

面向广义的rl代理商相关推荐

经销商、代理商、分销商的关系
目录经销商经销商概述代理商概念经销商.代理商.分销商的关系经销商编辑本段经销商,就是在某一区域和领域只拥有销售或服务的单位或个人.这个就是经销商.经销商具有独立的经营机构 ,拥有商品的 ...
通过评估假设行为来学习人类目标
来源| deepmind 编译| 武明利,责编| Carol 出品 | AI科技大本营(ID:rgznai100) 当我们在现实世界中训练强化学习(RL)代理时,我们不会希望它们探索不安全的状态,例如 ...
强化学习，路在何方？
↑↑↑关注后"星标"Datawhale 每日干货 & 每月组队学习,不错过 Datawhale干货来源:DeepRL实验室,转自:睿慕课 ▌一.深度强化学习的泡沫 201 ...
83篇文献-万字总结强化学习之路
深度强化学习实验室报道作者:侯宇清,陈玉荣编辑:DeepRL 深度强化学习是深度学习与强化学习相结合的产物,它集成了深度学习在视觉等感知问题上强大的理解能力,以及强化学习的决策能力,实现了端到端学 ...
万字总结83篇文献：深度强化学习之炒作、反思、回归本源
来源:深度强化学习实验室本文约15000字,建议阅读10+分钟本文为你深入浅出.全面系统总结强化学习的发展及未来展望. 深度强化学习是深度学习与强化学习相结合的产物,它集成了深度学习在视觉等感知问 ...
83篇文献、万字总结开启你的强化学习之路！
作者:侯宇清陈玉荣来源:深度强化学习实验室本文约13000字,建议阅读15+分钟本文将阐述深度强化学习的发展现状,并对未来进行展望. 标签:强化学习深度强化学习是深度学习与强化学习相结合的产 ...
泡沫破裂之后，强化学习路在何方？
作者|侯宇清.陈玉荣来源|智能单元编辑|Debra 一.深度强化学习的泡沫 2015 年,DeepMind 的 Volodymyr Mnih 等研究员在<自然>杂志上发表论文 Human- ...
python3 命令行参数处理库 argparse、docopt、click、fire 简介
一.前言在近半年的 Python 命令行旅程中,我们依次学习了 argparse.docopt.click 和 fire 库的特点和用法,逐步了解到 Python 命令行库的设计哲学与演变. 本文作 ...
python命令行大全-用什么库写 Python 命令行程序(示例代码详解)
一.前言在近半年的 Python 命令行旅程中,我们依次学习了 argparse . docopt . click 和 fire 库的特点和用法,逐步了解到 Python 命令行库的设计哲学与演变. ...

面向广义的rl代理商

有用的链接 (Useful Links)

相关文章：

面向广义的rl代理商相关推荐

最新文章

热门文章