启发式算法与机器学习的区别

It’s been a while since I’ve started exploring Reinforcement Learning and OpenAI Gym , inspired by the amazing Boston Dynamics Spot. I’ve spent last year studying the foundations of Machine Learning and how it is applied to robotics discovering a very interesting world.

自从我开始探索《 Reinforcement Learning和OpenAI Gym以来已经有一段时间了，灵感来自令人惊叹的Boston Dynamics Spot。去年，我一直在研究机器学习的基础，以及如何将其应用于发现非常有趣的世界的机器人技术。

In order to experiment what I’ve learned, I’ve created an open-source project called rex-gym. The aim is to let an open-source quadruped robot learns domestic and generic tasks in the simulations and then successfully transfer the knowledge (Control Policies) on the real robot without any other manual tuning.

为了实验我学到的东西，我创建了一个名为rex-gym的开源项目。目的是让开放源代码的四足机器人在模拟中学习家庭任务和通用任务，然后在不进行任何其他手动调整的情况下，成功地在真实机器人上转移知识( Control Policies )。

Rex is a 12 joints robot with 3 motors (Shoulder, Leg and Foot) for each leg. The robot base model is imported in pyBullet using a URDF file and the servo motors are modelled in the motor class.

雷克斯(Rex)是一个12关节机器人，每条腿有3个电机( Shoulder ， Leg和Foot )。使用URDF文件将机器人base模型导入pyBullet并在motor类中对伺服电动机进行建模。

There is also an enhanced version that comes with a 6DOF robotic arm mounted on the top of the rack

还有一个增强版本，在机架顶部装有6DOF机械臂

学习方法(Learning approach)

This library uses the Proximal Policy Optimization (PPO) algorithm with a hybrid policy defined as:

该库使用Proximal Policy Optimization (PPO)算法，其混合策略定义为：

a(t, o) = a(t) + π(o)

It can be varied continuously from fully user-specified to entirely learned from scratch. If we want to use a user-specified policy, we can set both the lower and the upper bounds of π(o) to be zero. If we want a policy that is learned from scratch, we can set a(t) = 0 and give the feedback component π(o) a wide output range.

从完全用户指定到完全从头开始学习，它可以不断变化。如果要使用用户指定的策略，可以将π(o)的下限和上限都设置为零。如果我们想要从头开始学习的策略，则可以将a(t) = 0设置a(t) = 0并为反馈分量π(o)提供较宽的输出范围。

By varying the open loop signal and the output bound of the feedback component, we can decide how much user control is applied to the system.

通过改变开环信号和反馈组件的输出界限，我们可以决定对系统应用多少用户控制。

A twofold approach is used to implement the Rex Gym Environments: Bezier controller and Open Loop.

一种双重方法用于实现Rex Gym Environments ： Bezier controller和Open Loop 。

The Bezier controller implements a fully user-specified policy. The controller uses the Inverse Kinematics model (see model/kinematics.py) to generate the gait.

Bezier controller实施完全由用户指定的策略。控制器使用Inverse Kinematics模型(请参阅model/kinematics.py )来生成步态。

The Open Loop mode consists, in some cases, in let the system lean from scratch (setting the open loop component a(t) = 0) while others just providing a simple trajectory reference (e.g. a(t) = sin(t)).

在某些情况下， Open Loop模式包括让系统从头开始倾斜(设置开环分量a(t) = 0 )，而其他模式仅提供简单的轨迹参考(例如a(t) = sin(t) ) 。

The purpose is to compare the learned policies and scores using those two different approach.

目的是使用这两种不同的方法比较学习的策略和分数。

步态 (Gaits)

My first focus was on the basic gaits: walk, gallop, turn and stand up.

我最初的重点是基本步态：走路，疾驰，转身和站起来。

I’ve created a gym environment for each of them (tuning the action space and the reward function ) and I’ve trained the robot using both the Bezier controller and Open Loop one. The output are different but efficient gaits:

我已经为他们每个人创建了一个gym environment (调整action space和reward function )，并且使用Bezier controller和Open Loop训练了机器人。输出是不同但高效的步态：

地形(Terrains)

In order to test the policies’ robustness, I’ve integrated different uneven terrains other than the classic plane.

为了测试政策的稳健性，除经典飞机外，我还集成了其他不平坦的地形。

The next step will be training the robot to interact with the environments, grasping objects and opening doors.

下一步将是训练机器人与环境互动，抓住物体并打开门。

Stay tuned!

敬请关注！

翻译自: https://medium.com/swlh/training-a-spot-inspired-quadruped-robot-using-reinforcement-learning-678b9e5df164

启发式算法与机器学习的区别

查看全文

http://www.taodudu.cc/news/show-4139694.html

day21_Lambda表达式、函数式接口
拉格朗日松弛（二）——实例及代码
edge怎么开启沉浸式阅读_《幻塔》首测今日开启探索沉浸式开放世界_网络游戏新闻...
巴西柔术第一课：骑乘式上位技术
js基础 js函数作用域链
作用域链与原型链的区别
作用域/作用域链与原型/原型链
Javascript中的作用域，作用域链
什么是作用域链？
作用域和作用域链的理解
【前端知识之JS】JS的作用域链
JavaScript的执行机制——作用域链和闭包
js底层原理作用域和作用域链
前端面试题-10-11作用域作用域链
彻底理解js的作用域链
作用域和作用域链精解
什么是作用域和作用域链以及闭包？
作用域链与原型链
作用域链和原型链
前端面试系列-JavaScript作用域和作用域链
js延长作用域链
JavaScript中的作用域及作用域链
作用域链的理解
C#个人博客系统源码（前台+后台管理）
基于SpringBoot+Bootstrap【爱码个人博客系统】附源码
拼多多直播抽奖是什么？玩法介绍！
关于拼多多的一些分析//2021-2-26
拓嘉启远：拼多多38“福女节”优惠如何参与
拓嘉辰丰：把握活动规则，玩转拼多多万人团
没有安装拼多多却总是弹出拼多多的广告

启发式算法与机器学习的区别_使用强化学习训练受启发的四足机器人相关推荐

深度强化学习和强化学习_深度强化学习：从哪里开始
深度强化学习和强化学习 by Jannes Klaas 简尼斯·克拉斯(Jannes Klaas) 深度强化学习:从哪里开始 (Deep reinforcement learning: where t ...
电子网络发票应用系统网络不通_深度强化学习在典型网络系统中的应用综述
作者:郑莹,段庆洋,林利祥,游新宇,徐跃东,王新摘要:近几年来,以深度强化学习(Deep Reinforcement Learning,DRL)为代表的人工智能技术被引入计算机网络系统设计中 ...
【经验】深度强化学习训练与调参技巧
来源:知乎(https://zhuanlan.zhihu.com/p/482656367) 作者:岳小飞天下苦 RL 久矣,其中最苦的地方莫过于训练和调参了,人人欲"调"之而后快 ...
上海交大开源训练框架，支持大规模基于种群多智能体强化学习训练
机器之心专栏作者:上海交大和UCL多智能体强化学习研究团队基于种群的多智能体深度强化学习(PB-MARL)方法在星际争霸.王者荣耀等游戏AI上已经得到成功验证,MALib 则是首个专门面向 PB- ...
利用AI强化学习训练50级比卡超单挑70级超梦！
强化学习(Reinforcement Learning, RL),是机器学习的范式和方法论之一,用于描述和解决智能体(agent)在与环境的交互过程中通过学习策略以达成回报最大化或实现特定目标的问题. ...
田渊栋的2021年终总结：多读历史！历史就是一个大规模强化学习训练集
视学算法报道作者:田渊栋编辑:好困 LRS [新智元导读]田渊栋博士最近又在知乎上发表了他的2021年度总结,成果包括10篇Paper和1部长篇小说及续集.文章中还提到一些研究心得和反思, ...
从“小”培养AI安全意识：OpenAI开源最新强化学习训练工具，安全约束自由定制，开箱即用...
鱼羊发自凹非寺量子位报道 | 公众号 QbitAI 强化学习(RL)很强,能训练出会用鸡贼策略的星际宗师级玩家. △AlphaStar打出cannon rush 但强化学习也很危险,因为它的套 ...
gazebo 直接获取传感器数据_【ROS-Gazebo】IMU插件使用与数据采集——以四足机器人pigot为例...
最近在琢磨别的事情,Gazebo探索上面进展不大,但也有一些收获,秉承慢慢写的佛系态度记录一下:pigot四足项目的步态改进,前行换成了摆线步态,加入了斜向步态 Gazebo-IMU(惯性测量单元)插 ...
谷歌造了个虚拟足球场，让AI像打FIFA一样做强化学习训练丨开源有API
郭一璞发自苏州街量子位报道 | 公众号 QbitAI 除了下棋.雅达利游戏和星际,AI终于把"魔爪"伸向了粉丝众多的体育竞技活动: 足球. 今天,谷歌开源了足球模拟环境G ...
迷你四足机器人制作_从0到1
前言本文基于STM32F103C8T6作为主控,实现单腿二自由度的舵机驱动小四足:详细介绍了从简单原理到硬件组成到代码实现各个部分. 楼主认为非常适合作为单片机入门的项目,既可简单实现, ...

启发式算法与机器学习的区别_使用强化学习训练受启发的四足机器人

学习方法(Learning approach)

步态 (Gaits)

地形(Terrains)

相关文章：

启发式算法与机器学习的区别_使用强化学习训练受启发的四足机器人相关推荐

最新文章

热门文章