一、环境搭建

测试环境:ubuntu16.04,kinetic

下载openai_ros

git clone https://bitbucket.org/theconstructcore/openai_ros.git

openai_ros相关的依赖:

message_runtime rospy gazebo_msgs std_msgs geometry_msgs controller_manager_msgs

例如你没有相关依赖就会报错:

安装controller_manager_msgs:

sudo apt-get install ros-kinetic-controller-manager-msgs

相关例子:

二、用openai_ros创建一个TurtleBot训练脚本

在catkin_ws / src目录下创建一个my_turtlebot2_training包:

再创建一个start_training.py的Python文件, 并添加可执行权限 chmod +x:

#!/usr/bin/env pythonimport gym
import numpy
import time
import qlearn
from gym import wrappers
# ROS packages required
import rospy
import rospkg
from openai_ros.openai_ros_common import StartOpenAI_ROS_Environmentif __name__ == '__main__':rospy.init_node('example_turtlebot2_maze_qlearn',anonymous=True, log_level=rospy.WARN)# Init OpenAI_ROS ENVtask_and_robot_environment_name = rospy.get_param('/turtlebot2/task_and_robot_environment_name')env = StartOpenAI_ROS_Environment(task_and_robot_environment_name)# Create the Gym environmentrospy.loginfo("Gym environment done")rospy.loginfo("Starting Learning")# Set the logging systemrospack = rospkg.RosPack()pkg_path = rospack.get_path('turtle2_openai_ros_example')outdir = pkg_path + '/training_results'env = wrappers.Monitor(env, outdir, force=True)rospy.loginfo("Monitor Wrapper started")last_time_steps = numpy.ndarray(0)# Loads parameters from the ROS param server# Parameters are stored in a yaml file inside the config directory# They are loaded at runtime by the launch fileAlpha = rospy.get_param("/turtlebot2/alpha")Epsilon = rospy.get_param("/turtlebot2/epsilon")Gamma = rospy.get_param("/turtlebot2/gamma")epsilon_discount = rospy.get_param("/turtlebot2/epsilon_discount")nepisodes = rospy.get_param("/turtlebot2/nepisodes")nsteps = rospy.get_param("/turtlebot2/nsteps")running_step = rospy.get_param("/turtlebot2/running_step")# Initialises the algorithm that we are going to use for learningqlearn = qlearn.QLearn(actions=range(env.action_space.n),alpha=Alpha, gamma=Gamma, epsilon=Epsilon)initial_epsilon = qlearn.epsilonstart_time = time.time()highest_reward = 0# Starts the main training loop: the one about the episodes to dofor x in range(nepisodes):rospy.logdebug("############### WALL START EPISODE=>" + str(x))cumulated_reward = 0done = Falseif qlearn.epsilon > 0.05:qlearn.epsilon *= epsilon_discount# Initialize the environment and get first state of the robotobservation = env.reset()state = ''.join(map(str, observation))# Show on screen the actual situation of the robot# env.render()# for each episode, we test the robot for nstepsfor i in range(nsteps):rospy.logwarn("############### Start Step=>" + str(i))# Pick an action based on the current stateaction = qlearn.chooseAction(state)rospy.logwarn("Next action is:%d", action)# Execute the action in the environment and get feedbackobservation, reward, done, info = env.step(action)rospy.logwarn(str(observation) + " " + str(reward))cumulated_reward += rewardif highest_reward < cumulated_reward:highest_reward = cumulated_rewardnextState = ''.join(map(str, observation))# Make the algorithm learn based on the resultsrospy.logwarn("# state we were=>" + str(state))rospy.logwarn("# action that we took=>" + str(action))rospy.logwarn("# reward that action gave=>" + str(reward))rospy.logwarn("# episode cumulated_reward=>" +str(cumulated_reward))rospy.logwarn("# State in which we will start next step=>" + str(nextState))qlearn.learn(state, action, reward, nextState)if not (done):rospy.logwarn("NOT DONE")state = nextStateelse:rospy.logwarn("DONE")last_time_steps = numpy.append(last_time_steps, [int(i + 1)])breakrospy.logwarn("############### END Step=>" + str(i))#raw_input("Next Step...PRESS KEY")# rospy.sleep(2.0)m, s = divmod(int(time.time() - start_time), 60)h, m = divmod(m, 60)rospy.logerr(("EP: " + str(x + 1) + " - [alpha: " + str(round(qlearn.alpha, 2)) + " - gamma: " + str(round(qlearn.gamma, 2)) + " - epsilon: " + str(round(qlearn.epsilon, 2)) + "] - Reward: " + str(cumulated_reward) + "     Time: %d:%02d:%02d" % (h, m, s)))rospy.loginfo(("\n|" + str(nepisodes) + "|" + str(qlearn.alpha) + "|" + str(qlearn.gamma) + "|" + str(initial_epsilon) + "*" + str(epsilon_discount) + "|" + str(highest_reward) + "| PICTURE |"))l = last_time_steps.tolist()l.sort()# print("Parameters: a="+str)rospy.loginfo("Overall score: {:0.2f}".format(last_time_steps.mean()))rospy.loginfo("Best 100 score: {:0.2f}".format(reduce(lambda x, y: x + y, l[-100:]) / len(l[-100:])))env.close()

创建了一个强化学习训练循环,并使用 Q-learning 强化学习进行训练。

每次培训都有一个关联的配置文件,其中包含该任务所需的参数,在包中,创建一个名为config的新文件夹,并在其中创建一个名为my_turtlebot2_maze_params.yaml的新文件:

turtlebot2: #namespacetask_and_robot_environment_name: 'MyTurtleBot2Wall-v0'ros_ws_abspath: "/home/user/simulation_ws"running_step: 0.04 # amount of time the control will be executedpos_step: 0.016     # increment in position for each command#qlearn parametersalpha: 0.1gamma: 0.7epsilon: 0.9epsilon_discount: 0.999nepisodes: 500nsteps: 10000running_step: 0.06 # Time for each step

参数分为两个不同的部分:

环境相关参数:取决于你选择的RobotEnvironmentTaskEnvironment

RL算法参数:那些取决于你选择的RL算法

创建启动文件夹。在启动文件夹中,创建一个名为start_training.launch的新文件:

<?xml version="1.0" encoding="UTF-8"?>
<launch><!-- This version uses the openai_ros environments --><rosparam command="load" file="$(find my_turtlebot2_training)/config/turtlebot2_openai_qlearn_params_wall_v2.yaml" /><!-- Launch the training system --><node pkg="my_turtlebot2_training" name="turtlebot2_maze" type="my_start_qlearning_wall_v2.py" output="screen"/>
</launch>

在终端中启动代码:

roslaunch my_turtlebot2_training start_training.launch

其中会遇到的问题:

未安装gym模块 ImportError: No module named gym


git clone https://github.com/openai/gym
cd gym
pip install -e .

缺qlearn

openai_ros教程( ros gazebo 深度强化学习)相关推荐

  1. 【Tensorflow教程笔记】深度强化学习(DRL)

    基础 TensorFlow 基础 TensorFlow 模型建立与训练 基础示例:多层感知机(MLP) 卷积神经网络(CNN) 循环神经网络(RNN) 深度强化学习(DRL) Keras Pipeli ...

  2. 用Turtlebot3实现基于深度强化学习的多移动机器人导航避障的仿真训练(附源码)

    Do not blindly trust anything I say, try to make your own judgement. 这是我的第一篇CSDN文章,本科四年一直都是白嫖现成的CSDN ...

  3. (转) 深度强化学习综述:从AlphaGo背后的力量到学习资源分享(附论文)

    本文转自:http://mp.weixin.qq.com/s/aAHbybdbs_GtY8OyU6h5WA 专题 | 深度强化学习综述:从AlphaGo背后的力量到学习资源分享(附论文) 原创 201 ...

  4. ROS开发笔记(10)——ROS 深度强化学习dqn应用之tensorflow版本(double dqn/dueling dqn/prioritized replay dqn)

    ROS开发笔记(10)--ROS 深度强化学习dqn应用之tensorflow版本(double dqn/dueling dqn/prioritized replay dqn) 在ROS开发笔记(9) ...

  5. 深度学习(三十八)——深度强化学习(1)教程

    教程 http://incompleteideas.net/sutton/book/the-book-2nd.html <Reinforcement Learning: An Introduct ...

  6. 【入门教程】TensorFlow 2 模型:深度强化学习

    文 /  李锡涵,Google Developers Expert 本文节选自<简单粗暴 TensorFlow 2> 本文将介绍在 OpenAI 的 gym 环境下,使用 TensorFl ...

  7. 深度强化学习控制六足机器人移动个人学习笔记(一)

    深度强化学习控制六足机器人移动个人学习笔记(一) 注意事项 ubuntu18对应的ros版本不是Kinect gym算法执行中部分包要求Python不低于3.6 conda虚拟环境安装gym-gaze ...

  8. 论文研读——基于深度强化学习的自动驾驶汽车运动规划研究综述

    论文研读--Survey of Deep Reinforcement Learning for Motion Planning of Autonomous V ehicles 此篇文章为论文的学习笔记 ...

  9. 深度强化学习控制移动机器人

    联系方式:860122112@qq.com 一.实验目的 使用深度强化学习控制移动机器人在复杂环境中避障.收集物品到指定点.所用到的算法包括DQN.Deuling-DDQN.A3C.DDPG.NAF. ...

  10. DeepMind悄咪咪开源三大新框架,深度强化学习落地希望再现

    作者 | Jesus Rodriguez 译者 | 夕颜 出品 | AI科技大本营(ID:rgznai100) [导读]近几年,深度强化学习(DRL)一直是人工智能取得最大突破的核心.尽管取得了很多进 ...

最新文章

  1. 深度概览卷积神经网络全景图,没有比这更全的了
  2. Shiny平台构建与R包开发(一)——ui布局
  3. 题解 P3811 【【模板】乘法逆元】
  4. Nessus Plugins Download Fail
  5. 战队不显示名字了_年仅17岁的新人选手!峡谷之巅1200分!被16家战队哄抢
  6. fzu 2077 The tallest tree
  7. python程序-第一个Python程序——在屏幕上输出文本
  8. Python 语言程序设计(4-3) Random 随机库
  9. 如何解决Mac苹果电脑使用Safari浏览器时无法在地址栏搜索?
  10. 网站页面黑白色效果实现技巧
  11. android9 apk自动安装功能,Android自动安装APK
  12. 用pano2vr创建全景图
  13. 发现了吗?西部世界III在Broadway上的广告
  14. h3c服务器登录密码修改,H3C交换机配置ssh密码验证登录方式
  15. javascript 模拟退格键_javascript禁止Backspace退格键的多种方法
  16. 【每日一题】电话号码
  17. 求矩阵伪逆的matlab方法,手把手教学
  18. iOS 苹果2.1大礼包解决方法
  19. 计算机慢的解决方法,电脑很卡反应很慢该如何处理【解决方法】
  20. zookeeper 删除节点时报错java.io.IOException: Packet len4272892 is out of range!

热门文章

  1. COMSOL学习之(1)入门—简单操作及帮助
  2. openstack虚拟机支持USB 重定向(usb映射)
  3. Linux系统终端常用指令命令汇总
  4. Python异步高并发批量读取URL链接
  5. 电磁仿真原理——3. 变分法(Variationl Methods)
  6. 乐高EV3怎么运行Python?
  7. 防病毒网关、防火墙与防病毒软件功能及部署对比
  8. vs2010 快捷键大全
  9. 十天学会单片机和c语言编程 ppt,十天学会单片机和C语言编程笔记1
  10. 智能变电站基本原理和IEC61850