openai_ros教程( ros gazebo 深度强化学习)
一、环境搭建
测试环境:ubuntu16.04,kinetic
下载openai_ros
git clone https://bitbucket.org/theconstructcore/openai_ros.git
openai_ros相关的依赖:
message_runtime rospy gazebo_msgs std_msgs geometry_msgs controller_manager_msgs
例如你没有相关依赖就会报错:
安装controller_manager_msgs:
sudo apt-get install ros-kinetic-controller-manager-msgs
相关例子:
二、用openai_ros创建一个TurtleBot训练脚本
在catkin_ws / src目录下创建一个my_turtlebot2_training包:
再创建一个start_training.py的Python文件, 并添加可执行权限 chmod +x:
#!/usr/bin/env pythonimport gym
import numpy
import time
import qlearn
from gym import wrappers
# ROS packages required
import rospy
import rospkg
from openai_ros.openai_ros_common import StartOpenAI_ROS_Environmentif __name__ == '__main__':rospy.init_node('example_turtlebot2_maze_qlearn',anonymous=True, log_level=rospy.WARN)# Init OpenAI_ROS ENVtask_and_robot_environment_name = rospy.get_param('/turtlebot2/task_and_robot_environment_name')env = StartOpenAI_ROS_Environment(task_and_robot_environment_name)# Create the Gym environmentrospy.loginfo("Gym environment done")rospy.loginfo("Starting Learning")# Set the logging systemrospack = rospkg.RosPack()pkg_path = rospack.get_path('turtle2_openai_ros_example')outdir = pkg_path + '/training_results'env = wrappers.Monitor(env, outdir, force=True)rospy.loginfo("Monitor Wrapper started")last_time_steps = numpy.ndarray(0)# Loads parameters from the ROS param server# Parameters are stored in a yaml file inside the config directory# They are loaded at runtime by the launch fileAlpha = rospy.get_param("/turtlebot2/alpha")Epsilon = rospy.get_param("/turtlebot2/epsilon")Gamma = rospy.get_param("/turtlebot2/gamma")epsilon_discount = rospy.get_param("/turtlebot2/epsilon_discount")nepisodes = rospy.get_param("/turtlebot2/nepisodes")nsteps = rospy.get_param("/turtlebot2/nsteps")running_step = rospy.get_param("/turtlebot2/running_step")# Initialises the algorithm that we are going to use for learningqlearn = qlearn.QLearn(actions=range(env.action_space.n),alpha=Alpha, gamma=Gamma, epsilon=Epsilon)initial_epsilon = qlearn.epsilonstart_time = time.time()highest_reward = 0# Starts the main training loop: the one about the episodes to dofor x in range(nepisodes):rospy.logdebug("############### WALL START EPISODE=>" + str(x))cumulated_reward = 0done = Falseif qlearn.epsilon > 0.05:qlearn.epsilon *= epsilon_discount# Initialize the environment and get first state of the robotobservation = env.reset()state = ''.join(map(str, observation))# Show on screen the actual situation of the robot# env.render()# for each episode, we test the robot for nstepsfor i in range(nsteps):rospy.logwarn("############### Start Step=>" + str(i))# Pick an action based on the current stateaction = qlearn.chooseAction(state)rospy.logwarn("Next action is:%d", action)# Execute the action in the environment and get feedbackobservation, reward, done, info = env.step(action)rospy.logwarn(str(observation) + " " + str(reward))cumulated_reward += rewardif highest_reward < cumulated_reward:highest_reward = cumulated_rewardnextState = ''.join(map(str, observation))# Make the algorithm learn based on the resultsrospy.logwarn("# state we were=>" + str(state))rospy.logwarn("# action that we took=>" + str(action))rospy.logwarn("# reward that action gave=>" + str(reward))rospy.logwarn("# episode cumulated_reward=>" +str(cumulated_reward))rospy.logwarn("# State in which we will start next step=>" + str(nextState))qlearn.learn(state, action, reward, nextState)if not (done):rospy.logwarn("NOT DONE")state = nextStateelse:rospy.logwarn("DONE")last_time_steps = numpy.append(last_time_steps, [int(i + 1)])breakrospy.logwarn("############### END Step=>" + str(i))#raw_input("Next Step...PRESS KEY")# rospy.sleep(2.0)m, s = divmod(int(time.time() - start_time), 60)h, m = divmod(m, 60)rospy.logerr(("EP: " + str(x + 1) + " - [alpha: " + str(round(qlearn.alpha, 2)) + " - gamma: " + str(round(qlearn.gamma, 2)) + " - epsilon: " + str(round(qlearn.epsilon, 2)) + "] - Reward: " + str(cumulated_reward) + " Time: %d:%02d:%02d" % (h, m, s)))rospy.loginfo(("\n|" + str(nepisodes) + "|" + str(qlearn.alpha) + "|" + str(qlearn.gamma) + "|" + str(initial_epsilon) + "*" + str(epsilon_discount) + "|" + str(highest_reward) + "| PICTURE |"))l = last_time_steps.tolist()l.sort()# print("Parameters: a="+str)rospy.loginfo("Overall score: {:0.2f}".format(last_time_steps.mean()))rospy.loginfo("Best 100 score: {:0.2f}".format(reduce(lambda x, y: x + y, l[-100:]) / len(l[-100:])))env.close()
创建了一个强化学习训练循环,并使用 Q-learning 强化学习进行训练。
每次培训都有一个关联的配置文件,其中包含该任务所需的参数,在包中,创建一个名为config的新文件夹,并在其中创建一个名为my_turtlebot2_maze_params.yaml的新文件:
turtlebot2: #namespacetask_and_robot_environment_name: 'MyTurtleBot2Wall-v0'ros_ws_abspath: "/home/user/simulation_ws"running_step: 0.04 # amount of time the control will be executedpos_step: 0.016 # increment in position for each command#qlearn parametersalpha: 0.1gamma: 0.7epsilon: 0.9epsilon_discount: 0.999nepisodes: 500nsteps: 10000running_step: 0.06 # Time for each step
参数分为两个不同的部分:
环境相关参数:取决于你选择的RobotEnvironment和TaskEnvironment。
RL算法参数:那些取决于你选择的RL算法
创建启动文件夹。在启动文件夹中,创建一个名为start_training.launch的新文件:
<?xml version="1.0" encoding="UTF-8"?>
<launch><!-- This version uses the openai_ros environments --><rosparam command="load" file="$(find my_turtlebot2_training)/config/turtlebot2_openai_qlearn_params_wall_v2.yaml" /><!-- Launch the training system --><node pkg="my_turtlebot2_training" name="turtlebot2_maze" type="my_start_qlearning_wall_v2.py" output="screen"/>
</launch>
在终端中启动代码:
roslaunch my_turtlebot2_training start_training.launch
其中会遇到的问题:
未安装gym模块 ImportError: No module named gym
git clone https://github.com/openai/gym
cd gym
pip install -e .
缺qlearn
openai_ros教程( ros gazebo 深度强化学习)相关推荐
- 【Tensorflow教程笔记】深度强化学习(DRL)
基础 TensorFlow 基础 TensorFlow 模型建立与训练 基础示例:多层感知机(MLP) 卷积神经网络(CNN) 循环神经网络(RNN) 深度强化学习(DRL) Keras Pipeli ...
- 用Turtlebot3实现基于深度强化学习的多移动机器人导航避障的仿真训练(附源码)
Do not blindly trust anything I say, try to make your own judgement. 这是我的第一篇CSDN文章,本科四年一直都是白嫖现成的CSDN ...
- (转) 深度强化学习综述:从AlphaGo背后的力量到学习资源分享(附论文)
本文转自:http://mp.weixin.qq.com/s/aAHbybdbs_GtY8OyU6h5WA 专题 | 深度强化学习综述:从AlphaGo背后的力量到学习资源分享(附论文) 原创 201 ...
- ROS开发笔记(10)——ROS 深度强化学习dqn应用之tensorflow版本(double dqn/dueling dqn/prioritized replay dqn)
ROS开发笔记(10)--ROS 深度强化学习dqn应用之tensorflow版本(double dqn/dueling dqn/prioritized replay dqn) 在ROS开发笔记(9) ...
- 深度学习(三十八)——深度强化学习(1)教程
教程 http://incompleteideas.net/sutton/book/the-book-2nd.html <Reinforcement Learning: An Introduct ...
- 【入门教程】TensorFlow 2 模型:深度强化学习
文 / 李锡涵,Google Developers Expert 本文节选自<简单粗暴 TensorFlow 2> 本文将介绍在 OpenAI 的 gym 环境下,使用 TensorFl ...
- 深度强化学习控制六足机器人移动个人学习笔记(一)
深度强化学习控制六足机器人移动个人学习笔记(一) 注意事项 ubuntu18对应的ros版本不是Kinect gym算法执行中部分包要求Python不低于3.6 conda虚拟环境安装gym-gaze ...
- 论文研读——基于深度强化学习的自动驾驶汽车运动规划研究综述
论文研读--Survey of Deep Reinforcement Learning for Motion Planning of Autonomous V ehicles 此篇文章为论文的学习笔记 ...
- 深度强化学习控制移动机器人
联系方式:860122112@qq.com 一.实验目的 使用深度强化学习控制移动机器人在复杂环境中避障.收集物品到指定点.所用到的算法包括DQN.Deuling-DDQN.A3C.DDPG.NAF. ...
- DeepMind悄咪咪开源三大新框架,深度强化学习落地希望再现
作者 | Jesus Rodriguez 译者 | 夕颜 出品 | AI科技大本营(ID:rgznai100) [导读]近几年,深度强化学习(DRL)一直是人工智能取得最大突破的核心.尽管取得了很多进 ...
最新文章
- 深度概览卷积神经网络全景图,没有比这更全的了
- Shiny平台构建与R包开发(一)——ui布局
- 题解 P3811 【【模板】乘法逆元】
- Nessus Plugins Download Fail
- 战队不显示名字了_年仅17岁的新人选手!峡谷之巅1200分!被16家战队哄抢
- fzu 2077 The tallest tree
- python程序-第一个Python程序——在屏幕上输出文本
- Python 语言程序设计(4-3) Random 随机库
- 如何解决Mac苹果电脑使用Safari浏览器时无法在地址栏搜索?
- 网站页面黑白色效果实现技巧
- android9 apk自动安装功能,Android自动安装APK
- 用pano2vr创建全景图
- 发现了吗?西部世界III在Broadway上的广告
- h3c服务器登录密码修改,H3C交换机配置ssh密码验证登录方式
- javascript 模拟退格键_javascript禁止Backspace退格键的多种方法
- 【每日一题】电话号码
- 求矩阵伪逆的matlab方法,手把手教学
- iOS 苹果2.1大礼包解决方法
- 计算机慢的解决方法,电脑很卡反应很慢该如何处理【解决方法】
- zookeeper 删除节点时报错java.io.IOException: Packet len4272892 is out of range!