Gym-Fetch-Robotics-obs-reward解析

文章目录

Gym-Fetch-Robotics-obs-reward解析
前言
- 简单介绍:
- 流程图：
- 原作者的博客：

前言

简单介绍:

一共八个环境，但是它的dense-reward，都简单的离谱。
就直接计算：

-np.linalg.norm(object_pos- goal_pos, axis=-1)

这不是离谱嘛，任务完成得靠三个值的配合：

gripper_pos: move by step action.
object_pos: move by gripper.
goal_pos:sample from self._sample_goal()

而自带的reward shape只考虑了后面两个值的相对位置关系。
在初始化探索的时候，对于夹爪是否要接触Object来说，仍然是比较稀疏的任务。
因此训练pick-and-place任务的时候，特别难训练。
参考Robosuite的pick-and-place-task，可以看到奖励函数的设计如下：

Un-normalized components if using reward shaping, where the maximum is returned if not solved:
- Reaching: in [0, 0.1], proportional to the distance between the gripper and the closest object
- Grasping: in {0, 0.35}, nonzero if the gripper is grasping an object
- Lifting: in {0, [0.35, 0.5]}, nonzero only if object is grasped; proportional to lifting height
- Hovering: in {0, [0.5, 0.7]}, nonzero only if object is lifted; proportional to distance from object to bin
- Completed: in{1}.

流程图：

图太难画了，选择放弃。

原作者的博客：

https://medium.com/@Amritpal001/intro-to-robotics-in-openai-fetch-reach-env-automating-robotics-with-reinforcement-learning-part-2b7452f3a5e9

摘录一些重要的信息如下：
Object position: The object is placed randomly on the table in the 30cm x 30cm (20c x 20cm for sliding) square with the center directly under the gripper (both objects are 5cm wide).

Goal state:
For pushing, the goal state is sampled uniformly from the same square as the box position.
In the pick-and-place task the target is located in the air in order to force the robot to grasp (and not just push). The x and y coordinates of the goal position are sampled uniformly from the mentioned square and the height is sampled uniformly between 10cm and 45cm.
For sliding the goal position is sampled from a 60cm x 60cm square centered 40cm away from the initial gripper position.
For all tasks we discard initial state-goal pairs in which the goal is already satisfied.

Step -
Step defines individual action by the agent that leads to change in the state of the environment.
Each step consists of 4 parts — observations , reward , done , info