

  • Gym-Fetch-Robotics-obs-reward解析
  • 前言
    • 简单介绍:
    • 流程图:
-np.linalg.norm(object_pos- goal_pos, axis=-1)


gripper_pos: move by step action.
object_pos: move by gripper.
goal_pos:sample from self._sample_goal()

而自带的reward shape只考虑了后面两个值的相对位置关系。

Un-normalized components if using reward shaping, where the maximum is returned if not solved:
- Reaching: in [0, 0.1], proportional to the distance between the gripper and the closest object
- Grasping: in {0, 0.35}, nonzero if the gripper is grasping an object
- Lifting: in {0, [0.35, 0.5]}, nonzero only if object is grasped; proportional to lifting height
- Hovering: in {0, [0.5, 0.7]}, nonzero only if object is lifted; proportional to distance from object to bin
- Completed: in{1}.




Object position: The object is placed randomly on the table in the 30cm x 30cm (20c x 20cm for sliding) square with the center directly under the gripper (both objects are 5cm wide).

Goal state:
For pushing, the goal state is sampled uniformly from the same square as the box position.
In the pick-and-place task the target is located in the air in order to force the robot to grasp (and not just push). The x and y coordinates of the goal position are sampled uniformly from the mentioned square and the height is sampled uniformly between 10cm and 45cm.
For sliding the goal position is sampled from a 60cm x 60cm square centered 40cm away from the initial gripper position.
For all tasks we discard initial state-goal pairs in which the goal is already satisfied.

Step -
Step defines individual action by the agent that leads to change in the state of the environment.
Each step consists of 4 parts — observations , reward , done , info


