Customizing RL Training
Users can customize the core MDP components — observations, commands, actions, rewards, events, and terminations — when training RL agents using GRID. By configuring these elements, you can tailor environment feedback and control interfaces to fit specific robotics tasks and training objectives.
Observations
Observations list each measurable quantity the environment can return, the configuration options for specifying which asset or sensor to observe, and a description of what the measurement represents.
Observation Function | Configurable Options | Description |
---|---|---|
base_pos_z | asset_cfg.name | Root height in the simulation world frame. |
base_lin_vel | asset_cfg.name | Root linear velocity in the asset’s root frame. |
base_ang_vel | asset_cfg.name | Root angular velocity in the asset’s root frame. |
projected_gravity | asset_cfg.name | Gravity projection on the asset’s root frame. |
root_pos_w | asset_cfg.name | Asset root position in the environment frame. |
root_quat_w | make_quat_unique (bool ), asset_cfg.name | Asset root orientation (w, x, y, z) in the environment frame. |
root_lin_vel_w | asset_cfg.name | Asset root linear velocity in the environment frame. |
root_ang_vel_w | asset_cfg.name | Asset root angular velocity in the environment frame. |
body_pose_w | asset_cfg.name, body_ids (list[int] ) | Flattened body poses of the asset w.r.t. environment origin (x, y, z, qw, qx, qy, qz) for each configured body. |
body_projected_gravity_b | asset_cfg.name, body_ids (list[int] ) | Direction of gravity projected onto each body frame (x, y, z) for configured bodies. |
joint_pos | asset_cfg.name | Joint positions of the asset. |
joint_pos_rel | asset_cfg.name | Joint positions relative to default joint positions. |
joint_pos_limit_normalized | asset_cfg.name | Joint positions normalized by the asset’s soft position limits. |
joint_vel | asset_cfg.name | Joint velocities of the asset. |
joint_vel_rel | asset_cfg.name | Joint velocities relative to default joint velocities. |
joint_effort | asset_cfg.name | Joint applied effort (torque) for configured joints. |
height_scan | sensor_cfg.name, offset (float) | Height scan from the sensor’s frame minus the given offset. |
body_incoming_wrench | asset_cfg.name, body_ids (list[int] ) | 6-D spatial wrench (force, torque) applied to each configured body link by joint forces. |
imu_orientation | asset_cfg.name | IMU sensor orientation in world frame as quaternion (w, x, y, z). |
imu_ang_vel | asset_cfg.name | IMU sensor angular velocity (rad/s) in sensor frame. |
imu_lin_acc | asset_cfg.name | IMU sensor linear acceleration (m/s²) in sensor frame. |
image | sensor_cfg.name, data_type (str ), convert_perspective_to_orthogonal (bool ), normalize (bool ) | Images from a camera or ray-caster camera (e.g., “rgb”, “depth”). Optionally normalize and orthogonalize depth. |
image_features | sensor_cfg.name, data_type (str ), model_zoo_cfg (dict ), model_name (str ), model_device (str ) | Extracted image features via a pretrained encoder (e.g., ResNet, Theia). |
last_action | action_name (str or None) | The last input action term (or full action tensor if None). |
generated_commands | command_name (str ) | The generated command tensor for the given command term. |
Commands
Commands define the functions available to sample or set asset poses or velocities, along with the configurable parameters that control how those commands are generated or applied.
Command Function | Configurable Options | Description |
---|---|---|
UniformPose2dCommand | asset_name, cfg.ranges.pos_x (tuple ), cfg.ranges.pos_y (tuple ), cfg.ranges.heading (tuple ), cfg.simple_heading (bool ), cfg.resampling_time_range (tuple ) | Sample a random 2D position around the environment origin and a heading (either toward target or uniform). |
TerrainBasedPose2dCommand | asset_name, cfg.ranges.heading (tuple ), cfg.simple_heading (bool ), cfg.resampling_time_range (tuple ) | Sample a random 2D position from valid terrain patches and a heading (either toward target or uniform). |
UniformPoseCommand | asset_name, body_name, cfg.ranges.pos_x (tuple ), cfg.ranges.pos_y (tuple ), cfg.ranges.pos_z (tuple ), cfg.ranges.roll (tuple ), cfg.ranges.pitch (tuple ), cfg.ranges.yaw (tuple ), cfg.make_quat_unique (bool ), cfg.resampling_time_range (tuple ) | Sample a random 3D pose (position + quaternion) in the robot’s base frame uniformly within specified ranges. |
UniformVelocityCommand | asset_name, cfg.ranges.lin_vel_x (tuple ), cfg.ranges.lin_vel_y (tuple ), cfg.ranges.ang_vel_z (tuple ), cfg.heading_command (bool ), cfg.ranges.heading (tuple ), cfg.rel_heading_envs (float ), cfg.rel_standing_envs (float ), cfg.resampling_time_range (tuple ) | Sample a random base-frame linear (x,y) and yaw velocity (either uniform or based on heading error), with optional standing/no-motion environments. |
NormalVelocityCommand | asset_name, cfg.ranges.mean_vel (tuple of 3), cfg.ranges.std_vel (tuple of 3), cfg.ranges.zero_prob (tuple of 3), cfg.rel_standing_envs (float ), cfg.resampling_time_range (tuple ) | Sample a base-frame velocity command from a normal distribution (with element-wise zeroing and optional standing). |
Actions
Actions enumerate the terms that transform high-level input vectors into specific joint or base control targets — such as positions, velocities, or torques—along with their associated configuration options.
Action Term | Configurable Options | One-line Meaning |
---|---|---|
BinaryJointPositionAction | cfg.joint_names (list of joint names), cfg.open_command_expr (string ), cfg.close_command_expr (string ) | Map a binary (open/close) action to joint position targets for specified joints. |
BinaryJointVelocityAction | cfg.joint_names (list of joint names), cfg.open_command_expr (string ), cfg.close_command_expr (string ) | Map a binary (open/close) action to joint velocity targets for specified joints. |
JointPositionToLimitsAction | cfg.joint_names (list ), cfg.scale (float or dict mapping joint→scale), cfg.rescale_to_limits (bool ) | Scale raw inputs and (optionally) rescale to joint position limits before applying as position targets. |
EMAJointPositionToLimitsAction | cfg.joint_names (list ), cfg.scale (float or dict ), cfg.rescale_to_limits (bool ), cfg.alpha (float or dict mapping joint→EMA weight) | Like JointPositionToLimitsAction, but applies exponential moving average (α) over processed position targets. |
JointPositionAction | cfg.joint_names (list ), cfg.scale (float or dict ), cfg.offset (float or dict ), cfg.use_default_offset (bool ) | Affinely transform input actions (scale + offset) and send as joint position targets. |
RelativeJointPositionAction | cfg.joint_names (list ), cfg.scale (float or dict ), cfg.offset (float or dict ), cfg.use_zero_offset (bool ) | Affinely transform input and add to current joint positions before sending as position commands. |
JointVelocityAction | cfg.joint_names (list ), cfg.scale (float or dict ), cfg.offset (float or dict ), cfg.use_default_offset (bool ) | Affinely transform input and send as joint velocity targets. |
JointEffortAction | cfg.joint_names (list ), cfg.scale (float or dict ), cfg.offset (float or dict ) | Affinely transform input and send as joint effort targets. |
NonHolonomicAction | cfg.x_joint_name (str ), cfg.y_joint_name (str ), cfg.yaw_joint_name (str ), cfg.body_name (str ), cfg.scale (tuple[2]), cfg.offset (tuple[2] ) | Map a 2D action (forward speed, yaw rate) to dummy joint velocities (x, y, yaw) for a skid-steer base. |
DifferentialInverseKinematicsAction | cfg.joint_names (list ), cfg.body_name (str ), cfg.scale (list of floats ), cfg.controller (DifferentialIK config), cfg.body_offset (optional pos/rot tuples) | Apply scaled end-effector pose commands through a differential IK controller to produce joint position targets. |
Rewards
Rewards describe the available functions for assigning scalar feedback to agent behaviors, detail any options required to configure them, and explain the physical or task-related criterion they encourage or penalize.
Reward Function | Configurable Options | Description |
---|---|---|
is_alive | none | +1 if the episode is still running (not terminated). |
is_terminated | none | +1 if the episode just terminated (penalizes any termination). |
is_terminated_term | term_keys (string , list of strings or regex) | Sum of specified non-timeout termination terms. |
lin_vel_z_l2 | asset_cfg.name | Penalize squared z-axis base linear velocity. |
ang_vel_xy_l2 | asset_cfg.name | Penalize squared x/y components of base angular velocity. |
flat_orientation_l2 | asset_cfg.name | Penalize deviation from a flat base orientation (projected gravity). |
base_height_l2 | target_height (float ), asset_cfg.name | Penalize squared deviation of base height from target. |
body_lin_acc_l2 | asset_cfg.name | Penalize summed L2 norm of all body linear accelerations. |
joint_torques_l2 | asset_cfg.name | Penalize squared torques on configured joints. |
joint_vel_l1 | asset_cfg.name | Penalize absolute joint velocities (L1). |
joint_vel_l2 | asset_cfg.name | Penalize squared joint velocities. |
joint_acc_l2 | asset_cfg.name | Penalize squared joint accelerations. |
joint_deviation_l1 | asset_cfg.name | Penalize absolute deviation from default joint positions. |
joint_pos_limits | asset_cfg.name | Penalize violations of soft joint-position limits. |
joint_vel_limits | soft_ratio (float ), asset_cfg.name | Penalize violations of soft joint-velocity limits (clipped). |
applied_torque_limits | asset_cfg.name | Penalize mismatch between applied and computed torques. |
action_rate_l2 | none | Penalize squared change in actions from one step to the next. |
action_l2 | none | Penalize squared magnitude of the actions themselves. |
undesired_contacts | threshold (float ), sensor_cfg.name | Penalize count of contact events whose force exceeds threshold. |
contact_forces | threshold (float ), sensor_cfg.name | Penalize amount by which contact forces exceed threshold. |
track_lin_vel_xy_exp | std (float ), command_name (str ), asset_cfg.name | Reward (exp-kernel) how well base xy-velocity matches command. |
track_ang_vel_z_exp | std (float ), command_name (str ), asset_cfg.name | Reward (exp-kernel) how well yaw velocity matches command. |
Events
Events list environment-level operations that introduce randomized or deterministic changes — such as randomizing physical properties or applying external forces—along with the parameters needed to specify how those changes are applied.
Event Function | Configurable Options | Description |
---|---|---|
randomize_rigid_body_scale | env_ids, scale_range (tuple[float, float] or dict{x, y, z → tuple[float, float]} ), asset_cfg.name, relative_child_path (optional) | Randomize the USD-scale of a rigid‐body asset (or a specified child prim) before simulation starts. |
randomize_rigid_body_material | env_ids, static_friction_range (tuple[float, float] ), dynamic_friction_range (tuple[float, float] ), restitution_range (tuple[float, float] ), num_buckets (int), asset_cfg.name, make_consistent (bool ) | Sample a set of friction/restitution “buckets” and assign them to each geometry of the specified RigidObject or Articulation. |
randomize_rigid_body_mass | env_ids, asset_cfg.name, mass_distribution_params (tuple[float, float] ), operation (“add”/“scale”/“abs”), distribution (“uniform”/“log_uniform”/“gaussian”), recompute_inertia (bool ) | Randomize per‐body mass values (add/scale/overwrite) and optionally recompute inertia tensors from default mass. |
randomize_rigid_body_com | env_ids, com_range (dict{x, y, z → tuple[float, float]} ), asset_cfg.name | Add a random offset to the center of mass of each body in the specified Articulation. |
randomize_rigid_body_collider_offsets | env_ids, rest_offset_distribution_params (tuple[float, float] ), contact_offset_distribution_params (tuple[float, float] ), asset_cfg.name, distribution (“uniform”/“log_uniform”/“gaussian”) | Randomize physics collider “rest” and/or “contact” offsets for each body shape of the specified asset. |
randomize_physics_scene_gravity | env_ids, gravity_distribution_params (tuple[list[float], list[float]]), operation (“add”/“scale”/“abs”), distribution (“uniform”/“log_uniform”/“gaussian”) | Randomize the global gravity vector (x, y, z) for the entire physics scene (same gravity for all envs). |
randomize_actuator_gains | env_ids, asset_cfg.name, stiffness_distribution_params (tuple[float, float] ), damping_distribution_params (tuple[float, float] ), operation, distribution | Randomize stiffness and/or damping gains of actuators (implicit or explicit) on specified joint IDs. |
randomize_joint_parameters | env_ids, asset_cfg.name, friction_distribution_params (tuple[float, float] ), armature_distribution_params (tuple[float, float] ), lower_limit_distribution_params (tuple[float, float] ), upper_limit_distribution_params (tuple[float, float] ), operation, distribution | Randomize joint physics properties: friction, armature, and position limits (lower/upper) by adding/scaling/overwriting. |
randomize_fixed_tendon_parameters | env_ids, asset_cfg.name, stiffness_distribution_params (tuple[float, float] ), damping_distribution_params (tuple[float, float] ), limit_stiffness_distribution_params (tuple[float, float] ), lower_limit_distribution_params (tuple[float, float] ), upper_limit_distribution_params (tuple[float, float] ), rest_length_distribution_params (tuple[float, float] ), offset_distribution_params (tuple[float, float] ), operation, distribution | Randomize fixed‐tendon stiffness, damping, limit stiffness, lower/upper position limits, rest length, and/or offset. |
apply_external_force_torque | env_ids, force_range (tuple[float, float] ), torque_range (tuple[float, float] ), asset_cfg.name | Sample and buffer random forces and torques to apply to each body of the specified asset (effective on next write). |
push_by_setting_velocity | env_ids, velocity_range (dict{x, y, z, roll, pitch, yaw → tuple[float, float]} ), asset_cfg.name | Sample a random root‐velocity (linear + angular) and set it directly on the specified asset to simulate a push. |
reset_root_state_uniform | env_ids, pose_range (dict{x, y, z, roll, pitch, yaw → tuple[float, float]} ), velocity_range (dict{x, y, z, roll, pitch, yaw → tuple[float, float]} ), asset_cfg.name | Randomize root pose (translation + rotation) uniformly around default and root velocity, then write to simulation. |
reset_root_state_with_random_orientation | env_ids, pose_range (dict{x, y, z → tuple[float, float]} ), velocity_range (dict{x, y, z, roll, pitch, yaw → tuple[float, float]} ), asset_cfg.name | Randomize root position uniformly, sample orientation uniformly from SO(3), and set velocity within given ranges. |
reset_root_state_from_terrain | env_ids, pose_range (dict{roll, pitch, yaw → tuple[float, float]} ), velocity_range (dict{x, y, z, roll, pitch, yaw → tuple[float, float]} ), asset_cfg.name | Sample a valid terrain patch (“init_pos”), set root pose on terrain + random yaw/pitch/roll, then set velocity. |
reset_joints_by_scale | env_ids, position_range (tuple[float, float] ), velocity_range (tuple[float, float] ), asset_cfg.name | Scale default joint positions and velocities by random factors (within ranges) and clamp to soft limits before writing. |
reset_joints_by_offset | env_ids, position_range (tuple[float, float] ), velocity_range (tuple[float, float] ), asset_cfg.name | Offset default joint positions and velocities by random values (within ranges) and clamp to soft limits before writing. |
reset_nodal_state_uniform | env_ids, position_range (dict{x, y, z → tuple[float, float]} ), velocity_range (dict{x, y, z → tuple[float, float]} ), asset_cfg.name | Randomize default nodal (vertex) positions and velocities of a DeformableObject uniformly within given ranges. |
reset_scene_to_default | env_ids | Restore all rigid objects, articulations (root pose + joint state), and deformables (nodal state) to their default values. |
randomize_visual_texture_material | env_ids, event_name (str ), asset_cfg.name, texture_paths (list[str] ), texture_rotation (tuple[float, float] ) | Trigger a Replicator‐based event that randomizes textures on each body’s “/visuals” prim under the specified asset. |
randomize_visual_color | env_ids, event_name (str ), asset_cfg.name, colors (list[tuple[float, float, float]] or dict{r, g, b → tuple[float, float]} ), mesh_name (str ) | Trigger a Replicator‐based event that randomizes color on the specified mesh prims under the asset. |
Terminations
Terminations specify the conditions under which an episode ends, the configuration options that define threshold values or monitored signals, and a brief explanation of what triggers each termination.
Termination Function | Configurable Options | Description |
---|---|---|
time_out | none | Terminate when episode length ≥ max_episode_length. |
command_resample | command_name (str ), num_resamples (int ) | Terminate when a command has been resampled the specified number of times. |
bad_orientation | limit_angle (float), asset_cfg.name | Terminate if the asset’s orientation deviates from vertical by > limit_angle. |
root_height_below_minimum | minimum_height (float ), asset_cfg.name | Terminate if the asset’s root height drops below the minimum height. |
joint_pos_out_of_limit | asset_cfg.name | Terminate if any joint position exceeds the soft position limits. |
joint_pos_out_of_manual_limit | bounds (tuple[min, max] ), asset_cfg.name | Terminate if any joint position falls outside the user-specified bounds. |
joint_vel_out_of_limit | asset_cfg.name | Terminate if any joint velocity exceeds the soft velocity limits. |
joint_vel_out_of_manual_limit | max_velocity (float ), asset_cfg.name | Terminate if any joint velocity exceeds the given max_velocity. |
joint_effort_out_of_limit | asset_cfg.name | Terminate if any joint’s computed torque equals its applied torque (indicating effort clipping). |
illegal_contact | threshold (float ), sensor_cfg.name | Terminate if any contact force at the sensor exceeds the threshold. |