Users can customize the core MDP components — observations, commands, actions, rewards, events, and terminations — when training RL agents using GRID. By configuring these elements, you can tailor environment feedback and control interfaces to fit specific robotics tasks and training objectives.Documentation Index
Fetch the complete documentation index at: https://docs.generalrobotics.dev/llms.txt
Use this file to discover all available pages before exploring further.
Observations
Observations list each measurable quantity the environment can return, the configuration options for specifying which asset or sensor to observe, and a description of what the measurement represents.| Observation Function | Configurable Options | Description |
|---|---|---|
| base_pos_z | asset_cfg.name | Root height in the simulation world frame. |
| base_lin_vel | asset_cfg.name | Root linear velocity in the asset’s root frame. |
| base_ang_vel | asset_cfg.name | Root angular velocity in the asset’s root frame. |
| projected_gravity | asset_cfg.name | Gravity projection on the asset’s root frame. |
| root_pos_w | asset_cfg.name | Asset root position in the environment frame. |
| root_quat_w | make_quat_unique (bool), asset_cfg.name | Asset root orientation (w, x, y, z) in the environment frame. |
| root_lin_vel_w | asset_cfg.name | Asset root linear velocity in the environment frame. |
| root_ang_vel_w | asset_cfg.name | Asset root angular velocity in the environment frame. |
| body_pose_w | asset_cfg.name, body_ids (list[int]) | Flattened body poses of the asset w.r.t. environment origin (x, y, z, qw, qx, qy, qz) for each configured body. |
| projected_gravity | asset_cfg.name, body_ids (list[int]) | Direction of gravity projected onto each body frame (x, y, z) for configured bodies. |
| joint_pos | asset_cfg.name | Joint positions of the asset. |
| joint_pos_rel | asset_cfg.name | Joint positions relative to default joint positions. |
| joint_pos_limit_normalized | asset_cfg.name | Joint positions normalized by the asset’s soft position limits. |
| joint_vel | asset_cfg.name | Joint velocities of the asset. |
| joint_vel_rel | asset_cfg.name | Joint velocities relative to default joint velocities. |
| joint_effort | asset_cfg.name | Joint applied effort (torque) for configured joints. |
| height_scan | sensor_cfg.name, offset (float) | Height scan from the sensor’s frame minus the given offset. |
| body_incoming_wrench | asset_cfg.name, body_ids (list[int]) | 6-D spatial wrench (force, torque) applied to each configured body link by joint forces. |
| imu_orientation | asset_cfg.name | IMU sensor orientation in world frame as quaternion (w, x, y, z). |
| imu_ang_vel | asset_cfg.name | IMU sensor angular velocity (rad/s) in sensor frame. |
| imu_lin_acc | asset_cfg.name | IMU sensor linear acceleration (m/s²) in sensor frame. |
| image | sensor_cfg.name, data_type (str), convert_perspective_to_orthogonal (bool), normalize (bool) | Images from a camera or ray-caster camera (e.g., “rgb”, “depth”). Optionally normalize and orthogonalize depth. |
| image_features | sensor_cfg.name, data_type (str), model_zoo_cfg (dict), model_name (str), model_device (str) | Extracted image features via a pretrained encoder (e.g., ResNet, Theia). |
| last_action | action_name (str or None) | The last input action term (or full action tensor if None). |
| velocity_commands | params.command_name (“base_velocity”) | Generated command tensor for base‐velocity commands. |
| ee_pose_commands | params.command_name (“ee_pose”) | Generated command tensor for end‐effector pose commands. |
| base_pose_commands | params.command_name (“base_pose”) | Generated command tensor for base‐pose commands. |
| target_object_position_in_robot_root_frame | params.command_name (“object_pose”) | Generated command tensor for the target object’s pose in the robot’s root frame. |
Robot-Specific Observation Terms
| Observation Term | Robot | Configurable Options | Description |
|---|---|---|---|
| joint_pos_rel_digit | Agility Digit | noise.n_min, noise.n_max, params.asset_cfg.name, params.asset_cfg.joint_names | Digit-group joint positions relative to defaults (regex match). |
| joint_vel_rel_digit | Agility Digit | noise.n_min, noise.n_max, params.asset_cfg.name, params.asset_cfg.joint_names | Digit-group joint velocities relative to defaults. |
| joint_pos_rel_left_arm | Unitree G1 (27 DOF) | noise.n_min, noise.n_max, params.asset_cfg.name, params.asset_cfg.joint_names | Left-arm joint positions relative to defaults (explicit list). |
| joint_vel_rel_left_arm | Unitree G1 (27 DOF) | noise.n_min, noise.n_max, params.asset_cfg.name, params.asset_cfg.joint_names | Left-arm joint velocities relative to defaults. |
Commands
Commands define the functions available to sample or set asset poses or velocities, along with the configurable parameters that control how those commands are generated or applied.| Command Function | Configurable Options | Description |
|---|---|---|
| UniformPose2dCommand | asset_name, cfg.ranges.pos_x (tuple), cfg.ranges.pos_y (tuple), cfg.ranges.heading (tuple), cfg.simple_heading (bool), cfg.resampling_time_range (tuple) | Sample a random 2D position around the environment origin and a heading (either toward target or uniform). |
| TerrainBasedPose2dCommand | asset_name, cfg.ranges.heading (tuple), cfg.simple_heading (bool), cfg.resampling_time_range (tuple) | Sample a random 2D position from valid terrain patches and a heading (either toward target or uniform). |
| UniformPoseCommand | asset_name, body_name, cfg.ranges.pos_x (tuple), cfg.ranges.pos_y (tuple), cfg.ranges.pos_z (tuple), cfg.ranges.roll (tuple), cfg.ranges.pitch (tuple), cfg.ranges.yaw (tuple), cfg.make_quat_unique (bool), cfg.resampling_time_range (tuple) | Sample a random 3D pose (position + quaternion) in the robot’s base frame uniformly within specified ranges. |
| UniformVelocityCommand | asset_name, cfg.ranges.lin_vel_x (tuple), cfg.ranges.lin_vel_y (tuple), cfg.ranges.ang_vel_z (tuple), cfg.heading_command (bool), cfg.ranges.heading (tuple), cfg.rel_heading_envs (float), cfg.rel_standing_envs (float), cfg.resampling_time_range (tuple) | Sample a random base-frame linear (x,y) and yaw velocity (either uniform or based on heading error), with optional standing/no-motion environments. |
| NormalVelocityCommand | asset_name, cfg.ranges.mean_vel (tuple of 3), cfg.ranges.std_vel (tuple of 3), cfg.ranges.zero_prob (tuple of 3), cfg.rel_standing_envs (float), cfg.resampling_time_range (tuple) | Sample a base-frame velocity command from a normal distribution (with element-wise zeroing and optional standing). |
Actions
Actions enumerate the terms that transform high-level input vectors into specific joint or base control targets — such as positions, velocities, or torques—along with their associated configuration options.| Action Term | Configurable Options | One-line Meaning |
|---|---|---|
| BinaryJointPositionAction | cfg.joint_names (list of joint names), cfg.open_command_expr (string), cfg.close_command_expr (string) | Map a binary (open/close) action to joint position targets for specified joints. |
| BinaryJointVelocityAction | cfg.joint_names (list of joint names), cfg.open_command_expr (string), cfg.close_command_expr (string) | Map a binary (open/close) action to joint velocity targets for specified joints. |
| JointPositionToLimitsAction | cfg.joint_names (list), cfg.scale (float or dict mapping joint→scale), cfg.rescale_to_limits (bool) | Scale raw inputs and (optionally) rescale to joint position limits before applying as position targets. |
| EMAJointPositionToLimitsAction | cfg.joint_names (list), cfg.scale (float or dict), cfg.rescale_to_limits (bool), cfg.alpha (float or dict mapping joint→EMA weight) | Like JointPositionToLimitsAction, but applies exponential moving average (α) over processed position targets. |
| JointPositionAction | cfg.joint_names (list), cfg.scale (float or dict), cfg.offset (float or dict), cfg.use_default_offset (bool) | Affinely transform input actions (scale + offset) and send as joint position targets. |
| RelativeJointPositionAction | cfg.joint_names (list), cfg.scale (float or dict), cfg.offset (float or dict), cfg.use_zero_offset (bool) | Affinely transform input and add to current joint positions before sending as position commands. |
| JointVelocityAction | cfg.joint_names (list), cfg.scale (float or dict), cfg.offset (float or dict), cfg.use_default_offset (bool) | Affinely transform input and send as joint velocity targets. |
| JointEffortAction | cfg.joint_names (list), cfg.scale (float or dict), cfg.offset (float or dict) | Affinely transform input and send as joint effort targets. |
| NonHolonomicAction | cfg.x_joint_name (str), cfg.y_joint_name (str), cfg.yaw_joint_name (str), cfg.body_name (str), cfg.scale (tuple[2]), cfg.offset (tuple[2]) | Map a 2D action (forward speed, yaw rate) to dummy joint velocities (x, y, yaw) for a skid-steer base. |
| DifferentialInverseKinematicsAction | cfg.joint_names (list), cfg.body_name (str), cfg.scale (list of floats), cfg.controller (DifferentialIK config), cfg.body_offset (optional pos/rot tuples) | Apply scaled end-effector pose commands through a differential IK controller to produce joint position targets. |
| PretrainedPolicyAction | cfg.asset_name (str), cfg.policy_path (str), cfg.low_level_decimation (int), cfg.low_level_actions (ActionCfg), cfg.low_level_observations (ObservationCfg) | Use a pretrained low-level policy (e.g. locomotion) to decode actions in a hierarchical setup (e.g. navigation on top). |
Rewards
Rewards describe the available functions for assigning scalar feedback to agent behaviors, detail any options required to configure them, and explain the physical or task-related criterion they encourage or penalize.| Reward Function | Configurable Options | Description |
|---|---|---|
| is_alive | none | +1 if the episode is still running (not terminated). |
| is_terminated | none | +1 if the episode just terminated (penalizes any termination). |
| is_terminated_term | term_keys (string, list of strings or regex) | Sum of specified non-timeout termination terms. |
| lin_vel_z_l2 | asset_cfg.name | Penalize squared z-axis base linear velocity. |
| ang_vel_xy_l2 | asset_cfg.name | Penalize squared x/y components of base angular velocity. |
| flat_orientation_l2 | asset_cfg.name | Penalize deviation from a flat base orientation (projected gravity). |
| base_height_l2 | target_height (float), asset_cfg.name | Penalize squared deviation of base height from target. |
| body_lin_acc_l2 | asset_cfg.name | Penalize summed L2 norm of all body linear accelerations. |
| joint_torques_l2 | asset_cfg.name | Penalize squared torques on configured joints. |
| joint_vel_l1 | asset_cfg.name | Penalize absolute joint velocities (L1). |
| joint_vel_l2 | asset_cfg.name | Penalize squared joint velocities. |
| joint_acc_l2 | asset_cfg.name | Penalize squared joint accelerations. |
| joint_deviation_l1 | asset_cfg.name | Penalize absolute deviation from default joint positions. |
| joint_pos_limits | asset_cfg.name | Penalize violations of soft joint-position limits. |
| joint_vel_limits | soft_ratio (float), asset_cfg.name | Penalize violations of soft joint-velocity limits (clipped). |
| applied_torque_limits | asset_cfg.name | Penalize mismatch between applied and computed torques. |
| action_rate_l2 | none | Penalize squared change in actions from one step to the next. |
| action_l2 | none | Penalize squared magnitude of the actions themselves. |
| undesired_contacts | threshold (float), sensor_cfg.name | Penalize count of contact events whose force exceeds threshold. |
| contact_forces | threshold (float), sensor_cfg.name | Penalize amount by which contact forces exceed threshold. |
| track_lin_vel_xy_exp | std (float), command_name (str), asset_cfg.name | Reward (exp-kernel) how well base xy-velocity matches command. |
| track_ang_vel_z_exp | std (float), command_name (str), asset_cfg.name | Reward (exp-kernel) how well yaw velocity matches command. |
| air_time | mode_time (float), velocity_threshold (float), asset_cfg.name (str), sensor_cfg.name (str), sensor_cfg.body_names (regex) | Reward for limbs airborne above the velocity threshold. |
| air_time_variance_penalty | sensor_cfg.name (str), sensor_cfg.body_names (regex) | Penalize uneven airborne time across limbs. |
| base_angular_velocity | std (float), asset_cfg.name (str) | Reward (Gaussian) on the magnitude of base angular velocity. |
| base_linear_velocity | std (float), ramp_rate (float), ramp_at_vel (float), asset_cfg.name (str) | Reward (Gaussian with ramp) on base linear velocity. |
| foot_clearance | std (float), tanh_mult (float), target_height (float), asset_cfg.name (str) | Reward for foot clearance above a target height. |
| base_motion | asset_cfg.name (str) | Penalize any base movement (encourages stationarity). |
| base_orientation | asset_cfg.name (str) | Penalize deviation from upright base orientation. |
| foot_slip | asset_cfg.name (str), sensor_cfg.name (str), sensor_cfg.body_names (regex), threshold (float) | Penalize foot-slip events when force exceeds the threshold. |
| joint_pos | asset_cfg.name (str), stand_still_scale (float), velocity_threshold (float) | Penalize deviation from reference joint positions when stationary. |
| joint_torques | asset_cfg.name (str) | Penalize high joint-torque usage on configured joints. |
| termination_penalty | none | Large penalty applied when the episode terminates. |
Task-Specific Reward Terms
| Reward Function | Configurable Options | Description | Task Name |
|---|---|---|---|
| position_tracking_error | std (float), command_name (str) | Reward (tanh) on base position tracking accuracy. | Navigation |
| orientation_tracking_error | command_name (str) | Penalize heading error relative to the commanded orientation. | Navigation |
| object_ee_distance | std (float) | Reward (Gaussian) on the distance between the end-effector and the object. | Manipulation Lift Object |
| object_is_lifted | minimal_height (float) | Reward (binary) for the object being lifted above the minimal height threshold. | Manipulation Lift Object |
| object_goal_distance | std (float), minimal_height (float), command_name (str) | Reward (Gaussian) on the object’s distance to the goal pose after lifting. | Manipulation Lift Object |
| track_lin_vel_xy_exp | std (float), command_name (str), asset_cfg.name | Reward (exp-kernel) how well base xy-velocity matches command. | Velocity Tracking |
| track_ang_vel_z_exp | std (float), command_name (str), asset_cfg.name | Reward (exp-kernel) how well yaw velocity matches command. | Velocity Tracking |
| track_orientation_inv_l2 | object_cfg (SceneEntityCfg("object")), rot_eps (float), command_name (str) | Reward (inverse-L2) encouraging alignment of object orientation with the commanded pose. | Manipulation In-Hand |
| success_bonus_inhand | object_cfg (SceneEntityCfg("object")), command_name (str) | Bonus reward upon successful completion of the in-hand manipulation task. | Manipulation In-Hand |
Robot-Specific Reward Terms
| Reward Function | Robot | Configurable Options | Description |
|---|---|---|---|
| left_wrist_end_effector_position_tracking | Unitree G1 (27 DOF) | asset_cfg.name, body_names, command_name, shift | Penalize deviation of the left-wrist end-effector position from the commanded target. |
| left_wrist_end_effector_position_tracking_fine_grained | Unitree G1 (27 DOF) | asset_cfg.name, body_names, std, command_name, shift | Reward (tanh) on fine-grained left-wrist position tracking accuracy. |
| left_wrist_end_effector_orientation_tracking | Unitree G1 (27 DOF) | asset_cfg.name, body_names, command_name, tol | Penalize orientation error of the left-wrist end-effector relative to the command. |
| left_hand_joint_range_violation | Unitree G1 (27 DOF) | asset_cfg.name, joint_names, limits_min, limits_max | Penalize any left-hand joints outside the specified minimum/maximum limits. |
Events
Events list environment-level operations that introduce randomized or deterministic changes — such as randomizing physical properties or applying external forces—along with the parameters needed to specify how those changes are applied.| Event Function | Configurable Options | Description |
|---|---|---|
| randomize_rigid_body_scale | env_ids, scale_range (tuple[float, float] or dict{x, y, z → tuple[float, float]}), asset_cfg.name, relative_child_path (optional) | Randomize the USD-scale of a rigid‐body asset (or a specified child prim) before simulation starts. |
| randomize_robot_body_material | env_ids, static_friction_range (tuple[float, float]), dynamic_friction_range (tuple[float, float]), restitution_range (tuple[float, float]), num_buckets (int), asset_cfg.name, make_consistent (bool) | Sample a set of friction/restitution “buckets” and assign them to each geometry of the specified robot Articulation. |
| randomize_object_body_material | env_ids, static_friction_range (tuple[float, float]), dynamic_friction_range (tuple[float, float]), restitution_range (tuple[float, float]), num_buckets (int), asset_cfg.name, make_consistent (bool) | Sample a set of friction/restitution “buckets” and assign them to each geometry of the specified RigidObject. |
| randomize_rigid_body_mass | env_ids, asset_cfg.name, mass_distribution_params (tuple[float, float]), operation (“add”/“scale”/“abs”), distribution (“uniform”/“log_uniform”/“gaussian”), recompute_inertia (bool) | Randomize per‐body mass values (add/scale/overwrite) and optionally recompute inertia tensors from default mass. |
| randomize_rigid_body_com | env_ids, com_range (dict{x, y, z → tuple[float, float]}), asset_cfg.name | Add a random offset to the center of mass of each body in the specified Articulation. |
| randomize_rigid_body_collider_offsets | env_ids, rest_offset_distribution_params (tuple[float, float]), contact_offset_distribution_params (tuple[float, float]), asset_cfg.name, distribution (“uniform”/“log_uniform”/“gaussian”) | Randomize physics collider “rest” and/or “contact” offsets for each body shape of the specified asset. |
| randomize_physics_scene_gravity | env_ids, gravity_distribution_params (tuple[list[float], list[float]]), operation (“add”/“scale”/“abs”), distribution (“uniform”/“log_uniform”/“gaussian”) | Randomize the global gravity vector (x, y, z) for the entire physics scene (same gravity for all envs). |
| randomize_actuator_gains | env_ids, asset_cfg.name, stiffness_distribution_params (tuple[float, float]), damping_distribution_params (tuple[float, float]), operation, distribution | Randomize stiffness and/or damping gains of actuators (implicit or explicit) on specified joint IDs. |
| randomize_joint_parameters | env_ids, asset_cfg.name, friction_distribution_params (tuple[float, float]), armature_distribution_params (tuple[float, float]), lower_limit_distribution_params (tuple[float, float]), upper_limit_distribution_params (tuple[float, float]), operation, distribution | Randomize joint physics properties: friction, armature, and position limits (lower/upper) by adding/scaling/overwriting. |
| randomize_fixed_tendon_parameters | env_ids, asset_cfg.name, stiffness_distribution_params (tuple[float, float]), damping_distribution_params (tuple[float, float]), limit_stiffness_distribution_params (tuple[float, float]), lower_limit_distribution_params (tuple[float, float]), upper_limit_distribution_params (tuple[float, float]), rest_length_distribution_params (tuple[float, float]), offset_distribution_params (tuple[float, float]), operation, distribution | Randomize fixed‐tendon stiffness, damping, limit stiffness, lower/upper position limits, rest length, and/or offset. |
| apply_external_force_torque | env_ids, force_range (tuple[float, float]), torque_range (tuple[float, float]), asset_cfg.name | Sample and buffer random forces and torques to apply to each body of the specified asset (effective on next write). |
| push_by_setting_velocity | env_ids, velocity_range (dict{x, y, z, roll, pitch, yaw → tuple[float, float]}), asset_cfg.name | Sample a random root‐velocity (linear + angular) and set it directly on the specified asset to simulate a push. |
| reset_base | env_ids, pose_range (dict{x, y, z, roll, pitch, yaw → tuple[float, float]}), velocity_range (dict{x, y, z, roll, pitch, yaw → tuple[float, float]}), asset_cfg.name | Randomize root pose of the robot (translation + rotation) uniformly around default and root velocity, then write to simulation. |
| reset_root_state | env_ids, pose_range (dict{x, y, z, roll, pitch, yaw → tuple[float, float]}), velocity_range (dict{x, y, z, roll, pitch, yaw → tuple[float, float]}), asset_cfg.name | Randomize root pose (translation + rotation) of an object uniformly around default and root velocity, then write to simulation. |
| reset_root_state_with_random_orientation | env_ids, pose_range (dict{x, y, z → tuple[float, float]}), velocity_range (dict{x, y, z, roll, pitch, yaw → tuple[float, float]}), asset_cfg.name | Randomize root position uniformly, sample orientation uniformly from SO(3), and set velocity within given ranges. |
| reset_root_state_from_terrain | env_ids, pose_range (dict{roll, pitch, yaw → tuple[float, float]}), velocity_range (dict{x, y, z, roll, pitch, yaw → tuple[float, float]}), asset_cfg.name | Sample a valid terrain patch (“init_pos”), set root pose on terrain + random yaw/pitch/roll, then set velocity. |
| reset_joints_by_scale | env_ids, position_range (tuple[float, float]), velocity_range (tuple[float, float]), asset_cfg.name | Scale default joint positions and velocities by random factors (within ranges) and clamp to soft limits before writing. |
| reset_joints_by_offset | env_ids, position_range (tuple[float, float]), velocity_range (tuple[float, float]), asset_cfg.name | Offset default joint positions and velocities by random values (within ranges) and clamp to soft limits before writing. |
| reset_nodal_state_uniform | env_ids, position_range (dict{x, y, z → tuple[float, float]}), velocity_range (dict{x, y, z → tuple[float, float]}), asset_cfg.name | Randomize default nodal (vertex) positions and velocities of a DeformableObject uniformly within given ranges. |
| reset_scene_to_default | env_ids | Restore all rigid objects, articulations (root pose + joint state), and deformables (nodal state) to their default values. |
| randomize_visual_texture_material | env_ids, event_name (str), asset_cfg.name, texture_paths (list[str]), texture_rotation (tuple[float, float]) | Trigger a Replicator‐based event that randomizes textures on each body’s “/visuals” prim under the specified asset. |
| randomize_visual_color | env_ids, event_name (str), asset_cfg.name, colors (list[tuple[float, float, float]] or dict{r, g, b → tuple[float, float]}), mesh_name (str) | Trigger a Replicator‐based event that randomizes color on the specified mesh prims under the asset. |
Terminations
Terminations specify the conditions under which an episode ends, the configuration options that define threshold values or monitored signals, and a brief explanation of what triggers each termination.| Termination Function | Configurable Options | Description |
|---|---|---|
| time_out | none | Terminate when episode length ≥ max_episode_length. |
| command_resample | command_name (str), num_resamples (int) | Terminate when a command has been resampled the specified number of times. |
| bad_orientation | limit_angle (float), asset_cfg.name | Terminate if the asset’s orientation deviates from vertical by > limit_angle. |
| root_height_below_minimum | minimum_height (float), asset_cfg.name | Terminate if the asset’s root height drops below the minimum height. |
| joint_pos_out_of_limit | asset_cfg.name | Terminate if any joint position exceeds the soft position limits. |
| joint_pos_out_of_manual_limit | bounds (tuple[min, max]), asset_cfg.name | Terminate if any joint position falls outside the user-specified bounds. |
| joint_vel_out_of_limit | asset_cfg.name | Terminate if any joint velocity exceeds the soft velocity limits. |
| joint_vel_out_of_manual_limit | max_velocity (float), asset_cfg.name | Terminate if any joint velocity exceeds the given max_velocity. |
| joint_effort_out_of_limit | asset_cfg.name | Terminate if any joint’s computed torque equals its applied torque (indicating effort clipping). |
| base_contact | threshold (float), sensor_cfg.name | Terminate if any contact force at the sensor exceeds the threshold. |
Robot-Specific Terrmination Terms
| Termination Function | Robot | Configurable Options | Description |
|---|---|---|---|
| base_contact_g1 | Unitree G1 (27 DOF) | sensor_cfg.name (str), sensor_cfg.body_names (list[str]), threshold (float) | Terminate episode if any specified non-foot link (e.g. pelvis, torso, waist, shoulders) registers contact force above the threshold. |
Task-Specific Terrmination Terms
| Termination Function | Configurable Options | Description | Task Name |
|---|---|---|---|
| object_out_of_reach | asset_cfg (SceneEntityCfg("object")), threshold (float) | Terminate when the object moves farther than the specified threshold distance from the robot. | Manipulation In-Hand |
| max_consecutive_success | num_success (int), command_name (str) | Terminate once the in-hand manipulation has succeeded the given number of times in a row. | Manipulation In-Hand |