Run PI05 policy inference on robot observations. Predicts action trajectories from camera images and robot state.
Parameters
base_rgb
str | PIL.Image | np.ndarray
required
Base camera RGB image.
wrist_rgb
str | PIL.Image | np.ndarray
required
Wrist camera RGB image.
joints
np.ndarray | list
required
Joint positions.
gripper
np.ndarray | list
required
Gripper position.
state
np.ndarray | list
required
Full robot state (concatenation of joints and gripper).
Optional task prompt (e.g. "pick up the cup").
Returns
dict containing:
actions — Numpy array of shape (horizon, action_dim) with predicted actions
- Additional fields from policy inference (e.g.
state, policy_timing)
Example
from grid_cortex_client import CortexClient
import numpy as np
client = CortexClient()
base_img = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
wrist_img = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
joints = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6])
gripper = np.array([0.5])
state = np.concatenate([joints, gripper])
action = client.run(
model_id="pi05",
base_rgb=base_img,
wrist_rgb=wrist_img,
joints=joints,
gripper=gripper,
state=state,
prompt="pick up the cup",
)
print(action["actions"].shape) # (horizon, action_dim)