Skip to main content
Run PI05 policy inference on robot observations. Predicts action trajectories from camera images and robot state.

Parameters

base_rgb
str | PIL.Image | np.ndarray
required
Base camera RGB image.
wrist_rgb
str | PIL.Image | np.ndarray
required
Wrist camera RGB image.
joints
np.ndarray | list
required
Joint positions.
gripper
np.ndarray | list
required
Gripper position.
state
np.ndarray | list
required
Full robot state (concatenation of joints and gripper).
prompt
str
Optional task prompt (e.g. "pick up the cup").
timeout
float | None
Optional HTTP timeout.

Returns

dict containing:
  • actions — Numpy array of shape (horizon, action_dim) with predicted actions
  • Additional fields from policy inference (e.g. state, policy_timing)

Example

from grid_cortex_client import CortexClient
import numpy as np

client = CortexClient()
base_img = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
wrist_img = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
joints = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6])
gripper = np.array([0.5])
state = np.concatenate([joints, gripper])

action = client.run(
    model_id="pi05",
    base_rgb=base_img,
    wrist_rgb=wrist_img,
    joints=joints,
    gripper=gripper,
    state=state,
    prompt="pick up the cup",
)
print(action["actions"].shape)  # (horizon, action_dim)