Installation
Install the GRID Cortex client package using pip:It is recommended that you use Python 3.10+
Authentication & Endpoint
Set up your API key (and endpoint, if needed):
- During onboarding General Robotics will give you a personal CORTEX API key.
- Export it so the client can pick it up automatically:
- If you run Cortex on-prem or on a managed cloud deployment, point the client at your instance:
- You can also pass the key directly when constructing the client:
Model Reference
The client exposes one unified function:- Use the exact
model_idshown in the tables below. **kwargsare model-specific inputs such asimage_input,prompt,left_image, etc.
Depth & Stereo
| Model ID | What it does | Key inputs | Returns |
|---|---|---|---|
zoedepth | Monocular depth | image_input (path/URL/PIL/np.ndarray) | np.ndarray depth map (H, W) float32 |
foundationstereo | Stereo depth (FoundationStereo) | left_image, right_image; optional aux_args = {K, baseline, hiera, valid_iters} | np.ndarray depth map (H, W) float32 |
Object Detection
| Model ID | What it does | Key inputs | Returns |
|---|---|---|---|
owlv2 | Text-prompted object detection | image_input, prompt; optional box_threshold, timeout | dict with boxes, scores, labels |
Image Segmentation
| Model ID | What it does | Key inputs | Returns |
|---|---|---|---|
gsam2 | Text-prompted segmentation | image_input, prompt; optional box_threshold, text_threshold, nms_threshold | np.ndarray mask (H, W) uint8 (255 fg, 0 bg) |
sam2 | Point/box prompted segmentation | image_input, prompts ([[x,y], ...]), labels, optional multimask_output, mode, timeout | backend dict with masks/scores |
sam3 | Single prompt-type segmentation (text OR points OR boxes) | image_input; one of text, points, or boxes; labels required for points/boxes | np.ndarray mask (H, W) uint8 |
oneformer | Universal segmentation | image_input, mode (panoptic/semantic/instance) | dict with output, label_map, latency_ms |
Grasp Prediction
| Model ID | What it does | Key inputs | Returns |
|---|---|---|---|
graspgen | 6-DoF grasp generation | depth_image, seg_image, camera_intrinsics; optional aux_args (num_grasps, gripper_config, camera_extrinsics), or provide point_cloud directly | dict with grasps (N,4,4), confidence, optional latency_ms |
Vision-Language
| Model ID | What it does | Key inputs | Returns |
|---|---|---|---|
moondream | VQA, captioning, detection, pointing | image_input; task = vqa/caption/detect/point; prompt for vqa/detect/point; length (short/normal) for caption | dict with "output" (text or structured data) |
Troubleshooting
401 Unauthorized – Check that your shell actually hasGRID_CORTEX_API_KEY exported and that the key is correct.Timeout / connection errors – If you are on-prem/managed cloud, confirm
GRID_CORTEX_BASE_URL points to your instance. You can also adjust the default 30 s timeout: