Skip to main content
This page explains how to install the grid-cortex-client Python package, and access the AI models hosted by GRID-Cortex.

Installation

Install the GRID Cortex client package using pip:
pip install grid-cortex-client

Authentication

Set up your API key for authentication:
  1. During onboarding General Robotics will give you a personal CORTEX API key.
  2. Export it so the client can pick it up automatically:
    export GRID_CORTEX_API_KEY="<YOUR_KEY>"
    
  3. You can also pass the key directly when constructing the client:
    from grid_cortex_client import CortexClient
    client = CortexClient(api_key="<YOUR_KEY>")
    

Quick-Start (2 Lines)

Get started with just two lines of code:
from grid_cortex_client import CortexClient
result = CortexClient().run(model_id="midas", image_input="demo.jpg")
That is genuinely it—one line to import, one line to run. The result type depends on the model (see tables below).

Model Reference & Examples

The client exposes one unified function:
CortexClient.run(model_id: str, **kwargs) → Any
  • model_id is the model name exactly as listed in the deployment (see tables).
  • **kwargs are model-specific inputs such as image_input, prompt, left_image, etc.
Below we group public models by task with minimal runnable code. All snippets assume:
from grid_cortex_client import CortexClient
client = CortexClient()  # uses env creds

Depth & Stereo

Model IDInput kwargsReturns
midasimage_input (PIL/np.array)numpy.ndarray depth map
zoedepthimage_input (PIL/np.array)numpy.ndarray depth map
metric3dimage_input (PIL/np.array)numpy.ndarray depth map
foundationstereoleft_image, right_imagenumpy.ndarray depth
vggt-depthimage_input (PIL/np.array)numpy.ndarray depth + uncertainty
# Monocular depth estimation (works with midas, zoedepth, metric3d)
from PIL import Image
image = Image.open("path/to/scene.jpg")
depth = client.run(model_id="midas", image_input=image)
print(depth.shape, depth.dtype)  # (H, W) float32
# To save as image: Image.fromarray((depth * 255).astype(np.uint8)).save("depth.png")
# FoundationStereo
left_img = Image.open("path/to/left_image.png")
right_img = Image.open("path/to/right_image.png")
depth_np = client.run(
    model_id="foundationstereo",
    left_image=left_img,
    right_image=right_img,
    K=[[293.2,0,128],[0,293.2,128],[0,0,1]],  # intrinsics (optional)
    baseline=0.06,                             # meters (optional)
)
print(depth_np.shape, depth_np.dtype)

Object Detection

Model IDInput kwargsReturns
owlv2image_input (PIL/np.array), prompt (string)dict with boxes, scores, labels
groundingdinoimage_input (PIL/np.array), prompt (string)dict with boxes, scores, labels
# Detect objects with text prompts
from PIL import Image
image = Image.open("path/to/street.jpg")
result = client.run(
    model_id="owlv2",
    image_input=image,
    prompt="car, person, traffic light",  # Comma-separated text
    box_threshold=0.25,
)
print(result["boxes"][0])  # [x1,y1,x2,y2]
print(result["scores"][0])  # 0.91
print(result["labels"][0])  # "car"

Image Segmentation

Model IDInput kwargsReturns
oneformerimage_input (PIL/np.array)PIL.Image mask
sam2image_input (PIL/np.array), prompts (list of [x,y] points)PIL.Image mask
gsam2image_input (PIL/np.array), prompts (list of [x,y] points), labels (list)PIL.Image mask
# Segment with point prompts
from PIL import Image
image = Image.open("path/to/dog.jpg")
mask = client.run(
    model_id="sam2",
    image_input=image,
    prompts=[[830,420]],   # List of [x,y] pixel coordinates
)
mask.show()

Feature Matching

Model IDInputsOutputs
lightglueimage0_input (PIL/np.array), image1_input (PIL/np.array)dict with points0, points1, matches, latency_ms
# Feature matching between two images
from PIL import Image
img0 = Image.open("path/to/frame0.jpg")
img1 = Image.open("path/to/frame1.jpg")
out = client.run(
    model_id="lightglue",
    image0_input=img0,
    image1_input=img1,
)
print(out.keys())  # dict of numpy arrays

Grasp Prediction

Model IDInput kwargsReturns
graspgendepth_image, seg_image, camera_intrinsics, …dict of 6-DoF grasps
import numpy as np
rgbs  = np.load("rgb.npy")      # H×W×3 uint8
depth = np.load("depth.npy")    # H×W float32 (metres)
seg   = np.load("seg.npy")      # H×W int32
K     = [[293.2,0,128],[0,293.2,128],[0,0,1]]

# Generate 6-DoF grasp poses
res = client.run(
    model_id="graspgen",
    depth_image=depth,
    seg_image=seg,
    camera_intrinsics=K,
    num_grasps=25,
)
print(res["poses"].shape)  # (25, 4, 4)

Vision-Language

Model IDCapabilityReturns
moondreamVQA / caption / detection / pointingdict with output (varies by task)
# VQA (Visual Question Answering)
from PIL import Image
image = Image.open("path/to/kitchen.jpg")
result = client.run(
    model_id="moondream",
    image_input=image,
    task="vqa",
    prompt="How many cups are on the table?",
)
print(result["output"])  # Text answer

# Image Captioning
result = client.run(
    model_id="moondream",
    image_input=image,
    task="caption",
    length="short",  # or "normal"
)
print(result["output"])  # Text caption

# Object Detection
result = client.run(
    model_id="moondream",
    image_input=image,
    task="detect",
    prompt="cup, plate, bowl",
)
print(result["output"])  # Dict with boxes, scores, labels

# Pointing (clickable points)
result = client.run(
    model_id="moondream",
    image_input=image,
    task="point",
    prompt="the red cup",
)
print(result["output"])  # Numpy array of (x,y) points

Troubleshooting

401 Unauthorized – Check that your shell actually has GRID_CORTEX_API_KEY exported and that the key is correct. Timeout / connection errors – You can adjust the default 30 s timeout:
client = CortexClient(timeout=60)