GRID Docs | General Robotics

Segment objects in an image using a text prompt. Combines Grounding DINO with SAM 2.

Parameters

image_input

str | PIL.Image | np.ndarray

required

RGB image as file path, URL, PIL Image, or numpy array.

prompt

str

required

Text description of objects to segment.

box_threshold

float

Confidence threshold (0.0–1.0) for filtering detections.

text_threshold

float

Text confidence threshold (0.0–1.0).

nms_threshold

float

Non-Maximum Suppression threshold (0.0–1.0).

timeout

float | None

Optional HTTP timeout.

Returns

np.ndarray — Binary segmentation mask of shape (H, W) with dtype uint8. Foreground pixels are 255, background is 0.

Example Output

Example

from grid_cortex_client import CortexClient
import numpy as np
from PIL import Image

client = CortexClient()
img = Image.open("scene.jpg")  # 640x480 RGB
mask = client.run(model_id="gsam2", image_input=img, prompt="street")

print(mask.shape, mask.dtype)
# (480, 640) uint8

foreground = np.count_nonzero(mask)
print(f"foreground pixels: {foreground} ({foreground / mask.size * 100:.1f}%)")
# foreground pixels: 138737 (45.2%)

​Parameters

​Returns

​Example Output

​Example

Parameters

Returns

Example Output

Example