Skip to main content
Segment objects in an image using a text prompt. Combines Grounding DINO with SAM 2.

Parameters

image_input
str | PIL.Image | np.ndarray
required
RGB image as file path, URL, PIL Image, or numpy array.
prompt
str
required
Text description of objects to segment.
box_threshold
float
Confidence threshold (0.0–1.0) for filtering detections.
text_threshold
float
Text confidence threshold (0.0–1.0).
nms_threshold
float
Non-Maximum Suppression threshold (0.0–1.0).
timeout
float | None
Optional HTTP timeout.

Returns

np.ndarray — Binary segmentation mask of shape (H, W) with dtype uint8. Foreground pixels are 255, background is 0.

Example Output

GSAM2 text-prompted segmentation output

Example

from grid_cortex_client import CortexClient
import numpy as np
from PIL import Image

client = CortexClient()
img = Image.open("scene.jpg")  # 640x480 RGB
mask = client.run(model_id="gsam2", image_input=img, prompt="street")

print(mask.shape, mask.dtype)
# (480, 640) uint8

foreground = np.count_nonzero(mask)
print(f"foreground pixels: {foreground} ({foreground / mask.size * 100:.1f}%)")
# foreground pixels: 138737 (45.2%)