Segment objects in an image using a text prompt. Combines Grounding DINO with SAM 2.
Parameters
image_input
str | PIL.Image | np.ndarray
required
RGB image as file path, URL, PIL Image, or numpy array.
Text description of objects to segment.
Confidence threshold (0.0–1.0) for filtering detections.
Text confidence threshold (0.0–1.0).
Non-Maximum Suppression threshold (0.0–1.0).
Returns
np.ndarray — Binary segmentation mask of shape (H, W) with dtype uint8. Foreground pixels are 255, background is 0.
Example Output
Example
from grid_cortex_client import CortexClient
import numpy as np
from PIL import Image
client = CortexClient()
img = Image.open("scene.jpg") # 640x480 RGB
mask = client.run(model_id="gsam2", image_input=img, prompt="street")
print(mask.shape, mask.dtype)
# (480, 640) uint8
foreground = np.count_nonzero(mask)
print(f"foreground pixels: {foreground} ({foreground / mask.size * 100:.1f}%)")
# foreground pixels: 138737 (45.2%)