GRID Docs | General Robotics

Segment objects using exactly one prompt type per call: text, points, or boxes. SAM3 offers two client methods:

run() — Returns a single union mask. Use when you only need the combined segmentation.
run_with_detections() — Returns the union mask plus per-instance masks, bounding boxes, and confidence scores. Use when you need individual object information (e.g. counting objects, filtering by confidence, or processing instances separately). Only available for text and box prompts — point prompts have no detection semantics.

Parameters

image_input

str | PIL.Image | np.ndarray

required

RGB image as file path, URL, PIL Image, or numpy array.

text

str

Text prompt describing objects to segment. Exclusive with points/boxes. Also accepted as text_prompt.

points

List[List[float]]

List of [x, y] point coordinates. Exclusive with text/boxes. Requires labels. Also accepted as prompts.

boxes

List[List[float]]

List of [x0, y0, x1, y1] box coordinates. Exclusive with text/points. Requires labels.

labels

List[int]

Required for points/boxes. 1 = foreground, 0 = background.

timeout

float | None

Optional HTTP timeout.

Returns — `run()`

np.ndarray — Binary union mask of shape (H, W) with dtype uint8. All matched instances are OR’d into a single mask. Foreground 255, background 0.

Returns — `run_with_detections()`

dict with keys:

union_mask — Combined binary mask (H, W), dtype uint8 (same as run() output)
masks — List of per-instance binary masks, each (H, W) dtype uint8
boxes — List of [x0, y0, x1, y1] bounding boxes (text/box prompts only)
scores — List of confidence scores (sorted descending)

For point prompts, masks, boxes, and scores will be empty lists.

Example Output — `run()`

SAM3 run() output — union mask overlay on input image

Example — `run()`

Use run() when you only need the combined mask — e.g. masking a region, computing area, or passing to a downstream model.

from grid_cortex_client import CortexClient
import numpy as np
from PIL import Image

client = CortexClient()
img = Image.open("cats.jpg")  # 640x480 RGB

# Text prompt — returns a single union mask
mask = client.run(model_id="sam3", image_input=img, text="cat")
print(mask.shape, mask.dtype)
# (480, 640) uint8
fg = np.count_nonzero(mask)
print(f"foreground: {fg} pixels ({fg / mask.size * 100:.1f}%)")
# foreground: 107198 pixels (34.9%)

# Points prompt — click on center of image
points_mask = client.run(
    model_id="sam3",
    image_input=img,
    points=[[320, 240]],
    labels=[1],
)
fg = np.count_nonzero(points_mask)
print(f"foreground: {fg} pixels ({fg / points_mask.size * 100:.1f}%)")
# foreground: 137017 pixels (44.6%)

# Boxes prompt — box around the right cat
boxes_mask = client.run(
    model_id="sam3",
    image_input=img,
    boxes=[[347, 26, 639, 369]],
    labels=[1],
)
fg = np.count_nonzero(boxes_mask)
print(f"foreground: {fg} pixels ({fg / boxes_mask.size * 100:.1f}%)")
# foreground: 107299 pixels (34.9%)

Example Output — `run_with_detections()`

SAM3 run_with_detections() output — per-instance masks, bounding boxes, and confidence scores

Example — `run_with_detections()`

Use run_with_detections() when you need per-instance information — e.g. counting objects, filtering by confidence, or processing each instance separately. Works with text and box prompts only.

from grid_cortex_client import CortexClient
import numpy as np
from PIL import Image

client = CortexClient()
img = Image.open("cats.jpg")  # 640x480 RGB

# Text prompt with per-instance detections
result = client.run_with_detections(model_id="sam3", image_input=img, text="cat")

print(result.keys())
# dict_keys(['union_mask', 'boxes', 'scores', 'masks'])

print(result["union_mask"].shape, result["union_mask"].dtype)
# (480, 640) uint8

print(f"instances found: {len(result['masks'])}")
# instances found: 2

for i, (mask, box, score) in enumerate(zip(result["masks"], result["boxes"], result["scores"])):
    fg = np.count_nonzero(mask)
    print(f"  instance {i}: score={score:.4f}, box={box}, pixels={fg}")
# instance 0: score=0.9327, box=[14.58, 55.12, 315.22, 473.21], pixels=50090
# instance 1: score=0.9252, box=[347.68, 26.83, 638.91, 368.76], pixels=57108

# Filter by confidence
high_conf = [s for s in result["scores"] if s > 0.9]
print(f"high-confidence instances: {len(high_conf)}")
# high-confidence instances: 2

# Boxes prompt with per-instance detections
result = client.run_with_detections(
    model_id="sam3",
    image_input=img,
    boxes=[[14, 55, 315, 473], [347, 26, 639, 369]],
    labels=[1, 1],
)
print(f"instances: {len(result['masks'])}")
# instances: 2

for i, (mask, box, score) in enumerate(zip(result["masks"], result["boxes"], result["scores"])):
    fg = np.count_nonzero(mask)
    print(f"  instance {i}: score={score:.4f}, box={box}, pixels={fg}")
# instance 0: score=0.9905, box=[347.76, 26.36, 638.77, 370.08], pixels=57268
# instance 1: score=0.9903, box=[15.29, 55.16, 314.74, 472.38], pixels=49950

GET STARTED

OPEN GRID

GRID ENTERPRISE

ROBOT API

SIMULATION

AI LAYER

DEPLOYMENT

DATA GENERATION PIPELINES

FAQ

SAM3

Parameters

Returns — `run()`

Returns — `run_with_detections()`

Example Output — `run()`

Example — `run()`

Example Output — `run_with_detections()`

Example — `run_with_detections()`

GET STARTED

OPEN GRID

GRID ENTERPRISE

ROBOT API

SIMULATION

AI LAYER

DEPLOYMENT

DATA GENERATION PIPELINES

FAQ

Documentation Index

​Parameters

​Returns — run()

​Returns — run_with_detections()

​Example Output — run()

​Example — run()

​Example Output — run_with_detections()

​Example — run_with_detections()

Parameters

Returns — `run()`

Returns — `run_with_detections()`

Example Output — `run()`

Example — `run()`

Example Output — `run_with_detections()`

Example — `run_with_detections()`