Skip to main content
Segment objects using exactly one prompt type per call: text, points, or boxes. SAM3 offers two client methods:
  • run() — Returns a single union mask. Use when you only need the combined segmentation.
  • run_with_detections() — Returns the union mask plus per-instance masks, bounding boxes, and confidence scores. Use when you need individual object information (e.g. counting objects, filtering by confidence, or processing instances separately). Only available for text and box prompts — point prompts have no detection semantics.

Parameters

image_input
str | PIL.Image | np.ndarray
required
RGB image as file path, URL, PIL Image, or numpy array.
text
str
Text prompt describing objects to segment. Exclusive with points/boxes. Also accepted as text_prompt.
points
List[List[float]]
List of [x, y] point coordinates. Exclusive with text/boxes. Requires labels. Also accepted as prompts.
boxes
List[List[float]]
List of [x0, y0, x1, y1] box coordinates. Exclusive with text/points. Requires labels.
labels
List[int]
Required for points/boxes. 1 = foreground, 0 = background.
timeout
float | None
Optional HTTP timeout.

Returns — run()

np.ndarray — Binary union mask of shape (H, W) with dtype uint8. All matched instances are OR’d into a single mask. Foreground 255, background 0.

Returns — run_with_detections()

dict with keys:
  • union_mask — Combined binary mask (H, W), dtype uint8 (same as run() output)
  • masks — List of per-instance binary masks, each (H, W) dtype uint8
  • boxes — List of [x0, y0, x1, y1] bounding boxes (text/box prompts only)
  • scores — List of confidence scores (sorted descending)
For point prompts, masks, boxes, and scores will be empty lists.

Example Output — run()

SAM3 run() output — union mask overlay on input image

Example — run()

Use run() when you only need the combined mask — e.g. masking a region, computing area, or passing to a downstream model.
from grid_cortex_client import CortexClient
import numpy as np
from PIL import Image

client = CortexClient()
img = Image.open("cats.jpg")  # 640x480 RGB

# Text prompt — returns a single union mask
mask = client.run(model_id="sam3", image_input=img, text="cat")
print(mask.shape, mask.dtype)
# (480, 640) uint8
fg = np.count_nonzero(mask)
print(f"foreground: {fg} pixels ({fg / mask.size * 100:.1f}%)")
# foreground: 107198 pixels (34.9%)

# Points prompt — click on center of image
points_mask = client.run(
    model_id="sam3",
    image_input=img,
    points=[[320, 240]],
    labels=[1],
)
fg = np.count_nonzero(points_mask)
print(f"foreground: {fg} pixels ({fg / points_mask.size * 100:.1f}%)")
# foreground: 137017 pixels (44.6%)

# Boxes prompt — box around the right cat
boxes_mask = client.run(
    model_id="sam3",
    image_input=img,
    boxes=[[347, 26, 639, 369]],
    labels=[1],
)
fg = np.count_nonzero(boxes_mask)
print(f"foreground: {fg} pixels ({fg / boxes_mask.size * 100:.1f}%)")
# foreground: 107299 pixels (34.9%)

Example Output — run_with_detections()

SAM3 run_with_detections() output — per-instance masks, bounding boxes, and confidence scores

Example — run_with_detections()

Use run_with_detections() when you need per-instance information — e.g. counting objects, filtering by confidence, or processing each instance separately. Works with text and box prompts only.
from grid_cortex_client import CortexClient
import numpy as np
from PIL import Image

client = CortexClient()
img = Image.open("cats.jpg")  # 640x480 RGB

# Text prompt with per-instance detections
result = client.run_with_detections(model_id="sam3", image_input=img, text="cat")

print(result.keys())
# dict_keys(['union_mask', 'boxes', 'scores', 'masks'])

print(result["union_mask"].shape, result["union_mask"].dtype)
# (480, 640) uint8

print(f"instances found: {len(result['masks'])}")
# instances found: 2

for i, (mask, box, score) in enumerate(zip(result["masks"], result["boxes"], result["scores"])):
    fg = np.count_nonzero(mask)
    print(f"  instance {i}: score={score:.4f}, box={box}, pixels={fg}")
# instance 0: score=0.9327, box=[14.58, 55.12, 315.22, 473.21], pixels=50090
# instance 1: score=0.9252, box=[347.68, 26.83, 638.91, 368.76], pixels=57108

# Filter by confidence
high_conf = [s for s in result["scores"] if s > 0.9]
print(f"high-confidence instances: {len(high_conf)}")
# high-confidence instances: 2

# Boxes prompt with per-instance detections
result = client.run_with_detections(
    model_id="sam3",
    image_input=img,
    boxes=[[14, 55, 315, 473], [347, 26, 639, 369]],
    labels=[1, 1],
)
print(f"instances: {len(result['masks'])}")
# instances: 2

for i, (mask, box, score) in enumerate(zip(result["masks"], result["boxes"], result["scores"])):
    fg = np.count_nonzero(mask)
    print(f"  instance {i}: score={score:.4f}, box={box}, pixels={fg}")
# instance 0: score=0.9905, box=[347.76, 26.36, 638.77, 370.08], pixels=57268
# instance 1: score=0.9903, box=[15.29, 55.16, 314.74, 472.38], pixels=49950