GRID Docs | General Robotics

Detect objects in an image using a text prompt with Google’s OWL-ViT v2.

Parameters

image_input

str | PIL.Image | np.ndarray

required

RGB image as file path, URL, PIL Image, or numpy array.

prompt

str

required

Period-separated text description of objects to detect (e.g. "car. person. traffic light").

box_threshold

float

Confidence threshold (0.0–1.0) for filtering detections (default: 0.2).

timeout

float | None

Optional HTTP timeout.

Returns

dict with keys:

boxes — List of bounding boxes as [x1, y1, x2, y2]
scores — List of confidence scores (0.0–1.0)
labels — List of detected label strings

Example Output

OWLv2 object detection output with bounding boxes

Example

from grid_cortex_client import CortexClient
from PIL import Image

client = CortexClient()
image = Image.open("scene.jpg")  # 640x480 RGB
dets = client.run(
    model_id="owlv2",
    image_input=image,
    prompt="building.",
    box_threshold=0.1,
)

print(len(dets["boxes"]))
# 10

# Top detections:
print(dets["scores"][0], dets["boxes"][0])
# 0.152  [122.0, 5.7, 187.9, 227.1]

print(dets["scores"][1], dets["boxes"][1])
# 0.271  [167.7, 26.6, 334.8, 242.4]

print(dets["scores"][2], dets["boxes"][2])
# 0.165  [-0.5, -0.9, 179.5, 242.3]

​Parameters

​Returns

​Example Output

​Example

Parameters

Returns

Example Output

Example