Detect objects in an image using a text prompt with Google’s OWL-ViT v2.Documentation Index
Fetch the complete documentation index at: https://docs.generalrobotics.dev/llms.txt
Use this file to discover all available pages before exploring further.
Parameters
RGB image as file path, URL, PIL Image, or numpy array.
Period-separated text description of objects to detect (e.g.
"car. person. traffic light").Confidence threshold (0.0–1.0) for filtering detections (default:
0.2).Optional HTTP timeout.
Returns
dict with keys:
boxes— List of bounding boxes as[x1, y1, x2, y2]scores— List of confidence scores (0.0–1.0)labels— List of detected label strings
Example Output
