The Molmo class provides core functionality for this module.
If True, inference call is run on the local VM, else offloaded onto GRID-Cortex. Defaults to False.
The input RGB image of shape (M,N,3). The question to answer about the media.
The response to the prompt.
from grid.model.perception.vlm.molmo import Molmo
car = AirGenCar()
# We will be capturing an image from the AirGen simulator
# and run model inference on it.
img = car.getImage("front_center", "rgb").data
model = Molmo(use_local = False)
result = model.run(image=img, prompt=<prompt>)
print(result)
This code is licensed under the Apache 2.0 License.