The LLaVA class provides a wrapper for the LLaVA model, which answers questions
about visual media (images) using the LLaVA framework.
If True, inference call is run on the local VM, else offloaded onto GRID-Cortex. Defaults to False.
The input RGB image of shape (M,N,3). The question to answer about the media.
The response to the prompt.
from grid.model.perception.vlm.llava import LLaVA
car = AirGenCar()
# We will be capturing an image from the AirGen simulator
# and run model inference on it.
img = car.getImage("front_center", "rgb").data
model = LLaVA(use_local = False)
result = model.run(rgbimage=img, prompt=<prompt>)
print(result)
This code is licensed under the Apache 2.0 License.