The LLaVANeXT
class provides A wrapper for the LLaVANeXT model, which answers questions
about visual media (images/videos) using the LLaVANeXT framework.
If True, inference call is run on the local VM, else offloaded onto GRID-Cortex. Defaults to True.
This model is currently not available via Cortex.
The input RGB image of shape (M,N,3). The path to the input video.
The question to answer about the media.
The response to the prompt.
from grid.model.perception.vlm.llava_next import LLaVANeXT
car = AirGenCar()
# We will be capturing an image from the AirGen simulator
# and run model inference on it.
img = car.getImage("front_center", "rgb").data
model = LLaVANeXT(use_local = True)
result = model.run(rgbimage=img, prompt=<prompt>)
print(result)
This code is licensed under the Apache 2.0 License.