The VideoLLaVA
class provides a wrapper for the Video-LLaVA model, which answers questions
about visual media (videos).
If True, inference call is run on the local VM, else offloaded onto GRID-Cortex. Defaults to True.
This model is currently not available via Cortex.
The input RGB image of shape (M,N,3). The path to the input video.
The question to answer about the media.
The response to the prompt.
from grid.model.perception.vlm.video_llava import VideoLLaVA
car = AirGenCar()
# We will be capturing an image from the AirGen simulator
# and run model inference on it.
img = car.getImage("front_center", "rgb").data
model = VideoLLaVA(use_local = True)
result = model.run(rgbimage=img, prompt=<prompt>)
print(result)
This code is licensed under the Apache 2.0 License.