DepthAnything_V2
class implements a wrapper for the DepthAnything_V2 model, which estimates depth
maps from RGB images. The model supports ‘metric’ and ‘relative’ modes, which load
different pre-trained models based on the specified mode. We use the VIT Large encoder.