airgen.ImageType.Scene
: An RGB image showing what the camera seesairgen.ImageType.DepthPerspective
: A depth image containing distance information for each pixelgetImages
method returns a list of image data, which we organize into a dictionary for easier access.
np.where
function creates a binary mask: 1 for pixels where depth is less than 1000, and 0 for pixels where depth is greater than or equal to 1000. This helps us focus on relevant objects in the scene and ignore very distant points.
depth2pointcloud
function converts the 2D depth map into a 3D point cloud. It takes three parameters:
depth
: The depth map (camera_data[0])camera_param
: Camera parameters like focal length and principal point (camera_data[1])mask
: Our binary mask to filter out distant pointsrr.log
function sends data to the Rerun visualizer, with the first parameter being a path in the visualization tree and the second parameter being the data to visualize.
run
method takes an RGB image as input and returns a predicted depth map. This is the key step where AI-based depth estimation occurs.
depth2pointcloud
function expects a depth map with shape (256, 256, 1) where the last dimension is the channel dimension. This reshape operation adds that extra dimension.
depth2pointcloud
function to convert the predicted depth map to a 3D point cloud. Note that we use the camera parameters from the original RGB image (camera_data[1]), as these parameters are the same for both the RGB and depth cameras in the simulation.