Scalable Data Generation

Scalable data collection is crucial in robotics because it enables rapid prototyping, robust model training, and extensive testing across various real-world scenarios. Traditional data collection can be time-consuming, resource-intensive, and limited by physical constraints. GRID overcomes these challenges by providing a virtual environment where users can quickly simulate and customize complex scenarios, configure sensors, and automate data generation at scale. This means faster iterations, broader coverage of edge cases, and robust datasets that accelerate the development of reliable, intelligent and safe robotics systems.

In this tutorial, we will guide you through a comprehensive end-to-end workflow for generating diverse multimodal sensor data at scale on the GRID platform.

Let’s begin by setting up the environment, configuring the robot, and defining simulation parameters. We’ll generate a trajectory, set up sensors, and enable autonomous data collection. Finally, we’ll see how GRID Enterprise scales this process for efficient, large-scale data generation.

Notebook for this example can be found: Here

Tutorial Outline

Scene Selection and Customisation: Initialize environment, configure robot and sensors, customize weather, wind, and time settings.
Trajectory Generation and Sensor Selection: Generate random path, extract waypoints, and select RGB, LiDAR, and IMU modalities.
Autonomous Data Generation: Gather data along the trajectory, log, and visualize.
Scaling with GRID Enterprise: Parallelize data generation for large-scale projects.

Scene Selection and Customisation

GRID offers multiple customizable environments, with options to choose different robots and sensor configurations.

For this demo, we will set up a neighborhood scenario using the Clearpath Husky Robot equipped with an RGB camera, LiDAR, and IMU sensors.

Once the session has started, let us go ahead and import our standard modules and initialise our robot.

# initialise the robot
import airgen
from airgen.utils.collect import data_collector
from airgen import WeatherParameter, Vector3r
from typing import List, Tuple, Dict, Any, Optional, Callable
import rerun as rr
import random, h5py, numpy as np

from grid.robot.wheeled.airgen_car import AirGenCar
airgen_car_0 = AirGenCar()

Once the session is initialized, we can further customize the environment’s physical characteristics, such as wind speed, time of day, and weather conditions. For example, let’s add fog to the simulation, set the time to around sunset, and adjust the wind speed to 5 m/s. Learn more about the configuration parameters in our docs.

# set the weather, wind, and timeofday parameters

client = airgen_car_0.client

client.simEnableWeather(True)
# adds fog to the scene
client.simSetWeatherParameter(WeatherParameter.Fog, 1.0)

# sets a 5 m/s wind in X direction, only supported for drones for now
# client.simSetWind(airgen.Vector3r(5, 0, 0))

# sets the time of day to be around sunset
client.simSetTimeOfDay(True, "2024-07-22 17:00:00")

Trajectory Generation and Sensor Selection

We will begin by initializing random source and destination points for the robot’s path. The simPlanPathToRandomFreePoint function searches for random start and end points within a specified radius.

search_radius = 100  # distance in meters

# generates the trajectory of points
path = client.simPlanPathToRandomFreePoint(
    search_radius, 
    smooth_path=True, 
    draw_path=True
)

points = []
for point in path:
    vector_point = airgen.Vector3r(
        point['x_val'], 
        point['y_val'], 
        point['z_val']
    )
    points.append(vector_point)

Next, we will define the modalities of data to be collected. In this tutorial, we will gather RGB, LiDAR, and IMU data.

def readSensors(client: airgen.VehicleClient) -> dict:
    sensor_data = {}
    sensor_data["imu"] = client.getImuData()
    
    # Get RGB camera data
    images = client.getImages(
        "front_center",
        [airgen.ImageType.Scene]
    )
    sensor_data['rgb'] = images[0]
    
    sensor_data["lidar"] = client.getLidarData()
    return sensor_data

Autonomous Data Generation

With the sensor configurations, trajectory, and environment setup in place, we can now enable the robot to collect data autonomously.

@data_collector(readSensors, time_delta=0.1)
def move_task(
    client: airgen.MultirotorClient, 
    position: Tuple[float], 
    **kwargs
) -> None | Tuple[None, List[dict]]:
    client.moveOnPath(points, velocity=5.0)


_, sensor_data = move_task(
    client, 
    (0, 0, -10), 
    _collect_data=True
)

for i, data in enumerate(sensor_data):
    lidar = data["lidar"]
    rgb, _ = data["rgb"]
    
    # Log imagery data
    rr.log("grid/imagery", rr.Image(rgb))
    
    # Log point cloud data
    point_cloud = np.array(lidar.point_cloud).reshape(-1, 3)
    rr.log("pointcloud", rr.Points3D(point_cloud))

print(f"collected {len(sensor_data)} measurements during moving task")

The data collected by the robot can be visualized on the rerun panel.

Scaling up the generation

To effectively scale up data generation, it’s crucial to simulate diverse real-world conditions and scenarios. Here, we randomize environmental parameters such as weather, wind, and time of day to introduce variability, creating a richer dataset that enhances model robustness. Additionally, GRID allows generating multiple trajectories, enabling the robot to navigate different paths under varied conditions. This combination of dynamic settings and paths ensures scalable, consistent data generation across multiple sessions, supporting efficient large-scale projects.

import random, h5py, numpy as np
from airgen import WeatherParameter, Vector3r
from grid import GRID_USER_SESSION_BLOB_DIR

save_path = os.path.join(
    GRID_USER_SESSION_BLOB_DIR, 
    "sensor_data.h5"
)

client.simEnableWeather(True)

# Generate data for multiple trajectories
num_trajectories = 5
weather_options = [
    WeatherParameter.Rain, 
    WeatherParameter.Roadwetness, 
    WeatherParameter.Snow,
    WeatherParameter.RoadSnow, 
    WeatherParameter.MapleLeaf, 
    WeatherParameter.RoadLeaf,
    WeatherParameter.Dust, 
    WeatherParameter.Fog
]

# Open the file in append mode once
with h5py.File(save_path, 'a') as hdf5_file:
    for traj_idx in range(num_trajectories):
        # Set random weather
        weather_param = random.choice(weather_options)
        weather_intensity = random.uniform(0, 1)
        client.simSetWeatherParameter(weather_param, weather_intensity)
        
        # Set random time of day
        hour = random.randint(0, 23)
        minute = random.randint(0, 59)
        time_str = f"2024-07-22 {hour:02}:{minute:02}:00"
        client.simSetTimeOfDay(True, time_str)

        # Generate random path
        path = client.simPlanPathToRandomFreePoint(
            100, 
            smooth_path=True, 
            draw_path=True
        )
        points = [
            Vector3r(p['x_val'], p['y_val'], p['z_val']) 
            for p in path
        ]

        # Collect sensor data
        _, sensor_data = move_task(
            client, 
            (0, 0, -10), 
            _collect_data=True
        )

        # Create a group for each trajectory
        traj_group = hdf5_file.create_group(f"trajectory_{traj_idx}")

        for i, data in enumerate(sensor_data):
            # Create subgroup for each frame
            frame_group = traj_group.create_group(f"frame_{i}")
            frame_group.create_dataset("rgb", data=data["rgb"][0])
            
            # Process LiDAR data
            lidar_points = np.array(data["lidar"].point_cloud)
            lidar_reshaped = lidar_points.reshape(-1, 3)
            frame_group.create_dataset("lidar", data=lidar_reshaped)

            # Logging for visualization if required
            rr.log("grid/imagery", rr.Image(data["rgb"][0]))
            rr.log("pointcloud", rr.Points3D(lidar_reshaped))

        trajectory_num = traj_idx + 1
        data_count = len(sensor_data)
        print(f"Collected {data_count} measurements for trajectory {trajectory_num}")

Once the robot has explored all the trajectories by itself, the entire sensor data will be stored in the sensor_data.h5 file which you can download and store in your own system and integrate with your pipelines.

Downloading Your Generated Data

After generating the data from the simulation, follow these steps to download it from the GRID platform:

Navigate to the GRID platform’s Storage tab, which is located next to the Terminal tab.
Press the Download button to directly save the data to your local machine.

For extremely large files, there might not be a notification pop-up about the download in your browser. In such cases, check the download folder on your local machine to verify if the download has started.
We are actively working on fixing this issue.

GRID Enterprise - Parallelisation and Optimisation

To optimize and scale data collection processes, GRID Enterprise enables parallelization across multiple sessions. This feature allows for efficient generation of large datasets by running multiple instances simultaneously, significantly reducing time and computational resources needed for large-scale projects.

Get Started

Open GRID

GRID Enterprise

Robot API

Simulation

AI Layer

Deployment

Data Generation Pipelines

FAQ

Tutorial Outline

Scene Selection and Customisation

Trajectory Generation and Sensor Selection

Autonomous Data Generation

Scaling up the generation

Downloading Your Generated Data

GRID Enterprise - Parallelisation and Optimisation

Get Started

Open GRID

GRID Enterprise

Robot API

Simulation

AI Layer

Deployment

Data Generation Pipelines

FAQ

​Tutorial Outline

​Scene Selection and Customisation

​Trajectory Generation and Sensor Selection

​Autonomous Data Generation

​Scaling up the generation

​Downloading Your Generated Data

​GRID Enterprise - Parallelisation and Optimisation

Tutorial Outline

Scene Selection and Customisation

Trajectory Generation and Sensor Selection

Autonomous Data Generation

Scaling up the generation

Downloading Your Generated Data

GRID Enterprise - Parallelisation and Optimisation