Scalable data collection is crucial in robotics because it enables rapid prototyping, robust model training, and extensive testing across various real-world scenarios. Traditional data collection can be time-consuming, resource-intensive, and limited by physical constraints. GRID overcomes these challenges by providing a virtual environment where users can quickly simulate and customize complex scenarios, configure sensors, and automate data generation at scale. This means faster iterations, broader coverage of edge cases, and robust datasets that accelerate the development of reliable, intelligent and safe robotics systems.In this tutorial, we will guide you through a comprehensive end-to-end workflow for generating diverse multimodal sensor data at scale on the GRID platform.Let’s begin by setting up the environment, configuring the robot, and defining simulation parameters. We’ll generate a trajectory, set up sensors, and enable autonomous data collection. Finally, we’ll see how GRID Enterprise scales this process for efficient, large-scale data generation.
GRID offers multiple customizable environments, with options to choose different robots and sensor configurations.For this demo, we will set up a neighborhood scenario using the Clearpath Husky Robot equipped with an RGB camera, LiDAR, and IMU sensors.Once the session has started, let us go ahead and import our standard modules and initialise our robot.
Once the session is initialized, we can further customize the environment’s physical characteristics, such as wind speed, time of day, and weather conditions. For example, let’s add fog to the simulation, set the time to around sunset, and adjust the wind speed to 5 m/s. Learn more about the configuration parameters in our docs.
Copy
Ask AI
# set the weather, wind, and timeofday parametersclient = airgen_car_0.clientclient.simEnableWeather(True)# adds fog to the sceneclient.simSetWeatherParameter(WeatherParameter.Fog, 1.0)# sets a 5 m/s wind in X direction, only supported for drones for now# client.simSetWind(airgen.Vector3r(5, 0, 0))# sets the time of day to be around sunsetclient.simSetTimeOfDay(True, "2024-07-22 17:00:00")
We will begin by initializing random source and destination points for the robot’s path. The simPlanPathToRandomFreePoint function searches for random start and end points within a specified radius.
Copy
Ask AI
search_radius = 100 # distance in meters# generates the trajectory of pointspath = client.simPlanPathToRandomFreePoint( search_radius, smooth_path=True, draw_path=True)points = []for point in path: vector_point = airgen.Vector3r( point['x_val'], point['y_val'], point['z_val'] ) points.append(vector_point)
Next, we will define the modalities of data to be collected. In this tutorial, we will gather RGB, LiDAR, and IMU data.
Copy
Ask AI
def readSensors(client: airgen.VehicleClient) -> dict: sensor_data = {} sensor_data["imu"] = client.getImuData() # Get RGB camera data images = client.getImages( "front_center", [airgen.ImageType.Scene] ) sensor_data['rgb'] = images[0] sensor_data["lidar"] = client.getLidarData() return sensor_data
To effectively scale up data generation, it’s crucial to simulate diverse real-world conditions and scenarios. Here, we randomize environmental parameters such as weather, wind, and time of day to introduce variability, creating a richer dataset that enhances model robustness. Additionally, GRID allows generating multiple trajectories, enabling the robot to navigate different paths under varied conditions. This combination of dynamic settings and paths ensures scalable, consistent data generation across multiple sessions, supporting efficient large-scale projects.
Copy
Ask AI
import random, h5py, numpy as npfrom airgen import WeatherParameter, Vector3rfrom grid import GRID_USER_SESSION_BLOB_DIRsave_path = os.path.join( GRID_USER_SESSION_BLOB_DIR, "sensor_data.h5")client.simEnableWeather(True)# Generate data for multiple trajectoriesnum_trajectories = 5weather_options = [ WeatherParameter.Rain, WeatherParameter.Roadwetness, WeatherParameter.Snow, WeatherParameter.RoadSnow, WeatherParameter.MapleLeaf, WeatherParameter.RoadLeaf, WeatherParameter.Dust, WeatherParameter.Fog]# Open the file in append mode oncewith h5py.File(save_path, 'a') as hdf5_file: for traj_idx in range(num_trajectories): # Set random weather weather_param = random.choice(weather_options) weather_intensity = random.uniform(0, 1) client.simSetWeatherParameter(weather_param, weather_intensity) # Set random time of day hour = random.randint(0, 23) minute = random.randint(0, 59) time_str = f"2024-07-22 {hour:02}:{minute:02}:00" client.simSetTimeOfDay(True, time_str) # Generate random path path = client.simPlanPathToRandomFreePoint( 100, smooth_path=True, draw_path=True ) points = [ Vector3r(p['x_val'], p['y_val'], p['z_val']) for p in path ] # Collect sensor data _, sensor_data = move_task( client, (0, 0, -10), _collect_data=True ) # Create a group for each trajectory traj_group = hdf5_file.create_group(f"trajectory_{traj_idx}") for i, data in enumerate(sensor_data): # Create subgroup for each frame frame_group = traj_group.create_group(f"frame_{i}") frame_group.create_dataset("rgb", data=data["rgb"][0]) # Process LiDAR data lidar_points = np.array(data["lidar"].point_cloud) lidar_reshaped = lidar_points.reshape(-1, 3) frame_group.create_dataset("lidar", data=lidar_reshaped) # Logging for visualization if required rr.log("grid/imagery", rr.Image(data["rgb"][0])) rr.log("pointcloud", rr.Points3D(lidar_reshaped)) trajectory_num = traj_idx + 1 data_count = len(sensor_data) print(f"Collected {data_count} measurements for trajectory {trajectory_num}")
Once the robot has explored all the trajectories by itself, the entire sensor data will be stored in the sensor_data.h5 file which you can download and store in your own system and integrate with your pipelines.
After generating the data from the simulation, follow these steps to download it from the GRID platform:
Navigate to the GRID platform’s Storage tab, which is located next to the Terminal tab.
Press the Download button to directly save the data to your local machine.
For extremely large files, there might not be a notification pop-up about the download in your browser. In such cases, check the download folder on your local machine to verify if the download has started.
We are actively working on fixing this issue.
GRID Enterprise - Parallelisation and Optimisation
To optimize and scale data collection processes, GRID Enterprise enables parallelization across multiple sessions. This feature allows for efficient generation of large datasets by running multiple instances simultaneously, significantly reducing time and computational resources needed for large-scale projects.