Scalable Data Generation
Scalable data collection is crucial in robotics because it enables rapid prototyping, robust model training, and extensive testing across various real-world scenarios. Traditional data collection can be time-consuming, resource-intensive, and limited by physical constraints. GRID overcomes these challenges by providing a virtual environment where users can quickly simulate and customize complex scenarios, configure sensors, and automate data generation at scale. This means faster iterations, broader coverage of edge cases, and robust datasets that accelerate the development of reliable, intelligent and safe robotics systems.
In this tutorial, we will guide you through a comprehensive end-to-end workflow for generating diverse multimodal sensor data at scale on the GRID platform.
Let’s begin by setting up the environment, configuring the robot, and defining simulation parameters. We’ll generate a trajectory, set up sensors, and enable autonomous data collection. Finally, we’ll see how GRID Enterprise scales this process for efficient, large-scale data generation.
Notebook for this example can be found: Here
Tutorial Outline
- Scene Selection and Customisation: Initialize environment, configure robot and sensors, customize weather, wind, and time settings.
- Trajectory Generation and Sensor Selection: Generate random path, extract waypoints, and select RGB, LiDAR, and IMU modalities.
- Autonomous Data Generation: Gather data along the trajectory, log, and visualize.
- Scaling with GRID Enterprise: Parallelize data generation for large-scale projects.
Scene Selection and Customisation
GRID offers multiple customizable environments, with options to choose different robots and sensor configurations.
For this demo, we will set up a neighborhood scenario using the Clearpath Husky Robot equipped with an RGB camera, LiDAR, and IMU sensors.
Once the session has started, let us go ahead and import our standard modules and initialise our robot.
Once the session is initialized, we can further customize the environment’s physical characteristics, such as wind speed, time of day, and weather conditions. For example, let’s add fog to the simulation, set the time to around sunset, and adjust the wind speed to 5 m/s. Learn more about the configuration parameters in our docs.
Trajectory Generation and Sensor Selection
We will begin by initializing random source and destination points for the robot’s path. The simPlanPathToRandomFreePoint
function searches for random start and end points within a specified radius.
Next, we will define the modalities of data to be collected. In this tutorial, we will gather RGB, LiDAR, and IMU data.
Autonomous Data Generation
With the sensor configurations, trajectory, and environment setup in place, we can now enable the robot to collect data autonomously.
The data collected by the robot can be visualized on the rerun panel.
Scaling up the generation
To effectively scale up data generation, it’s crucial to simulate diverse real-world conditions and scenarios. Here, we randomize environmental parameters such as weather, wind, and time of day to introduce variability, creating a richer dataset that enhances model robustness. Additionally, GRID allows generating multiple trajectories, enabling the robot to navigate different paths under varied conditions. This combination of dynamic settings and paths ensures scalable, consistent data generation across multiple sessions, supporting efficient large-scale projects.
Once the robot has explored all the trajectories by itself, the entire sensor data will be stored in the sensor_data.h5
file which you can download and store in your own system and integrate with your pipelines.
Downloading Your Generated Data
After generating the data from the simulation, follow these steps to download it from the GRID platform:
- Navigate to the GRID platform’s Storage tab, which is located next to the Terminal tab.
- Press the Download button to directly save the data to your local machine.
For extremely large files, there might not be a notification pop-up about the download in your browser. In such cases, check the download folder on your local machine to verify if the download has started.
We are actively working on fixing this issue.
GRID Enterprise - Parallelisation and Optimisation
To optimize and scale data collection processes, GRID Enterprise enables parallelization across multiple sessions. This feature allows for efficient generation of large datasets by running multiple instances simultaneously, significantly reducing time and computational resources needed for large-scale projects.