AI Models
This tutorial demonstrates how to integrate AI models with AirGen simulations for ground vehicles and aerial drones within the GRID platform. You’ll learn how to use perception models for object detection and segmentation while navigating vehicles through customized environments.
Part 1: Using AI Models with Cars
Notebook for this part can be found: Here
Initialize the Car and Import Dependencies
This code creates a simulated car instance and imports essential libraries. The AirGenCar
class provides an interface to control the vehicle, while the data_collector
decorator will help us capture sensor data during movement. WeatherParameter
and Vector3r
allow us to customize the environment and represent 3D positions.
Import AI Models and Configure the Environment
Here we prepare our AI models and customize the environment:
GroundingDINO is a text-guided object detection model combining DINO (Detection Transformer) architecture with grounding capabilities. It can identify objects based on text prompts, making it versatile for autonomous driving perception.
GSAM2 (Grounded Segment Anything Model 2) creates pixel-level masks for objects described by text prompts. We’ll use it to identify road surfaces, helping our vehicle understand drivable areas.
We also configure challenging environmental conditions (fog and sunset lighting) to test how our perception systems perform under difficult visual circumstances.
Generate a Path for the Car to Follow
The simPlanPathToRandomFreePoint
function generates a collision-free path to a random destination within 50 meters. It creates waypoints forming a smooth trajectory and visualizes it in the simulation. We convert these points to the required Vector3r
format for AirGen’s movement functions, preparing the path for our car to follow.
Define a Function to Run the AI Models
This function creates our perception pipeline. It captures an image from the car’s front camera, then uses GroundingDINO to detect other cars (the period after “car” helps with text-prompt parsing). It also uses GSAM2 to create a mask identifying road surfaces. The function returns both raw image data and processed detections, providing the essential environmental understanding that autonomous vehicles need.
Collect Data While the Car Follows the Path
The @data_collector
decorator transforms our movement function into a data collection pipeline that calls runAIModels
every 0.1 seconds. The car follows our generated path at 5 m/s while continuously collecting perception data. This approach demonstrates how to integrate movement control with perception in autonomous systems.
Part 2: Using AI Models with Drones
Notebook for this part can be found: Here
Initialize the Drone
This creates a simulated drone that can move in three-dimensional space. Unlike ground vehicles, drones offer unique aerial perspectives valuable for surveillance and monitoring applications.
Take Off and Position the Drone
The drone takes off and ascends to an altitude of 25 meters (negative Z values represent higher altitudes in AirGen) at 2 m/s. This positions the drone at an ideal height for surveillance, providing a wide field of view while maintaining sufficient image detail for object detection.
Import AI Models for Object Detection
We import GroundingDINO for aerial object detection. This model excels at detecting objects from aerial perspectives because of its:
- Text-guided detection capabilities that adapt to different target objects
- Zero-shot capabilities that work even for objects not seen during training
- Transformer architecture that maintains accuracy with small objects in wide views
Define a Function for Fire Detection
This function creates a fire detection pipeline. It captures an image from the drone’s camera and uses GroundingDINO with “fire.” as the text prompt. The model returns bounding boxes, confidence scores, and labels for detected fires. Fire detection is a critical application for drone monitoring in disaster management and forest protection.
Search for Fire by Rotating the Drone
This code implements a 360-degree scanning pattern to search for fires. We create 30 evenly spaced viewing angles and systematically rotate the drone to each one, running fire detection at each position. When a fire is detected, the drone stops scanning and reports the finding. This methodical scanning approach is similar to real-world drone search patterns used in wildfire monitoring.
Add Smoke Segmentation
CLIPSeg uses CLIP features (Contrastive Language-Image Pretraining) to perform semantic segmentation based on text prompts. Our function captures an image and generates a mask identifying smoke. This complements fire detection, as smoke is often visible before flames and at greater distances. Combining object detection with segmentation creates a more comprehensive fire monitoring system.
Evaluate Vision Models Under Various Weather Conditions
This test evaluates how our perception models perform as visibility deteriorates. The code incrementally increases rain and fog from 0% to 90%, running both detection and segmentation at each step. This systematic approach helps identify how environmental conditions affect perception performance, allowing developers to establish confidence thresholds and adaptive algorithms for real-world deployment.