This tutorial demonstrates how to integrate AI models with AirGen simulations for ground vehicles and aerial drones within the GRID platform. You’ll learn how to use perception models for object detection and segmentation while navigating vehicles through customized environments.

Part 1: Using AI Models with Cars

Notebook for this part can be found: Here

Initialize the Car and Import Dependencies

from grid.robot.wheeled.airgen_car import AirGenCar
# Initialize the robot
import airgen
from airgen.utils.collect import data_collector
from airgen import WeatherParameter, Vector3r
from typing import List, Tuple
airgen_car_0 = AirGenCar()

This code creates a simulated car instance and imports essential libraries. The AirGenCar class provides an interface to control the vehicle, while the data_collector decorator will help us capture sensor data during movement. WeatherParameter and Vector3r allow us to customize the environment and represent 3D positions.

Import AI Models and Configure the Environment

# Import AI models
from grid.model.perception.detection.gdino import GroundingDINO
from grid.model.perception.segmentation.gsam2 import GSAM2

# Initialize AI models
detection = GroundingDINO()
segmentation = GSAM2()

# Get client reference
client = airgen_car_0.client

# Configure environment conditions
client.simEnableWeather(True)
client.simSetWeatherParameter(WeatherParameter.Fog, 1.0)  
client.simSetTimeOfDay(True, "2024-07-22 17:00:00") 

Here we prepare our AI models and customize the environment:

GroundingDINO is a text-guided object detection model combining DINO (Detection Transformer) architecture with grounding capabilities. It can identify objects based on text prompts, making it versatile for autonomous driving perception.

GSAM2 (Grounded Segment Anything Model 2) creates pixel-level masks for objects described by text prompts. We’ll use it to identify road surfaces, helping our vehicle understand drivable areas.

We also configure challenging environmental conditions (fog and sunset lighting) to test how our perception systems perform under difficult visual circumstances.

Generate a Path for the Car to Follow

search_radius = 50  # Distance in meters
# Generate trajectory points
path = client.simPlanPathToRandomFreePoint(search_radius,
                            smooth_path=True, draw_path=True)

# Convert to Vector3r format for AirGen
points = []
for point in path:
    points.append(airgen.Vector3r(
        point['x_val'], point['y_val'], point['z_val']))

The simPlanPathToRandomFreePoint function generates a collision-free path to a random destination within 50 meters. It creates waypoints forming a smooth trajectory and visualizes it in the simulation. We convert these points to the required Vector3r format for AirGen’s movement functions, preparing the path for our car to follow.

Define a Function to Run the AI Models

def runAIModels(client: airgen.VehicleClient) -> dict:
    # Capture RGB image from front camera
    img, _ = client.getImages("front_center", [airgen.ImageType.Scene])[0]
    
    # Run object detection for cars
    boxes, scores, labels = detection.run(img, "car.")
    
    # Run segmentation for road surfaces
    mask = segmentation.run(img, "road")
    
    return img, boxes, scores, labels, mask

This function creates our perception pipeline. It captures an image from the car’s front camera, then uses GroundingDINO to detect other cars (the period after “car” helps with text-prompt parsing). It also uses GSAM2 to create a mask identifying road surfaces. The function returns both raw image data and processed detections, providing the essential environmental understanding that autonomous vehicles need.

Collect Data While the Car Follows the Path

@data_collector(runAIModels, time_delta=0.1)
def move_task(
    client: airgen.MultirotorClient, position: Tuple[float], **kwargs
) -> None | Tuple[None, List[dict]]:
    client.moveOnPath(points, velocity=5.0)

# Execute the movement task and collect AI model outputs
img, boxes, scores, labels, mask = move_task(client, (0, 0, -10), 
                                                _collect_data=True)

The @data_collector decorator transforms our movement function into a data collection pipeline that calls runAIModels every 0.1 seconds. The car follows our generated path at 5 m/s while continuously collecting perception data. This approach demonstrates how to integrate movement control with perception in autonomous systems.

Part 2: Using AI Models with Drones

Notebook for this part can be found: Here

Initialize the Drone

from grid.robot.aerial.airgen_drone import AirGenDrone 
airgen_drone_0 = AirGenDrone()

This creates a simulated drone that can move in three-dimensional space. Unlike ground vehicles, drones offer unique aerial perspectives valuable for surveillance and monitoring applications.

Take Off and Position the Drone

airgen_drone_0.client.takeoffAsync().join()
airgen_drone_0.client.moveToZAsync(-25, 2).join()

The drone takes off and ascends to an altitude of 25 meters (negative Z values represent higher altitudes in AirGen) at 2 m/s. This positions the drone at an ideal height for surveillance, providing a wide field of view while maintaining sufficient image detail for object detection.

Import AI Models for Object Detection

import airgen
import numpy as np
from grid.model.perception.detection.gdino import GroundingDINO

groundingdino = GroundingDINO()

We import GroundingDINO for aerial object detection. This model excels at detecting objects from aerial perspectives because of its:

  • Text-guided detection capabilities that adapt to different target objects
  • Zero-shot capabilities that work even for objects not seen during training
  • Transformer architecture that maintains accuracy with small objects in wide views

Define a Function for Fire Detection


def detect(drone, object_name="fire."):
    rgb_image, pose = drone.getImages("front_center", 
                                    [airgen.ImageType.Scene])[0]
    boxes, scores, labels = groundingdino.run(rgb_image, object_name)
    return boxes, scores, labels

This function creates a fire detection pipeline. It captures an image from the drone’s camera and uses GroundingDINO with “fire.” as the text prompt. The model returns bounding boxes, confidence scores, and labels for detected fires. Fire detection is a critical application for drone monitoring in disaster management and forest protection.

Search for Fire by Rotating the Drone

yaw_angles = np.linspace(0, 360, 30)

for yaw in yaw_angles:
    airgen_drone_0.client.rotateToYawAsync(yaw).join()
    boxes, scores, labels = detect(airgen_drone_0.client)
    if 'fire' in labels:
        print("Fire detected!")
        break

This code implements a 360-degree scanning pattern to search for fires. We create 30 evenly spaced viewing angles and systematically rotate the drone to each one, running fire detection at each position. When a fire is detected, the drone stops scanning and reports the finding. This methodical scanning approach is similar to real-world drone search patterns used in wildfire monitoring.

Add Smoke Segmentation

from grid.model.perception.segmentation.clipseg import CLIPSeg
clipseg = CLIPSeg(use_local=True)

# Use CLIP features to try and segment out 'smoke'
def segment(drone_client, object_name="smoke"):
    rgb_image, pose = drone_client.getImages("front_center", 
                                    [airgen.ImageType.Scene])[0]
    result = clipseg.run(rgb_image, object_name)
    return result

# Run segmentation
segment_result = segment(airgen_drone_0.client)

CLIPSeg uses CLIP features (Contrastive Language-Image Pretraining) to perform semantic segmentation based on text prompts. Our function captures an image and generates a mask identifying smoke. This complements fire detection, as smoke is often visible before flames and at greater distances. Combining object detection with segmentation creates a more comprehensive fire monitoring system.

Evaluate Vision Models Under Various Weather Conditions

# Evaluate the vision models with weather variations
import time
airgen_drone_0.client.simEnableWeather(True)

for i in range(10):
    # Calculate weather intensity (0% to 90%)
    weather_intensity = i / 10
    
    # Gradually increase rain and fog
    airgen_drone_0.client.simSetWeatherParameter(
        airgen.WeatherParameter.Rain, 
        weather_intensity
    )
    airgen_drone_0.client.simSetWeatherParameter(
        airgen.WeatherParameter.Fog, 
        weather_intensity
    )
    
    # Run detection and segmentation
    detection_result = detect(airgen_drone_0.client)
    segmentation_result = segment(airgen_drone_0.client)
    
    weather_percent = i * 10
    print(f"Weather step {i}: Rain and fog at {weather_percent}%")
    time.sleep(1)

This test evaluates how our perception models perform as visibility deteriorates. The code incrementally increases rain and fog from 0% to 90%, running both detection and segmentation at each step. This systematic approach helps identify how environmental conditions affect perception performance, allowing developers to establish confidence thresholds and adaptive algorithms for real-world deployment.