Applied Intuition’s perception team has conducted a case study that uses synthetic data to improve a perception algorithm’s object detection performance on underrepresented classes in a real-world dataset.
Real-world data is indispensable when training object detection models for autonomous vehicle (AV) perception systems. Unfortunately, the real world doesn’t always easily provide all the data needed for successful training. For example, classes such as cyclists and motorcyclists may occur less frequently than pedestrians and cars, making it difficult for perception models trained on real-world data to correctly detect them (Figure 1). Similarly, the most dangerous situations such as accidents may be hidden in the last few percent of test drives.
Even though underrepresented classes and long-tail events occur less frequently in the real world, object detection models still need to be trained to handle them just as well as more common classes and situations. In the past few years, perception teams have started to utilize synthetic data to help address some of these limitations in real-world datasets. There still exists a domain gap between real-world and synthetic data, which is important to acknowledge, but recent methods are overcoming this gap with a combination of improved synthetic data and new machine learning training strategies.
To demonstrate this use case, Applied Intuition’s perception team has conducted a case study that uses synthetic data as a supplemental training resource to address a class imbalance found in a real-world dataset. The study shows that synthetic data may be used to help mitigate class imbalances and address areas where real-world data is limited. Download the full-length case study below.
Goal and Scope
This case study uses nuImages—a commonly used dataset by nuScenes—as a baseline training dataset. In the dataset, the cyclist class occurs 170 times less frequently than more prominent classes such as cars and pedestrians (Figure 2).
The study generates and uses a synthetic dataset to improve a perception algorithm’s object detection performance on cyclists while retaining or improving object detection performance on other classes. It also explores whether the use of synthetic data can reduce the amount of real-world data needed to improve a model’s object detection performance.
Implementation
The study consists of the following steps:
- Analyze a baseline model trained only on real-world data from the nuImages dataset.
- Generate labeled synthetic data that specifically targets the lack of representation of the cyclist class in the real-world dataset. More examples of cyclists are created in the synthetic dataset.
- Use the above synthetic data as a supplemental training resource in addition to the nuImages data.
1. Baseline model analysis
First, it is measured how a perception model reacts to a class imbalance in the real-world nuImages data. A Cascade Mask R-CNN perception model is trained on this dataset until convergence. Its resulting object detection performance is lower on the cyclist class compared to all other classes (Figure 3).
2. Synthetic data generation
Next, synthetic data is generated to upsample the underrepresented cyclist class (Figure 4). This case study uses procedural 3D environment generation, automatic scenario creation, and a synthetic data generation pipeline to enable this process.
3. Perception model training with synthetic and real-world data
The above synthetic dataset is then used to improve model performance in the following experiments.
i) Mixed training experiment
Synthetic data and real-world data are combined into one large training dataset. Batches that contain both real-world and synthetic data are randomly sampled from this dataset during training. Two trials are conducted to adjust the ratio of synthetic to real-world data and explore whether using more synthetic data and less real-world data impacts the model’s object detection performance:
- Trial with a 0.5:1 ratio of synthetic to real-world data
- Trial with a 1:1 ratio of synthetic to real-world data
ii) Fine-tuning experiment
A model is trained to convergence on only the synthetic dataset using a small holdout synthetic set for validation. Three trials are then conducted to fine-tune the model on the following amounts of real-world data:
- Fine-tuning without data ablation: 100% of the nuImages training set
- Fine-tuning with data ablation: 70% of the nuImages training set
- Fine-tuning with data ablation: 50% of the nuImages training set
Key Results
1. Quantitative results
Compared to the baseline model with 100% of the real-world data, mixing synthetic and real-world data (mixed training) leads to improvements on the cyclist class (Figure 5). Pre-training a model on synthetic data and then fine-tuning it on 100% of the real-world data (fine-tuning without data ablation) shows the highest performance improvement, outperforming the baseline model consistently on all classes (Figure 5).
Pre-training a model on synthetic data and then fine-tuning it on 70% of the real-world data (fine-tuning with data ablation) leads to an improved performance, both on the cyclist class (Figure 5) and overall (Figure 6).
2. Qualitative insights
The case study suggests that synthetic data can help improve object detection performance in difficult cases. The following images show cases in which the baseline model fails to adequately detect a cyclist while the model pre-trained on synthetic data succeeds (Figures 7 a) - 7 c)).
Commercial Applications
This case study shows an early indication that synthetic data is a useful supplementary tool to real-world datasets when training AV perception models on both nominal and edge cases. Even though situations such as fallen objects, pedestrians or live animals on a highway, and bad visibility due to dense fog are rare, AVs must be prepared to safely navigate them. Synthetic data can help address class imbalances by improving a perception model’s object detection performance on minority classes such as cyclists. Synthetic data also provides a fast, cost-effective, and ethical way to generate training datasets and target rare or difficult cases when real-world data is too infrequent or dangerous to collect.
Contact our team of perception engineers to learn more about Applied Intuition’s synthetic datasets.