In advanced driver-assistance systems (ADAS) and automated driving (AD) development, the quantity and quality of training datasets directly impact the performance of ML models. Collecting training data in the real world, however, can be slow, expensive, and constrained by real-world logistics. Annotating data presents an additional challenge, as human labeling is expensive, slow, and error-prone.
Why Synthetic Datasets?
Applied Intuition Synthetic Datasets facilitate data-driven ADAS and AD development by helping perception and validation teams define, generate, and utilize synthetic training data for ML models.
Define datasets with natural language, distribution-based structured language, or the visual editor to scalably get the data your model needs
Dataset management tooling to view statistics, filter, and export data
Generated datasets proven to improve model performance in published case studies
Benefits
Speed up ML training
Obtain new labeled datasets and train the next model iteration up to 32x faster.
Reduce data costs
Reduce spending on data collection and labeling up to 95%.
Improve performance
Improve edge case performance by 3x and achieve aggregate model performance up to 20% faster.
Key components
Rapid scene generation
Easily define and generate synthetic datasets at scale by using natural language scenario generation, Synthetic Datasets’ distribution-based domain randomization framework, or by extracting and augmenting scenes from real-world logs. Directly control distributions to ensure datasets match the task domain and target specific edge cases while being designed to have a minimal domain gap during training.
Sensor simulation
Synthetic Datasets build upon the capabilities of Applied Intuition’s Sensor Sim to ensure synthetic data is physically accurate and representative of target sensors and task domains. Since machines look at data differently from humans, Synthetic Datasets have the diversity and realism necessary for machines to get value from training on the data.
Label generation
Programmatically generate ground truth labels ranging from simple bounding boxes and cuboids to dense labels like optical flow and depth. Customize labels to your taxonomy, ontology, and labeling specification to ensure data seamlessly integrates with existing datasets and ML pipelines.
Domain adaptation
Use domain adaptation based on real training datasets. Re-style or modify synthetic data to match the task domain through a combination of generative and classical algorithms, ensuring that synthetic datasets provide maximal value to ML-enabled systems.
Scalable infrastructure
Utilize Applied Intuition’s Cloud Engine to orchestrate thousands of parallel simulations and generate production-scale datasets in a matter of hours.
Key components
Rapid scene generation
Easily define and generate synthetic datasets at scale by using natural language scenario generation, Synthetic Datasets’ distribution-based domain randomization framework, or by extracting and augmenting scenes from real-world logs. Directly control distributions to ensure datasets match the task domain and target specific edge cases while being designed to have a minimal domain gap during training.
Sensor simulation
Synthetic Datasets build upon the capabilities of Applied Intuition’s Sensor Sim to ensure synthetic data is physically accurate and representative of target sensors and task domains. Since machines look at data differently from humans, Synthetic Datasets have the diversity and realism necessary for machines to get value from training on the data.
Label generation
Programmatically generate ground truth labels ranging from simple bounding boxes and cuboids to dense labels like optical flow and depth. Customize labels to your taxonomy, ontology, and labeling specification to ensure data seamlessly integrates with existing datasets and ML pipelines.
Domain adaptation
Use domain adaptation based on real training datasets. Re-style or modify synthetic data to match the task domain through a combination of generative and classical algorithms, ensuring that synthetic datasets provide maximal value to ML-enabled systems.
Scalable infrastructure
Utilize Applied Intuition’s Cloud Engine to orchestrate thousands of parallel simulations and generate production-scale datasets in a matter of hours.