A Computationally Efficient, Machine Learning-Based Approach to Identify Failure Cases for AV Validation

June 14, 2021
1 min read

Building a fleet of millions of vehicles that can safely operate in an unconstrained world is an enormous challenge because there are a near infinite number of scenarios in real-world driving. Each intersection could easily contain a million different scenarios, and it is difficult to conclude how much testing is enough to ensure safety for all road users. Safety is an important vehicle selection factor for consumers today, and regulatory compliance will be required for commercial deployment of AVs. Even with regulations, consumers will still be wary about the safety of AVs. According to the poll conducted by Partners for Automated Vehicle Education (PAVE), 48% of Americans responded that they “would never get in a taxi or ride-share vehicle that was being driven autonomously.”

To confidently state that the safety of an autonomous vehicle system is validated, development teams must test and pass not only nominal scenarios but also edge case scenarios from a large ‘parameter space’ describing a staggering number of combinations of parameters such as speed limits, environmental conditions (e.g., weather, daytime/nighttime), and the presence of actors (e.g., surrounding vehicles, pedestrians, objects) and their behaviors. While approaches like runtime constraints have been suggested as catch-all solutions to ensure that generated scenarios adequately test the intended interaction, the reality is that there needs to be a more intelligent, computationally efficient approach to detect more failing cases. In this blog, the Applied Intuition team will first share today’s common approaches to identify these interesting scenarios and describe how a machine learning-based approach may be used to search for failure cases more efficiently and proactively.

Common Approaches to Finding Failing Scenarios

Testing an entire parameter space is computationally expensive because of the combinatorial explosion of the parameter space (millions of combinations), and is not very useful because most of the combinations are boring or sometimes even unrealistic. To explore a large parameter space in simulation, then, it is important to filter out invalid scenarios (e.g. unrealistic actor behaviors or situations where the intended interactions don’t occur), and smartly select interesting scenarios in which the AV algorithms could potentially fail (e.g. a pedestrian jumping into the street).

Disengagement and crash reports

Typically, the process of understanding scenario types (recurring traffic cases) starts with examining disengagements and crash databases. For example, autonomous vehicle collision reports mandated by the California Department of Motor Vehicles (DMV) are a valuable data source for the AV programs to understand the collision patterns. Through examination of the prior reports, development teams may gain insights such as crash distributions by features or conditions (e.g., time of day, road conditions), factors contributing to AV crashes and disengagements, and the safety performance of AV (measured by crash frequency per unit distance) relative to human drivers. These insights can then inform test scenario designs to focus the development effort. 

While the analysis of real-world adverse events is helpful, it is not enough to validate the safety of autonomous systems because the number of collisions encountered by the test vehicles is only a fraction of high-risk scenarios that AVs could encounter post-production deployment.

Requirements-driven testing

To combat the constraints of using real-world data, a more procedural and methodical approach can be used by deriving scenarios from system requirements themselves. With this approach, validation programs with requirements for their systems define test cases for their AV stack. These requirements and test cases, once parameterized, can be linked to parameterized scenarios that sweep over many permutations of the parameters. These generated scenarios help AV system engineers to test their stacks on incredibly diverse situations, many of which may be extremely dangerous or otherwise impossible to test in the real world. 

There are a variety of ways to improve on parameter sweep-based requirements testing, the most straightforward of which is parameter-based pruning. Parameterized scenarios generally generate three categories of scenarios: those that are realistic and easy for the AV stack, those that are realistic and difficult, and those that are unrealistic. Parameter-based pruning allows scenario creators to exclude scenarios that they know are unrealistic, impossible, or otherwise irrelevant based on the input parameters. By using a combination of parameter sweep scenarios and pruning, AV teams can get a better understanding of their true coverage over a realistic scenario space.

Runtime constraints

The biggest drawback to parameter-based pruning is that it relies on scenario creators to know how scenario parameters will affect simulation properties. Runtime constraints remedy this by allowing scenario creators to introduce constraints on the simulation properties themselves. By specifying constraints on interactions between actors and the ego during simulation time, it is possible to ignore examples that do not display specific desired behaviors or force those behaviors to occur. 

Imagine a scenario in which the goal is to test for avoiding aggressive actors merging on a highway. The parameters would include parameters on a highway traffic spawner and the speed or aggressiveness factor of a designated aggressive actor. In these cases, simulations that don’t necessarily make sense should be pruned or adjusted: for example, where the background highway traffic and the aggressive actor collide, or where the aggressive actor hits the ego vehicle from behind. However, these observed outcomes could vary based on the ego's behaviors and are not clearly semantically tied to the input parameters. In this case, runtime constraints would detect these failure cases early and either trigger a simulation failure or force the simulation to continue within boundary conditions.

This approach faces some limitations. Similar to parameter-based pruning, runtime constraints can be difficult to define in a semantically meaningful fashion. In the rear-end collision example from above, it may be easy to filter for collisions from the rear, but it is difficult to determine if the cause is out of bounds for the test case. For instance, the rear-end collision could be caused by the ego slowing down too quickly, a variation that should be included in the test suite. However, it could also be caused by the actor merging too late, which is a failure that should be excluded. Additionally, runtime constraints can introduce some non-realism, especially while enforcing constraints.  

Despite these limitations, runtime constraints are a powerful way to semantically prune pieces of a scenario’s parameter space without knowing how the parameters map to simulation properties. Combined with random or probabilistic sampling methods on the input parameter space, it can greatly reduce the compute required to validate an AV stack. 

Machine Learning-Based Approach (Auto-Sampling) 

A machine-learning (ML) based approach can help development teams more intelligently explore the parameter space for events of interest. With this approach (we call it auto-sampling), normal, uninteresting cases are automatically minimized when validation teams execute simulation tests. Because only a subset of combinations are run with the goal of finding interesting cases that either result in a failure or close to a failure, it can drastically reduce the amount of time and the cost of compute.

At a high level, this approach works by modeling the autonomous vehicle stack as a black or grey box function mapping between a scenario’s parameter space and the results of the simulation. This abstraction allows our auto-sampling mechanism to apply a variety of statistical techniques to sample only the most important, interesting edge cases. 

Auto-sampling has the potential to reduce the number of searched parameters by orders of magnitude. For example, for a standard unprotected left turn scenario with an ego and two actors, thousands of combinations of initial parameters may need to be naively or randomly sampled in order to induce a failure. With auto-sampling, the scenario parameter response function for time to collision (TTC) can be approximated and intelligently searched for. Using this technique, it is possible to save as much as 90% of simulation costs and compute time associated with running complex parameterized scenarios, simply by only focusing on the most important areas of the parameter space. 

There are limitations to auto-sampling that largely stem from fundamental difficulties in black-box optimization. Response functions need to be relatively well-behaved and fully deterministic in order for auto-sampling to be most effective. Additionally, different sampling techniques are better suited for different response function topographies, and it is difficult to know beforehand what the response function will look like. However, by combining auto-sampling with parameter-based pruning and runtime constraints, it is possible to remedy most of these problems.

Auto-sampling is a powerful tool that learns from your stack’s failures, explores its unique weak points, and surfaces potential failure modes in a fraction of the time that it takes traditional parameterized scenarios. These techniques augment verification and validation workflows by finding and testing the 1% of failure cases rather than the 99% of cases that are easy to handle. Ultimately, auto-sampling adds a powerful tool to your verification and validation toolbox, allowing you and your team to tell a stronger coverage story.

Applied Intuition’s Approach

The Applied Intuition team believes that exploratory validation that proactively searches for potential failure cases should complement existing validation workflows. Our validation and verification management tool supports AV development teams with detecting failure cases early and mitigating risks of costly, unsafe events from occurring in the real world. The tool seamlessly integrates with all of Applied Intuition’s simulation products and optimizes simulation runs to discover interesting events intelligently. Through robust in-product visualization, our tool supports the communication of safety and functional requirements to stakeholders and regulators. To learn more about our tool or approach to validation workflows, contact our engineering team!