The quality of simulation test cases is crucial to developing safe advanced driver-assistance systems (ADAS) and automated driving (AD) systems. Creating high-quality test cases, however, is a difficult task. In this blog, we will explore the different aspects that make some test cases better than others and how ADAS and AD teams can improve the quality of their simulation test cases.
What Makes a Test Case High Quality?
In ADAS and AD simulation, teams create test cases to evaluate the performance of a system (e.g., the perception, prediction, planning, or controls modules) in a virtual environment. Five aspects set apart high-quality test cases from low-quality ones:
- Realism: A high-quality test case describes a situation that could occur in the real world.
- Diversity: A high-quality test case is distinct from other test cases. It describes a situation that differs from the situations described in other test cases.
- Relevance: A high-quality test case is targeted to a specific intended functionality of the ADAS or AD system and effectively covers an appropriate range of test parameter values.
- Adaptability: A high-quality test case is robust to variations in ego behavior and is easily legible and modifiable when needed.
- Evaluation criteria: A high-quality test case measures the relevant metrics and sets the appropriate thresholds so users can accurately assess system performance.
Low-quality test cases can cause the following problems in a team’s ADAS or AD development cycle:
- Unrealistic nature: When a test case describes an unrealistic situation that is either illogical or naturally impossible, its simulation results don’t relate to real-world examples and are meaningless for ADAS or AD performance.
- Limited point of view: When test cases aren’t distinct from one another, teams waste resources testing the same situations multiple times while experiencing a lack of coverage in other areas.
- Irrelevance: Test cases that fail to test the intended functionality or cover an insufficient or irrelevant range of variations slow down testing and overall development.
- Data inflexibility: Rigid or stale test cases equally slow down development as they cannot adapt to changes in ego behavior or the ADAS or AD stack.
- Poor evaluation: Without the appropriate metrics to access system performance, low-quality test cases ultimately lead to the deployment of unsafe ADAS and AD modules.
How to Create a High-Quality Test Case
When creating test cases for ADAS or AD simulation, teams should pay close attention to the following four areas:
- Actor behaviors and trigger conditions
- Parameter sweeps
- Evaluation criteria and pass/fail thresholds
- Scaling to new maps
1. Actor behaviors and trigger conditions
Every simulation test case consists of the ego and other actors in the scene (i.e., other vehicles, bicycles, pedestrians, etc.). How these actors behave plays an important role in test case quality. For a test case to be high quality, actor behavior needs to be diverse, realistic, and relevant.
First, teams should explore which actor behaviors are represented in their library of test cases. Actors should follow a wide variety of behaviors, ranging from simple to more advanced ones. Simple behaviors include stopping, waiting, decelerating, and accelerating. Advanced behaviors include actors following a specific hand-drawn path, following a lane center with a varying lateral offset, or using a configurable lane change trajectory.
Second, teams should utilize motion and intelligence models to represent real-world driving behavior more intelligently. Realistic actor motion models can prevent unrealistic longitudinal and lateral accelerations or non-physical movements. Good test cases use behavior models like the Intelligent Driver Model (IDM) to allow actors to adjust their speed dynamically to surrounding traffic and react realistically to imminent collisions if necessary. High-quality test cases may also use the lane change model MOBIL to allow actors to move around obstacles or other actors in their path dynamically and naturally.
Finally, teams should ensure that a test case is robust to changes in how the ego might respond in the scene. A common problem with statically defined test cases is that they aren’t adaptable to changes in ego behavior. For example, a poorly built test case might be originally defined by overfitting to the current version of the ADAS or AD stack. Good test cases utilize features such as trigger conditions to allow actors in the scene to move based on events during runtime. Teams can also use dynamic constraints on actor behaviors at runtime, which allows actors to slightly modify their motion to account for variance in ego behavior. These conditional features allow the test case to remain valid even as the ADAS or AD stack improves or changes over time. Teams can also use observers (i.e., metrics measured and evaluated with pass/fail criteria) as automated checks to ensure that the test case continues testing the interaction it was designed to test.
2. Parameter sweeps
Parameter sweeps allow teams to expand test coverage by automatically running simulations for a range of different values for each test case parameter. Good parameter sweeps allow teams to create diverse test cases efficiently while preserving relevance and realism.
First, teams should determine which variables in the scene to parameterize and how to “sweep” those parameters. Teams can vary any test case parameter, but the test case type determines which parameters are most useful to vary. In typical test cases involving interactions with other actors, commonly parameterized variables include actor speeds, relative lateral and longitudinal positions, and the relative timing between the ego’s and the actors’ maneuvers. Actor behavior and intelligence model parameters are also often varied (e.g., lateral speed during a cut-in or the actor aggressiveness during a merge). It is also possible to easily test a test case at multiple similar but distinct road locations on a map by varying the initial positions of the ego and the actors.
The ranges of parameter sweeps should come from data-backed research into the target operational design domain (ODD). For example, teams should perform drive log analysis to understand the relevant distributions for the varied parameters (e.g., actor speeds, durations of lane changes, maximum decelerations, etc.).
As the various values of a parameter sweep combine to form several distinct variations, it is essential to verify that each variation is relevant, realistic, and valid. For example, varying the interaction timing while also varying ego and actor speeds indiscriminately will result in many irrelevant variations that do not contain the intended interaction. Parameter sweeps must account for the other sweeps within the test case to preserve the intended interaction. High-quality test cases use dynamic math expressions for certain fields of the test case description, which reference the values of the parameter sweep variables to account for the variation correctly.
3. Evaluation criteria and pass/fail thresholds
The evaluation criteria and pass/fail thresholds of a test case make or break its quality. Evaluation criteria determine whether the ego passes or fails a test case. Pass/fail thresholds set a value for each variable at which the ego goes from a “pass” to a “fail” and vice versa.
For example, the Applied Development Platform (ADP) includes three types of observers to ensure test case quality and enable teams to evaluate an ADAS or AD system’s performance:
- Validity observers ensure that the test case remains accurate regarding the intended test design. For example, a simple test case might contain an observer to check that the ego reaches the intended destination. In another test case, a validity observer might check whether the ego and another actor were ever within a certain distance threshold of each other.
- Core intent observers ensure that the ego completes the main goal of the test case (e.g., reaching the intended destination in a left-turn scenario).
- Global observers ensure that the ego adheres to all global safety and comfort standards it needs to meet in all real-world situations.
An ADAS or AD system only passes a test case if all its observers pass. The criteria for passing or failing an observer varies based on the observer type. Some observers are binary, like “collision” or “reach destination,” whereas others monitor a continuous value and pass if that value exceeds or falls below some specified thresholds. In cases where it is necessary to choose a specific threshold, Test Suites use well-established values from relevant studies by default. Alternatively, customers can use their own pre-defined thresholds.
In addition to using evaluation criteria and pass/fail thresholds to evaluate an ADAS or AD system’s performance on individual test cases, teams should also perform an aggregate analysis of ADAS or AD behavior against all relevant test coverage. This way, teams can better understand the ADAS or AD’s overall performance across the ODD of interest.
4. Scaling to new maps
High-quality test cases should be robust and scalable to different maps and geometries as teams might update or change underlying map data during ADAS and AD development. Applied Intution’s test cases are map-agnostic, making them robust to map changes and allowing teams to adapt them to different maps as needed. Map-agnostic test cases define the initial ego position, initial actor positions, and observer regions using relative placement in relation to a single pair of coordinates. Using abstract scenarios, which are also map-agnostic via relative placement, is another way to ensure test case scalability.
Example 1: Low-Quality Test Case
The video below shows an example of a low-quality test case.
The video shows all distinct variations of actor behavior overlaid into a single scene. The ego enters an intersection with a green light while another actor enters the intersection from the right despite having a red light. The test case only contains the default observers (collision and stack health observer…).
Lead time: In this example, both the ego and the actor start too close to the intersection, which results in insufficient lead time to represent a realistic real-world interaction. High-quality test cases allow 5-10 seconds for the ego to initialize and gain state in the simulated environment before the interaction of interest occurs.
Observers: Because this test case only contains default observers, if the ego began to take a right turn instead of proceeding straight at the intersection, the test would break and become meaningless. High-quality test cases contain several additional observers to ensure test validity and provide information about broader safety and comfort performance.
Parameter sweeps: The test case performs a parameter sweep over a wide range in actor speed, but the variation in speed is not accounted for in the initial position or timing of the actor. By the time the ego reaches the beginning of the intersection, the actor has already crossed the ego’s path in more than half of the variations. These variations are trivially passable and not useful for ADAS and AD testing.
The only variations that require a response from the ego are the ones with a slow actor, which is both rare in the real world and the more straightforward test case to pass. High-quality test cases should control for varied actor speed by constraining the actor to arrive at the same point at the same time across all variations. That way, the ego is forced into the desired interaction, and there is no loss of coverage across the parameterized speed range.
Notably, this test case contains only one parameter sweep variable, but a similar principle applies to each additional variable in the test case. When combining multiple parameter sweeps, an even smaller subset of variations will be useful for ADAS and AD testing.
Result: The above issues result in a test case that will likely pass, even though the test case is not successfully testing what it should cover in theory. The test case description might read: “Ensure the ego can safely avoid a collision with a red light runner that proceeds through the intersection at 5-20 m/s.” Stakeholders may see this test case passing and assume that the ego’s behavior is sufficient for the entire range of possible variations. This false sense of confidence can lead to hidden risks and unexpected failures later in the testing process or, even worse, in the real-world production deployment of an unsafe ADAS or AD system.
Example 2: High-Quality Test Case
The following video shows an example of a high-quality test case:
In this example, we preview the timing of the pedestrian crossing in all distinct variations against a constant-velocity ego. The pedestrian speed, crossing angle, and crossing timing are all varied. The test case also has a set of observers that capture relevant test criteria around safety, comfort, and scenario validity (not pictured here).
Lead time: The interaction between the ego and the pedestrian takes place eight seconds or more into the test case, giving the ego plenty of time to respond properly.
Observers: This test case does not simply test for collisions. For safety-related observers, a minimum time-to-collision and a minimum distance observer check that the ego never gets too close to the pedestrian. For comfort-related observers, we have limits on maximum lateral and longitudinal accelerations and jerk. Lastly, a test validity observer checks that the ego reaches the intended destination while being within the general vicinity of the pedestrian at least once in the scenario.
Parameter sweeps: The test case contains significantly more parameterization than the low-quality example above, resulting in broad coverage of this interaction. Teams can easily add more granularity if desired.
Even though there is a high amount of variation in the pedestrian’s speed, crossing angle, and the point at which the pedestrian crosses the lane, we maintain appropriate timing of the interaction in all variations by applying a constraint to the pedestrian’s motion (namely that it must arrive at the lane center by a specified time that coincides with the time at which the ego reaches the crossing point). As a result, if the ego does not brake in response to the pedestrian heading towards the road, a collision will ensue.
Result: The high-quality test case shown above provides a breadth of coverage over a safety-critical situation that could occur in the real world. The test gives a valuable signal about the ADAS or AD system’s performance in such a case and safeguards against future changes to the ego’s behavior by using robust scenario design best practices.
Applied Intuition’s Approach
Applied Intuition helps ADAS and AD development teams build high-quality simulation test cases, run deterministic tests effectively, and validate their system end-to-end. Our predefined simulation test cases, Test Suites, allow teams to jump-start their test case creation process and ensure that test cases are always high quality. Contact our Test Suites team to learn more.