Behavior monitoring

Coverage and Performance Metrics

Metrics collected during test execution are used to answer two critical questions: how well was the SUT (System Under Test) tested, and how well did the SUT perform within these tests. The first question is answered by the coverage grade, the multi-dimensional representation of all situations encountered during testing. The second question is answered by performance grade, the collection of Key Performance Indicators, normalized within their context.

Together, these metrics provide insight to the following questions

Coverage:

What is the current coverage grade (overall and specifically for a given scenario)?
What are the main coverage holes for a scenario? How do they cluster, in other words, are there big uncovered areas?

Performance:

What were the values for a specific KPI (overall and specifically for a given scenario)? Do those cluster in some interesting way?
How well does the SUT perform on specific KPI grades (overall and for a scenario)? Where is this worse / better than the previous SW release?
How may runs actually failed with a SUT error, in other words, with a grade below the threshold? How do they cluster?
What is the trend in all of these relative to the previous week? Which metrics improved and which degraded?

Both coverage and performance metrics defined in OSC2 typically implement a verification plan, specifying goals and thresholds. The verification plan is a result of an engineering effort driven by requirements such as AV performance, ODD, safety standards and so on.

cover()

Purpose

Define a coverage data collection point.

Category

Struct, actor, or scenario member

Note

cover() is also allowed in the with block of field declarations.

Syntax

cover([name: ] <name> 
    [, expression: <exp>] 
    [, <param>* ])

Syntax parameters

<name>: (Required) Is a user-defined identifier composed of any number of characters A–Z, a-z, 0-9, and underscore (_). Identifiers beginning with a digit or an underscore are not allowed.
<exp>: (Optional) Is an expression using objects in the enclosing construct. The expression must be of scalar type. The value of the cover item is the value of the expression when the cover group event occurs. If <exp> is not provided, the expression is derived from the name.
<param>*: (Optional) See cover() and record() parameters.

Description

Coverage is a mechanism for sampling key parameters related to scenario execution. Analyzing aggregate coverage helps determine how safely the AV behaved and what level of confidence you can assign to the results.

For example, to determine the conditions under which a cut_in_and_slow_down scenario failed or succeeded, you might need to measure:

The speed of the sut.car
The relative speed of the passing car
The distance between the two cars

You can specify when to sample these items. For example, the key events for this scenario are the start and end events of the change_lane phase.

Cover items that have the same sampling event are aggregated into a single metric group, along with record data sampled by the same event. The default event for collection coverage is end.

If the range of data that you want to collect is large, you might want to slice that range into subranges or buckets. For example, if you expect the SUT to travel at a speed between 10 kph and 130 kph, specifying a bucket size of 10 gives you 12 buckets, with 10 kph - 19 kph as the first bucket.

Note

Buckets are always open on the right end, so [1..2] includes 1 but not 2.

You can also specify an explanatory line of text to display about the cover item during coverage analysis.

This example defines a name, a unit of measurement, a line of display text, and a range and a range slice for the field speed:

OSC2 code: behavior monitoring with cover()

    speed1: speed
    cover(speed1, unit: kph,
        text: "Absolute speed of ego (in km/h)",
        range: [10..130], every: 10)

Item type	Resolved buckets - cover	Resolved buckets - record
int, uint, float, physical	A single [MIN_VALUE..MAX_VALUE] bucket, where MIN_VALUE and MAX_VALUE are the minimal and maximal values that can be represented in the type.	A new bucket will be opened by the runtime for each sampled value.
enum	A single-value bucket for each member.	A single-value bucket for each member.
bool	Single-value 'true' and 'false' buckets.	Single-value 'true' and 'false' buckets.
string	A new bucket will be opened by the runtime for each distinct value.	A new bucket will be opened by the runtime for each sampled value.

Name	Description
log()	Used to report major events and messages
log_info()	More detailed reporting
log_debug()	Verbose information that may be useful for debug
log_trace()	Most detailed information used to trace execution

Behavior monitoring

Coverage and Performance Metrics

cover()

Cross coverage: combining coverage from different items

record()

Cross record - Combining metrics

cover() and record() parameters

unit: <unit>

range: <range>

every: <value>

event: <event-name>

text: <string>

items: <list>

buckets: <list-of-bucket-boundaries>

buckets: <list-of-explicit-buckets>

ignore: <item-bool-exp>

sample_if: <bool-exp>

disable: <bool>

target: <value>

override option

trace()

Failure response constructs

sut_error()

log*()

Statistics modifier

`unit: <unit>`

`range: <range>`

`every: <value>`

`event: <event-name>`

`text: <string>`

`items: <list>`

`buckets: <list-of-bucket-boundaries>`

`buckets: <list-of-explicit-buckets>`

`ignore: <item-bool-exp>`

`sample_if: <bool-exp>`

`disable: <bool>`

`target: <value>`