Scheduled Testing Feedback and Observability

Once a model is in production, Robust Intelligence can provide detailed information on the model’s performance to enable you to identify and correct issues.

A model under testing displays summary information about model health on the overview panel of the project that contains the model. Robust Intelligence monitors machine learning model performance across the following three risk categories:

Operational tests a model’s overall performance and data stability of over time.
Security tests a model’s resilience against compromise from external attacks.
Fairness tests a model’s outcome for fair treatment among subsets in the data.

Viewing a stress test schedule

Sign in to the Robust Intelligence UI. The Workspaces page appears.
Click a workspace. The Workspaces summary page appears.
Select a project.

You can filter or sort the list of projects in a workspace with the Sort and Filter controls in the upper right. Click the glyph to the right of the Filter control to switch between list and card display for projects. Type a string in Search Projects… to display only projects that match the string.
The Production panel on the right shows the Schedule ID and Schedule Status for the test. If no schedule is configured, see Scheduled Stress Testing.

Viewing the CT risk pages

Sign in to the Robust Intelligence UI. The Workspaces page appears.
Click a workspace. The Workspaces summary page appears.
Select a project.
Select the Continuous Testing tab in the left panel. This shows the Continuous Testing overview panel for the project you selected.

Continuous Testing overview panel

The Continuous Testing overview panel shows, for this project:

a graph of test results over time; and
a list of events list summarizing the results of each test run with respect to measures of Operational, Security, and Fairness risk.
- Click the Filter button at the upper right to help find an event. See the next section for help with filters.
- Click on a row to inspect an event. This displays the inspection page for the continuous testing run, including a list of all tests that ran. Click on a test to see its measures, Key Insights, and any features that were flagged in this test run.

Filter events in the Continuous Testing overview panel

Use the Filter button at the upper right to help find an event. See the filter criteria explanations, below.

Filter	Description	Example states
Bin date	Show only events generated from data points whose dates fall within the range you specify.	Last 7 days
Run date	Show only events generated from test runs that were completed within the range you specify.	Last 7 days
Model	Model name	"My image classification"
Status	Run status of the continuous test run.	Completed, Failed, In Progress
Operational	Operational risk result of the test run.	Alert, Warning, Pass
Fairness	Fairness risk result of the test run.	Alert, Warning, Pass
Security	Security risk result of the test run.	Alert, Warning, Pass

Tests

Abnormal inputs test

Abnormal inputs monitor	Description
Unseen Categorical	Tests the models response to data that contains categorical values that are never observed in the reference dataset.
Rare Categories	Tests the models response to data that contains categorical values that are rarely observed in the reference dataset.
Numeric Outliers	Tests the models response to data that contains numeric values outside the typical range for that feature in the reference dataset.
Abnormality Rate	Tests the total percent of rows with any abnormalities.
Feature Type Check - Count	Tests the number of feature values that are of the incorrect type.
Null Check - Count	Tests the number of null values for features that do no have nulls in the reference dataset.
Empty String - Count	Tests the number of empty or null strings for each string feature.
Capitalization - Count	Tests the number of string values that are capitalized differently from those observed in the reference set.
Inconsistencies - Count	Tests the number of data points with pairs of feature values that are inconsistent with each other.
Required Characters - Count	Tests the number of required characters in feature values.

Security risk

Tests for security risk assess the security of the model and underlying dataset, providing alerts in cases of model evasion or subversion.

Security risk monitor	Description
Data Poisoning	Tests for corrupted input data.
Model Evasion	Tests for adversarial evasion attacks.

Fairness and Compliance risk

Tests for fairness and compliance risk assess a model’s outcome for fair treatment among subcategories in the data.

Fairness monitor	Description
Intersectional Group Fairness	Tests for changes in the model performance over different slices of data from the intersection of two protected features.
Positive Prediction Rate	Tests whether the model's positive prediction rate differs significantly across different subsets of protected features.
Predictive Equality	Test whether model performance differs significantly across different subsets of protected features.
Equal Opportunity Recall	Tests whether model recall differs significantly across different subsets of protected features.
Class imbalance	Test if any subsets of a feature have high class imbalance bias as a result of having a significantly smaller sample size.
Demographic Parity	Tests how the model performance over subsets of protected features compare to the subset with highest performance.
Protected Feature Drift	Tests the change in distribution of protected features.

Model monitors

Monitors are an optional Robust Intelligence feature that work with continuous tests to alert you when a test fails. Learn more about optional Monitors.