RIME v11 Changelog

AI Firewall

This new RIME mode allows you to deploy an automatically trained AI Firewall around your machine learning model. The custom firewall is auto-generated based on the model impacts and data vulnerabilities identified during AI Stress Testing in order to prevent AI failure when deployed. Contact Robust Intelligence if you are interested in a trial.

AI Stress Testing

New Test Categories

AI Stress Tests are now sorted into 4 categories that measure robustness across different axes. See the documentation for descriptions of all tests that fall into these categories.
- Abnormal Input: Measure system robustness to erroneous inputs that can arise in a production setting.
- Distribution Shift: Measure the severity of the shift in the data distribution.
- Model Behavior: Explore the performance of your model across various slices of data.
- Data Cleanliness: Measure the prevalence of common data errors.

Updated Test Details

The test case views have been reorganized to more clearly convey information.

Custom Columns per Test Category

Each test category has a custom set of columns so as to better convey the results.

New Model Impact Calculation

Model impact is now a bucketed value ranging from 0-10 so as to simplify the output.
- In abnormal input tests, we introduce a slight change to the calculation of model impact in that it is now a combination of two model impact calculations. The first component is adversarial model impact, which measures the change in model performance or model predictions when introduced to an input that has been perturbed in a specific way to each test. The second component is observed model impact, which measures the change in model performance or model predictions across the failing rows seen in the evaluation set. The total model impact is thus a combination of these results.
- In distribution shift tests, we introduce the observed model impact which measures the change in model predictions across rows that have experienced drift in the evaluation set.

Feature First View

This new view of test runs allows users to view test results by features.

New Use Case - Multi-class Classification

RIME AI Stress Testing now supports multi-class classification with additional tests specific to the use case.

Improved Logging

Added support for structured logging as well as explicit logging of model latency and test runtime.

Updated Nonparametric Numeric Outlier Test

The numeric outlier test has been updated to use a nonparametric method to determine outliers. This speeds up runtimes significantly.

New Tests

Empty String - An abnormal input test that measures whether the model is invariant between empty strings and null values.
Required Characters - An abnormal input test that measures whether the model is tolerant of feature values missing required characters.
Mutual Information Shift - A distribution shift test that measures the change in the mutual information value between features.
Categorical Label Drift - A distribution shift test that measures the drift in the distribution of categorical labels.
Regression Label Drift (Regression Only) - A distribution shift test that measures the drift of labels in regression use cases.
Multi-class Prediction Drift (Multi-class Only) - A distribution shift test that measures the drift of predictions in multi-class use cases.
Macro Subset Performance Tests (Multi-class Only) - Model behavior tests that measure the change in model performance in multi-class use cases.