RIME v12 Changelog

AI Continuous Testing

The UI for this mode has been updated to allow users to more effectively monitor their models over time. Track changes in summary metrics over time and dive into test results at various time intervals to diagnose the underlying causes of issues.

AI Firewall

Updated User Flow - Firewall is configured at the Project level rather than at the Test Run level. No, there is one firewall per project, and the user can choose a test run within a project and specify the firewall configuration from that test run.

AI Stress Testing

Category Specific Views

There are now additional tabs on the Stress Testing page that present custom views per test category.

Actionability - there is a new summary section surfaced for each test category page, and this summary section surfaces custom suggestions users can take to improve their model’s robustness.

Test Run Comparisons

Users now have the ability to compare test run results in the project page. Compare test results between dozens of pre-defined performance and test result metrics to help with model selection.

SDK: Queryable Test Results

Programatically query RIME test results via the SDK (RIME Team) in order to integrate it more closely into existing workflows.

New Use Case - Learning to Rank

RIME AI Stress Testing now supports the Learning to Rank use case with additional relevant tests and performance metrics. The LTR test suite works on pointwise models in the with-model run mode, but works across any model that produces a score per query-document pair in the prediction log run mode.

New NLP Use Case - Text Classification

RIME AI Stress Testing now supports the Text Classification use case in the RIME Lite and RIME Team versions. Contact Robust Intelligence if you are interested in a demo or a trial.

New NLP Use Case - Spell Checking

RIME AI Stress Testing now supports the seq2seq Spell Checking use case in the RIME Lite and RIME Team versions. Contact Robust Intelligence if you are interested iin a demo or a trial.