RIME v15 Changelog

The Ideal RIME User Flow

There are two points in the Machine Learning development process in which RIME can be used – pre-deployment and post-deployment. We’ve historically developed products that sit on either side of this divide, but it is clear that the world’s best machine learning teams see these processes are inextricably linked. The v15 release of RIME tightly joins the pre-deployment AI Stress Testing and post-deployment AI Firewall experience. Now, data science teams can use AI Stress Testing to determine the best model and then natively promote the most robust one into production deployment with the AI Firewall providing realtime protection and batch monitoring.

AI Firewall

AI Firewall Realtime & AI Firewall Monitoring - What were previously two separate post-deployment user flows are now unified into one experience: AI Firewall.
- AI Firewall Realtime - Operates at the datapoint-level to simultaneously alert and protect your model against issues in real-time.
- AI Firewall Monitoring - Operates at the level of data batches, offering an aggregate view of clusters of abnormal inputs and data drift.
Overview Page - The new Overview page is the mission control for your model’s production deployment. In it, you can see the status of firewall events, get notified when model performance degrades, and see the underlying causes of failure.
Summary Metric Thresholding - All RIME summary metrics (Abnormality Rate, Test Pass Rate, Avg Severity Rate) have auto-configured thresholds in the post-deployment setting. These thresholds are configured from statistical analysis of the reference dataset upon which the model is trained, and users can configure alerts when these metrics degrade in a production setting.

AI Stress Testing

Model Impact Improvements - Model Impacts have historically been the Robust Intelligence way of measuring the impact a given feature has on model predictions. It is, in a way, an analogue for feature importance. Our internal research shows it’s a generally superior metric in the production machine learning context. We’ve further improved this measurement in this version by tying it more closely to the performance change associated with a given error. Therefore, model impact columns measurements have been replaced with “Observed Performance Change” and “Simulated Performance Change”. Both of these metrics represent the performance change associated with the error associated with a given test. These changes apply to both Tabular, Natural Language Processing, and Computer Vision use cases. GirishChandrasekar marked this conversation as resolved. Show resolved
RIME CV - RIME now offers native support for AI Stress Testing of Computer Vision models in the classification use case. Reach out to a member of the RI team to learn more about how you can try out this run mode.
Custom Metadata (NLP + CV) - The unstructured data run modes of RIME for NLP and CV now offer functionality to add structured metadata features to each datapoint. This feature automatically runs additional tests that can reveal insights about the robustness of your model across slices of these features in addition to the features that are automatically extracted from the text by RIME.

General

REST API - RIME now offers a native REST API. This enables users to use RIME in any programming language without the constraints imposed by the RIME Python SDK.
NLP SDK Support - RIME’s SDK now offers NLP support. Users can now run Stress Tests, Query Stress Test Results, and run Firewall Monitoring.
General SDK Improvement - RIME now offers new SDK methods that allow the user to upload dataset files and model directories easily.
Alerts - Users can now configure alerts via Slack notification or Email to notify when RIME runs are complete.
Navigation Update - There has been an updated global navigation structure along the left banner of the RIME UI that enables users to easily switch back to the project view at any time.
Test Detail Refresh - Test Details have been refreshed and restructured to provide more relevant information in a more consumable way.
Charting Improvements - Wide charts are now horizontally scrollable instead of relying on truncation. In addition, numerical distribution charts are represented as a histogram rather than as a fitted line graph in order to more realistically represent the distribution.