RIME Model Performance

Overview

This tutorial will guide you through various ways to use RIME to improve model performance. For more information, please see the corresponding reference.

Be sure to complete the initial setup described in RIME Data and Model Setup before proceeding.

Preprocessing

In order to use RIME’s preprocessing, we can first import it with:

from rime.tabular.performance import preprocess_df

We can then call it passing in only the data and get back a preprocessed dataframe:

preprocessed_df = preprocess_df(df)

Active Learning

RIME also contains some functionality for “active learning”. Most accurately, RIME has several functions that take in an unlabeled dataset and model(s) and suggest points that would be high value to label.

In order to use this functionality, let’s first import the relevant functions:

from rime.tabular.performance import single_model_active_learning, two_model_active_learning

We can then get N points that would be high value to label:

N = 10
indices_to_label = single_model_active_learning(df, model_wrapper, N)

This will give us the indices of the original dataframe that are high value to label.

We also expose some functionality to do this utilizing two models. This can be useful if you have two versions of a model (either trained over the same dataset or different slices). If we have a second container (container_2) we can then do:

model_wrapper_2 = container_2.model.base_model
indices_to_label = two_model_active_learning(df, model_wrapper, model_wrapper_2, N)

This, like the previous function, will return some indices that are high value to label.

Note that for both these algorithms there is some randomness involved, and if you want to get deterministic results make sure to pass the seed parameter.