Scikit-Learn Pipeline

Copy the code snippet below and follow the commented steps to connect RIME to a Scikit-Learn Pipeline.

Once that is done, you can specify that model file when configuring your model source.

"""Template for connecting a Scikit-Learn Pipeline to RIME.

We expect this file to contain a `predict_df` function that takes in a Pandas
DataFrame corresponding to one or more rows in the dataset. This method should
return a NumPy array containing scores between 0 and 1 for each row in the dataset.
The DataFrame will be loaded from the data sources you configure.

"""

from pathlib import Path

import joblib
import numpy as np
import pandas as pd

# Step 1: Load the pipeline.

model_dir = Path(__file__).parent / "model"
pipeline = joblib.load(model_dir / "pipeline.joblib")
    
# Step 2: If the model requires additional preprocessing to the input data, 
# include the logic in the below 'preprocess_df' function or 'predict_df'
# function. By default, RIME passes rows from the dataset defined in the config
# to 'predict_df' with the label and prediction columns omitted.

def preprocess_df(df: pd.DataFrame) -> pd.DataFrame:
    """Apply preprocessing to rows of the dataframe."""
    
    return df

# Step 3: Implement the below function that returns a prediction per row in
# the dataset, with any additional preprocessing if necessary.

def predict_df(df: pd.DataFrame) -> np.ndarray:
    """Return array of probabilities assigned to the positive class."""
    
    # Apply custom preprocessing to the row.
    df = preprocess_df(df)
    
    return pipeline.predict_proba(df)[:,1]