Specify a Model

RIME expects models to be passed in as Python files that expose one of the following functions:

predict_dict: the input is a single row in dictionary form (e.g. x = {'Age': 15, 'Animal': 'Cat', ...}). For binary classification and regression, the output should be a float prediction - for binary classification it should be between 0 and 1, for regression it can be unbounded. The function signature should look like: predict_dict(x: dict) -> float. For multi-class classification the output should be a numpy array where the ith dimension represents the predicted probability for the ith class. The function signature changes slightly to look like: predict_dict(x: dict) -> np.ndarray.
predict_df: the input is a Pandas DataFrame. If the model task is multi-class classification, the output should be a NumPy array of floats of shape (len(df), num_classes). For all other model tasks, the output should be a NumPy array of floats of shape (len(df),). In either case, the function signature should look like: predict_df(df: pd.DataFrame) -> np.ndarray

NOTE: for binary classification the return type should be a single float per row which represents the probability for the positive class. It should not be an array of probabilities for each class. E.g. predict_df should return [0.7, 0.1, ...] NOT [[0.3, 0.7], [0.9, 0.1], ...].

The following shows how to set up the Python interface RIME expects for a model that can be called via loading a model binary.

Step 1: Specify model path

Put your model binary, and any other relevant model artifacts, in the same folder as this file. Create a constant for the path to this binary:

from pathlib import Path

cur_dir = Path(__file__).absolute().parent

MODEL_NAME = 'TODO: change this to model name'
MODEL_PATH = cur_dir / MODEL_NAME

Step 2: Retrieve custom code

If custom code is needed to perform preprocessing on the data (or to call your API), we need to make sure it is loaded into the environment. If this code is able to be installed as a Python package, see the Custom Requirements section.

If your code is NOT a Python package (and is instead a Python file or folder) then please put all relevant files in the same directory as this file, and add the following snippet to the Python file:

import sys
sys.path.append(str(cur_dir))

Step 3: Access the model

As an example, if you used the Python pickle module to save your model this would look like:

import pickle
with open(MODEL_PATH, 'rb') as f:
    model = pickle.load(f)

Step 4: Import / implement preprocessing function

If the model you are using expects inputs of a different schema than the datasets you’ve provided, or of a different type than dict/Pandas.DataFrame, you will need to load/define all custom preprocessing. If your model can take in the raw data directly then you can skip this step! Getting the preprocessing functionality could look like:

from custom_package import preprocessing

or

def preprocessing(x: dict) -> ModelInputType:
    # TODO: implement preprocessing logic.

Step 5: Implement a predict function

Implement either the predict_dict function or the predict_df function. This should look something like:

(NOTE: whichever you choose to implement, it must match one of the function names and signatures below.)

# In multi-class case, the return type is np.ndarray
def predict_dict(x: dict) -> float:
    single_input = preprocessing(x)
    model_output = model.predict_proba(single_input)
    return model_output[0][1]

OR

def predict_df(df: pd.DataFrame) -> np.ndarray:
    array_input = preprocessing(df)
    model_output = model.predict_proba(array_input)
    return model_output[:, 1]