Defining a Model Interface
Robust Intelligence expects models to be passed in as Python files that expose one of the following functions:
predict_dict
: the input is a single row in dictionary form (e.g.x = {'Age': 15, 'Animal': 'Cat', ...}
). The return type will depend on the model task. For example, the output for binary classification should be a float value between 0 and 1, and for regression it can be an unbounded float. The function signature for either case should look like:predict_dict(x: dict) -> float
. For multi-class classification the output should be a numpy array where the i’th element represents the predicted probability for the i;th class. The function signature changes slightly to look like:predict_dict(x: dict) -> np.ndarray
. For details on the prediction formats for all model tasks, see the Preprocessing your Datasets page.predict_df
: the input is a Pandas DataFrame, and the output is a NumPy array. If the model task is multi-class classification or natural language inference, the output should be a NumPy array of floats of shape(len(df), num_classes)
. For all other model tasks, the output should be a NumPy array of floats of shape(len(df),)
, with each element satisfying the prediction format specifies in Preprocessing your Datasets. In any case, the function signature should look like:predict_df(df: pd.DataFrame) -> np.ndarray
NOTE: for binary classification the return type should be a single float per row which represents the probability for the positive class. It should not be an array of probabilities for each class. E.g.
predict_df
should return[0.7, 0.1, ...]
NOT[[0.3, 0.7], [0.9, 0.1], ...]
.
The following shows how to set up the Python interface Robust Intelligence expects for a model that can be called via loading a model binary.
As of v2.1, Robust Intelligence supports models written in Python 3.8 to 3.11. Any model files written in Python 3.7 should be migrated to Python 3.8 or later.
Step 1: Specify model path
Put your model binary, and any other relevant model artifacts, in the same folder as this file. Create a constant for the path to this binary:
from pathlib import Path
cur_dir = Path(__file__).absolute().parent
MODEL_NAME = 'TODO: change this to model name'
MODEL_PATH = cur_dir / MODEL_NAME
Step 2: Retrieve custom code
If custom code is needed to perform preprocessing on the data (or to call your API), we need to make sure it is loaded into the environment. If this code is able to be installed as a Python package, see the Custom Requirements
section.
If your code is NOT a Python package (and is instead a Python file or folder) then please put all relevant files in the same directory as this file, and add the following snippet to the Python file:
import sys
sys.path.append(str(cur_dir))
Step 3: Access the model
As an example, if you used the Python pickle
module to save your model this would look like:
import pickle
with open(MODEL_PATH, 'rb') as f:
model = pickle.load(f)
Step 4: Import / implement preprocessing function
If the model you are using expects inputs of a different schema than the datasets you’ve provided, or of a different type than dict
/Pandas.DataFrame
, you will need to load/define all custom preprocessing. If your model can take in the raw data directly then you can skip this step! Getting the preprocessing functionality could look like:
from custom_package import preprocessing
or
def preprocessing(x: dict) -> ModelInputType:
# TODO: implement preprocessing logic.
Step 5: Implement a predict function
Implement either the predict_dict
function or the predict_df
function. For binary classification, this should look something like:
(NOTE: whichever you choose to implement, it must match one of the function names and signatures below.)
# In multi-class case, the return type is np.ndarray
def predict_dict(x: dict) -> float:
single_input = preprocessing(x)
model_output = model.predict_proba(single_input)
return model_output[0][1]
OR
def predict_df(df: pd.DataFrame) -> np.ndarray:
array_input = preprocessing(df)
model_output = model.predict_proba(array_input)
return model_output[:, 1]
Step 6: Passing secrets to the model
If your model file expects to receive secrets for using during runtime, you can specify them as model endpoint integrations. As an example, you might need secrets in case the model file queries an authenticated endpoint to make predictions.
You may specify a model_endpoint_integration_id
when registering the model, as described here. The secrets specified in the model endpoint integration would be made available to the predict function as an integration_dict
dict parameter. The predict function signatures would then change as follows:
def predict_dict(x: dict, integration_dict: dict) -> float:
"""predict logic"""
API_KEY = integration_dict['SECRET_API_KEY']
pass
OR
def predict_df(df: pd.DataFrame, integration_dict: dict) -> np.ndarray:
"""predict logic"""
API_KEY = integration_dict['SECRET_API_KEY']
pass