Updating your Continuous Test

In this Notebook walkthrough, we will show how to update a Continuous Test after it has been deployed to production. The Continuous Test can be updated live to account for many service changes, such as modifying the reference dataset and upgrading the model, or configuring individual tests.

Latest Colab version of this notebook available here

Install dependencies

[ ]:
!pip install rime-sdk &> /dev/null
!pip install https://github.com/RobustIntelligence/ri-public-examples/archive/master.zip
[ ]:
from pathlib import Path
from tempfile import TemporaryDirectory
from typing import List

import pandas as pd
from ri_public_examples.download_files import download_files
from rime_sdk import Client

Download and prep data

[ ]:
download_files('tabular-2.0/fraud', 'fraud')
ct_data = pd.read_csv("fraud/data/fraud_incremental.csv")
ct_data[:len(ct_data)//2].to_csv("fraud/data/fraud_incremental_0.csv", index=False)
ct_data[len(ct_data)//2:].to_csv("fraud/data/fraud_incremental_1.csv", index=False)

ct_preds = pd.read_csv("fraud/data/fraud_incremental_preds.csv")
ct_preds[:len(ct_preds)//2].to_csv("fraud/data/fraud_incremental_0_preds.csv", index=False)
ct_preds[len(ct_preds)//2:].to_csv("fraud/data/fraud_incremental_1_preds.csv", index=False)

Instantiate RIME client and create project

[ ]:
API_TOKEN = '' # PASTE API_KEY
CLUSTER_URL = '' # PASTE DEDICATED DOMAIN OF RIME SERVICE (e.g., https://rime.example.rbst.io)
AGENT_ID = '' # PASTE AGENT_ID IF USING AN AGENT THAT IS NOT THE DEFAULT
[ ]:
client = Client(CLUSTER_URL, API_TOKEN)
[ ]:
description = (
    "Create a Continuous Test and update the configuration after it is deployed to production."
    " Demonstration uses a tabular binary classification dataset"
    " and model that simulates credit card fraud detection."
)
project = client.create_project(
    "Continuous Testing Configuration Demo",
    description,
    "MODEL_TASK_BINARY_CLASSIFICATION"
)

Upload data to S3 and register dataset and prediction set

For SaaS environments using the default S3 storage location, the Python SDK supports direct file uploads using upload_*().

For other environments and storage technologies, artifacts must be managed through alternate means.

[ ]:
IS_SAAS = False # TOGGLE True/False (Note: SaaS environments use URLs ending in "rbst.io" and have an "Internal Agent")

[ ]:
if not IS_SAAS:
    BLOB_STORE_URI = "" # PROVIDE BLOB STORE URI (e.g., "s3://acmecorp-rime")
    assert BLOB_STORE_URI != ""

UPLOAD_PATH = "ri_public_examples_fraud"

[ ]:
if IS_SAAS:
    ref_s3_path = client.upload_file(
        Path('fraud/data/fraud_ref.csv'), upload_path=UPLOAD_PATH
    )
    ref_preds_s3_path = client.upload_file(
        Path("fraud/data/fraud_ref_preds.csv"), upload_path=UPLOAD_PATH
    )
else:
    ref_s3_path = f"{BLOB_STORE_URI}/{UPLOAD_PATH}/data/fraud_ref.csv"
    ref_preds_s3_path = f"{BLOB_STORE_URI}/{UPLOAD_PATH}/data/fraud_ref_preds.csv"

Once the data and model are uploaded, we can register them to RIME. Once they’re registered, we can refer to these resources using their RIME-generated ID’s.

[ ]:
from datetime import datetime

dt = str(datetime.now())

# Note: models and datasets need to have unique names.
model_id = project.register_model(f"fraud_model_{dt}", None, agent_id=AGENT_ID)

ref_dataset_id = project.register_dataset_from_file(
    f"ref_dataset_{dt}", ref_s3_path, data_params={"label_col": "label"}, agent_id=AGENT_ID
)
project.register_predictions_from_file(
    ref_dataset_id, model_id, ref_preds_s3_path, agent_id=AGENT_ID
)

Create a Continuous Test

[ ]:
from datetime import timedelta

ct = project.create_ct(model_id, ref_dataset_id, timedelta(days=1))
ct

Run Continuous Testing on a batch of production data

[ ]:
if IS_SAAS:
    ct_data_0_s3_path = client.upload_file(
        Path("fraud/data/fraud_incremental_0.csv"), upload_path=UPLOAD_PATH
    )
else:
    ct_data_0_s3_path = f"{BLOB_STORE_URI}/{UPLOAD_PATH}/data/fraud_incremental_0.csv"

ct_data_0_id = project.register_dataset_from_file(
    f"fraud_incremental_0_dataset_{dt}",
    ct_data_0_s3_path,
    data_params={"label_col": "label", "timestamp_col": "timestamp"},
    agent_id=AGENT_ID
)
[ ]:
ct_job = ct.start_continuous_test(ct_data_0_id, agent_id=AGENT_ID)
ct_job.get_status(verbose=True, wait_until_finish=True)

Update the Reference Dataset

Suppose a week has passed, and we have updated your model by retraining on new data. We want to update our deployed Continuous Test to reflect the new reference dataset.

[ ]:
if IS_SAAS:
    new_ref_data_s3_path = client.upload_file(
        Path("fraud/data/fraud_eval.csv"), upload_path=UPLOAD_PATH
    )
else:
    new_ref_data_s3_path = f"{BLOB_STORE_URI}/{UPLOAD_PATH}/data/fraud_eval.csv"

new_ref_data_id = project.register_dataset_from_file(
    f"eval_dataset_{dt}",
    new_ref_data_s3_path,
    data_params={"label_col": "label"},
    agent_id=AGENT_ID
)

# Update configuration based on the new stress test run
ct.update_ct(ref_data_id=new_ref_data_id)
# The new stress test run will now be highlighted to reflect the update
project

Run Continuous Testing on the latest batch of production data This time using the updated reference set as the baseline against which the production data is compared.

[ ]:
if IS_SAAS:
    ct_data_1_s3_path = client.upload_file(
        Path("fraud/data/fraud_incremental_1.csv"), upload_path=UPLOAD_PATH
    )
else:
    ct_data_1_s3_path = f"{BLOB_STORE_URI}/{UPLOAD_PATH}/data/fraud_incremental_1.csv"

ct_data_1_id = project.register_dataset_from_file(
    f"fraud_incremental_1_dataset_{dt}",
    ct_data_1_s3_path,
    data_params={"label_col": "label", "timestamp_col": "timestamp"},
    agent_id=AGENT_ID
)
[ ]:
ct_job = ct.start_continuous_test(ct_data_1_id, override_existing_bins=True, agent_id=AGENT_ID)
ct_job.get_status(verbose=True, wait_until_finish=True)