Test Categories
Distribution Drift
Test for differences in the distribution of the reference dataset versus the         evaluation dataset. If predictions and labels are provided, measure the         performance degradation caused by shifting data as well as drift in         predictions and labels themselves.        
        Labels and predictions are not required but they improve results.
Abnormal Input
Check the evaluation dataset for abnormal values commonly encountered in         production. If model predictions are provided, test if the observed         abnormal values cause a degradation in your model’s performance.        
        Labels and predictions are not required but they improve results.
Transformations
Test your model’s invariance to different types of data transformations.        
        Model is required. Labels are not required but they improve results.
Attacks
Test the robustness of your model by measuring the maximum         difference in model predictions that can be caused by small perturbations to         data points.        
        Model is required.
Subset Performance
Test that your model performs equally well across different subsets of the         evaluation dataset.        
        Predictions are required. Labels are required for most tests.
