Input Data Format

Automated Validation

The Python SDK exposes a command-line utility that can automatically validate your input data:

rime-data-format-check <ARGS>


Inspecting <REFERENCE_SET>
Done!

Inspecting <EVALUATION_SET>
Done!


---


Your data should work with RIME!

Instructions are available here.


Supported File Formats

RIME Tabular currently supports both CSV (.csv) and Parquet (.parquet), with task-specific nuances defined below. Input files should have header columns in string format — these will be used as feature names.

RIME is most effective when both label and prediction column are provided; however, neither are required for most tasks*.

Requirements By Task

Regression

  • Labels should be any real number

  • Predictions should be any real number

Binary Classification

  • Labels should be integer values 0 or 1

  • Predictions should be float values (probabilities) between 0 and 1

Multi-Class Classification

  • Labels should be integers referring to class index

  • Predictions should be an array summing to 1, with index i representing the probability of the ith class

  • Predictions should be uploaded as a separate .csv or .parquet file, with columns corresponding to prediction classes

Ranking

  • * Labels are required

  • Labels should be any real number

  • Predictions should be any real number

  • ranking_info must be provided in the data configuration