# Troubleshooting 

## RIME Installation

1. **How do I upgrade my production version of RIME?**

    New major versions of RIME are released on a six-week release cycle. There are
    often minor version releases with fixes for bugs and the like inbetween these
    releases as well. The upgrade process involves updating the RIME cluster image,
    updating the image itself, and then updating the SDK. See [Updating RIME](../get_started/installation/enterprise/installation/update.md)
    for detailed instructions on this process.

## RIME Python Package

1. **I'm seeing "Missing option" errors with my `rime-engine` CLI commands! How do I resolve them?**

    ```
    rime-engine run-stress-tests --config-path examples/income/stress_tests_model.json

    Error: Missing option '--upload-endpoint'.

    ..
    ```

    The commands in the RIME CLI [Walkthroughs](/walkthroughs/cli.rst) use **environment variables** to keep the commands short and readable.

    Be sure to set these up in your terminal session before running any commands:

    **Local**

    NOTE: Disabling TLS is recommended *for local uploads only*!

    ```
    export RIME_UPLOAD_URL=localhost:5001
    export RIME_FIREWALL_URL=localhost:5002
    export RIME_DISABLE_TLS=True
    ```

    **Cloud**

    Be sure to replace ``<YOUR_ORG_NAME>`` and ``<YOUR_API_KEY`` with the specific values for your RIME Cloud instance!

    ```
    export RIME_UPLOAD_URL=rime-backend.<YOUR_ORG_NAME>.rime.dev
    export RIME_FIREWALL_URL=rime-backend.<YOUR_ORG_NAME>.rime.dev
    export RIME_API_KEY=<YOUR_API_KEY>
    ```

    Alternatively, missing options can be provided in the command itself, as flags (e.g., `--upload-endpoint` below):
    ```
    rime-engine run-stress-tests --config-path examples/income/stress_tests_model.json --upload-endpoint <YOUR_UPLOAD_ENDPOINT>
    ```

    Mappings of environment variables to their option names can be found by running `--help` for the chosen command:
    ```
    rime-engine run-stress-tests --help
    ```

2. **I'm seeing `ModuleNotFound` errors in the console. How do I resolve them?**
    
    These errors likely result from not having the extras installed for your use case.

    Make sure you are inside of the `rime_trial/` directory using your `rime-venv` virtual environment before proceeding.

    (If not already run during installation) Run the following to generate the necessary requirements lists:
    ```bash
    python rime_helper.py generate-rime-requirements --token-file $PATH_TO_TOKEN_TXT_FILE   
    ```

    For Natural Language Processing (NLP) use cases (i.e., text data):
    ```bash
    pip install -r nlp_requirements.txt
    ```

    For Computer Vision (CV) use cases (i.e., image data):
    ```bash
    pip install -r cv_requirements.txt
    ```

3. **I'm seeing `grpc.FutureTimeoutError`(s) when trying to upload stress tests. How do I resolve them?**

    This error can be due to DNS resolution across operating systems and protocols (IPv4 vs. IPv6).

    If that is the case, running the following in your terminal session can resolve the issue:
    ```bash
    export GRPC_DNS_RESOLVER=native
    ```
    
    Otherwise, make sure your machine has access to the endpoint(s) in question (e.g., by enabling VPN).

## RIME SDK
### Troubleshooting RIME Stress Tests

When running a suite of stress tests on arbitrary models and datasets on a custom image, things can go wrong.
The RIME SDK has tools available to help you debug your stress test jobs.
This document includes a few common failure scenarios and recommended debugging techniques.

#### Lost RIMEStressTestJob Object

If you close your Python notebook or scripting session, you will lose access to the ephemeral in-memory objects such as `RIMEStressTestJob`.
To recover these objects, connect the client to the same backend service.
```Python
rime_client = RIMEClient("my_vpc.rime.com", "api-key")
```
Then, use `rime_client.list_stress_test_jobs()` to query the server for a list of jobs from the past two days.
You can filter by status and project ID to reduce the volume of jobs returned.
Then, you can call `get_status()` on each job to find which job is yours.
The return value from `get_status()` includes the start time and status of the job which should help you identify which job you started.
```Python
jobs = rime_client.list_stress_test_jobs(status_filters= ['RUNNING', 'FAILING'], project_id="bar")
# Print out the metadata for each job to see which one you started most recently.
for job in jobs:
    print(job.get_status())
```

#### Test Run Results Don't Show Up in UI

This indicates that the `RIMEStressTestJob` executing the suite of stress tests failed along the way.
There are a number of reasons why this would happen.
Here are a few:
* Misspecified `test_run_config`, dataset, or model.
* CustomImage cannot be pulled.
* Resource limits exceeded.
The best place to start is the `get_status()` of the `RIMEStressTestJob` object.
If the job status is `'FAILING'` and the `verbose` flag is set to `True`, `get_status()` will dump the logs to `stdout`.
For configuration issues, this can be very helpful.
```Python
# Assume the job is 'FAILING'
status = job.get_status(verbose=True, wait_until_finish=True)
# This will dump the logs if there any to stdout.
```
Looking at the logs can help solve a lot of problems.
If you have trouble making additional progress with debugging, please contact RI support.