DocumentationAPI Reference
Back to ConsoleLog In

Running the RTiC Server

The docker run command provides a wide variety of options for controlling the RTiC Server by passing the relevant arguments, such as –
• Running the Deci RTiC Inference Engine (the server).
• Connecting the RTiC server to the Deci platform in order to access optimized models.
• Registering the relevant optimized model in the RTiC server in order to enable it to serve
inference requests received from your application (using the Deci RTiC Client).
and more.

These arguments enable you to use the docker run command to perform all these actions at once. Alternatively, you can use the docker run command to perform these actions in stages. For example, you can run the Deci RTiC Inference Engine (the server) using the docker run command and later use it to connect to the Deci platform and/or to register a model in it.

Basic Running RTiC-GPU Example

The following example demonstrates the arguments that can be used when deploying the RTiC Server on a GPU.

Note – Remember that you must pull the RTiC server intended for GPU (not for CPU).
Run the downloaded docker Image (RTiC-GPU) with the following arguments.

docker run --runtime=nvidia -d --gpus all --ipc=host -p 8000:8000 -p 8001:8001 deciai/rtic-gpu:latest
  • --runtime=nvidia
  • -d : (“Detach”) docker run mode - Launches the docker container in the background. To launch (“Interactive Terminal”) run the command with -it flag instead
  • --gpus : Specifies the quantity of GPUs that will be available for this RTiC Server container.
  • -p 8000:8000 :[local_machine_port]:[8000] – Maps the machine port 8000 to the port 8000 inside the docker. This port is used for HTTP communication.
  • -p 8001:8001 :[local_machine_port]:[8001] – Maps the machine port 8001 to the port 8001 inside the docker. This port is used for GRPC inference sessions.
  • (optional) -e CUDA_VISIBLE_DEVICES=i – Where i can be an integer value or a comma-delimited list of integers, which assigns GPUs by their integer IDs.
  • (optional) -v HOME_FOLDER/USERNAME/.aws:/root/.aws – Mounts the user’s .aws config in the docker where USER_NAME is the user name of the current machine user.
  • (optional) -v PATH/TO/MODELS/FOLDER:/checkpoints/ – Mounts the local Machine’s Model Files into /checkpoints/ inside the docker container.
  • (optional) --ipc=host – Enables IPC (InterProcess Communication) mode, which should only be used when deploying both the RTiC Server and the RTiC Client on the same machine.
  • (optional) --ipc=sharable – Enables the IPC (InterProcess Communication) mode with other containers on the same host. This should only be used when you need to access RTiC from another container on the same machine. Both the client and the server must run on docker when this flag is used.
  • (optional) -v /dev/shm:/dev/shm – Enables the host to share the Shared Memory with the container. When using IPC (InterProcess Communication) with a Linux host, we recommend adding this flag to prevent unexpected docker behavior.

Optional Argument Usage Example

The following is an example of the usage of various optional arguments –

docker run --ipc=host --runtime=nvidia -d --gpus all -e CUDA_VISIBLE_DEVICES=0  -v /home/"USER_NAME"/checkpoints:/checkpoints/ -p 8000:8000 deciai/rtic-gpu:latest

Or, if you would like RTiC to communicate with the platform in order to load a model from the repository, your command should contain the one of the arguments described in Connecting to the Deci Platform.

docker run --ipc=host --runtime=nvidia -d --gpus all -e CUDA_VISIBLE_DEVICES=0 -e DECI_PLATFORM_TOKEN="a5b17f8024a34329be5be5eed8237709" -v /home/"USER_NAME"/checkpoints:/checkpoints/ -p 8000:8000 deciai/rtic-gpu:latest

If the integration succeeded, the following appears in the logs of the docker container.

-INFO- Successfully logged in to the platform API

If the integration failed, the following appears in the logs of the docker container.

-WARNING- Failed to login to the platform; Consider specifying the flag SKIP_PLATFORM_LOGIN=true;

Important - Please make sure that the local port to which you are forwarding is available.

The following is an example of a response –

{
  "images": [
    {
      "image": [
        "https://files.readme.io/e5c88d2-DockerRun.png",
        "DockerRun.png",
        2802,
        150,
        "#1a1b1a"
      ]
    }
  ]
}

Basic Running RTiC-CPU Example

The following example demonstrates some of the additional arguments that can be used when deploying the RTiC Server on a CPU.

Note – Remember that you must pull the RTiC server intended for CPU (not for GPU).

Run the downloaded docker Image (RTiC-CPU) with the following arguments –

docker run -d -p 8000:8000 -p 8001:8001 deciai/rtic-cpu:latest

Docker Security Flags

Run RTiC with docker default security flags (seccomp, network) in order to leverage the full OS potential for a deployment from a local host.

We recommend specifying the following flags to disable docker security features in order to decrease latency –

docker run -d --privileged=true --security-opt seccomp=unconfined --pid=host --network=host deciai/rtic-cpu:latest
  • -p 8000:8000 :[local_machine_port]:[8000] – Maps the machine port 8000 to port 8000 inside the docker. The port is used for HTTP communication.
  • -p 8001:8001 :[local_machine_port]:[8001] – Maps the machine port 8001 to port 8001 inside the docker. The port is used for GRPC inference sessions.
  • (optional) -v HOME_FOLDER/USERNAME/.aws:/root/.aws – Mounts the user’s .aws config in the docker where USER_NAME is the user name of the current machine user.
  • (optional) -v PATH/TO/MODELS/FOLDER:/checkpoints/ – Mounts the local machine’s model files into /checkpoints/ inside the docker container.

Configuring Transport

RTiC provides the following protocols for Input and Output Tensor Exchange, which we refer to as Transport Protocols. Each of these protocols serves a different purposes, and derives different benchmarks.

Choosing the right transport for your deployment may contribute significantly to reduction of the inference latency and throughput for your architecture.

  • HTTP – A fast serialization for numpy arrays over HTTP body payloads. This is the default protocol for network or cloud deployments, especially if you use large batch sizes across the network, because it provides lower latency than gRPC for larger batch sizes.
  • IPC – When both the RTiC server and the RTiC client are deployed on the same machine (localhost), the default communication method is IPC (Inter-Process-Communication over their shared memory space).
  • GRPC – GRPC uses HTTP2 protocol for transport, and is therefore faster than HTTP for latency in small batch sizes.

To use a specific transport communication protocol (HTTP, GRAPHIC or IPC) for your deployment, you can simply override the Predict command transport using the transport argument: transport='grpc'.

You can asynchronously apply a registered model on a given set of inputs using the predict_async method in order to increase the throughput from a client to the server and back.
In the following example, we feed the desired model (already registered) with random input and then print out the result.

For each predict_async request, you must provide the same arguments as you would provide for a predict command.

In return, you will receive a Future object, instead of a list of np.ndarray objects. This Future result can be fetched (blocking) using .result().

Concurrency, Replications and Optimizations

RTiC uses different concurrency settings, based on the deployment type. It will automatically try to fit the concurrency configuration for the specified machine, based on its hardware specs.

  • (optional) -e NUM_WORKERS_PER_MODEL="" - Default: 1.
  • (optional) -e MAX_CONCURRENCY_PER_WORKER="" – The concurrency level specifies the number of threads per process. This affects the number of threads, and the number of OS context switches and pages that occur during inference. Higher is not better!
    Default: If RTiC runs on a CPU, RTiC will use Thread Per CPU Core. If RTiC Runs on a GPU, It will use half of the available count in order to prevent GPU memory errors. For instance, if your server has 8 CPU Cores WITHOUT GPU, RTiC will use 8 Threads. If your server has 8 CPU Cores WITH GPU, RTiC will use 4 Threads.
  • (optional) -e GPU_PERCENTILE_PER_PROCESS="" – When serving more than a single model, each Worker (process) should have an entire (or fraction of the) GPU. This flag can be used to specify that quantity, which will divide up the GPU memory between the model’s processes equally.
    For example, it can be used with the value "1" for the entire GPU, if there is only a single model. Or, it can be set to 0.25, if there are 4 models, so that each model will use 25% of the GPU memory and potential. The default is 1.
RTiC Server Runtime Configuration:
CPUS:                            8
GPUS:                            0
Processes Per Model:             1
Worker Concurrency (Threads):    8
GPU percentile per worker:       0%

Logs Output

In order to get a live logging output of the RTiC Server, you can use the command docker logs -f with the output container ID (the UUID that is visible in the bottom line of the terminal snippet above) is provided by the previous docker run command.

docker logs -f [docker-uuid]

Did this page help you?