Deci RTiC (Run-Time Inference Container) is a containerized deep learning runtime engine that turns a model into a siloed runtime server, which enables efficient inference and seamless deployment – at scale and on any hardware. This solution is essential for overcoming the complex challenges of making deep learning models production read.
This RTiC (Run-Time Inference Container) Beta version offers a hardware-optimized container to deploy, run and benchmark deep-learning models in any framework. This results in an immediate performance boost for any inference process.
- Introducing Concurrency. Ability to parallelize RTiC operations, which results in unparalleled performance.
- Deci Platform Integration. RTiC can securely communicate with a user's private Deci model repository so that they can see and fetch models into production with ease in order to simplify the CI/CD process.
- Added gRPC Support. Enables faster, smaller and simpler payloads, especially when inference is of realtime and small batch sizes.
- Fixed IPC Performance Issues.
- Enhanced OpenVino Support in RTiC.
- Enhanced Supported Capabilities for Cuda10 and Python2.
- Asynchronous Predict. A new method of making predictions asynchronously.
- Added IPC Inference Support. (Interprocess communication), which is optimized for local deployment, where both the client and the server are deployed on the same machine.
Note that IPC mode enables Deci's containers to access the same block of memory, which creates a shared buffer for the processes to communicate with each other.
- Added IPC/HTTP Flag. Ability to choose communication between client and server containers when running predict/register commands.
- OpenVino Support.
- Improved data serialization for optimized performance.
- Added dynamic batching inference in RTiC-GPU.
- RTiC server-client version validation.
- Support for Multiple Frameworks. RTiC can manage any type and quantity of models and is only limited by the system's disk and memory resources. RTiC supports TensorFlow (frozen graph), ONNX runtime, PyTorch (TorchScript) and Keras model formats.
- Optimized for AutoNAC Models. RTiC is optimized for models that have been generated by Deci’s AutoNAC, which provides world-class model execution efficiency.
- Support for Runtime Optimization. RTiC offers an immediate on-demand performance boost using Deci’s out-of-the-box runtime optimizer, which acts as a hardware-aware optimized graph compiler.
- Multi-model, Single/Batching Execution on GPU. Models can run with pre-configured per-model batch size. RTiC can accept requests for a dynamic or static batch of inputs and respond with the corresponding batch of outputs – as long as the request doesn’t exceed the configured batch size.
- API-based Communication. RTiC client/server applications communicate via HTTP API provides the following commands:
Register a model to fetch it from an external model repository for use in RTiC.
Deregister a model to remove it from RTiC.
Predict to execute an inference request.
Measure performance to get the inference execution performance indicators for the model.
Get Models retrieves a list of all the models presently registered in this RTiC.
- Hot-swapping of Models in Production. It is now possible to replace the model in production without any service downtime. This enables ongoing training and updating of the models.
- Measuring Model Performance in Production. A model’s utilization, memory and performance can be measured using on-demand analysis in order to enable precise benchmarking in the production environment.
Updated 10 months ago