INFERY (from the word inference) is Deci’s proprietary deep-learning run-time inference engine that turns a model into a siloed efficient runtime server and enables you to run (load and use) your model from Python code.
INFERY enables efficient inference and seamless deployment, on any hardware. INFERY is essential for overcoming the complex challenges of making deep learning models production-ready.
INFERY benefits include –
- Simplifies Deployment – Load models using a quick, yet simple Python package, built for scalability and super quick deployment.
- Boosts Latency/Throughput – Enjoy inference performance acceleration of DL models provided by our platform, optimized for any given target hardware (CPU or GPU).
- Runs Anywhere – Deci enables model portability across common frameworks and across various types of production hosts. INFERY offers inference performance optimization and model portability across multiple hardware, platforms and frameworks. You can change runtime backends (platforms), out of the box, without touching your code.
- Reduces Cost-to-Serve – Deci reduces total cost of ownership by up to 80% by maximizing hardware utilization. INFERY enables the pipelining and performance scaling of multiple models on a single host.
- Measures Your Model's Performance During Production – Deci reveals how your models really behave on your production hardware. Just load your model using INFERY to see how it behaves (in terms of latency, ms). This gives you the ability to debug and calculate the compute capacity you'll need for your task.
Updated 8 months ago