Artificial Intelligence (AI) Inference is the process of using trained Deep Neural Networks (DNNs) to make predictions based on previously unseen data. This consists of the process of taking a model and deploying it onto a device so that it will process incoming data requests (such as images or videos). This will enable it to return a response that identifies the subject/purpose from which it was trained, such as the type of an object, name of a person, the quantity of cars and so on.
Deep learning models are mathematical algorithms that are trained using data and human input in order to automatically produce a decision when provided with the type of unknown data that they are trained to handle. A deep learning model replicates a decision making process that enables automation and understanding. Deci uses AI to optimize your model in order to achieve its best performance.
Deci assigns a standardized/normalized score to describe the efficiency of a model’s runtime performance on a specific production environment, without compromising accuracy. A model has two types of performance – accuracy and efficiency. A Deci Score is a normalized grade of between one and 10 that evaluates a model’s runtime efficiency on a specific hardware and batch size set up. It is comprised of the performance metrics and measures for Accuracy, Throughput, Latency, Memory Footprint and Model Size.
Accuracy is the primary method for evaluating a model. It specifies the proportion of predictions that the model gets right.
Throughput specifies the quantity of requests processed by your model in a given timeframe, while it provides inference services in a production environment.
Latency specifies the delay when responding to the requests processed by your model while it provides inference services in a production environment.
Memory Footprint specifies how much main memory your model uses while it provides inference services in a production environment.
Model Size specifies the physical dimensions (quantity of bytes) that the files of the model occupy, such as its static libraries, executable programs, static data and so on.
Deci AI’s proprietary Automated Neural Architecture Construction (AutoNAC) engine provides a substantial performance boost to existing deep neural solutions. The acceleration provided by its cutting-edge algorithmic acceleration technology autonomously redesigns your deep learning models in order to provide dramatically increased throughput, significant reductions in inference latency and substantial cost-to-serve savings, which are often accompanied by improvements in accuracy.
This is Deci’s most powerful optimization feature and provides up to 10X performance boost and up to 80% cost-savings, while preserving the model’s trained accuracy.
AutoNAC is hardware-aware. It optimizes deep models to more effectively use their hardware platform (whether it is a CPU, GPU, FPGA or special purpose ASIC accelerator) in order to reach top performance during production.
Note – To enjoy the full power of Deci’s AutoNAC accelerator, you must purchase a premium license.
Runtime Inference Container (RTiC) is Deci’s proprietary containerized deep-learning run-time inference engine that turns a model into a siloed efficient run-time server. Deci’s RTiC enables inference performance optimization and model portability across multiple HW, platforms, and frameworks. It may also be referred to as the RTiC server.
INFERY is a Python package for deploying and inferencing deep learning models. It supports various frameworks, to run inference on any hardware with simply 3 lines of code - install (pip), load and predict.
Updated 5 months ago