Infery
Introduction
Welcome to Infery, Deci.AI's inference library intended to take your trained neural networks seamlessly to production. Infery provides to ability to convert deep neural networks between frameworks with a single line of code, along with the stable environment needed to do so. It then enables optimal a/synchronous inference aimed at fully utilizing your available HW resources - whether they are local or remote.
Getting Started
Infery has numerous abilities aimed at optimizing the workflow required to take a trained model to production. Before you may streamline the deployment process, you must first install Infery:
pip install --index-url=https://[USERNAME]:[TOKEN]@deci.jfrog.io/artifactory/api/pypi/deciExternal/simple infery==x.x.x
After installing Infery, you may use its CLI to install requirements for different features, such as inference or compilation:
infery install --inference pytorch --compile onnx2tensorrt
- For more in-depth instructions refer to the installation guide.
Having installed Infery you may now look into any of these vended functionalities. First import and load your model:
from infery import *
# Load an ONNX model. If the framework is not inferred correctly, please pass framework=FrameworkType.XXX kwarg.
onnx_model = infery.Model("/path/to/model.onnx")
Model Analysis
- Basic analysis:
# Get metadata about the model metadata = onnx_model.metadata
- For more in-depth instructions refer to the analysis guide.
Model Compilation
- Basic compilation:
trt_model = onnx_model.compile(target_framework=FrameworkType.TENSORRT)
- For more in-depth instructions refer to the compilation guide.
Model Inference
- Basic inference:
# Run random noise (dummy inputs) through the model output = trt_model.predict(onnx_model.make_dummy_inputs()) # Benchmark the model's performance benchmark_result = trt_model.benchmark(batch_size=1)
- For more in-depth instructions refer to the inference guide.
Model Serving
- For more in-depth instructions refer to quickstart
Support and Compatibility
Compilation
Below is a support matrices of the conversions Infery knows to perform. To check if Infery can compile (/convert) from a framework X to a framework Y, check the tile corresponding to row X and column Y. Frameworks with columns and no rows may not be converted from (i.e. can only be converted to):
ONNX | TensorRT | TensorFlow2 | TFLite | TFJS | OpenVino | CoreML | Keras | PyTorch | TorchScript | TensorFlow1 | |
---|---|---|---|---|---|---|---|---|---|---|---|
PyTorch | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
TensorFlow2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
Keras | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
ONNX | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
Inference
Below is a support matrices of frameworks with which Infery knows to perform inference
PyTorch | ONNX | TensorRT | TensorFlow 2 | TFLite | TFJS | Keras | OpenVino | CoreML | TensorFlow 1 | TorchScript |
---|---|---|---|---|---|---|---|---|---|---|
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
Serving
Below is a support matrix of the protocols Infery knows to serve with
gRPC - Remote | gRPC - Shared Memory | HTTP - Remote | HTTP - Shared Memory |
---|---|---|---|
✅ | ✅ | ✅ | ✅ |