Compilation

Infery provides a simple interface for converting models from one framework to another. Infery can also modify the different model characteristics such as its batch size or quantization level.

Compilation-Specific Requirements

infery follows a "install what you need" philosophy (see more details in the [Concepts] section). running pip install infery will install infery without any additional requirements. However, each compilation flow has a specific set of requirements (for example, any compilation from ONNX will require the onnx library to be installed). These requirements may seem unexpected at times (Surprise! compiling Pytorch to TensorRT requires onnx to be installed as well).

In order to compile between different frameworks, you'll need to install dependencies for the wanted compilation flow. It is highly recommended you use Infery's CLI to do so. Refer to the installation guide for more details.

Examples

ONNX → TensorRT

Converting an onnx model to Nvidia's TensorRT framework can result in an easy boost in performance on Nvidia GPUs without compromising accuracy. First, download the following ResNet50 from ONNX model zoo. ONNX Model Zoo is a repository of pre-trained deep learning models in ONNX format, allowing users to easily access and use these models for various machine learning tasks.

from infery import Model, FrameworkType
# Start by wrapping the ONNX model in an `infery.Model` instance
onnx_model = Model("resnet50-v1-12.onnx")

# Compiling the model from this point is just a simple one-liner
trt_model = onnx_model.compile(target_framework=FrameworkType.TENSORRT)
# TRT does not support fully dynamic axes, If your model has them you need to specify a static batch size for the target model
trt_model = onnx_model.compile(target_framework=FrameworkType.TENSORRT, target_batch_size=1)

trt_model is also an infery.Model instance, and supports all of infery's other features like inference and model inspection:

# model metadata will print information on the model like inputs and outputs shape, names, dtype and more 
trt_model.metadata
# you could also verify the new model framework using
trt_model.framaework

ONNX → OpenVINO

First, make sure you have OpenVino installed, if not, please check out our installation guide.

First, download the following ResNet50 from ONNX model zoo. ONNX Model Zoo is a repository of pre-trained deep learning models in ONNX format, allowing users to easily access and use these models for various machine learning tasks.

from infery import Model, FrameworkType
# Start by wrapping the ONNX model in an `infery.Model` instance
onnx_model = Model("resnet50-v1-12.onnx")
# Compiling the model from this point is just a simple one-liner
ov_model = onnx_model.compile(target_framework=FrameworkType.OPENVINO)

PyTorch → TFLite

Pytorch is a very popular framework for training neaural networks, but it's not always to right choice when shifting to production. Specifically, if our end goal is inference on Android devices, we may prefer to convert our Pytorch network to TFLite. This is easily achieved with infery. We'll start again by instantiating a torch model, this time with torchvision, and wrapping it in a infery.Model instance:

import infery
import torchvision
resnet18 = torchvision.models.resnet18()
model = infery.Model(resnet18, framework=infery.FrameworkType.PYTORCH, source_type=infery.ModelSourceType.LOADED_MODEL)

Notice how this time we didn't use a local file, but rather a loaded instance of our model. We therefore needed to provide infery with the framework of the loaded model. We can now compile the loaded model to TFLite with a simple one-liner, as before:

tflite_model = model.compile(target_framework=infery.FrameworkType.TFLITE, input_specs=[(1, 3, 640, 640)])

Notice that now we had to include the input_specs parameter, since torch networks don't have well-defined io dimensions that we can deduce with model inspection. We can now persist the new model to disk and use it in production:

tflite_model.persist("./resnet18.tflite")

We also had the option of specifying output_path=... in model.compile to persist the model directly, without passing through the tflite_model instance.

Compilation Params

model.compile accepts many parameters that can affect the compilation result. Some of these parameters are common to all compilation flows (for example target_batch_size or target_quantization), but most of the parameters are specific to the source and target frameworks of the specific compilation flow we're running. These compilation-specific parameters are documented in [Compilation Params]. You can inspect the available parameters inside your interpreter session by instantiating a compilation params instance. Continuing the last example:

params = model.compile_params(target_framework=infery.FrameworkType.TFLITE)

params will be an instance of PytorchToTfliteParams, and you can inspect its fields to see available parameters. For example, we can see the params has a allow_tf_ops attribute to enable tensorflow ops in theresulting tflite model:

params.allow_tf_ops = True
params.input_specs = [(1, 3, 640, 640)]
tflite_model = model.compile(params)

model.compile also accepts instances of compilation params as inputs. Any supported parameter can also be passed to model.compile as a keyword argument, like we did with input_specs earlier:

tflite_model = model.compile(target_framework=infery.FrameworkType.TFLITE, input_specs=[(1, 3, 640, 640)], allow_tf_ops=True)

Link to further documentation and examples: Params Quickstart