Compilation
Infery provides a simple interface for converting models from one framework to another. Infery can also modify the different model characteristics such as its batch size or quantization level.
Compilation-Specific Requirements
infery follows a "install what you need" philosophy (see more details in the [Concepts] section). running pip install infery
will install infery without any additional requirements. However, each compilation flow has a specific set of requirements (for example, any compilation from ONNX will require the onnx
library to be installed). These requirements may seem unexpected at times (Surprise! compiling Pytorch to TensorRT requires onnx
to be installed as well).
In order to compile between different frameworks, you'll need to install dependencies for the wanted compilation flow. It is highly recommended you use Infery's CLI to do so. Refer to the installation guide for more details.
Examples
ONNX → TensorRT
Converting an onnx model to Nvidia's TensorRT framework can result in an easy boost in performance on Nvidia GPUs without compromising accuracy. First, download the following ResNet50 from ONNX model zoo. ONNX Model Zoo is a repository of pre-trained deep learning models in ONNX format, allowing users to easily access and use these models for various machine learning tasks.
from infery import Model, FrameworkType
# Start by wrapping the ONNX model in an `infery.Model` instance
onnx_model = Model("resnet50-v1-12.onnx")
# Compiling the model from this point is just a simple one-liner
trt_model = onnx_model.compile(target_framework=FrameworkType.TENSORRT)
# TRT does not support fully dynamic axes, If your model has them you need to specify a static batch size for the target model
trt_model = onnx_model.compile(target_framework=FrameworkType.TENSORRT, target_batch_size=1)
trt_model
is also an infery.Model
instance, and supports all of infery's other features like inference and model inspection:
# model metadata will print information on the model like inputs and outputs shape, names, dtype and more
trt_model.metadata
# you could also verify the new model framework using
trt_model.framaework
ONNX → OpenVINO
First, make sure you have OpenVino installed, if not, please check out our installation guide.
First, download the following ResNet50 from ONNX model zoo. ONNX Model Zoo is a repository of pre-trained deep learning models in ONNX format, allowing users to easily access and use these models for various machine learning tasks.
from infery import Model, FrameworkType
# Start by wrapping the ONNX model in an `infery.Model` instance
onnx_model = Model("resnet50-v1-12.onnx")
# Compiling the model from this point is just a simple one-liner
ov_model = onnx_model.compile(target_framework=FrameworkType.OPENVINO)
PyTorch → TFLite
Pytorch is a very popular framework for training neaural networks, but it's not always to right choice when shifting to production. Specifically, if our end goal is inference on Android devices, we may prefer to convert our Pytorch network to TFLite. This is easily achieved with infery.
We'll start again by instantiating a torch model, this time with torchvision, and wrapping it in a infery.Model
instance:
import infery
import torchvision
resnet18 = torchvision.models.resnet18()
model = infery.Model(resnet18, framework=infery.FrameworkType.PYTORCH, source_type=infery.ModelSourceType.LOADED_MODEL)
Notice how this time we didn't use a local file, but rather a loaded instance of our model. We therefore needed to provide infery with the framework of the loaded model. We can now compile the loaded model to TFLite with a simple one-liner, as before:
tflite_model = model.compile(target_framework=infery.FrameworkType.TFLITE, input_specs=[(1, 3, 640, 640)])
Notice that now we had to include the input_specs
parameter, since torch networks don't have well-defined io dimensions that we can deduce with model inspection.
We can now persist the new model to disk and use it in production:
tflite_model.persist("./resnet18.tflite")
We also had the option of specifying output_path=...
in model.compile
to persist the model directly, without passing through the tflite_model
instance.
Compilation Params
model.compile
accepts many parameters that can affect the compilation result. Some of these parameters are common to all compilation flows (for example target_batch_size
or target_quantization
), but most of the parameters are specific to the source and target frameworks of the specific compilation flow we're running. These compilation-specific parameters are documented in [Compilation Params]. You can inspect the available parameters inside your interpreter session by instantiating a compilation params instance. Continuing the last example:
params = model.compile_params(target_framework=infery.FrameworkType.TFLITE)
params
will be an instance of PytorchToTfliteParams
, and you can inspect its fields to see available parameters. For example, we can see the params
has a allow_tf_ops
attribute to enable tensorflow ops in theresulting tflite model:
params.allow_tf_ops = True
params.input_specs = [(1, 3, 640, 640)]
tflite_model = model.compile(params)
model.compile
also accepts instances of compilation params as inputs. Any supported parameter can also be passed to model.compile
as a keyword argument, like we did with input_specs
earlier:
tflite_model = model.compile(target_framework=infery.FrameworkType.TFLITE, input_specs=[(1, 3, 640, 640)], allow_tf_ops=True)
Link to further documentation and examples: Params Quickstart