Infery provides a simple interface for converting models from one framework to another. Infery can also modify the different model characteristics such as its batch size or quantization level.
infery follows a "install what you need" philosophy (see more details in the [Concepts] section). running
pip install infery will install infery without any additional requirements. However, each compilation flow has a specific set of requirements (for example, any compilation from ONNX will require the
onnx library to be installed). These requirements may seem unexpected at times (Surprise! compiling Pytorch to TensorRT requires
onnx to be installed as well).
In order to compile between different frameworks, you'll need to install dependencies for the wanted compilation flow. It is highly recommended you use Infery's CLI to do so. Refer to the installation guide for more details.
ONNX → TensorRT
Converting an onnx model to Nvidia's TensorRT framework can result in an easy boost in performance on Nvidia GPUs without compromising accuracy. First, download the following ResNet50 from ONNX model zoo. ONNX Model Zoo is a repository of pre-trained deep learning models in ONNX format, allowing users to easily access and use these models for various machine learning tasks.
from infery import Model, FrameworkType # Start by wrapping the ONNX model in an `infery.Model` instance onnx_model = Model("resnet50-v1-12.onnx") # Compiling the model from this point is just a simple one-liner trt_model = onnx_model.compile(target_framework=FrameworkType.TENSORRT) # TRT does not support fully dynamic axes, If your model has them you need to specify a static batch size for the target model trt_model = onnx_model.compile(target_framework=FrameworkType.TENSORRT, target_batch_size=1)
trt_model is also an
infery.Model instance, and supports all of infery's other features like inference and model inspection:
# model metadata will print information on the model like inputs and outputs shape, names, dtype and more trt_model.metadata # you could also verify the new model framework using trt_model.framaework
ONNX → OpenVINO
First, make sure you have OpenVino installed, if not, please check out our installation guide.
First, download the following ResNet50 from ONNX model zoo. ONNX Model Zoo is a repository of pre-trained deep learning models in ONNX format, allowing users to easily access and use these models for various machine learning tasks.
from infery import Model, FrameworkType # Start by wrapping the ONNX model in an `infery.Model` instance onnx_model = Model("resnet50-v1-12.onnx") # Compiling the model from this point is just a simple one-liner ov_model = onnx_model.compile(target_framework=FrameworkType.OPENVINO)
PyTorch → TFLite
Pytorch is a very popular framework for training neaural networks, but it's not always to right choice when shifting to production. Specifically, if our end goal is inference on Android devices, we may prefer to convert our Pytorch network to TFLite. This is easily achieved with infery.
We'll start again by instantiating a torch model, this time with torchvision, and wrapping it in a
import infery import torchvision resnet18 = torchvision.models.resnet18() model = infery.Model(resnet18, framework=infery.FrameworkType.PYTORCH, source_type=infery.ModelSourceType.LOADED_MODEL)
Notice how this time we didn't use a local file, but rather a loaded instance of our model. We therefore needed to provide infery with the framework of the loaded model. We can now compile the loaded model to TFLite with a simple one-liner, as before:
tflite_model = model.compile(target_framework=infery.FrameworkType.TFLITE, input_specs=[(1, 3, 640, 640)])
Notice that now we had to include the
input_specs parameter, since torch networks don't have well-defined io dimensions that we can deduce with model inspection.
We can now persist the new model to disk and use it in production:
We also had the option of specifying
model.compile to persist the model directly, without passing through the
model.compile accepts many parameters that can affect the compilation result. Some of these parameters are common to all compilation flows (for example
target_quantization), but most of the parameters are specific to the source and target frameworks of the specific compilation flow we're running. These compilation-specific parameters are documented in [Compilation Params]. You can inspect the available parameters inside your interpreter session by instantiating a compilation params instance. Continuing the last example:
params = model.compile_params(target_framework=infery.FrameworkType.TFLITE)
params will be an instance of
PytorchToTfliteParams, and you can inspect its fields to see available parameters. For example, we can see the
params has a
allow_tf_ops attribute to enable tensorflow ops in theresulting tflite model:
params.allow_tf_ops = True params.input_specs = [(1, 3, 640, 640)] tflite_model = model.compile(params)
model.compile also accepts instances of compilation params as inputs. Any supported parameter can also be passed to
model.compile as a keyword argument, like we did with
tflite_model = model.compile(target_framework=infery.FrameworkType.TFLITE, input_specs=[(1, 3, 640, 640)], allow_tf_ops=True)
Link to further documentation and examples: Params Quickstart