DocumentationAPI Reference
Back to ConsoleLog In

Quickstart with INFERY

The following describes how to install and load a model so that you can run inference from a Python package using INFERY.
This quickstart shows how it can be done using a model that resides in the Deci Lab.

To use your own model or to work without the Lab, refer to the step by step guide, starting from Installing INFERY).

Choose A Model (Lab Example)

To get ready to run inference with INFERY –

(1) Display the Deci Lab, which opens by default when you launch Deci or by clicking the Lab tab at the top of the page.

(2) Select the row of the model (baseline or optimized) to be deployed and click the Download button at the top right of the Lab.


The following displays. It provides three simple copy/paste instructions that describe how to install the INFERY Python package, to load the model into INFERY and then to run inference on the model, on the chosen hardware.


(3) CHOOSE TARGET HARDWARE – If you have selected an optimized model, then you can skip this step because the model has already been optimized for a specific hardware environment, such as Intel XEON.

If you selected a baseline model, then you must specify the target hardware by clicking the relevant button in this window – GPU or CPU, as follows –


(4) Click the Download model option at the top right of this window to download the model from the Deci repository to your machine.


PIP install the INFERY Python package by clicking the Copy icon to copy the command. Run this command in a CLI (terminal) on the machine on which to deploy the model.

# For CPU:
python3 -m pip install -U pip
python3 -m pip install infery

# For GPU:
python3 -m pip install -U pip

# Compile pycuda for the local CUDA. The example uses CUDA 11.2
export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
python3 -m pip install -U pycuda

# Install infery-gpu from PyPi and TensorRT from nvidia's pip repository
python3 -m pip install -U --extra-index-url infery-gpu


In a Python environment, load the model into a INFERY Python package, as follows –

Click the Copy icon to copy the following command.

Make sure to copy and use the command that is actually displayed in the Deci Download Model window (not the example below) as the basis for passing your own parameters, as follows –

  • model_path – Specify the exact path to where you downloaded/saved your model.
  • framework_type – Specify the framework (programming language) used to develop this model. The supported options are listed in the table below.
  • inference_hardware – Specify either gpu or cpu according to the target hardware environment on which the model will be run (CPU or GPU).
import infery, numpy as np
model = infery.load(model_path='MyNewTask_1_0.onnx', framework_type='onnx', inference_hardware='gpu')

Predict – Run Inference with INFERY

(1) Click the Copy icon next to the RUN MODEL option to copy the command that calls and runs your model.


(2) Run the copied command in a Python environment, as follows –

Pass the object named model that was received back from the LOAD MODEL command. This will run the INFERY inference engine with the selected optimized model inside.

Make sure to copy and use the command that is actually displayed in the Deci Download Model window (not this example).

In the example below, inputs represents the input tensor to be processed by the model. In this example, it is an automatically generated random tensor, which should be replaced by the real tensor to be processed by the model.predict command.

inputs = np.random.random((1, 3, 224, 224)).astype('float32')

Did this page help you?