Skip to content

Infery CLI Tool


Infery comes with a command line tool for those who prefer one-line terminal commands over the Python API. In order to utilize the tool, simply install infery and use the relevant subcommand.

General usage:

# Top level help
infery --help


Documentation for the installation utility is available in the installation guide


Example usage:

# Benchmark help
infery benchmark --help

# Benchmark an ONNX model
infery benchmark --model-path=/path/to/model.onnx

# Benchmark an OpenVINO model's performance with a batch size of 4 (must be compatible with the provided model's BS)
infery benchmark --model-path=/path/to/model.xml --batch-size=4

# Benchmark a TensorRT model that has multiple dynamic axes (here, "input_1" that may accepts dims [3,224,224] with batch size 1)
infery benchmark --model-path=/path/to/model.engine --input-dims=input_1:1,3,224,224:FP32

# Benchmark an ONNX model, averaging over 15 seconds of forward passes before calculating the average performance 
infery benchmark --model-path=/path/to/model.onnx --duration-secs=15

# Benchmark a TF2 model with 30 warmup repetitions before 1000 repetitions over which the performance is averaged
infery benchmark --model-path=/path/to/model_dir --framework=tensorflow2 --warmups=30 --repetitions=1000

# Benchmark from JSON file input
echo '{"framework":"ONNX", "model_path":"/home/ubuntu/Downloads/mobilenetv2-12-int8.onnx", "batch_size": 1}' > /tmp/benchmark_request.json
infery benchmark_json -i /tmp/benchmark_request.json


Example usage:

# Compile help
infery compile --help

# Compile an ONNX model to TensorRT (quantize to FP32 by default)
infery compile --source=/path/to/source_model.onnx --target=/path/to/compiled_model.engine --target-framework=tensorrt

# Compile an ONNX model to TensorRT, quantizing to INT8 weights, with verbose logging
infery compile --source=/path/to/source_model.onnx --target=/path/to/compiled_model.engine --target-framework=tensorrt --quantization int8 -E verbose=True

# Compile a TF2 model to an OpenVINO model with static batch size of 4, quantizing to INT8
infery compile --source=/path/to/source_model_dir --target=/path/to/compiled_model.xml --target-framework=openvino --quantization=int8 --batch-size 4 --source-framework=tensorflow2

# Compile an ONNX model to TFLite, quantizing to FP16 and passing the "allow_tf_ops" option
infery compile --source=/path/to/source_model.onnx --target=/path/to/compiled_model.tflite --target-framework=tflite --quantization=fp16 -E allow_tf_ops=True

# Quantize an ONNX to FP16, leaving the model's inputs and outputs in their original quantization
infery compile --source=/path/to/source_model.onnx --target=/path/to/compiled_model.onnx --target-framework=onnx --quantization=fp16 -E fp16_keep_io_types=True

# Run compilation from JSON file
echo '{"source":"/home/ubuntu/Downloads/resnet50_dynamic.onnx", "target":"/home/avi/Downloads/resnet50_dynamic.pkl", "target_framework":"TENSORRT", "batch_size": 64}' > /tmp/compilation_request.json
infery compile_json -i /tmp/compilation_request.json