DocumentationAPI Reference
Back to ConsoleLog In

Measuring a Model's Performance in INFERY

To measure a model’s performance using INFERY –

(1) Run the model.benchmark command from your application, as follows –

# Benchmark implicitly

# Or, Benchmark explicitly

The operation consists of the following parameters –

  • BATCH_SIZE is the batch size for which the measurement will be made. This should be the batch size that the model is configured to handle.
  • INPUT_DIMS (optional) is the size/shape of the input to be used to measure the model’s performance. This should be the size/shape that the model is configured to handle.
  • WARMUP_CALLS (optional) is the number of warmup calls to perform PRIOR to the benchmark. This helps to prepare the clocks on different HW for benchmark, making the results more consistent, reaching the peak of your current HW's compute power before the benchmark starts.
  • REPETITIONS (optional) is the number of times the measurement request will be sent in order to improve accuracy. The measurement that is presented in the Deci platform represents the average of the measurements in the responses to each of these requests.
  • DTYPE (optional) is the data type of the inference (numpy compatible, e.g float32, float16, int8, etc.). The data type affects inference because it changes the amount of data the needs to be copied to the memory and back (for inference).

The following is an example of a request –


The following is an example of a response –

infery_manager -INFO- Benchmarking the model in batch size 24 and dimensions (3, 224, 224)...
<ModelBenchmarks: {
    "batch_inf_time": "8.98 ms",
    "batch_inf_time_variance": "0.08 ms",
    "memory": "2362.00 ms",
    "throughput": "2671.70 fps",
    "sample_inf_time": "0.37 ms",
    "batch_size": 24

The following describes the response parameters –
'batch_size' – Specifies batch size that was used for benchmark.
'batch_inf_time' – Specifies the latency for the entire batch.
'sample_inf_time' – Specifies the latency for a single sample within the batch. equivalent to batch_inf_time divided by the batch_size.
'memory' – Specifies the memory footprint that the model utilizes while inferencing.
'throughput' – Specifies the number of requests that are processed (forward passes) per second.
'batch_inf_time_variance' – Specifies the variance of the batch inference times during benchmark. If the variance is high, we recommend increasing the number passed to 'repetitions' to make the benchmarks more reliable.

Did this page help you?