DocumentationAPI Reference
Back to ConsoleLog In

Running Inference

After a model has been registered, you can use the RTiC Client to send inference requests to a model during production.

If your main objective is optimal throughput (not latency), then you can request that inference be performed asynchronously in order to boost performance. This option takes advantage of concurrent/parallel processing.

Running Inference on Registered Models

The following example feeds a registered model random input (using the Predict command) and then shows the result.

For each predict request, provide the following –

  • MODEL_NAME – Must exactly match the name of the model being measured.
  • BATCH_SIZE – The batch size for which the measurement will be made. This should be a batch size that the model is configured to handle.
  • INPUT_DIMS – The size/shape of the input to be used to measure the model’s performance. This must be the size/shape that the model is configured handle. Specify the input dimensions expected by the model so that Deci can interact with it. This is the syntax that the model is configured to handle and which will be used to measure the model’s performance.
    For example, 3, 244, 244 is for a model that performs an Object Detection task –
    The first parameter (in this case 3) represents the quantity of channels, which in this case is RGB,
    because this is a computer vision model.
    The second and third parameters represent the resolution of the image, which in this case is 224 x 224 pixels.

Notes – There is no need to declare the batch-size when entering the input dimensions. You should enter the input dimensions as the model expects to receive them, such as channel last/first.

The following code sample shows how to execute inference on your model –

import numpy as np

YOUR_MODEL_NAME = 'resnet50'

def get_random_input_tensor(batch_size: int, dummy_input_dims: tuple) -> np.ndarray:
    dummy_batch = np.random.rand(batch_size, *dummy_input_dims)
    dummy_batch = dummy_batch.astype(np.float32)
    return dummy_batch

# Modify according to your model's input shape.
tensor = get_random_input_tensor(batch_size=16,
                                 dummy_input_dims=(3,224,224))

# Run the inference
model_output = client.rtic.predict(model_name=YOUR_MODEL_NAME, 
                                   model_input=tensor,
                                   transport='http')

The following is an example of a response –

>>> {'data': array([[-3.25712226e-02, -4.07898799e-02,  4.45684195e-02,
         ....
         ....
         5.61307557e-03, -2.90499590e-02,  4.11636978e-02,
         1.28511060e-03, -2.02523749e-02,  1.58170760e-02,
         4.09304425e-02,  2.17799544e-02,  1.59001257e-03,
         4.30968702e-02]], dtype=float32),
         dtype=float32),
 'message': 'Success',
 'success': True}

Verify the output dimensions. In this example, we use a single-output model –

predictions = model_output.data
single_model_output_predictions = predictions[0]
single_model_output_predictions.shape

In the example, a classification model that was trained on ImageNet will return the following –

# np.ndarray
# Imagenet output - 1 batch, 1000 classes
>>> (1, 1000)

Running Inference Asynchronously

The following code sample shows how to execute an asynchronous inference –

import numpy as np

YOUR_MODEL_NAME = 'resnet50'


def get_random_input_tensor(batch_size: int, dummy_input_dims: tuple) -> np.ndarray:
    dummy_batch = np.random.rand(batch_size, *dummy_input_dims)
    dummy_batch = dummy_batch.astype(np.float32)
    return dummy_batch

# Modify according to your model's input shape.
tensor = get_random_input_tensor(batch_size=1, dummy_input_dims=(3,224,224))

# Run the inference asynchronically
futures = []
for i in range(10):
    f = client.rtic.predict_async(YOUR_MODEL_NAME, tensor)
    
    # Optional: If you do not need all the results together, you can execute a function as soon as the result is ready.
    # f.add_done_callback(function_that_handles_inference_result)
    futures.append(f)

# Waiting for all the inference submitted inference results, blocking:
results = [f.result() for f in futures]

Did this page help you?