To submit a request for inference –
(1) Run the client.rtic.predict command from your application, as follows –
model_output = client.rtic.predict(model_name='YOUR_MODEL_NAME', model_input=tensor, transport='grpc')
This request consists of the following parameters –
- model_name is the model’s name, as described above.
- tensor is the input to be submitted to the model and on which the inference will be performed.
- transport specifies the method for communicating with the RTiC inference engine. Multiple application protocols are supported. The default is HTTP.
RTiC offers several types of protocols for Input and Output Tensors exchange, which we refer to as Transport Protocols. Each of these Transport Protocols serves different purposes, with different benchmarks. Choosing the relevant transport for your deployment will reduce your inference latency and increase its throughput.
- HTTP – A fast serialization for numpy arrays over HTTP payloads. This is the default protocol for network or cloud deployments, especially if you use large batch sizes across the network.
- IPC – When working in localhost (server and client are deployed on the same machine), the default communication method is IPC (Inter-Process-Communication over the Shared Memory).
- GRPC – GRPC uses HTTP2 protocol for transport and is therefore faster than HTTP for latency in small batch sizes.
In case you wish to use HTTP/GRPC/IPC communication protocol for your deployment, you can simply override the predict call transport using the transport argument: transport='grpc'.
Updated 7 months ago