Triton Inference server : Inference on multi-gpus and load balancing across gpus #7841

Mudaseer · 2024-11-28T10:24:50Z

Mudaseer
Nov 28, 2024

Inference Tensorrt model on multi GPU’s works as expected, untill if gpus belongs to same gpu family. it loads same model on all the GPU’s by specifying gpus:[0,1] in instance_group (config.pbtxt). only one infer api call is enough to handle the load balancing across GPU’s.

How to handle Inference Tensorrt model on multi GPU’s, if GPUs belongs to different families and how can we achieve optimized load balancing as above?

If such configurations are not natively supported, I would also appreciate insights or suggestions for alternative strategies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton Inference server : Inference on multi-gpus and load balancing across gpus #7841

{{title}}

Replies: 0 comments

Select a reply

Triton Inference server : Inference on multi-gpus and load balancing across gpus #7841

Mudaseer Nov 28, 2024

Replies: 0 comments

Mudaseer
Nov 28, 2024