Replies: 1 comment
-
You can dynamically load models in explicit model control mode: https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_management.html#model-control-mode-explicit As far as dynamically loading the image, you should be able to do it via a Python model. You can even use an ensemble model, if you want to break out the Python model into its own step. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I’m implementing triton for my computer vision infrastructure. I have a couple of details I want to understand before proceeding.
First of all, I wonder whether I can dynamically load models. All the models I have don’t fit in the GPU I will be using and that is not a problem, since model load time is not crucial because when I need a model I’ll need it for a while and I’m not worried about that load time. In my head it would be something like unloading the less frequently used models to make room for the most recently requested. Is this something triton can do? If not, is there a way to implement it with api calls ?
The second thing I want to sort out is the image loading process. Currently, the worker that analyses images requests the images to an external tile engine through http get and then it would have to sen it to triton server. Is there a way to send the load url for triton to get the image from the tile engine directly, saving some redundant time in the process?
Beta Was this translation helpful? Give feedback.
All reactions