-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refit_cuda_engine method is too slow #3332
Comments
@davidli313 could you try TensorRT 9.0? The refitting perf has been improved by 1.8 - 15x in TensorRT 9.0. Also can you give more details on your use case? What is the inference time? (Where are the weights from? Is there a training stage?) What is the percentage of refitting time in the entire process? |
actually, the engine built with refit feature has a poor bad preformance in inference time consuming,especially with dynamic shape feature。and the inference time consuming is not stable. the refittable unet engine inference process costs 500ms to1000ms. that is slower than pytorch. |
@davidli313 @zhangvia Hi, can you point me to some samples of loading lora with refit ? |
I meet same problem. By nsight sys , I find some cudaMalloc and CudaFree with long time. Who can help me, give me a sample code of c++ to refit a unet with lora? |
Please refer to "https://github.com/NVIDIA/TensorRT/blob/release/9.2/demo/Diffusion/utilities.py" and pass GPU weights to refitter instead of CPU weights to avoid internal memory allocation. |
Description
I used the python tensorrt refitter class to load the LoRA weights of stable diffusion unet, but the refitter.refit_cuda_engine method is so slow, usually taking 4~5 seconds. Is there any way to improve the performance of refit_cuda_engine?
Environment
TensorRT Version:
8.6.1
NVIDIA GPU:
GeForce RTX 4090
NVIDIA Driver Version:
525.89.02
CUDA Version:
12.0
CUDNN Version:
8.9.2
Operating System:
Ubuntu 20.04.1
Python Version (if applicable):
3.9.16
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
1.12.1
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):The text was updated successfully, but these errors were encountered: