You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using two threads and two CUDA streams to perform inference on two models using cuDLA and GPU TensorRT.
However, I have observed that when performing inference on one model using cuDLA, it blocks the inference on the other model using GPU TensorRT, resulting in the GPU being completely idle.
Is it not possible to run cuDLA and GPU inference simultaneously?
Description
I am using two threads and two CUDA streams to perform inference on two models using cuDLA and GPU TensorRT.
However, I have observed that when performing inference on one model using cuDLA, it blocks the inference on the other model using GPU TensorRT, resulting in the GPU being completely idle.
Is it not possible to run cuDLA and GPU inference simultaneously?
Environment
TensorRT Version: 8.5
NVIDIA GPU: orin
NVIDIA Driver Version:
CUDA Version: 11.2
CUDNN Version:
Relevant Files
nsys report:https://drive.google.com/file/d/1I1iqgpOwb_FlDpX0Nxbaip7R_osdSDN3/view?usp=drive_link
Steps To Reproduce
mCuDLACtx->submitDLATask(mStream); thread A and stream1 submitDLATask
trt_context->enqueueV2(buffers, stream, nullptr); thread B and stream2 tensorRT enqueueV2
The text was updated successfully, but these errors were encountered: