On Orin platform,cuDLA inference can block TensorRT inference on the GPU, resulting in the GPU being completely idle. #3804

SolDogLi · 2024-04-18T09:49:34Z

Description

I am using two threads and two CUDA streams to perform inference on two models using cuDLA and GPU TensorRT.
However, I have observed that when performing inference on one model using cuDLA, it blocks the inference on the other model using GPU TensorRT, resulting in the GPU being completely idle.

Is it not possible to run cuDLA and GPU inference simultaneously?

Environment

TensorRT Version: 8.5

NVIDIA GPU: orin

NVIDIA Driver Version:

CUDA Version: 11.2

CUDNN Version:

Relevant Files

nsys report：https://drive.google.com/file/d/1I1iqgpOwb_FlDpX0Nxbaip7R_osdSDN3/view?usp=drive_link

Steps To Reproduce

mCuDLACtx->submitDLATask(mStream); thread A and stream1 submitDLATask

trt_context->enqueueV2(buffers, stream, nullptr); thread B and stream2 tensorRT enqueueV2

lix19937 · 2024-04-20T04:18:24Z

NOTE: drive os 6060 has dla bug(dla stability /timeout), you can update to 6080.

zerollzeng · 2024-04-25T14:53:10Z

Please try the latest DOS release, and if you are our Auto customer, I think you can file nvbugs directly to NV? Or you have a Tier-1?

zerollzeng · 2024-04-25T14:53:42Z

Because DLA runtime is not in the scope of TensorRT, so their's nothing we can do in TRT side.

ttyio · 2024-07-02T16:55:05Z

closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all!

zerollzeng self-assigned this Apr 25, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label Apr 25, 2024

ttyio closed this as completed Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On Orin platform,cuDLA inference can block TensorRT inference on the GPU, resulting in the GPU being completely idle. #3804

On Orin platform,cuDLA inference can block TensorRT inference on the GPU, resulting in the GPU being completely idle. #3804

SolDogLi commented Apr 18, 2024

lix19937 commented Apr 20, 2024

zerollzeng commented Apr 25, 2024

zerollzeng commented Apr 25, 2024

ttyio commented Jul 2, 2024

On Orin platform,cuDLA inference can block TensorRT inference on the GPU, resulting in the GPU being completely idle. #3804

On Orin platform,cuDLA inference can block TensorRT inference on the GPU, resulting in the GPU being completely idle. #3804

Comments

SolDogLi commented Apr 18, 2024

Description

Environment

Relevant Files

Steps To Reproduce

lix19937 commented Apr 20, 2024

zerollzeng commented Apr 25, 2024

zerollzeng commented Apr 25, 2024

ttyio commented Jul 2, 2024