fix: forgot to always set _disable_torch_cuda_device_set

Signed-off-by: Terry Kong <terryk@nvidia.com>
NVIDIA · Oct 2, 2024 · 148543d · 148543d
1 parent 0142ee7
commit 148543d
Showing 1 changed file with 1 addition and 0 deletions.
diff --git a/nemo/export/trt_llm/tensorrt_llm_run.py b/nemo/export/trt_llm/tensorrt_llm_run.py
@@ -502,6 +502,7 @@ def load_distributed(engine_dir, model_parallel_rank, gpus_per_node):
  # We want the engine to have the mp_rank, but the python runtime to not resassign the device of the current process
  # So we will set it to the current
  rank=torch.cuda.current_device(),
+ _disable_torch_cuda_device_set=True,
  )
 
  tensorrt_llm_worker_context.decoder = decoder