Undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12 #1700

anandhu-eng · 2024-05-15T07:15:02Z

Occured while running: cm run script --tags=generate-run-cmds,inference --model=bert-99 --backend=pytorch --mode=performance --device=cuda --quiet --test_query_count=1000 --network=sut

Found same issue created at: pytorch/pytorch#111469

I have tried to export env variable: export LD_LIBRARY_PATH=$HOME/.local/lib/python3.12/sitepackages/nvidia/nvjitlink:$LD_LIBRARY_PATH but did not work

Another method mentioned:

Downgrade torch to 2.0.1
Installing nightly build version of torch

The text was updated successfully, but these errors were encountered:

arjunsuresh · 2024-05-15T13:47:52Z

This error should be because pytorch was compiled for a later version of cuda. --adr.cuda.version=12.4.1 should fix it.

mrmhodak · 2024-05-21T16:45:52Z

WG agrees this is resolved

mrmhodak · 2024-05-21T16:46:05Z

Closing

Xi0131 · 2024-12-24T06:52:02Z

Hi, I am facing the same issue with the gpt-j benchmark.
Here are the output:

torch.distributed not initialized, assuming single world_size.
Quantized model exported to /mnt/models/GPTJ-6B/fp8-quantized-ammo/GPTJ-FP8-quantized
Total time used 14.93 s.
make: Leaving directory '/home/usr/CM/repos/local/cache/e6f880f23ece4993/repo/docker'
Traceback (most recent call last):
  File "/home/usr/CM/repos/local/cache/c69993974e204ca1/repo/closed/NVIDIA/code/gptj/tensorrt/onnx_tune.py", line 15, in <module>
    import torch
  File "/home/usr/Benchmark/cm/lib/python3.12/site-packages/torch/__init__.py", line 367, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: /home/usr/Benchmark/cm/lib/python3.12/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12
CM error: Portable CM script failed (name = get-ml-model-gptj, return code = 256)

with the command:
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=gptj-99 --implementation=nvidia --framework=tensorrt --category=datacenter --scenario=Offline --execution_mode=test --device=cuda --docker --quiet --test_query_count=50

I have tried to add --adr.cuda.version=12.4.1 at the end of the command but it does not solve the issue. Should I reinstall torch with the proper version? My torch version is currently in 2.5.1.

mrmhodak closed this as completed May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12 #1700

Undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12 #1700

anandhu-eng commented May 15, 2024

arjunsuresh commented May 15, 2024

mrmhodak commented May 21, 2024

mrmhodak commented May 21, 2024

Xi0131 commented Dec 24, 2024 •

edited

Loading

Undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12 #1700

Undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12 #1700

Comments

anandhu-eng commented May 15, 2024

arjunsuresh commented May 15, 2024

mrmhodak commented May 21, 2024

mrmhodak commented May 21, 2024

Xi0131 commented Dec 24, 2024 • edited Loading

Xi0131 commented Dec 24, 2024 •

edited

Loading