Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12 #1700

Closed
anandhu-eng opened this issue May 15, 2024 · 4 comments
Closed

Comments

@anandhu-eng
Copy link
Contributor

image

Occured while running: cm run script --tags=generate-run-cmds,inference --model=bert-99 --backend=pytorch --mode=performance --device=cuda --quiet --test_query_count=1000 --network=sut

Found same issue created at: pytorch/pytorch#111469

I have tried to export env variable: export LD_LIBRARY_PATH=$HOME/.local/lib/python3.12/sitepackages/nvidia/nvjitlink:$LD_LIBRARY_PATH but did not work

Another method mentioned:

  1. Downgrade torch to 2.0.1
  2. Installing nightly build version of torch
@arjunsuresh
Copy link
Contributor

This error should be because pytorch was compiled for a later version of cuda. --adr.cuda.version=12.4.1 should fix it.

@mrmhodak
Copy link
Contributor

WG agrees this is resolved

@mrmhodak
Copy link
Contributor

Closing

@Xi0131
Copy link

Xi0131 commented Dec 24, 2024

Hi, I am facing the same issue with the gpt-j benchmark.
Here are the output:

torch.distributed not initialized, assuming single world_size.
Quantized model exported to /mnt/models/GPTJ-6B/fp8-quantized-ammo/GPTJ-FP8-quantized
Total time used 14.93 s.
make: Leaving directory '/home/usr/CM/repos/local/cache/e6f880f23ece4993/repo/docker'
Traceback (most recent call last):
  File "/home/usr/CM/repos/local/cache/c69993974e204ca1/repo/closed/NVIDIA/code/gptj/tensorrt/onnx_tune.py", line 15, in <module>
    import torch
  File "/home/usr/Benchmark/cm/lib/python3.12/site-packages/torch/__init__.py", line 367, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: /home/usr/Benchmark/cm/lib/python3.12/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12
CM error: Portable CM script failed (name = get-ml-model-gptj, return code = 256)

with the command:
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=gptj-99 --implementation=nvidia --framework=tensorrt --category=datacenter --scenario=Offline --execution_mode=test --device=cuda --docker --quiet --test_query_count=50

I have tried to add --adr.cuda.version=12.4.1 at the end of the command but it does not solve the issue. Should I reinstall torch with the proper version? My torch version is currently in 2.5.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants