-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Error building extension 'cpu_adam', because /usr/bin/ld: can not find -lcurand,help! #5659
Comments
same problem. |
same problem!why |
same problem
|
按照这里的方法,解决了 #3929 (comment)
there is no lib64 under /home/enwei/anaconda3/envs/llama. So I copied everything in lib to lib64, and the problem is solved for me. |
Hi @hekaijie123, Can you please share your environment variables, specifically We've found that explicitly setting
|
Setting |
Closing issue since #5780 has been merged. |
python -c 'import deepspeed; deepspeed.ops.adam.cpu_adam.CPUAdamBuilder().load()'
[2024-06-14 14:24:07,747] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
Using /home/jxlab03/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Creating extension directory /home/jxlab03/.cache/torch_extensions/py310_cu118/cpu_adam...
Emitting ninja build file /home/jxlab03/.cache/torch_extensions/py310_cu118/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include -isystem /home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/TH -isystem /home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/THC -isystem /home/jxlab03/anaconda3/envs/minicpm/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX512 -D__ENABLE_CUDA_ -DBF16_AVAILABLE -c /home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
[2/3] c++ -MMD -MF cpu_adam_impl.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include -isystem /home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/TH -isystem /home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/include/THC -isystem /home/jxlab03/anaconda3/envs/minicpm/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX512 -D__ENABLE_CUDA_ -DBF16_AVAILABLE -c /home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/cpu_adam_impl.cpp -o cpu_adam_impl.o
[3/3] c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
FAILED: cpu_adam.so
c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
/usr/bin/ld: 找不到 -lcurand
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "", line 1, in
File "/home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 508, in load
return self.jit_load(verbose)
File "/home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 555, in jit_load
op_module = load(name=self.name,
File "/home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
_write_ninja_file_and_build_library(
File "/home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/home/jxlab03/anaconda3/envs/minicpm/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
but I use "ldconfig -p | grep libcurand" in terminal, is can see the ibcurand.so.
libcurand.so.10 (libc6,x86-64) => /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcurand.so.10
libcurand.so (libc6,x86-64) => /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcurand.so
torch cuda version 和 nvcc version is match, is 11.8.
So, I don't konw why ninja can find -lcurand?
The text was updated successfully, but these errors were encountered: