-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question for building wheel for transformer-engine #516
Comments
Could you try pyTorch 1.14 (or anything 2.x) instead? I believe pyTorch changed the random number generator C++ API between 1.13 and 1.14 which could cause this error. |
Same here, but in my case the process stops at:
|
any updates on this? I am facing more or less the same issue. CMAKE fails with an error saying
Please help me solve this problem. |
@Mrzhang-dada @ionutmodo @osainz59 the environment configuration shows blow: NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 nvcc -V : 11.8 cuDNN: 8.9.6 torch 1.13.0+cu116 |
Same trouble on 2 machines with Geforce RTX 2080 TI 12GB and Geforce RTX 4090 24GB cards. The environment configuration shows blow: NVIDIA-SMI 550.54.14 nvcc -V : cuDNN: 8.9.6 It is a horrible job to try to install TransformerEngine.
|
Please try these suggestions: #355 (comment) It may also be worth considering using an NGC PyTorch container, which includes TE. |
I don't know why this error occurs, can anyone help me solve this problem.
My system environment is ubuntu 20.04, python3.8, cuda11.8.
root@f77be2ea35c2:/workspace/TransformerEngine# pip install .
WARNING: Ignoring invalid distribution -ransformer-engine (/opt/conda/lib/python3.8/site-packages)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing /workspace/TransformerEngine
Preparing metadata (setup.py) ... done
Requirement already satisfied: pydantic in /opt/conda/lib/python3.8/site-packages (from transformer-engine==1.1.0.dev0+64a3d1d) (1.8.2)
Requirement already satisfied: torch in /opt/conda/lib/python3.8/site-packages (from transformer-engine==1.1.0.dev0+64a3d1d) (1.13.0a0+d0d6b1f)
Requirement already satisfied: flash-attn<=2.0.4,>=1.0.6 in /opt/conda/lib/python3.8/site-packages (from transformer-engine==1.1.0.dev0+64a3d1d) (2.0.4)
Requirement already satisfied: einops in /opt/conda/lib/python3.8/site-packages (from flash-attn<=2.0.4,>=1.0.6->transformer-engine==1.1.0.dev0+64a3d1d) (0.7.0)
Requirement already satisfied: packaging in /opt/conda/lib/python3.8/site-packages (from flash-attn<=2.0.4,>=1.0.6->transformer-engine==1.1.0.dev0+64a3d1d) (21.3)
Requirement already satisfied: ninja in /opt/conda/lib/python3.8/site-packages (from flash-attn<=2.0.4,>=1.0.6->transformer-engine==1.1.0.dev0+64a3d1d) (1.11.1.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.8/site-packages (from pydantic->transformer-engine==1.1.0.dev0+64a3d1d) (4.3.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging->flash-attn<=2.0.4,>=1.0.6->transformer-engine==1.1.0.dev0+64a3d1d) (3.0.9)
Building wheels for collected packages: transformer-engine
Building wheel for transformer-engine (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [416 lines of output]
/opt/conda/lib/python3.8/site-packages/setuptools/dist.py:490: UserWarning: Normalizing '1.1.0dev+64a3d1d' to '1.1.0.dev0+64a3d1d'
warnings.warn(tmpl.format(**locals()))
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/transformer_engine
copying transformer_engine/init.py -> build/lib.linux-x86_64-3.8/transformer_engine
creating build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/cpp_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/mlp.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/fused_attn.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
creating build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/distributed.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/cpp_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/recompute.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/constants.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/profile.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/fp8_buffer.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
creating build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/float8_tensor.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/attention.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
creating build/lib.linux-x86_64-3.8/transformer_engine/common
copying transformer_engine/common/recipe.py -> build/lib.linux-x86_64-3.8/transformer_engine/common
copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/common
copying transformer_engine/common/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/common
creating build/lib.linux-x86_64-3.8/transformer_engine/jax/praxis
copying transformer_engine/jax/praxis/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/praxis
copying transformer_engine/jax/praxis/module.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/praxis
copying transformer_engine/jax/praxis/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/praxis
creating build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
copying transformer_engine/jax/flax/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
creating build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/layernorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/base.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/layernorm_linear.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/attention.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/linear.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/layernorm_mlp.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
creating build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/activation.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/transpose.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/normalization.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/fused_attn.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/gemm.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/cast.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
creating build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/layernorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/rmsnorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/base.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/layernorm_linear.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/_common.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/linear.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/layernorm_mlp.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
running build_ext
Building CMake extension transformer_engine
Running command /opt/conda/bin/cmake -S /workspace/TransformerEngine/transformer_engine -B /tmp/tmpc_wa7krl -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/workspace/TransformerEngine/build/lib.linux-x86_64-3.8 -GNinja -Dpybind11_DIR=/opt/conda/lib/python3.8/site-packages/pybind11/share/cmake/pybind11
-- The CUDA compiler identification is NVIDIA 11.8.89
-- The CXX compiler identification is GNU 9.4.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.8.89")
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- cudnn found at /usr/lib/x86_64-linux-gnu/libcudnn.so.
-- cudnn_adv_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.
-- cudnn_adv_train found at /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.
-- cudnn_cnn_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.
-- cudnn_cnn_train found at /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.
-- cudnn_ops_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.
-- cudnn_ops_train found at /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.
-- Found CUDNN: /usr/include
-- cuDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
-- cuDNN: /usr/include
-- Found Python: /opt/conda/bin/python3.8 (found version "3.8.13") found components: Interpreter Development Development.Module Development.Embed
-- JAX support: OFF
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for transformer-engine
Running setup.py clean for transformer-engine
Failed to build transformer-engine
ERROR: Could not build wheels for transformer-engine, which is required to install pyproject.toml-based projects
The text was updated successfully, but these errors were encountered: