Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question for building wheel for transformer-engine #516

Open
Mrzhang-dada opened this issue Nov 13, 2023 · 6 comments
Open

question for building wheel for transformer-engine #516

Mrzhang-dada opened this issue Nov 13, 2023 · 6 comments

Comments

@Mrzhang-dada
Copy link

I don't know why this error occurs, can anyone help me solve this problem.
My system environment is ubuntu 20.04, python3.8, cuda11.8.

root@f77be2ea35c2:/workspace/TransformerEngine# pip install .
WARNING: Ignoring invalid distribution -ransformer-engine (/opt/conda/lib/python3.8/site-packages)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing /workspace/TransformerEngine
Preparing metadata (setup.py) ... done
Requirement already satisfied: pydantic in /opt/conda/lib/python3.8/site-packages (from transformer-engine==1.1.0.dev0+64a3d1d) (1.8.2)
Requirement already satisfied: torch in /opt/conda/lib/python3.8/site-packages (from transformer-engine==1.1.0.dev0+64a3d1d) (1.13.0a0+d0d6b1f)
Requirement already satisfied: flash-attn<=2.0.4,>=1.0.6 in /opt/conda/lib/python3.8/site-packages (from transformer-engine==1.1.0.dev0+64a3d1d) (2.0.4)
Requirement already satisfied: einops in /opt/conda/lib/python3.8/site-packages (from flash-attn<=2.0.4,>=1.0.6->transformer-engine==1.1.0.dev0+64a3d1d) (0.7.0)
Requirement already satisfied: packaging in /opt/conda/lib/python3.8/site-packages (from flash-attn<=2.0.4,>=1.0.6->transformer-engine==1.1.0.dev0+64a3d1d) (21.3)
Requirement already satisfied: ninja in /opt/conda/lib/python3.8/site-packages (from flash-attn<=2.0.4,>=1.0.6->transformer-engine==1.1.0.dev0+64a3d1d) (1.11.1.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.8/site-packages (from pydantic->transformer-engine==1.1.0.dev0+64a3d1d) (4.3.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging->flash-attn<=2.0.4,>=1.0.6->transformer-engine==1.1.0.dev0+64a3d1d) (3.0.9)
Building wheels for collected packages: transformer-engine
Building wheel for transformer-engine (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [416 lines of output]
/opt/conda/lib/python3.8/site-packages/setuptools/dist.py:490: UserWarning: Normalizing '1.1.0dev+64a3d1d' to '1.1.0.dev0+64a3d1d'
warnings.warn(tmpl.format(**locals()))
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/transformer_engine
copying transformer_engine/init.py -> build/lib.linux-x86_64-3.8/transformer_engine
creating build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/cpp_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/mlp.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/fused_attn.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
creating build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/distributed.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/cpp_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/recompute.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/constants.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/profile.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/fp8_buffer.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
copying transformer_engine/paddle/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle
creating build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/float8_tensor.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/attention.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
creating build/lib.linux-x86_64-3.8/transformer_engine/common
copying transformer_engine/common/recipe.py -> build/lib.linux-x86_64-3.8/transformer_engine/common
copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/common
copying transformer_engine/common/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/common
creating build/lib.linux-x86_64-3.8/transformer_engine/jax/praxis
copying transformer_engine/jax/praxis/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/praxis
copying transformer_engine/jax/praxis/module.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/praxis
copying transformer_engine/jax/praxis/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/praxis
creating build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
copying transformer_engine/jax/flax/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
creating build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/layernorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/base.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/layernorm_linear.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/attention.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/linear.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/layernorm_mlp.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer
creating build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/activation.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/transpose.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/normalization.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/fused_attn.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/gemm.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/cast.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions
creating build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/layernorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/rmsnorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/base.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/layernorm_linear.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/_common.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/linear.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/layernorm_mlp.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module
running build_ext
Building CMake extension transformer_engine
Running command /opt/conda/bin/cmake -S /workspace/TransformerEngine/transformer_engine -B /tmp/tmpc_wa7krl -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/workspace/TransformerEngine/build/lib.linux-x86_64-3.8 -GNinja -Dpybind11_DIR=/opt/conda/lib/python3.8/site-packages/pybind11/share/cmake/pybind11
-- The CUDA compiler identification is NVIDIA 11.8.89
-- The CXX compiler identification is GNU 9.4.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.8.89")
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- cudnn found at /usr/lib/x86_64-linux-gnu/libcudnn.so.
-- cudnn_adv_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.
-- cudnn_adv_train found at /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.
-- cudnn_cnn_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.
-- cudnn_cnn_train found at /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.
-- cudnn_ops_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.
-- cudnn_ops_train found at /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.
-- Found CUDNN: /usr/include
-- cuDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
-- cuDNN: /usr/include
-- Found Python: /opt/conda/bin/python3.8 (found version "3.8.13") found components: Interpreter Development Development.Module Development.Embed
-- JAX support: OFF
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:

      pybind11_DIR


  -- Build files have been written to: /tmp/tmpc_wa7krl
  Running command /opt/conda/bin/cmake --build /tmp/tmpc_wa7krl
  [1/29] Building CXX object common/CMakeFiles/transformer_engine.dir/util/cuda_driver.cpp.o
  [2/29] Building CXX object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_api.cpp.o
  [3/29] Building CXX object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_api.cpp.o
  [4/29] Building CXX object common/CMakeFiles/transformer_engine.dir/util/cuda_runtime.cpp.o
  [5/29] Building CXX object common/CMakeFiles/transformer_engine.dir/util/system.cpp.o
  [6/29] Building CXX object common/CMakeFiles/transformer_engine.dir/transformer_engine.cpp.o
  [7/29] Building CXX object common/CMakeFiles/transformer_engine.dir/util/rtc.cpp.o
  [8/29] Building CXX object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn.cpp.o
  [9/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/gemm/cublaslt_gemm.cu.o
  /workspace/TransformerEngine/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used

  /workspace/TransformerEngine/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used

  /workspace/TransformerEngine/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used

  /workspace/TransformerEngine/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used

  [10/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/util/cast.cu.o
  [11/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_upper_triang_masked_softmax.cu.o
  [12/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
  [13/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.o
  [14/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o
  [15/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o
  [16/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/transpose.cu.o
  [17/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_masked_softmax.cu.o
  [18/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_bwd_semi_cuda_kernel.cu.o
  [19/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/transpose_fusion.cu.o
  [20/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/swiglu.cu.o
  [21/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/relu.cu.o
  [22/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/gelu.cu.o
  [23/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/cast_transpose.cu.o
  [24/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_bwd_semi_cuda_kernel.cu.o
  [25/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_fwd_cuda_kernel.cu.o
  [26/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/multi_cast_transpose.cu.o
  [27/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o
  [28/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_fwd_cuda_kernel.cu.o
  [29/29] Linking CXX shared library common/libtransformer_engine.so
  Running command /opt/conda/bin/cmake --install /tmp/tmpc_wa7krl
  -- Install configuration: "Release"
  -- Installing: /workspace/TransformerEngine/build/lib.linux-x86_64-3.8/./libtransformer_engine.so
  -- Set runtime path of "/workspace/TransformerEngine/build/lib.linux-x86_64-3.8/./libtransformer_engine.so" to ""
  building 'transformer_engine_extensions' extension
  creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8
  creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace
  creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine
  creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine
  creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch
  creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc
  creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions
  Emitting ninja build file /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/build.ninja...
  Compiling objects...
  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  [1/11] /usr/local/cuda/bin/nvcc  -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
  FAILED: /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.o
  /usr/local/cuda/bin/nvcc  -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
  /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.cu(69): error: expression must have class type but it has type "uint64_t"

  /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.cu(73): error: expression must have class type but it has type "uint64_t"

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  2 errors detected in the compilation of "/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.cu".
  [2/11] c++ -MMD -MF /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/ts_fp8_op.o.d -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/ts_fp8_op.cpp -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/ts_fp8_op.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
  In file included from /workspace/TransformerEngine/transformer_engine/common/util/logging.h:17,
                   from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/common.h:34,
                   from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions.h:7,
                   from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/ts_fp8_op.cpp:8:
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
     36 |   (..., (str += to_string_like(args)));
        |                                     ^
  [3/11] c++ -MMD -MF /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.o.d -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
  In file included from /workspace/TransformerEngine/transformer_engine/common/util/logging.h:17,
                   from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/../common.h:34,
                   from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/../extensions.h:7,
                   from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:7:
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
     36 |   (..., (str += to_string_like(args)));
        |                                     ^
  In file included from /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/Exceptions.h:13,
                   from /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/python.h:11,
                   from /opt/conda/lib/python3.8/site-packages/torch/include/torch/extension.h:6,
                   from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/../common.h:31,
                   from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/../extensions.h:7,
                   from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:7:
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<transformer_engine::FP8TensorMeta>’:
  /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:84:67:   required from here
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<transformer_engine::FP8TensorMeta>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
   1479 | class class_ : public detail::generic_type {
        |       ^~~~~~
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<transformer_engine::DType>’:
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7:   required from ‘class pybind11::enum_<transformer_engine::DType>’
  /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:121:70:   required from here
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<transformer_engine::DType>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<transformer_engine::FP8FwdTensors>’:
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7:   required from ‘class pybind11::enum_<transformer_engine::FP8FwdTensors>’
  /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:130:66:   required from here
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<transformer_engine::FP8FwdTensors>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<transformer_engine::FP8BwdTensors>’:
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7:   required from ‘class pybind11::enum_<transformer_engine::FP8BwdTensors>’
  /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:141:66:   required from here
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<transformer_engine::FP8BwdTensors>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<NVTE_Bias_Type>’:
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7:   required from ‘class pybind11::enum_<NVTE_Bias_Type>’
  /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:149:48:   required from here
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<NVTE_Bias_Type>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<NVTE_Mask_Type>’:
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7:   required from ‘class pybind11::enum_<NVTE_Mask_Type>’
  /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:154:48:   required from here
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<NVTE_Mask_Type>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<NVTE_QKV_Layout>’:
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7:   required from ‘class pybind11::enum_<NVTE_QKV_Layout>’
  /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:159:50:   required from here
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<NVTE_QKV_Layout>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<NVTE_Fused_Attn_Backend>’:
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7:   required from ‘class pybind11::enum_<NVTE_Fused_Attn_Backend>’
  /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:179:66:   required from here
  /opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<NVTE_Fused_Attn_Backend>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
  [4/11] /usr/local/cuda/bin/nvcc  -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/misc.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/misc.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
     36 |   (..., (str += to_string_like(args)));
        |                                     ^
  [5/11] /usr/local/cuda/bin/nvcc  -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/activation.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/activation.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
     36 |   (..., (str += to_string_like(args)));
        |                                     ^
  [6/11] /usr/local/cuda/bin/nvcc  -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/transpose.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/transpose.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
     36 |   (..., (str += to_string_like(args)));
        |                                     ^
  [7/11] /usr/local/cuda/bin/nvcc  -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/gemm.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/gemm.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
     36 |   (..., (str += to_string_like(args)));
        |                                     ^
  [8/11] /usr/local/cuda/bin/nvcc  -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/common.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/common.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
     36 |   (..., (str += to_string_like(args)));
        |                                     ^
  [9/11] /usr/local/cuda/bin/nvcc  -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/cast.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/cast.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
     36 |   (..., (str += to_string_like(args)));
        |                                     ^
  [10/11] /usr/local/cuda/bin/nvcc  -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/normalization.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/normalization.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
     36 |   (..., (str += to_string_like(args)));
        |                                     ^
  [11/11] /usr/local/cuda/bin/nvcc  -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/softmax.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/softmax.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode

  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
  /workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
     36 |   (..., (str += to_string_like(args)));
        |                                     ^
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1897, in _run_ninja_build
      subprocess.run(
    File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/workspace/TransformerEngine/setup.py", line 626, in <module>
      main()
    File "/workspace/TransformerEngine/setup.py", line 611, in main
      setuptools.setup(
    File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
      return distutils.core.setup(**attrs)
    File "/opt/conda/lib/python3.8/distutils/core.py", line 148, in setup
      dist.run_commands()
    File "/opt/conda/lib/python3.8/distutils/dist.py", line 966, in run_commands
      self.run_command(cmd)
    File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/opt/conda/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 299, in run
      self.run_command('build')
    File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/opt/conda/lib/python3.8/distutils/command/build.py", line 135, in run
      self.run_command(cmd_name)
    File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/workspace/TransformerEngine/setup.py", line 403, in run
      super().run()
    File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
      _build_ext.run(self)
    File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
      _build_ext.build_ext.run(self)
    File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 340, in run
      self.build_extensions()
    File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 839, in build_extensions
      build_ext.build_extensions(self)
    File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
      _build_ext.build_ext.build_extensions(self)
    File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
      self._build_extensions_serial()
    File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
      self.build_extension(ext)
    File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
      _build_ext.build_extension(self, ext)
    File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
      objects = self.compiler.compile(sources,
    File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 654, in unix_wrap_ninja_compile
      _write_ninja_file_and_compile_objects(
    File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1569, in _write_ninja_file_and_compile_objects
      _run_ninja_build(
    File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1913, in _run_ninja_build
      raise RuntimeError(message) from e
  RuntimeError: Error compiling objects for extension
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for transformer-engine
Running setup.py clean for transformer-engine
Failed to build transformer-engine
ERROR: Could not build wheels for transformer-engine, which is required to install pyproject.toml-based projects

@ptrendx
Copy link
Member

ptrendx commented Nov 16, 2023

Could you try pyTorch 1.14 (or anything 2.x) instead? I believe pyTorch changed the random number generator C++ API between 1.13 and 1.14 which could cause this error.

@osainz59
Copy link

Same here, but in my case the process stops at:

[28/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_fwd_cuda_kernel.cu.o
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/scratch_local/pip-req-build-10hsu61s/setup.py", line 353, in _build_cmake
          subprocess.run(command, cwd=build_dir, check=True)
        File "/leonardo/prod/spack/03/install/0.19/linux-rhel8-icelake/gcc-11.3.0/python-3.10.8-eauysn2mronkqqffs7r6bvftsdpsfm4b/lib/python3.10/subprocess.py", line 526, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', '/scratch_local/tmpm8306yxu']' returned non-zero exit status 1.

@ionutmodo
Copy link

any updates on this? I am facing more or less the same issue. CMAKE fails with an error saying Could NOT find CUDNN (missing: CUDNN_INCLUDE_DIR CUDNN_LIBRARY). However, these variables are set:

$ echo $CUDNN_LIBRARY
/mnt/nfs/clustersw/shared/cuda/cudnn-linux-x86_64-8.9.0.131_cuda12-archive/lib/libcudnn.so

$ echo $CUDNN_INCLUDE_DIR
/mnt/nfs/clustersw/shared/cuda/cudnn-linux-x86_64-8.9.0.131_cuda12-archive/include

Please help me solve this problem.

@qwerfdsadad
Copy link

qwerfdsadad commented Jul 11, 2024

@Mrzhang-dada @ionutmodo @osainz59
Have you solve this problem? I'm facing the same problem.

the environment configuration shows blow:

NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2

nvcc -V : 11.8

cuDNN: 8.9.6

torch 1.13.0+cu116
torchaudio 0.13.0+cu116
torchsummary 1.5.1
torchvision 0.14.0+cu116

@ican24
Copy link

ican24 commented Sep 24, 2024

Same trouble on 2 machines with Geforce RTX 2080 TI 12GB and Geforce RTX 4090 24GB cards.
I spent 3-4 days on testing with different configurations without results .

The environment configuration shows blow:

NVIDIA-SMI 550.54.14
Driver Version: 550.54.14
CUDA Version: 12.4

nvcc -V :
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

cuDNN: 8.9.6
torch 2.4.1
torchvision 0.19.1

It is a horrible job to try to install TransformerEngine.
I need it to develop punctuation and capitalization module.

RuntimeError: Error when running CMake: Command '['/home/deep/.local/lib/python3.10/site-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-o6ytpyoz/transformer_engine/common', '-B', '/tmp/pip-req-build-o6ytpyoz/build/cmake', '-DPython_EXECUTABLE=/usr/bin/python3', '-DPython_INCLUDE_DIR=/usr/local/include/python3.10', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-o6ytpyoz/build/lib.linux-x86_64-cpython-310', '-Dpybind11_DIR=/home/deep/.local/lib/python3.10/site-packages/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.

@timmoon10
Copy link
Collaborator

timmoon10 commented Oct 11, 2024

Please try these suggestions: #355 (comment)

It may also be worth considering using an NGC PyTorch container, which includes TE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants