You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is from trying to to update the spack package to 2.6.2 and provide NCCL/RCCL support, but it doesn't look as if it's related to spack. Building fails when I enable NCCL, but works without it; I'm puzzled why, as it must usually work.
The cmake args which fail (with openmpi-4.1.4, cuda-11.4.1, nccl-2.14.3-1) are
There are two different failures, depending on whether openmpi is built with C++ support.
With openmpi+cxx, the failure is
[ 83%] Linking CXX shared library libcosma.so
cd /tmp/mdehsdl3/spack-stage/spack-stage-cosma-2.6.2-neo24soctuz3gh5w75eoivfgvyykwk7v/spack-build-neo24so/src/cosma && /usr/bin/cmake -E cmake_link_script CMakeFiles/cosma.dir/link.txt --verbose=1
/nobackup/projects/bdman01/mdehsdl3/spack.clean/lib/spack/env/gcc/g++ -fPIC -O2 -g -DNDEBUG -Wl,-rpath -Wl,/nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/openmpi-4.1.4-jdxn55a26z4fhc2xtgq7hiihcehuxhgs/lib -Wl,-rpath -Wl,/nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/hwloc-2.8.0-bkqulonwqaazeatswgiw3y73tkxry2yo/lib -L/nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/hwloc-2.8.0-bkqulonwqaazeatswgiw3y73tkxry2yo/lib -pthread -shared -Wl,-soname,libcosma.so -o libcosma.so CMakeFiles/cosma.dir/blas.cpp.o CMakeFiles/cosma.dir/buffer.cpp.o CMakeFiles/cosma.dir/communicator.cpp.o CMakeFiles/cosma.dir/context.cpp.o CMakeFiles/cosma.dir/interval.cpp.o CMakeFiles/cosma.dir/layout.cpp.o CMakeFiles/cosma.dir/local_multiply.cpp.o CMakeFiles/cosma.dir/mapper.cpp.o CMakeFiles/cosma.dir/math_utils.cpp.o CMakeFiles/cosma.dir/matrix.cpp.o CMakeFiles/cosma.dir/memory_pool.cpp.o CMakeFiles/cosma.dir/multiply.cpp.o CMakeFiles/cosma.dir/one_sided_communicator.cpp.o CMakeFiles/cosma.dir/strategy.cpp.o CMakeFiles/cosma.dir/two_sided_communicator.cpp.o CMakeFiles/cosma.dir/cinterface.cpp.o CMakeFiles/cosma.dir/environment_variables.cpp.o CMakeFiles/cosma.dir/pinned_buffers.cpp.o CMakeFiles/cosma.dir/gpu/nccl_utils.cpp.o CMakeFiles/cosma.dir/gpu/gpu_aware_mpi_utils.cpp.o -Wl,-rpath,/tmp/mdehsdl3/spack-stage/spack-stage-cosma-2.6.2-neo24soctuz3gh5w75eoivfgvyykwk7v/spack-build-neo24so/libs/COSTA/src/costa:/tmp/mdehsdl3/spack-stage/spack-stage-cosma-2.6.2-neo24soctuz3gh5w75eoivfgvyykwk7v/spack-build-neo24so/libs/Tiled-MM/src/Tiled-MM:/nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/nccl-2.14.3-1-anhrq6463uiydo7xfah7tmhcrrup4zfb/lib:/nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/openmpi-4.1.4-jdxn55a26z4fhc2xtgq7hiihcehuxhgs/lib: ../../libs/COSTA/src/costa/libcosta.so ../../libs/Tiled-MM/src/Tiled-MM/libTiled-MM.so /nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/nccl-2.14.3-1-anhrq6463uiydo7xfah7tmhcrrup4zfb/lib/libnccl.so /nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/openmpi-4.1.4-jdxn55a26z4fhc2xtgq7hiihcehuxhgs/lib/libmpi_cxx.so /nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/openmpi-4.1.4-jdxn55a26z4fhc2xtgq7hiihcehuxhgs/lib/libmpi.so /usr/lib/gcc/ppc64le-redhat-linux/8/libgomp.so /usr/lib64/libpthread.so /opt/software/builder/developers/compilers/cuda/11.4.1/1/default/lib64/libcublas.so /opt/software/builder/developers/compilers/cuda/11.4.1/1/default/lib64/libcudart.so
CMakeFiles/cosma.dir/gpu/gpu_aware_mpi_utils.cpp.o: In function `cosma::gpu::check_runtime_status(cudaError)':
/nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/openmpi-4.1.4-jdxn55a26z4fhc2xtgq7hiihcehuxhgs/include/openmpi/ompi/mpi/cxx/intracomm_inln.h:102: multiple definition of `cosma::gpu::check_runtime_status(cudaError)'
CMakeFiles/cosma.dir/gpu/nccl_utils.cpp.o:/tmp/mdehsdl3/spack-stage/spack-stage-cosma-2.6.2-neo24soctuz3gh5w75eoivfgvyykwk7v/spack-src/src/cosma/gpu/utils.hpp:7: first defined here
collect2: error: ld returned 1 exit status
make[2]: *** [src/cosma/CMakeFiles/cosma.dir/build.make:413: src/cosma/libcosma.so] Error 1
make[2]: Leaving directory '/tmp/mdehsdl3/spack-stage/spack-stage-cosma-2.6.2-neo24soctuz3gh5w75eoivfgvyykwk7v/spack-build-neo24so'
and without cxx it's
[ 83%] Linking CXX shared library libcosma.so
cd /tmp/mdehsdl3/spack-stage/spack-stage-cosma-2.6.2-iy3pxeya5oy7n52rsyyzx2zjzv2qry5g/spack-build-iy3pxey/src/cosma && /usr/bin/cmake -E cmake_link_script CMakeFiles/cosma.dir/link.txt --verbose=1
/nobackup/projects/bdman01/mdehsdl3/spack.clean/lib/spack/env/gcc/g++ -fPIC -O2 -g -DNDEBUG -Wl,-rpath -Wl,/nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/openmpi-4.1.4-tngp6b2qcx64wd7ndf53dmdeovlmui4h/lib -Wl,-rpath -Wl,/nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/hwloc-2.8.0-bkqulonwqaazeatswgiw3y73tkxry2yo/lib -L/nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/hwloc-2.8.0-bkqulonwqaazeatswgiw3y73tkxry2yo/lib -pthread -shared -Wl,-soname,libcosma.so -o libcosma.so CMakeFiles/cosma.dir/blas.cpp.o CMakeFiles/cosma.dir/buffer.cpp.o CMakeFiles/cosma.dir/communicator.cpp.o CMakeFiles/cosma.dir/context.cpp.o CMakeFiles/cosma.dir/interval.cpp.o CMakeFiles/cosma.dir/layout.cpp.o CMakeFiles/cosma.dir/local_multiply.cpp.o CMakeFiles/cosma.dir/mapper.cpp.o CMakeFiles/cosma.dir/math_utils.cpp.o CMakeFiles/cosma.dir/matrix.cpp.o CMakeFiles/cosma.dir/memory_pool.cpp.o CMakeFiles/cosma.dir/multiply.cpp.o CMakeFiles/cosma.dir/one_sided_communicator.cpp.o CMakeFiles/cosma.dir/strategy.cpp.o CMakeFiles/cosma.dir/two_sided_communicator.cpp.o CMakeFiles/cosma.dir/cinterface.cpp.o CMakeFiles/cosma.dir/environment_variables.cpp.o CMakeFiles/cosma.dir/pinned_buffers.cpp.o CMakeFiles/cosma.dir/gpu/nccl_utils.cpp.o CMakeFiles/cosma.dir/gpu/gpu_aware_mpi_utils.cpp.o -Wl,-rpath,/tmp/mdehsdl3/spack-stage/spack-stage-cosma-2.6.2-iy3pxeya5oy7n52rsyyzx2zjzv2qry5g/spack-build-iy3pxey/libs/COSTA/src/costa:/tmp/mdehsdl3/spack-stage/spack-stage-cosma-2.6.2-iy3pxeya5oy7n52rsyyzx2zjzv2qry5g/spack-build-iy3pxey/libs/Tiled-MM/src/Tiled-MM:/nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/nccl-2.14.3-1-anhrq6463uiydo7xfah7tmhcrrup4zfb/lib:/nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/openmpi-4.1.4-tngp6b2qcx64wd7ndf53dmdeovlmui4h/lib: ../../libs/COSTA/src/costa/libcosta.so ../../libs/Tiled-MM/src/Tiled-MM/libTiled-MM.so /nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/nccl-2.14.3-1-anhrq6463uiydo7xfah7tmhcrrup4zfb/lib/libnccl.so /nobackup/projects/bdman01/mdehsdl3/spack.clean/opt/spack/linux-rhel8-power9le/gcc-8.5.0/openmpi-4.1.4-tngp6b2qcx64wd7ndf53dmdeovlmui4h/lib/libmpi.so /usr/lib/gcc/ppc64le-redhat-linux/8/libgomp.so /usr/lib64/libpthread.so /opt/software/builder/developers/compilers/cuda/11.4.1/1/default/lib64/libcublas.so /opt/software/builder/developers/compilers/cuda/11.4.1/1/default/lib64/libcudart.so
CMakeFiles/cosma.dir/gpu/gpu_aware_mpi_utils.cpp.o: In function `cosma::gpu::check_runtime_status(cudaError)':
/tmp/mdehsdl3/spack-stage/spack-stage-cosma-2.6.2-iy3pxeya5oy7n52rsyyzx2zjzv2qry5g/spack-src/src/cosma/gpu/utils.hpp:7: multiple definition of `cosma::gpu::check_runtime_status(cudaError)'
CMakeFiles/cosma.dir/gpu/nccl_utils.cpp.o:/tmp/mdehsdl3/spack-stage/spack-stage-cosma-2.6.2-iy3pxeya5oy7n52rsyyzx2zjzv2qry5g/spack-src/src/cosma/gpu/utils.hpp:7: first defined here
collect2: error: ld returned 1 exit status
make[2]: *** [src/cosma/CMakeFiles/cosma.dir/build.make:412: src/cosma/libcosma.so] Error 1
make[2]: Leaving directory '/tmp/mdehsdl3/spack-stage/spack-stage-cosma-2.6.2-iy3pxeya5oy7n52rsyyzx2zjzv2qry5g/spack-build-iy3pxey'
By the way, as something else to add, what exactly does COSMA_WITH_GPU_AWARE_MPI mean? In the case of openmpi, it could be configuring --with-cuda and/or using a UCX built with cuda and/or gdrcopy.
The text was updated successfully, but these errors were encountered:
This is from trying to to update the spack package to 2.6.2 and provide NCCL/RCCL support, but it doesn't look as if it's related to spack. Building fails when I enable NCCL, but works without it; I'm puzzled why, as it must usually work.
The cmake args which fail (with openmpi-4.1.4, cuda-11.4.1, nccl-2.14.3-1) are
It succeeds when -DCOSMA_WITH_NCCL=ON is removed.
There are two different failures, depending on whether openmpi is built with C++ support.
With openmpi+cxx, the failure is
and without cxx it's
By the way, as something else to add, what exactly does COSMA_WITH_GPU_AWARE_MPI mean? In the case of openmpi, it could be configuring --with-cuda and/or using a UCX built with cuda and/or gdrcopy.
The text was updated successfully, but these errors were encountered: