From 45fcfd6646d3e9958b07a217b880cdaf731d09fc Mon Sep 17 00:00:00 2001 From: Geoffroy Lesur Date: Tue, 22 Oct 2024 09:50:48 +0200 Subject: [PATCH 1/2] fix arch for nvidia architectures --- doc/source/reference/makefile.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/reference/makefile.rst b/doc/source/reference/makefile.rst index bf8ecdad..094b7533 100644 --- a/doc/source/reference/makefile.rst +++ b/doc/source/reference/makefile.rst @@ -146,13 +146,13 @@ We recommend the following modules and environement variables on Jean Zay: .. code-block:: bash - -DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_VOLTA70=ON -DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF + -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_VOLTA70=ON -DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF While Ampere A100 GPUs are enabled with .. code-block:: bash - -DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_AMPERE80=ON -DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF + -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_AMPERE80=ON -DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF MPI (multi-GPU) can be enabled by adding ``-DIdefix_MPI=ON`` as usual. The malloc async option is here to prevent a bug when using PSM2 with async Cuda malloc possibly leading to openmpi crash or hangs on Jean Zay. From 46d19732f2a021acfc527765fb350c2e4abf60a2 Mon Sep 17 00:00:00 2001 From: Geoffroy Lesur Date: Thu, 24 Oct 2024 09:36:46 +0200 Subject: [PATCH 2/2] update JZ documentation --- doc/source/reference/makefile.rst | 40 ++++++++++++++++++++++++------- 1 file changed, 32 insertions(+), 8 deletions(-) diff --git a/doc/source/reference/makefile.rst b/doc/source/reference/makefile.rst index 094b7533..4f948e5d 100644 --- a/doc/source/reference/makefile.rst +++ b/doc/source/reference/makefile.rst @@ -130,32 +130,56 @@ Finally, *Idefix* can be configured to run on Mi250 by enabling HIP and the desi MPI (multi-GPU) can be enabled by adding ``-DIdefix_MPI=ON`` as usual. -Jean Zay at IDRIS, Nvidia V100 and A100 GPUs --------------------------------------------- +Jean Zay at IDRIS, Nvidia V100/A100/H100 GPUs +--------------------------------------------- -We recommend the following modules and environement variables on Jean Zay: +We recommend the following modules and environement variables on Jean Zay V100/A100: .. code-block:: bash + module load arch/a100 # ONLY forA100 module load cuda/12.1.0 module load gcc/12.2.0 module load openmpi/4.1.1-cuda - module load cmake/3.18.0 + module load cmake/3.25.2 + +While for H100: + +.. code-block:: bash + + module load arch/h100 + module load cmake/3.30.1 + module load cuda/12.1.0 + module load openmpi/4.1.5-cuda *Idefix* can then be configured to run on Nvidia V100 with the following options to ccmake: .. code-block:: bash - -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_VOLTA70=ON -DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF + -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_VOLTA70=ON While Ampere A100 GPUs are enabled with .. code-block:: bash - -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_AMPERE80=ON -DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF + -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_AMPERE80=ON + +And for H100 GPUS: + +.. code-block:: bash + + -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_HOPPER90=ON + + +MPI (multi-GPU) can be enabled by adding ``-DIdefix_MPI=ON`` as usual. + + +.. warning:: + + As of *Idefix* 2.1.02, we automatically disable Cuda Malloc async (``-DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF``). However, earlier versions of + *Idefix* requires this flag when calling cmake to prevent a bug when using PSM2 with async Cuda malloc possibly leading to openmpi crash or hangs on Jean Zay. + -MPI (multi-GPU) can be enabled by adding ``-DIdefix_MPI=ON`` as usual. The malloc async option is here to prevent a bug when using PSM2 with async -Cuda malloc possibly leading to openmpi crash or hangs on Jean Zay. .. _setupSpecificOptions: