-
Notifications
You must be signed in to change notification settings - Fork 101
QUDA on Perlmutter
Instructions last verified on June 7, 2022. Since Perlmutter is still a preproduction series these instructions may need to change at any time. Please contact us on the QUDA slack if they do not work
Due to the Cray MPI wrappers, some care is needed to set up a build environment and help QUDA's cmake build (and MILC's Makefile
) properly find MPI. The following environment will load CUDA 11.5, gcc 11.2.0, and cmake 3.22, plus set other useful environment variables:
module purge
module load PrgEnv-gnu
module load cmake
module load cudatoolkit
module load craype-accel-nvidia80
export MPICH_GPU_SUPPORT_ENABLED=1
export CRAY_CPU_TARGET=x86-64
export CC=$(which cc)
export CXX=$(which CC)
export MPI_HOME=$MPICH_DIR
export MPI_CXX_COMPILER=$(which CC)
export MPI_CXX_COMPILER_FLAGS=$(CC --cray-print-opts=all)
With the previous environment variables in place, compiling QUDA is relatively straightforward. A reference QUDA installation that automatically downloads+builds QMP plus QIO, and includes the necessary bits to be used with MILC, is:
WORKING_DIRECTORY=$(pwd)
git clone --branch develop https://github.com/lattice/quda && mkdir build && cd build
cmake \
-DCMAKE_BUILD_TYPE=RELEASE \
-DQUDA_GPU_ARCH=sm_80 \
-DQUDA_DIRAC_DEFAULT_OFF=ON \
-DQUDA_DIRAC_STAGGERED=ON \
-DQUDA_QMP=ON \
-DQUDA_QIO=ON \
-DQUDA_DOWNLOAD_USQCD=ON \
../quda
make -j install
cd $WORKING_DIRECTORY
The MILC+QUDA helper scripts that come with MILC currently need to be modified to work on Perlmutter. For simplicity, we include raw commands below, which will be updated once the compile_*
scripts have been updated.
MILC can be downloaded as
git clone --branch develop https://github.com/milc-qcd/milc_qcd
Compiling MILC with QMP+QIO+QUDA requires finding the CUDA path, as well as the directories to the QMP+QIO+QUDA installs. These can be found via
# Automated method to find the path to CUDA
PATH_TO_CUDA=$(which nvcc)
PATH_TO_CUDA=${PATH_TO_CUDA/\bin\/nvcc/}
# Paths to QUDA, QIO, QMP
PATH_TO_QUDA="${WORKING_DIRECTORY}/build/usqcd"
PATH_TO_QMP=$PATH_TO_QUDA
PATH_TO_QIO=$PATH_TO_QUDA
MILC RHMC can be compiled from the Makefile as:
> cd ${WORKING_DIRECTORY}/milc_qcd/ks_imp_rhmc
> cp ../Makefile .
> MY_CC=cc \
MY_CXX=CC \
CUDA_HOME=${PATH_TO_CUDA} \
QUDA_HOME=${PATH_TO_QUDA} \
WANTQUDA=true \
WANT_FN_CG_GPU=true \
WANT_FL_GPU=true \
WANT_GF_GPU=true \
WANT_FF_GPU=true \
WANT_MIXED_PRECISION_GPU=2 \
PRECISION=2 \
MPP=true \
OMP=true \
WANTQIO=true \
WANTQMP=true \
QIOPAR=${PATH_TO_QIO} \
QMPPAR=${PATH_TO_QMP} \
PATH_TO_NVHPCSDK="" \
make -j 1 su3_rhmd_hisq
The MILC spectrum measurement executable can be built as:
> cd ${WORKING_DIRECTORY}/milc_qcd/ks_spectrum
> cp ../Makefile .
> MY_CC=cc \
MY_CXX=CC \
CUDA_HOME=${PATH_TO_CUDA} \
QUDA_HOME=${PATH_TO_QUDA} \
WANTQUDA=true \
WANT_FN_CG_GPU=true \
WANT_FL_GPU=true \
WANT_GF_GPU=true \
WANT_FF_GPU=true \
WANT_MIXED_PRECISION_GPU=2 \
PRECISION=2 \
MPP=true \
OMP=true \
WANTQIO=true \
WANTQMP=true \
QIOPAR=${PATH_TO_QIO} \
QMPPAR=${PATH_TO_QMP} \
PATH_TO_NVHPCSDK="" \
CGEOM="-DFIX_NODE_GEOM -DFIX_IONODE_GEOM" \
KSCGMULTI="-DKS_MULTICG=HYBRID -DMULTISOURCE" \
make -j 1 ks_spectrum_hisq
Cray MPI requires various environment variable flags to run. These are subject to change, but for now a viable set is:
export QUDA_ENABLE_GDR=1
export MPICH_RDMA_ENABLED_CUDA=1
export MPICH_GPU_SUPPORT_ENABLED=1
export MPICH_NEMESIS_ASYNC_PROGRESS=1
export OMP_NUM_THREADS=16
export SLURM_CPU_BIND=cores
export CRAY_ACCEL_TARGET=nvidia80
An interactive node on Perlmutter, after adding your account, can be acquired via:
salloc -A m[####]_g -C gpu -t 20 -N 1 --tasks-per-node 4 --gpus 4 --qos interactive
Make sure that your environment matches the environment defined on the top of this page. Further, the USQCD library paths should be added to your LD_LIBRARY_PATH
when running MILC. The USQCD libraries exist in [path to QUDA build]/usqcd/lib
.
QUDA's test executables can be run from the interactive node via srun
. A reference command for staggered_invert_test
on 1 GPU is straightforward and given by:
srun -n 1 ./staggered_invert_test
Likewise, a 4 GPU run with a 1x1x2x2
decomposition is given by:
srun -n 4 ./staggered_invert_test --gridsize 1 1 2 2
WIP