Skip to content

Build and Run on Summit

Cameron Smith edited this page May 21, 2021 · 12 revisions

This was last tested with the omegah fork joshia5:boundaryfields (based on scorec:master).

Profiling

Install libmeshb

To run the fun3d delta case Omega_h needs to rebuilt with libmeshb support.

libmeshb is here:

https://github.com/LoicMarechal/libMeshb

Installation requires Cmake and a C (and Fortran) compiler.

module swap xl gcc/7.4.0
module load cmake
module load cuda/10.1.243
git clone git@github.com:LoicMarechal/libMeshb.git
mkdir build-libMeshb
cd $_
cmake ../libMeshb/ -DCMAKE_C_COMPILER=gcc -DCMAKE_Fortran_COMPILER=gfortran -DCMAKE_INSTALL_PREFIX=$PWD/install
make install -j8

Download the input data

Clone the repo with the fun3d delta case. The repo is 527MB.

module load git-lfs
git lfs install
git clone git@github.com:UGAWG/parallel-adapt-results.git

Build Omega_h

git clone git@github.com:SCOREC/omega_h.git
cd omega_h
git checkout run_ugawg_delta_cuda

Create envSummitGcc7Cuda10.sh with the following contents:

module swap xl gcc/7.4.0
module load cmake
module load cuda/10.1.243
libmeshb=/gpfs/alpine/phy122/scratch/cwsmith/build-libmeshb-gcc7-summit/install
export CMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH:$libmeshb

Build

source envSummitGcc7Cuda10.sh
mkdir build-omegah-gcc7cuda10-summit
cd $_

cmake /path/to/omegah/source \
-DCMAKE_INSTALL_PREFIX=$PWD/install \
-DOmega_h_USE_MPI=on \
-DOmega_h_USE_CUDA=on \
-DOmega_h_USE_libMeshb=on \
-DCMAKE_CXX_COMPILER=g++ \
-DOmega_h_CUDA_ARCH=70 \
-DBUILD_TESTING=ON \
-DBUILD_TESTING=on \
-DMPIEXEC_EXECUTABLE=`which mpiexec`

make -j ugawg_hsc

Run using nvidia-nsight

Create a file named runDeltaProf.sh with the following contents. Edit the path passed to the source command and the paths for the bin and delta variables.

#!/bin/bash -x
#BSUB -P <projectId>
#BSUB -W 0:30
#BSUB -nnodes 1
#BSUB -alloc_flags "smt1"
#BSUB -J delta
#BSUB -o delta.%J
#BSUB -e delta.%J

source /path/to/envSummitGcc7Cuda10.sh
module load nsight-systems
bin=/path/to/build-omegah-gcc7cuda10-summit/src
delta=/path/to/parallel-adapt-results/delta-wing/fun3d-fv-lp2
mesh=$delta/delta50k.meshb
mach=$delta/delta50k-mach.solb 
run="jsrun -n 1 -a 1 -c 1 -g 1 nsys profile --stats=true" 
$run -o delta50k_cuda10 $bin/ugawg_hsc $mesh $mach $delta/delta50k-metric.solb 50k &> 50k.log
#uncomment the following to run a larger case
#$run -o delta500k_cuda10 $bin/ugawg_hsc $mesh $mach $delta/scaled-metric/delta500k-metric.solb 500k &> 500k.log

Make it executable:

chmod +x runDeltaProf.sh

Submit the job:

bsub ./runDeltaProf.sh

This should produce 50k.qdrep which can be viewed in the NVIDIA Nsight Systems tool:

https://developer.nvidia.com/nsight-systems

Run using --osh-time

Create a file named runDeltaOshTime.sh with the following contents. Edit the path passed to the source command and the paths for the bin and delta variables.

#!/bin/bash -x
#BSUB -P <projectId>
#BSUB -W 0:30
#BSUB -nnodes 1
#BSUB -alloc_flags "smt1"
#BSUB -J delta
#BSUB -o delta.%J
#BSUB -e delta.%J

source /path/to/envSummitGcc7Cuda10.sh
module load nsight-systems
bin=/path/to/build-omegah-gcc7cuda10-summit/src
delta=/path/to/parallel-adapt-results/delta-wing/fun3d-fv-lp2
mesh=$delta/delta50k.meshb
mach=$delta/delta50k-mach.solb
run="jsrun -n 1 -a 1 -c 1 -g 1"
for case in 50k 500k; do
  metric=$delta/scaled-metric/delta${case}-metric.solb
  [ "$case" == "50k" ] && metric=$delta/delta${case}-metric.solb
  for opt in time pool timePool; do
    arg=""
    [ "$opt" == "time" ] && arg="--osh-time" && export CUDA_LAUNCH_BLOCKING=1
    [ "$opt" == "pool" ] && arg="--osh-pool" && unset CUDA_LAUNCH_BLOCKING
    [ "$opt" == "timePool" ] && arg="--osh-time --osh-pool" && export CUDA_LAUNCH_BLOCKING=1 
    echo $case $arg
    $run $bin/ugawg_hsc $arg $mesh $mach $metric $case &> ${case}-${opt}.log
  done
done

Make it executable:

chmod +x runDeltaOshTime.sh

Submit the job:

bsub ./runDeltaOshTime.sh

Appended to the end of the runs using --osh-time will be a top-down and bottom-up tree of functions sorted by time in descending order.

The --osh-pool argument enables use of an internal memory pool instead of device runtime allocation calls.