From 2a0501647b2661321434306a9c785b3d27ad1b4a Mon Sep 17 00:00:00 2001 From: Soren Rasmussen Date: Tue, 23 Jan 2024 10:35:54 -0700 Subject: [PATCH] documentation: adding general and Derecho specific debugging information --- docs/compiling.md | 51 +++++++++++++++++++++++++++++++++++------------ 1 file changed, 38 insertions(+), 13 deletions(-) diff --git a/docs/compiling.md b/docs/compiling.md index 724887a5..8a662f1c 100644 --- a/docs/compiling.md +++ b/docs/compiling.md @@ -39,27 +39,52 @@ See [instructions](running/running.md#derecho-system-specifics) for running ICAR #### Cray Compiler ``` bash -module load cce ncarcompilers cray-mpich netcdf fftw +module load ncarenv cce cray-mpich netcdf fftw make -j 4 ``` #### GNU Compiler ``` bash -module load gcc ncarcompilers cray-mpich netcdf fftw caf/derecho-2.10.1 +module load gcc ncarenv/23.09 cray-mpich netcdf fftw opencoarrays COMPILER=gnu make -j 4 ``` -#### Intel Compilers - - __Note__: currently the classic Intel compiler is recommended for production runs but testing `ifx` and reporting issues would be useful. - - __Note__: test `debugslow` mode is required for successful running and compilation, issue is being worked on. -##### Classic -``` bash -module load intel-classic ncarcompilers intel-mpi netcdf-mpi fftw-mpi -COMPILER=intel MODE=debugslow make -j 4 -``` -##### OneAPI +### Derecho Debugging +When running ICAR on larger domains there is a chance the program will run out of memory available for coarrays. +This may be difficult to diagnose because the program will stop without outputting any information but the following steps might help. +*NOTE*: The following debugging information is only for Cray compilers, GNU's OpenCoarrays is not setup to use the SHMEM library. + +#### Set Helpful Debug Variables +Set the following variables, further detail and more variables can be found in the [Cray OpenSHMEMX](https://cray-openshmemx.readthedocs.io/en/latest/intro_shmem.html#cray-openshmemx-setup-and-running-specific-environment-variables) documentation. +* `SHMEM_MEMINFO_DISPLAY=1` to display information about the job's memory allocation during initialization +* `SHMEM_ABORT_ON_ERROR=1` and `MPICH_ABORT_ON_ERROR=1`, these are set when loading the ATP module + +#### Increase Memory Available for Coarrays +The default symmetric heap size on Derecho is 64MB per process. +To increase the size set the variable `XT_SYMMETRIC_HEAP_SIZE` to an integer value with the suffix `M` for megabyte or `G` for gigabyte. ``` bash -module load intel-oneapi ncarcompilers intel-mpi netcdf-mpi fftw-mpi -COMPILER=intel MODE=debugslow F90=ifx FC=ifx make -j 4 +export XT_SYMMETRIC_HEAP_SIZE=128M ``` + +#### Further Tools +* [Derecho debugging and profiling documentation](https://arc.ucar.edu/knowledge_base/149323810) +* Use [gdb](https://sourceware.org/gdb/current/onlinedocs/gdb) (with [cheatsheet](https://sourceware.org/gdb/current/onlinedocs/gdb)) or [gdb4hpc](https://cpe.ext.hpe.com/docs/debugging-tools/gdb4hpc.1.html) +* [Linaro Forge Tools](https://docs.linaroforge.com/23.0/html/forge/index.html) such as DTT or MAP. +* If the program returned `died from signal XYZ`, check the signal error against list shown by `$ kill -L` or `$ kill -l` + + + + + + + + + + + + + + + +