From 87326c0bfad9644858b8f788dd33767eafb33c8b Mon Sep 17 00:00:00 2001 From: "Eric T. Johnson" Date: Fri, 13 Sep 2024 15:25:15 -0400 Subject: [PATCH] Add note about amrex.the_arena_init_size=0 on Perlmutter --- sphinx_docs/source/nersc-workflow.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/sphinx_docs/source/nersc-workflow.rst b/sphinx_docs/source/nersc-workflow.rst index db708e1..6f77015 100644 --- a/sphinx_docs/source/nersc-workflow.rst +++ b/sphinx_docs/source/nersc-workflow.rst @@ -22,6 +22,13 @@ includes the restart logic to allow for job chaining. .. literalinclude:: ../../job_scripts/perlmutter/perlmutter.submit :language: sh +.. note:: + + With large reaction networks, you may get GPU out-of-memory errors during + the first burner call. If this happens, you can add + ``amrex.the_arena_init_size=0`` after ``${restartString}`` in the srun call + so AMReX doesn't reserve 3/4 of the GPU memory for the device arena. + Below is an example that runs on CPU-only nodes. Here ``ntasks-per-node`` refers to number of MPI processes (used for distributed parallelism) per node, and ``cpus-per-task`` refers to number of hyper threads used per task