-
Notifications
You must be signed in to change notification settings - Fork 37
Notes on Communication Patterns
From the meeting on this topic:
There are multiple kinds of communication performed
- Ghost halo exchange, which is expensive, requires lots of data, and is performed every cycle
- Flux correction, less expensive, less data needed, performed every cycle
- re-meshing and load balancing, performed only every re-mesh cycle.
Our GPU-performant ghost halo exchange machinery is mostly in in src/bvals/cc/bvals_cc_in_one.*
. MPI_IReceive
is called first at the beginning of a cycle. Then MPI_Start
and MPI_Test
are used. Buffers are packed and unpacked in a single kernel.
Most relevant place to look is in src/bvals/cc/flux_correction_cc.cpp
, which implements the functions in BoundaryBuffer
in src/bvals/bvals_interfaces.hpp
. We don't do anything special for GPUs here, other than put the calls in kokkos kernels. It hasn't been needed.
The load balancing calculation is done locally. Each rank recomputes the entire global AMR tree locally. Then the meshblocks are assigned to ranks by walking though a space-filling curve. The "cost" of blocks per rank is intended to be the same for each rank, which is effected via indexing logic. The only communication required is when moving blocks, as everything else is local. Moving blocks is, of course, very expensive. But re-meshing is only done once every several cycles, compared to the other operations, which are per-cycle.
The call stack is something like this. Most of this machinery lives in src/mesh/amr_loadbalance.cpp
. However, a little bit of it lives in src/bvals/bvals_base.cpp
.
-
LoadBalancingAndAdaptiveMeshRefinement
wraps the two most important functions-
UpdateMeshBlockTree
recomputes the AMR octree based on refinement criteria -
GatherCostListAndCheckBalance
has anMPI_Allgather
for the total load costs. -
RedistributeAndRefineMeshBlocks
If refinement happened or a load imbalance is detected, move meshblocks, assign neighbors, etc. All done deterministically.-
CalculateLoadBalance
-
AssignBlocks
walks through the space-filling curve and assigns blocks to MPI ranks as the cost saturates per rank. Fills a astd::vector<int>
,ranklist
, which sets the rank of each meshblock, and the meshblocks will later use to find the ranks of their neighbors. -
UpdateBlockList
setsnslist
andnblist
which map MPI rank to the global ID of the first meshblock on that rank and total number of meshblocks on that rank respectively.
-
- A bunch of machinery is called to generate buffers, copy meshblock data as needed, prolongate, restrict, etc. MPI communication is called here, on potentially a large amount of data to, e.g., move data on meshblocks to new ranks.
- The neighbor list is finally generated/saved in
pmb->pbval->SearchAndSetNeighbors
, which uses the AMR octree, theranklist
, andnslist
to walk through the tree locally. No MPI communication is needed.
-
-