2023.09.21 Meeting Notes

Agenda

LR

further working on multigrid
- now CG and BiCGStab now work on uniform grids for Poisson
- Poisson also seems to work for AMR
- still fighting with spatially dependent diffusion coeff
- code not optimized for performance yet
- still tweaking task list (to reduce downstream interface)
now time to get the infrastructure (pieces) merged into main; concerns
- logical block sizes
- comm patterns
- logical locations/negative levels
- will create final PR with additional docs, and then ping people for reviews

PM

Josh discovered performance regression when rebasing Riot to main when running on CPUs
Up to 30% integrated performance just because of buffer packing kernels
Changing kernel structure fixes regression, but results in a slowdown on GPU runs
Path forward: downstream codes should tests impact of those kernels and then we can decide how to proceed (general versus specialized solution, ...)
Also compared parthenon-hydro to AthenaK to identify performance impact from different block handling and load balancing. To be reported once more data is available.

JD

added capability to pack subset of blocks, useful, for example, if there's significant load imbalance (e.g., when nothing happens in some part of the domain)
-> soft disabling blocks, might also be useful for adaptive timestepping or multigrid
working on PR for timer based load balancing
- through timer objects inside kernels
- works/requires on hierarchical parallelism (so that the timer is at the outer level and reports back to a view in device memory)

BP

preparing new kharma release with AMR and semi-implicit stepping for viscosity
added a couple of small PRs to Parthenon along the way (QOL, and bug fixes)
PEP1 is ready for merge (to customize packages, e.g., allows for customizing source terms/streamlining driver)

FG

working on coordinate
- cyl. coordinates are working (in separate AthenaPK branch for testing)
- sph. should work, but need testing
- coordinates almost working with yt already
- next step: write example on Parthenon for regression testing and review
kernel timing for AthenaPK, looks like there's room for improvement both around MPI collectives and individual kernel performance (based on initial roofline models, e.g., 30% of HBM -- compared to 70-80% in K-Athena)
- looks like kernel size is a key issue (on AMD GPUs) due to register pressure (also observed by BP in kharma)

PG