-
Notifications
You must be signed in to change notification settings - Fork 37
2021.01.06 Meeting Notes
- Individual/group updates
- Goal Updates:
- AMR Performance
- Sparse Variables
- Face Centered Variables
- Update on LANL proxy (@jlippuner)
- Review non-WIP PRs
@AndrewGaspar continuing work on sparse variables - will give an additional update later in meeting.
@JoshuaSBrown built out RZAnsel project space, is finishing up python app for uploading performance statistics and CI results, and doing code review.
@carolae is working on roofline models for the advection examples on the cascase-lake partition on Darwin.
@jlippuner is working on LANL's proxy example.
@Yurlungur working on curvilinear coordinates (e.g. cylindrical and spherical coordinate systems).
Ben Ryan still working on MPI for particles - up in WIP PR on GitHub.
CJ Solomon trying to wrap up PR for diagnostics. https://github.com/lanl/parthenon/pull/400
@pgrete fixed issue in pack-in-one. Problem was in the packing, additional restriction step, was accidentally re-using caching. Would like some testing in RIOT.
@pgrete fixed bug with timestep calculation.
@pgrete updated AthenaPK to work with hydro and made it say Happy New Years!
@pgrete - with pack-in-one improvements now we get better performance on single Volta than on a dual socket Skylake-X system for 16^3 block size.
We should see if we can replicate this on RZAnsel.
(see: AthenaPK update) as intro
If GPUs are not fully utilized, performance is not great.
Most expensive function next on the list is Restriction and Prolongation using small kernel launches. Need to use fat launches on MeshData
.
Another easy win would be to add options to only perform refinement every N cycles.
Got variable packing refactored to support null variables in block packs. This is to work with sparse variables in MeshData that aren't uniform across all blocks in the MeshData.
Not in progress.
FillDerived samples iterations for each cell. Uses a power law distribution. Dummy work in FillDerived, too. RIOT folks should look at that. Running on both CPU and GPU. Random number generation is taking a long time on the GPU runs, so switching to Kokkos_Random.hpp
.
@gshipman would like to do performance comparisons on the proxy between rzansel and CTS-1 nodes (broadwell). Need buffer in one branch.
@carolae will help @Yurlungur with RIOT performance runs on CTS-1 and rzansel. @agaspar will set up project space on Snow.