-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate buffer packing functions with less atomics #1199
Conversation
+ Fold in buffer sizing to the load buffer function + Use a sort and a discontinuity check instead of an atomic_fetch_add inside of LoadBuffer_ to get particle index into buffer + Reduce transcendental functions call in particle sourcing in particle example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexrlongne This looks great, I think with just formatting, cleanup of the SwarmKey
cell_idx_1d
name, and removing the dynamic memory allocs, this is ready to go.
+ Call the member variable in SwarmKey the sort_key + Remove CountParticlesInBuffer function + Add buffer_start and buffer_sorted as swarm member variables
f061df6
to
d1e3e37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just one more query about r^2 vs r^-2
Co-authored-by: Ben Ryan <bryan10@illinois.edu>
Co-authored-by: Ben Ryan <bryan10@illinois.edu>
…-lab#1199) * Consolidate buffer packing functions with less atomics + Fold in buffer sizing to the load buffer function + Use a sort and a discontinuity check instead of an atomic_fetch_add inside of LoadBuffer_ to get particle index into buffer + Reduce transcendental functions call in particle sourcing in particle example * Address PR comments + Call the member variable in SwarmKey the sort_key + Remove CountParticlesInBuffer function + Add buffer_start and buffer_sorted as swarm member variables * Update example/particles/particles.cpp Co-authored-by: Ben Ryan <bryan10@illinois.edu> * Update src/interface/swarm_comms.cpp Co-authored-by: Ben Ryan <bryan10@illinois.edu> --------- Co-authored-by: Ben Ryan <bryan10@illinois.edu> Co-authored-by: Ben Ryan <brryan@lanl.gov>
PR Summary
Profiling with NSIGHT systems revealed that the
LoadBuffer_
function inswarm_comms.cpp
took 20% of the GPU time in the particles example that used much more particles (1e6 per block). NSIGHT compute showed that the expensive kernel inLoadBuffer_
had many stalls, we speculated this was due to waiting on theatomic_fetch_add
function inLoadBuffer_
. This PR reworksLoadBuffer_
andCountParticlesToSend_
to remove atomics and instead sort the particles by buffer ID and use their sorted order to determine their size and index into that buffer.Removing atomics reduces the runtime by about 20% in the million particle example and more in a 10 million particle version. A case where the particles are photons may involves even more cell block crossings as c*delta_t could be large compared to a block size (in the particles example the particle's path length is equal to a block size).
The reason for the atomic in both cases was that the loops were over particles where the particle buffer was determined and that was then used to atomically update a buffer specific field.
As a rider, this PR also updates some sampling functions in the particle example to use less transcendentals, which has shown improvement in a Parthenon downstream code. I can separate into another PR if desired.
Changes
PR Checklist