Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate buffer packing functions with less atomics #1199

Merged
merged 5 commits into from
Nov 1, 2024

Conversation

alexrlongne
Copy link
Collaborator

PR Summary

  • Profiling with NSIGHT systems revealed that the LoadBuffer_ function in swarm_comms.cpp took 20% of the GPU time in the particles example that used much more particles (1e6 per block). NSIGHT compute showed that the expensive kernel in LoadBuffer_ had many stalls, we speculated this was due to waiting on the atomic_fetch_add function in LoadBuffer_. This PR reworks LoadBuffer_ and CountParticlesToSend_ to remove atomics and instead sort the particles by buffer ID and use their sorted order to determine their size and index into that buffer.

  • Removing atomics reduces the runtime by about 20% in the million particle example and more in a 10 million particle version. A case where the particles are photons may involves even more cell block crossings as c*delta_t could be large compared to a block size (in the particles example the particle's path length is equal to a block size).

  • The reason for the atomic in both cases was that the loops were over particles where the particle buffer was determined and that was then used to atomically update a buffer specific field.

  • As a rider, this PR also updates some sampling functions in the particle example to use less transcendentals, which has shown improvement in a Parthenon downstream code. I can separate into another PR if desired.

Changes

  • Fold in buffer sizing to the load buffer function
  • Use a sort and a discontinuity check instead of an atomic_fetch_add inside of LoadBuffer_ to get particle index into buffer
  • Reduce transcendental functions call in particle sourcing in particle example

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • Change is breaking (API, behavior, ...)
    • Change is additionally added to CHANGELOG.md in the breaking section
    • PR is marked as breaking
    • Short summary API changes at the top of the PR (plus optionally with an automated update/fix script)
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

+ Fold in buffer sizing to the load buffer function
+ Use a sort and a discontinuity check instead of an
atomic_fetch_add inside of LoadBuffer_ to get particle
index into buffer
+ Reduce transcendental functions call in particle sourcing in
particle example
src/interface/swarm_comms.cpp Outdated Show resolved Hide resolved
src/interface/swarm_comms.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@brryan brryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexrlongne This looks great, I think with just formatting, cleanup of the SwarmKey cell_idx_1d name, and removing the dynamic memory allocs, this is ready to go.

+ Call the member variable in SwarmKey the sort_key
+ Remove CountParticlesInBuffer function
+ Add buffer_start and buffer_sorted as swarm member variables
@alexrlongne alexrlongne force-pushed the along/try_less_atomic branch from f061df6 to d1e3e37 Compare October 29, 2024 17:05
Copy link
Collaborator

@brryan brryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just one more query about r^2 vs r^-2

alexrlongne and others added 2 commits October 29, 2024 14:37
Co-authored-by: Ben Ryan <bryan10@illinois.edu>
Co-authored-by: Ben Ryan <bryan10@illinois.edu>
@brryan brryan enabled auto-merge (squash) November 1, 2024 02:17
@brryan brryan merged commit b5364b7 into develop Nov 1, 2024
50 of 53 checks passed
acreyes pushed a commit to acreyes/parthenon that referenced this pull request Nov 14, 2024
…-lab#1199)

* Consolidate buffer packing functions with less atomics

+ Fold in buffer sizing to the load buffer function
+ Use a sort and a discontinuity check instead of an
atomic_fetch_add inside of LoadBuffer_ to get particle
index into buffer
+ Reduce transcendental functions call in particle sourcing in
particle example

* Address PR comments

+ Call the member variable in SwarmKey the sort_key
+ Remove CountParticlesInBuffer function
+ Add buffer_start and buffer_sorted as swarm member variables

* Update example/particles/particles.cpp

Co-authored-by: Ben Ryan <bryan10@illinois.edu>

* Update src/interface/swarm_comms.cpp

Co-authored-by: Ben Ryan <bryan10@illinois.edu>

---------

Co-authored-by: Ben Ryan <bryan10@illinois.edu>
Co-authored-by: Ben Ryan <brryan@lanl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants