Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix remaining swarm D->H->D copies #1145

Merged
merged 32 commits into from
Aug 28, 2024
Merged

Conversation

brryan
Copy link
Collaborator

@brryan brryan commented Jul 31, 2024

PR Summary

There are some outstanding places in the particle infrastructure where we copy arrays to host to perform sorting logic. This is almost certainly responsible for a just uncovered major performance bottleneck downstream. Should be fixable with prefix sums on device. This PR aims to do that.

  • Initialize device data in Swarm constructor with kernel launch rather than device/host copies
  • Switch some Swarm method names from camel case to uppercase
  • Rewrite Defragment method to use kernel launches rather than device/host copies to defragment the memory pool of particles
  • Rewrite AddEmptyParticles method to use kernel launches rather than device host copies to provide a list of empty particle indices
  • Remove probably unnecessary check on non-null_ptr-ness of user swarm boundary functions that was not interacting correctly with recent changes to the forest infrastructure.
  • Remove device host copies in swarm_comms.cpp (can possibly defer this to another PR if anyone wants to merge this before middle of next week)
  • Remove copy constructor from AddRegion to prevent segfault if reference is not properly stored when making a task collection.

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • Change is breaking (API, behavior, ...)
    • Change is additionally added to CHANGELOG.md in the breaking section
    • PR is marked as breaking
    • Short summary API changes at the top of the PR (plus optionally with an automated update/fix script)
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

@brryan brryan changed the title WIP: Fix remaining swarm D->H->D copies Fix remaining swarm D->H->D copies Jul 31, 2024
Copy link
Collaborator

@lroberts36 lroberts36 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, glad to see there was a nice way to update the empty indices entirely on device.

src/interface/swarm_container.cpp Show resolved Hide resolved
Copy link
Collaborator

@pgrete pgrete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this.
Looks good (though I should knowledge on parsing parallel_scans` ;) )

src/interface/swarm.cpp Show resolved Hide resolved
* First kernel down

* Further cleanup

* Notes

* Send seems to work

* Need to promote particles example to latest pattern

* May have fixed particles example

* cycles...

* iterative tasking is only iterating on one meshblock in particles example...

* Still not working...

* New loop seems to work

* Cleaned up some printfs/marked unused code for deletion

* New algorithm is cycling, time to clean up

* Fixed indexing bug...

* Clean up

* finished_transport shouldnt be provided by swarm

* Reverting to old manual iterative tasking

* Still working...

* Starting to make progress...

* Cleaned up

* format

* A few leftover print statements
@brryan
Copy link
Collaborator Author

brryan commented Aug 20, 2024

OK I fixed up the swarm communication logic to also function on device. Those specific changes are shown in #1154. I used a combination of parallel fors/reduces and atomic opartions.

@brryan
Copy link
Collaborator Author

brryan commented Aug 20, 2024

OK this is ready for review/merge and should cure the current host/device copy performance bottlenecks in swarms.

example/particles/particles.cpp Outdated Show resolved Hide resolved
src/interface/swarm.cpp Outdated Show resolved Hide resolved
@Yurlungur
Copy link
Collaborator

@AstroBarker now that you're back you should be paying attention to swarm development in parthenon.

Copy link
Collaborator

@pgrete pgrete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'm done reviewing (some comments are probably worth addressing).
I still want to check downstream (tomorrow at the latest) before final approval.

example/particles/particles.cpp Outdated Show resolved Hide resolved
example/particles/particles.cpp Outdated Show resolved Hide resolved
example/particles/particles.cpp Outdated Show resolved Hide resolved
example/particles/particles.cpp Outdated Show resolved Hide resolved
example/particles/particles.cpp Outdated Show resolved Hide resolved
example/particles/particles.cpp Outdated Show resolved Hide resolved
example/particles/particles.cpp Show resolved Hide resolved
src/interface/swarm_comms.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@brryan brryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pgrete! Code is much improved.

Only major question is what to do for this PR about the double versus Real time comparison issue, because particle time variables are locked in to Real now.

example/particles/particles.cpp Outdated Show resolved Hide resolved
example/particles/particles.cpp Outdated Show resolved Hide resolved
example/particles/particles.cpp Outdated Show resolved Hide resolved
example/particles/particles.cpp Outdated Show resolved Hide resolved
example/particles/particles.cpp Outdated Show resolved Hide resolved
example/particles/particles.cpp Outdated Show resolved Hide resolved
example/particles/particles.cpp Outdated Show resolved Hide resolved
src/interface/swarm_comms.cpp Outdated Show resolved Hide resolved
@pgrete
Copy link
Collaborator

pgrete commented Aug 27, 2024

I did some quick downstream tests and the results look good.

  • for swarm_container.cpp::165::Receive [region]
    orig: 2.02s
    new: 1.25s
  • for swarm_container.cpp::155::Send
    orig: 0.97s
    new: 0.53s

Feel free to hit merge whenever.

@brryan brryan enabled auto-merge (squash) August 27, 2024 20:36
@pgrete pgrete disabled auto-merge August 28, 2024 14:14
@pgrete pgrete enabled auto-merge (squash) August 28, 2024 14:14
@brryan brryan disabled auto-merge August 28, 2024 14:39
@brryan brryan enabled auto-merge (squash) August 28, 2024 14:39
@brryan brryan merged commit 517bf92 into develop Aug 28, 2024
54 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants