-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix remaining swarm D->H->D copies #1145
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, glad to see there was a nice way to update the empty indices entirely on device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of this.
Looks good (though I should knowledge on parsing parallel_scan
s` ;) )
…swarm_prefix_sums
…enon into brryan/more_swarm_prefix_sums
* First kernel down * Further cleanup * Notes * Send seems to work * Need to promote particles example to latest pattern * May have fixed particles example * cycles... * iterative tasking is only iterating on one meshblock in particles example... * Still not working... * New loop seems to work * Cleaned up some printfs/marked unused code for deletion * New algorithm is cycling, time to clean up * Fixed indexing bug... * Clean up * finished_transport shouldnt be provided by swarm * Reverting to old manual iterative tasking * Still working... * Starting to make progress... * Cleaned up * format * A few leftover print statements
OK I fixed up the swarm communication logic to also function on device. Those specific changes are shown in #1154. I used a combination of parallel |
OK this is ready for review/merge and should cure the current host/device copy performance bottlenecks in swarms. |
@AstroBarker now that you're back you should be paying attention to swarm development in parthenon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I'm done reviewing (some comments are probably worth addressing).
I still want to check downstream (tomorrow at the latest) before final approval.
…enon into brryan/more_swarm_prefix_sums
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pgrete! Code is much improved.
Only major question is what to do for this PR about the double
versus Real
time comparison issue, because particle time variables are locked in to Real
now.
I did some quick downstream tests and the results look good.
Feel free to hit merge whenever. |
…enon into brryan/more_swarm_prefix_sums
PR Summary
There are some outstanding places in the particle infrastructure where we copy arrays to host to perform sorting logic. This is almost certainly responsible for a just uncovered major performance bottleneck downstream. Should be fixable with prefix sums on device. This PR aims to do that.
Swarm
constructor with kernel launch rather than device/host copiesSwarm
method names from camel case to uppercaseDefragment
method to use kernel launches rather than device/host copies to defragment the memory pool of particlesAddEmptyParticles
method to use kernel launches rather than device host copies to provide a list of empty particle indicesnull_ptr
-ness of user swarm boundary functions that was not interacting correctly with recent changes to the forest infrastructure.swarm_comms.cpp
(can possibly defer this to another PR if anyone wants to merge this before middle of next week)AddRegion
to prevent segfault if reference is not properly stored when making a task collection.PR Checklist