Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix remaining swarm D->H->D copies #1145

Merged
merged 32 commits into from
Aug 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
60b216c
working
brryan Jul 30, 2024
3867965
Clean up implicit captures
brryan Jul 31, 2024
ed99964
OK something isnt working with updating empty_indices
brryan Jul 31, 2024
f19156b
Creating indices works but defragmentation is broken
brryan Jul 31, 2024
bf06351
Defrag works
brryan Jul 31, 2024
def49d2
Switch to persistent scratch memory
brryan Jul 31, 2024
e9b98c0
Clean up
brryan Jul 31, 2024
4a4110e
Fix GPU issues
brryan Jul 31, 2024
1cf0b8f
Fix compile error by cleaning up code
brryan Jul 31, 2024
b4487b1
Remove unnecessary check against non-null user swarm BCs
brryan Jul 31, 2024
d68e6dd
Remove unused function
brryan Jul 31, 2024
93efdfc
Formatting
brryan Jul 31, 2024
5e5a31a
Fiddle with send logic
brryan Jul 31, 2024
ee95934
Merge branch 'develop' into brryan/more_swarm_prefix_sums
lroberts36 Aug 1, 2024
b2891de
Merge branch 'develop' of github.com:lanl/parthenon into brryan/more_…
brryan Aug 20, 2024
51cc2f1
Merge branch 'brryan/more_swarm_prefix_sums' of github.com:lanl/parth…
brryan Aug 20, 2024
2a6a8ed
Perform swarm boundary logic on device (#1154)
brryan Aug 20, 2024
7326394
Oops ParArray1D isn't a host array when compiled for device
brryan Aug 20, 2024
36c6bc0
implicit this->
brryan Aug 20, 2024
753ca05
bug in nrecvd particles with 1 particle received...
brryan Aug 20, 2024
ed10e33
Found the bug
brryan Aug 21, 2024
85d7bd5
Fixed bug, cleaned up
brryan Aug 21, 2024
f17c013
kokkos parallel_scan -> par_scan
brryan Aug 21, 2024
fb536d1
Merge branch 'develop' into brryan/more_swarm_prefix_sums
brryan Aug 22, 2024
e4092f6
Merge branch 'develop' into brryan/more_swarm_prefix_sums
brryan Aug 23, 2024
87192a7
Fix types, clean up code
brryan Aug 26, 2024
5d93cac
Merge branch 'brryan/more_swarm_prefix_sums' of github.com:lanl/parth…
brryan Aug 26, 2024
fab017d
Merge branch 'develop' into brryan/more_swarm_prefix_sums
brryan Aug 26, 2024
fab834b
Merge branch 'develop' into brryan/more_swarm_prefix_sums
brryan Aug 26, 2024
a205b76
Add warning if using swarms but Real != double
brryan Aug 27, 2024
27879e9
Merge branch 'brryan/more_swarm_prefix_sums' of github.com:lanl/parth…
brryan Aug 27, 2024
5c46e5a
Cleanup from debugging
brryan Aug 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
- [[PR 1004]](https://github.com/parthenon-hpc-lab/parthenon/pull/1004) Allow parameter modification from an input file for restarts

### Fixed (not changing behavior/API/variables/...)
- [[PR 1145]](https://github.com/parthenon-hpc-lab/parthenon/pull/1145) Fix remaining swarm D->H->D copies
- [[PR 1150]](https://github.com/parthenon-hpc-lab/parthenon/pull/1150) Reduce memory consumption for buffer pool
- [[PR 1146]](https://github.com/parthenon-hpc-lab/parthenon/pull/1146) Fix an issue outputting >4GB single variables per rank
- [[PR 1152]](https://github.com/parthenon-hpc-lab/parthenon/pull/1152) Fix memory leak in task graph outputs related to `abi::__cxa_demangle`
Expand Down
6 changes: 2 additions & 4 deletions example/particles/parthinput.particles
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,8 @@ refinement = none
nx1 = 16
x1min = -0.5
x1max = 0.5
ix1_bc = user
ox1_bc = user
# ix1_bc = periodic # Optionally use periodic boundary conditions everywhere
# ox1_bc = periodic
ix1_bc = periodic
ox1_bc = periodic

nx2 = 16
x2min = -0.5
Expand Down
180 changes: 82 additions & 98 deletions example/particles/particles.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -340,8 +340,7 @@ TaskStatus CreateSomeParticles(MeshBlock *pmb, const double t0) {
return TaskStatus::complete;
}

TaskStatus TransportParticles(MeshBlock *pmb, const StagedIntegrator *integrator,
const double t0) {
TaskStatus TransportParticles(MeshBlock *pmb, const double t0, const double dt) {
PARTHENON_INSTRUMENT

auto swarm = pmb->meshblock_data.Get()->GetSwarmData()->Get("my_particles");
Expand All @@ -350,8 +349,6 @@ TaskStatus TransportParticles(MeshBlock *pmb, const StagedIntegrator *integrator

int max_active_index = swarm->GetMaxActiveIndex();

Real dt = integrator->dt;

auto &t = swarm->Get<Real>("t").Get();
auto &x = swarm->Get<Real>(swarm_position::x::name()).Get();
auto &y = swarm->Get<Real>(swarm_position::y::name()).Get();
Expand Down Expand Up @@ -469,97 +466,31 @@ TaskStatus TransportParticles(MeshBlock *pmb, const StagedIntegrator *integrator
// Custom step function to allow for looping over MPI-related tasks until complete
TaskListStatus ParticleDriver::Step() {
TaskListStatus status;
integrator.dt = tm.dt;

PARTHENON_REQUIRE(integrator.nstages == 1,
"Only first order time integration supported!");

BlockList_t &blocks = pmesh->block_list;
auto num_task_lists_executed_independently = blocks.size();

// Create all the particles that will be created during the step
status = MakeParticlesCreationTaskCollection().Execute();
PARTHENON_REQUIRE(status == TaskListStatus::complete,
"ParticlesCreation task list failed!");

// Loop over repeated MPI calls until every particle is finished. This logic is
// required because long-distance particle pushes can lead to a large, unpredictable
// number of MPI sends and receives.
bool particles_update_done = false;
while (!particles_update_done) {
status = MakeParticlesUpdateTaskCollection().Execute();

particles_update_done = true;
for (auto &block : blocks) {
// TODO(BRR) Despite this "my_particles"-specific call, this function feels like it
// should be generalized
auto swarm = block->meshblock_data.Get()->GetSwarmData()->Get("my_particles");
if (!swarm->finished_transport) {
particles_update_done = false;
}
}
}
// Transport particles iteratively until all particles reach final time
status = IterativeTransport();
// status = MakeParticlesTransportTaskCollection().Execute();
PARTHENON_REQUIRE(status == TaskListStatus::complete,
"IterativeTransport task list failed!");

// Use a more traditional task list for predictable post-MPI evaluations.
status = MakeFinalizationTaskCollection().Execute();
PARTHENON_REQUIRE(status == TaskListStatus::complete, "Finalization task list failed!");

return status;
}

// TODO(BRR) This should really be in parthenon/src... but it can't just live in Swarm
// because of the loop over blocks
TaskStatus StopCommunicationMesh(const BlockList_t &blocks) {
PARTHENON_INSTRUMENT

int num_sent_local = 0;
for (auto &block : blocks) {
auto sc = block->meshblock_data.Get()->GetSwarmData();
auto swarm = sc->Get("my_particles");
swarm->finished_transport = false;
num_sent_local += swarm->num_particles_sent_;
}

int num_sent_global = num_sent_local; // potentially overwritten by following Allreduce
#ifdef MPI_PARALLEL
for (auto &block : blocks) {
auto swarm = block->meshblock_data.Get()->GetSwarmData()->Get("my_particles");
for (int n = 0; n < block->neighbors.size(); n++) {
NeighborBlock &nb = block->neighbors[n];
// TODO(BRR) May want logic like this if we have non-blocking TaskRegions
// if (nb.snb.rank != Globals::my_rank) {
// if (swarm->vbswarm->bd_var_.flag[nb.bufid] != BoundaryStatus::completed) {
// return TaskStatus::incomplete;
// }
//}

// TODO(BRR) May want to move this logic into a per-cycle initialization call
if (swarm->vbswarm->bd_var_.flag[nb.bufid] == BoundaryStatus::completed) {
swarm->vbswarm->bd_var_.req_send[nb.bufid] = MPI_REQUEST_NULL;
}
}
}

MPI_Allreduce(&num_sent_local, &num_sent_global, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
#endif // MPI_PARALLEL

if (num_sent_global == 0) {
for (auto &block : blocks) {
auto &pmb = block;
auto sc = pmb->meshblock_data.Get()->GetSwarmData();
auto swarm = sc->Get("my_particles");
swarm->finished_transport = true;
}
}

// Reset boundary statuses
for (auto &block : blocks) {
auto &pmb = block;
auto sc = pmb->meshblock_data.Get()->GetSwarmData();
auto swarm = sc->Get("my_particles");
for (int n = 0; n < pmb->neighbors.size(); n++) {
auto &nb = block->neighbors[n];
swarm->vbswarm->bd_var_.flag[nb.bufid] = BoundaryStatus::waiting;
}
}

return TaskStatus::complete;
}

TaskCollection ParticleDriver::MakeParticlesCreationTaskCollection() const {
TaskCollection tc;
TaskID none(0);
Expand All @@ -577,40 +508,93 @@ TaskCollection ParticleDriver::MakeParticlesCreationTaskCollection() const {
return tc;
}

TaskCollection ParticleDriver::MakeParticlesUpdateTaskCollection() const {
TaskStatus CountNumSent(const BlockList_t &blocks, const double tf_, bool *done) {
int num_unfinished = 0;
for (auto &block : blocks) {
auto sc = block->meshblock_data.Get()->GetSwarmData();
auto swarm = sc->Get("my_particles");
int max_active_index = swarm->GetMaxActiveIndex();

auto &t = swarm->Get<Real>("t").Get();

auto swarm_d = swarm->GetDeviceContext();

const auto &tf = tf_;

parthenon::par_reduce(
PARTHENON_AUTO_LABEL, 0, max_active_index,
KOKKOS_LAMBDA(const int n, int &num_unfinished) {
if (swarm_d.IsActive(n)) {
if (t(n) < tf) {
brryan marked this conversation as resolved.
Show resolved Hide resolved
num_unfinished++;
}
}
},
Kokkos::Sum<int>(num_unfinished));
}

#ifdef MPI_PARALLEL
MPI_Allreduce(MPI_IN_PLACE, &num_unfinished, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
#endif // MPI_PARALLEL

if (num_unfinished > 0) {
*done = false;
} else {
*done = true;
}

return TaskStatus::complete;
}

TaskCollection ParticleDriver::IterativeTransportTaskCollection(bool *done) const {
TaskCollection tc;
TaskID none(0);
const double t0 = tm.time;
const BlockList_t &blocks = pmesh->block_list;
const int nblocks = blocks.size();
const double t0 = tm.time;
const double dt = tm.dt;

auto num_task_lists_executed_independently = blocks.size();

TaskRegion &async_region0 = tc.AddRegion(num_task_lists_executed_independently);
for (int i = 0; i < blocks.size(); i++) {
TaskRegion &async_region = tc.AddRegion(nblocks);
for (int i = 0; i < nblocks; i++) {
auto &pmb = blocks[i];

auto &sc = pmb->meshblock_data.Get()->GetSwarmData();
auto &tl = async_region[i];

auto &tl = async_region0[i];

auto transport_particles =
tl.AddTask(none, TransportParticles, pmb.get(), &integrator, t0);

auto send = tl.AddTask(transport_particles, &SwarmContainer::Send, sc.get(),
BoundaryCommSubset::all);
auto transport = tl.AddTask(none, TransportParticles, pmb.get(), t0, dt);
auto reset_comms =
tl.AddTask(transport, &SwarmContainer::ResetCommunication, sc.get());
auto send =
tl.AddTask(reset_comms, &SwarmContainer::Send, sc.get(), BoundaryCommSubset::all);
auto receive =
tl.AddTask(send, &SwarmContainer::Receive, sc.get(), BoundaryCommSubset::all);
}

TaskRegion &sync_region0 = tc.AddRegion(1);
TaskRegion &sync_region = tc.AddRegion(1);
{
auto &tl = sync_region0[0];
auto stop_comm = tl.AddTask(none, StopCommunicationMesh, blocks);
auto &tl = sync_region[0];
auto check_completion = tl.AddTask(none, CountNumSent, blocks, t0 + dt, done);
}

return tc;
}

// TODO(BRR) to be replaced by iterative tasklist machinery
TaskListStatus ParticleDriver::IterativeTransport() const {
TaskListStatus status;
bool transport_done = false;
int n_transport_iter = 0;
int n_transport_iter_max = 1000;
while (!transport_done) {
status = IterativeTransportTaskCollection(&transport_done).Execute();

n_transport_iter++;
PARTHENON_REQUIRE(n_transport_iter < n_transport_iter_max,
"Too many transport iterations!");
}

return status;
}

TaskCollection ParticleDriver::MakeFinalizationTaskCollection() const {
TaskCollection tc;
TaskID none(0);
Expand Down
4 changes: 3 additions & 1 deletion example/particles/particles.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@ class ParticleDriver : public EvolutionDriver {
ParticleDriver(ParameterInput *pin, ApplicationInput *app_in, Mesh *pm)
: EvolutionDriver(pin, app_in, pm), integrator(pin) {}
TaskCollection MakeParticlesCreationTaskCollection() const;
TaskCollection MakeParticlesUpdateTaskCollection() const;
TaskCollection MakeParticlesTransportTaskCollection() const;
TaskListStatus IterativeTransport() const;
TaskCollection IterativeTransportTaskCollection(bool *done) const;
TaskCollection MakeFinalizationTaskCollection() const;
TaskListStatus Step();

Expand Down
7 changes: 7 additions & 0 deletions src/interface/state_descriptor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -452,6 +452,13 @@ StateDescriptor::CreateResolvedStateDescriptor(Packages_t &packages) {
field_tracker.CategorizeCollection(name, field_dict, &field_provider);
swarm_tracker.CategorizeCollection(name, package->AllSwarms(), &swarm_provider);

if (!package->AllSwarms().empty() && !std::is_same<Real, double>::value) {
PARTHENON_WARN(
"Swarms always use Real precision, even for ParticleVariables containing "
"time data, while Parthenon time variables are fixed to double precision. This "
"may cause inaccurate comparisons with cycle beginning and end times.")
}

// Add package registered boundary conditions
for (int i = 0; i < 6; ++i)
state->UserBoundaryFunctions[i].insert(state->UserBoundaryFunctions[i].end(),
Expand Down
Loading
Loading