Skip to content

Commit

Permalink
Merge branch 'bumo-kokkos-42' into BenWibking/mainloop-debug-output-c…
Browse files Browse the repository at this point in the history
…allback
  • Loading branch information
BenWibking committed Nov 12, 2024
2 parents 5fc2b1b + e00017c commit 52b0ca0
Show file tree
Hide file tree
Showing 20 changed files with 141 additions and 65 deletions.
4 changes: 3 additions & 1 deletion .github/workflows/ci-extended.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ env:
CMAKE_BUILD_PARALLEL_LEVEL: 5 # num threads for build
MACHINE_CFG: cmake/machinecfg/CI.cmake
OMPI_MCA_mpi_common_cuda_event_max: 1000
# CUDA IPC within docker repeated seem to cause issue on the CI machine
OMPI_MCA_btl_smcuda_use_cuda_ipc: 0
# https://github.com/open-mpi/ompi/issues/4948#issuecomment-395468231
OMPI_MCA_btl_vader_single_copy_mechanism: none

Expand All @@ -34,7 +36,7 @@ jobs:
container:
image: ghcr.io/parthenon-hpc-lab/cuda11.6-mpi-hdf5-ascent
# map to local user id on CI machine to allow writing to build cache
options: --user 1001
options: --user 1001 --cap-add CAP_SYS_PTRACE --shm-size="8g" --ulimit memlock=134217728
steps:
- uses: actions/checkout@v3
with:
Expand Down
8 changes: 5 additions & 3 deletions .github/workflows/ci-short.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ env:
CMAKE_BUILD_PARALLEL_LEVEL: 5 # num threads for build
MACHINE_CFG: cmake/machinecfg/CI.cmake
OMPI_MCA_mpi_common_cuda_event_max: 1000
# CUDA IPC within docker repeated seem to cause issue on the CI machine
OMPI_MCA_btl_smcuda_use_cuda_ipc: 0
# https://github.com/open-mpi/ompi/issues/4948#issuecomment-395468231
OMPI_MCA_btl_vader_single_copy_mechanism: none

Expand All @@ -22,7 +24,7 @@ jobs:
container:
image: ghcr.io/parthenon-hpc-lab/cuda11.6-mpi-hdf5-ascent
# map to local user id on CI machine to allow writing to build cache
options: --user 1001
options: --user 1001 --cap-add CAP_SYS_PTRACE --shm-size="8g" --ulimit memlock=134217728
steps:
- uses: actions/checkout@v3
with:
Expand All @@ -47,7 +49,7 @@ jobs:
container:
image: ghcr.io/parthenon-hpc-lab/cuda11.6-mpi-hdf5-ascent
# map to local user id on CI machine to allow writing to build cache
options: --user 1001
options: --user 1001 --cap-add CAP_SYS_PTRACE --shm-size="8g" --ulimit memlock=134217728
steps:
- uses: actions/checkout@v3
with:
Expand Down Expand Up @@ -79,7 +81,7 @@ jobs:
container:
image: ghcr.io/parthenon-hpc-lab/cuda11.6-mpi-hdf5-ascent
# map to local user id on CI machine to allow writing to build cache
options: --user 1001
options: --user 1001 --cap-add CAP_SYS_PTRACE --shm-size="8g" --ulimit memlock=134217728
steps:
- uses: actions/checkout@v3
with:
Expand Down
24 changes: 12 additions & 12 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,19 @@
## Current develop

### Added (new features/APIs/variables/...)
- [[PR 1185]](https://github.com/parthenon-hpc-lab/parthenon/pull/1185/files) Bugfix to particle defragmentation
- [[PR 1185]](https://github.com/parthenon-hpc-lab/parthenon/pull/1185) Bugfix to particle defragmentation
- [[PR 1184]](https://github.com/parthenon-hpc-lab/parthenon/pull/1184) Fix swarm block neighbor indexing in 1D, 2D
- [[PR 1183]](https://github.com/parthenon-hpc-lab/parthenon/pull/1183) Fix particle leapfrog example initialization data
- [[PR 1179]](https://github.com/parthenon-hpc-lab/parthenon/pull/1179) Make a global variable for whether simulation is a restart
- [[PR 1171]](https://github.com/parthenon-hpc-lab/parthenon/pull/1171) Add PARTHENON_USE_SYSTEM_PACKAGES build option
- [[PR 1161]](https://github.com/parthenon-hpc-lab/parthenon/pull/1161) Make flux field Metadata accessible, add Metadata::CellMemAligned flag, small perfomance upgrades

### Changed (changing behavior/API/variables/...)
- [[PR 1191]](https://github.com/parthenon-hpc-lab/parthenon/pull/1191) Update Kokkos version to 4.4.1
- [[PR 1206]](https://github.com/parthenon-hpc-lab/parthenon/pull/1206) Leapfrog fix
- [[PR1203]](https://github.com/parthenon-hpc-lab/parthenon/pull/1203) Pin Ubuntu CI image
- [[PR1177]](https://github.com/parthenon-hpc-lab/parthenon/pull/1177) Make mesh-level boundary conditions usable without the "user" flag
- [[PR 1203]](https://github.com/parthenon-hpc-lab/parthenon/pull/1203) Pin Ubuntu CI image
- [[PR 1177]](https://github.com/parthenon-hpc-lab/parthenon/pull/1177) Make mesh-level boundary conditions usable without the "user" flag
- [[PR 1187]](https://github.com/parthenon-hpc-lab/parthenon/pull/1187) Make DataCollection::Add safer and generalize MeshBlockData::Initialize
- [[Issue 1165]](https://github.com/parthenon-hpc-lab/parthenon/issues/1165) Bump Kokkos submodule to 4.4.1
- [[PR 1171]](https://github.com/parthenon-hpc-lab/parthenon/pull/1171) Add PARTHENON_USE_SYSTEM_PACKAGES build option
- [[PR 1172]](https://github.com/parthenon-hpc-lab/parthenon/pull/1172) Make parthenon manager robust against external MPI init and finalize calls

Expand All @@ -32,7 +32,7 @@


### Incompatibilities (i.e. breaking changes)
- [[PR1177]](https://github.com/parthenon-hpc-lab/parthenon/pull/1177) Make mesh-level boundary conditions usable without the "user" flag
- [[PR 1177]](https://github.com/parthenon-hpc-lab/parthenon/pull/1177) Make mesh-level boundary conditions usable without the "user" flag

## Release 24.08
Date: 2024-08-30
Expand Down Expand Up @@ -156,12 +156,12 @@ Date: 2024-03-21
- [[PR 973]](https://github.com/parthenon-hpc-lab/parthenon/pull/973) Multigrid performance upgrades

### Fixed (not changing behavior/API/variables/...)
- [[PR1023]](https://github.com/parthenon-hpc-lab/parthenon/pull/1023) Fix broken param of a scalar bool
- [[PR1012]](https://github.com/parthenon-hpc-lab/parthenon/pull/1012) Remove accidentally duplicated code
- [[PR992]](https://github.com/parthenon-hpc-lab/parthenon/pull/992) Allow custom PR ops with sparse pools
- [[PR988]](https://github.com/parthenon-hpc-lab/parthenon/pull/988) Fix bug in neighbor finding routine for small, periodic, refined meshes
- [[PR986]](https://github.com/parthenon-hpc-lab/parthenon/pull/986) Fix bug in sparse boundary communication BndInfo cacheing
- [[PR978]](https://github.com/parthenon-hpc-lab/parthenon/pull/978) remove erroneous sparse check
- [[PR 1023]](https://github.com/parthenon-hpc-lab/parthenon/pull/1023) Fix broken param of a scalar bool
- [[PR 1012]](https://github.com/parthenon-hpc-lab/parthenon/pull/1012) Remove accidentally duplicated code
- [[PR 992]](https://github.com/parthenon-hpc-lab/parthenon/pull/992) Allow custom PR ops with sparse pools
- [[PR 988]](https://github.com/parthenon-hpc-lab/parthenon/pull/988) Fix bug in neighbor finding routine for small, periodic, refined meshes
- [[PR 986]](https://github.com/parthenon-hpc-lab/parthenon/pull/986) Fix bug in sparse boundary communication BndInfo cacheing
- [[PR 978]](https://github.com/parthenon-hpc-lab/parthenon/pull/978) remove erroneous sparse check

### Infrastructure (changes irrelevant to downstream codes)
- [[PR 1027]](https://github.com/parthenon-hpc-lab/parthenon/pull/1027) Refactor RestartReader as abstract class
Expand Down Expand Up @@ -228,7 +228,7 @@ Date: 2023-11-16
- [[PR 901]](https://github.com/parthenon-hpc-lab/parthenon/pull/901) Implement shared element ownership model

### Removed (removing behavior/API/varaibles/...)
- [[PR 930](https://github.com/parthenon-hpc-lab/parthenon/pull/930) Remove ParthenonManager::ParthenonInit as it is error-prone and the split functions are the recommended usage.
- [[PR 930]](https://github.com/parthenon-hpc-lab/parthenon/pull/930) Remove ParthenonManager::ParthenonInit as it is error-prone and the split functions are the recommended usage.


## Release 0.8.0
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Parthenon -- a performance portable block-structured adaptive mesh refinement fr

* CMake 3.16 or greater
* C++17 compatible compiler
* Kokkos 4.0.1 or greater
* Kokkos 4.4.1 or greater

## Optional (enabling features)

Expand Down
1 change: 1 addition & 0 deletions cmake/machinecfg/GitHubActions.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ message(STATUS "Loading machine configuration for GitHub Actions CI. ")

# common options
set(NUM_MPI_PROC_TESTING "2" CACHE STRING "CI runs tests with 2 MPI ranks")
set(Kokkos_ENABLE_ROCTHRUST OFF CACHE BOOL "Temporarily disabled as the container needs to be updated to the `-complete` base image.")

set(MACHINE_CXX_FLAGS "")
if (${MACHINE_VARIANT} MATCHES "cuda")
Expand Down
28 changes: 28 additions & 0 deletions doc/sphinx/src/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,34 @@ parallelism interface that is needed for managing memory cached in
tightly nested loops. The wrappers are documented
:ref:`here <nested par for>`.

View of Views
-------------

Special care needs to be taken when working with a ``View`` of ``View``.

To repeat the Kokkos documenation: `Don't use them <https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/View.html#can-i-make-a-view-of-views>`__

But if you have to (which is the case in some places inside Parthenon)
then follow this pattern:

.. code:: c++

Kokkos::View<ParArray1D<Real> *> view_of_pararrays(parthenon::ViewOfViewAlloc("myname"), 10);
The ``ViewOfViewAlloc`` ensures that the ``Kokkos::SequentialHostInit`` property is added,
which results in the (inner ``View`` ) deallocators being called on the host (rather than on
the device by default).

Similarly, when you create a host mirror of said ``View`` of ``View`` add the additional
property for the same reason.

.. code:: c++

auto view_of_pararrays_h =
Kokkos::create_mirror_view(Kokkos::view_alloc(Kokkos::SequentialHostInit), view_of_pararrays);

Note that the ``SequentialHostInit`` was only added in Kokkos 4.4.1 (which is now the default in Parthenon).

The need for reductions within function handling ``MeshBlock`` data
-------------------------------------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion external/Kokkos
Submodule Kokkos updated 1031 files
3 changes: 0 additions & 3 deletions src/bvals/bvals.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,6 @@ class BoundarySwarm : public BoundaryCommunication {
explicit BoundarySwarm(std::weak_ptr<MeshBlock> pmb, const std::string &label);
~BoundarySwarm() = default;

std::vector<ParArrayND<int>> vars_int;
std::vector<ParArrayND<Real>> vars_real;

// (usuallly the std::size_t unsigned integer type)
std::vector<BoundaryCommunication *>::size_type bswarm_index;

Expand Down
5 changes: 3 additions & 2 deletions src/bvals/comms/bnd_info.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,9 @@
namespace parthenon {

void ProResCache_t::Initialize(int n_regions, StateDescriptor *pkg) {
prores_info = ParArray1D<ProResInfo>("prores_info", n_regions);
prores_info_h = Kokkos::create_mirror_view(prores_info);
prores_info = ProResInfoArr_t(ViewOfViewAlloc("prores_info"), n_regions);
prores_info_h = Kokkos::create_mirror_view(
Kokkos::view_alloc(Kokkos::SequentialHostInit), prores_info);
int nref_funcs = pkg->NumRefinementFuncs();
// Note that assignment of Kokkos views resets them, but
// buffer_subset_sizes is a std::vector. It must be cleared, then
Expand Down
7 changes: 4 additions & 3 deletions src/bvals/comms/bnd_info.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include "bvals/neighbor_block.hpp"
#include "coordinates/coordinates.hpp"
#include "interface/variable_state.hpp"
#include "kokkos_abstraction.hpp"
#include "mesh/domain.hpp"
#include "mesh/forest/logical_coordinate_transformation.hpp"
#include "utils/communication_buffer.hpp"
Expand Down Expand Up @@ -127,11 +128,11 @@ struct ProResInfo {
int GetBufferSize(MeshBlock *pmb, const NeighborBlock &nb,
std::shared_ptr<Variable<Real>> v);

using BndInfoArr_t = ParArray1D<BndInfo>;
using BndInfoArr_t = Kokkos::View<BndInfo *, LayoutWrapper, DevMemSpace>;
using BndInfoArrHost_t = typename BndInfoArr_t::HostMirror;

using ProResInfoArr_t = ParArray1D<ProResInfo>;
using ProResInfoArrHost_t = typename ParArray1D<ProResInfo>::HostMirror;
using ProResInfoArr_t = Kokkos::View<ProResInfo *, LayoutWrapper, DevMemSpace>;
using ProResInfoArrHost_t = typename ProResInfoArr_t::HostMirror;
class StateDescriptor;
struct ProResCache_t {
ProResInfoArr_t prores_info{};
Expand Down
6 changes: 4 additions & 2 deletions src/bvals/comms/bvals_utils.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
#include "bvals/comms/bnd_info.hpp"
#include "bvals/comms/bvals_in_one.hpp"
#include "interface/variable.hpp"
#include "kokkos_abstraction.hpp"
#include "mesh/domain.hpp"
#include "mesh/mesh.hpp"
#include "mesh/meshblock.hpp"
Expand Down Expand Up @@ -215,8 +216,9 @@ inline void RebuildBufferCache(std::shared_ptr<MeshData<Real>> md, int nbound,
using namespace loops;
using namespace loops::shorthands;
BvarsSubCache_t &cache = md->GetBvarsCache().GetSubCache(BOUND_TYPE, SENDER);
cache.bnd_info = BndInfoArr_t("bnd_info", nbound);
cache.bnd_info_h = Kokkos::create_mirror_view(cache.bnd_info);
cache.bnd_info = BndInfoArr_t(ViewOfViewAlloc("bnd_info"), nbound);
cache.bnd_info_h = Kokkos::create_mirror_view(
Kokkos::view_alloc(Kokkos::SequentialHostInit), cache.bnd_info);

// prolongation/restriction sub-sets
// TODO(JMM): Right now I exclude fluxcorrection boundaries but if
Expand Down
7 changes: 5 additions & 2 deletions src/interface/mesh_data.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include "interface/sparse_pack_base.hpp"
#include "interface/swarm_pack_base.hpp"
#include "interface/variable_pack.hpp"
#include "kokkos_abstraction.hpp"
#include "mesh/domain.hpp"
#include "mesh/meshblock.hpp"
#include "mesh/meshblock_pack.hpp"
Expand Down Expand Up @@ -149,8 +150,10 @@ const MeshBlockPack<P> &PackOnMesh(M &map, BlockDataList_t<Real> &block_data_,
}

if (make_new_pack) {
ParArray1D<P> packs("MeshData::PackVariables::packs", nblocks);
auto packs_host = Kokkos::create_mirror_view(packs);
Kokkos::View<P *, LayoutWrapper, DevMemSpace> packs(
ViewOfViewAlloc("MeshData::PackVariables::packs"), nblocks);
auto packs_host =
Kokkos::create_mirror_view(Kokkos::view_alloc(Kokkos::SequentialHostInit), packs);

for (size_t i = 0; i < nblocks; i++) {
const auto &pack = packing_function(block_data_[i], this_map, this_key);
Expand Down
11 changes: 7 additions & 4 deletions src/interface/sparse_pack_base.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
#include "interface/sparse_pack_base.hpp"
#include "interface/state_descriptor.hpp"
#include "interface/variable.hpp"
#include "kokkos_abstraction.hpp"
#include "utils/utils.hpp"
namespace parthenon {
namespace impl {
Expand Down Expand Up @@ -151,8 +152,9 @@ SparsePackBase SparsePackBase::Build(T *pmd, const PackDescriptor &desc,
} else if (contains_face_or_edge) {
leading_dim += 2;
}
pack.pack_ = pack_t("data_ptr", leading_dim, pack.nblocks_, max_size);
pack.pack_h_ = Kokkos::create_mirror_view(pack.pack_);
pack.pack_ = pack_t(ViewOfViewAlloc("data_ptr"), leading_dim, pack.nblocks_, max_size);
pack.pack_h_ = Kokkos::create_mirror_view(
Kokkos::view_alloc(Kokkos::SequentialHostInit), pack.pack_);

// For non-flat packs, shape of pack is type x block x var x k x j x i
// where type here might be a flux.
Expand All @@ -167,8 +169,9 @@ SparsePackBase SparsePackBase::Build(T *pmd, const PackDescriptor &desc,
pack.block_props_ = block_props_t("block_props", nblocks, 27 + 1);
pack.block_props_h_ = Kokkos::create_mirror_view(pack.block_props_);

pack.coords_ = coords_t("coords", desc.flat ? max_size : nblocks);
auto coords_h = Kokkos::create_mirror_view(pack.coords_);
pack.coords_ = coords_t(ViewOfViewAlloc("coords"), desc.flat ? max_size : nblocks);
auto coords_h = Kokkos::create_mirror_view(
Kokkos::view_alloc(Kokkos::SequentialHostInit), pack.coords_);

// Fill the views
int idx = 0;
Expand Down
6 changes: 4 additions & 2 deletions src/interface/sparse_pack_base.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
#include "interface/state_descriptor.hpp"
#include "interface/variable.hpp"
#include "interface/variable_state.hpp"
#include "kokkos_abstraction.hpp"
#include "utils/utils.hpp"

namespace parthenon {
Expand All @@ -55,13 +56,14 @@ class SparsePackBase {

using alloc_t = std::vector<int>;
using include_t = std::vector<bool>;
using pack_t = ParArray3D<ParArray3D<Real, VariableState>>;
using pack_t =
Kokkos::View<ParArray3D<Real, VariableState> ***, LayoutWrapper, DevMemSpace>;
using pack_h_t = typename pack_t::HostMirror;
using bounds_t = ParArray3D<int>;
using bounds_h_t = typename bounds_t::HostMirror;
using block_props_t = ParArray2D<int>;
using block_props_h_t = typename block_props_t::HostMirror;
using coords_t = ParArray1D<ParArray0D<Coordinates_t>>;
using coords_t = Kokkos::View<ParArray0D<Coordinates_t> *, LayoutWrapper, DevMemSpace>;

// Returns a SparsePackBase object that is either newly created or taken
// from the cache in pmd. The cache itself handles the all of this logic
Expand Down
17 changes: 10 additions & 7 deletions src/interface/swarm_pack_base.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
#include "interface/state_descriptor.hpp"
#include "interface/swarm_device_context.hpp"
#include "interface/variable.hpp"
#include "kokkos_abstraction.hpp"
#include "utils/utils.hpp"

namespace parthenon {
Expand All @@ -43,10 +44,10 @@ class SwarmPackBase {
SwarmPackBase() = default;
virtual ~SwarmPackBase() = default;

using pack_t = ParArray3D<ParArray1D<TYPE>>;
using pack_t = Kokkos::View<ParArray1D<TYPE> ***, LayoutWrapper, DevMemSpace>;
using bounds_t = ParArray3D<int>;
using contexts_t = ParArray1D<SwarmDeviceContext>;
using contexts_h_t = typename ParArray1D<SwarmDeviceContext>::HostMirror;
using contexts_t = Kokkos::View<SwarmDeviceContext *, LayoutWrapper, DevMemSpace>;
using contexts_h_t = typename contexts_t::HostMirror;
using max_active_indices_t = ParArray1D<int>;
using desc_t = impl::SwarmPackDescriptor<TYPE>;
using idx_map_t = std::unordered_map<std::string, std::size_t>;
Expand Down Expand Up @@ -108,8 +109,9 @@ class SwarmPackBase {

// Allocate the views
int leading_dim = 1;
pack.pack_ = pack_t("data_ptr", leading_dim, nblocks, max_size);
auto pack_h = Kokkos::create_mirror_view(pack.pack_);
pack.pack_ = pack_t(ViewOfViewAlloc("data_ptr"), leading_dim, nblocks, max_size);
auto pack_h = Kokkos::create_mirror_view(
Kokkos::view_alloc(Kokkos::SequentialHostInit), pack.pack_);

pack.bounds_ = bounds_t("bounds", 2, nblocks, nvar);
auto bounds_h = Kokkos::create_mirror_view(pack.bounds_);
Expand Down Expand Up @@ -153,8 +155,9 @@ class SwarmPackBase {
Kokkos::deep_copy(pack.pack_, pack_h);
Kokkos::deep_copy(pack.bounds_, bounds_h);

pack.contexts_ = contexts_t("contexts", nblocks);
pack.contexts_h_ = Kokkos::create_mirror_view(pack.contexts_);
pack.contexts_ = contexts_t(ViewOfViewAlloc("contexts"), nblocks);
pack.contexts_h_ = Kokkos::create_mirror_view(
Kokkos::view_alloc(Kokkos::SequentialHostInit), pack.contexts_);
pack.max_active_indices_ = max_active_indices_t("max_active_indices", nblocks);
pack.flat_index_map_ = max_active_indices_t("flat_index_map", nblocks + 1);
BuildSupplemental(pmd, desc, pack);
Expand Down
Loading

0 comments on commit 52b0ca0

Please sign in to comment.