-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Kokkos version to 4.4.1 #1191
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely in favor (plus the bump to 4.4.1).
Just wanted to note that updating to Kokkos 4.4.x causes finalization of |
@pgrete It looks like that didn't fix it. Are there other view-of-views usage elsewhere in the code? |
Potentially. But the fix didn't work because I put the view_alloc info to the device view (which triggered a static assert fail). |
I guess all our packs are broken (so the corresponding unit tests fail correctly). |
Ugh, this gets uglier with the minute... I'm removing the "trivial" from the PR and adding the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there should be some sort of "announcement" for downstream devs that if they invoke views of views, they need to revisit things...
src/bvals/comms/bnd_info.hpp
Outdated
@@ -127,11 +128,11 @@ struct ProResInfo { | |||
int GetBufferSize(MeshBlock *pmb, const NeighborBlock &nb, | |||
std::shared_ptr<Variable<Real>> v); | |||
|
|||
using BndInfoArr_t = ParArray1D<BndInfo>; | |||
using BndInfoArr_t = Kokkos::View<BndInfo *, LayoutWrapper, DevMemSpace>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection... just for my own understanding, can you explain the necessity of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to pass the Kokkos::SequentialHostInit
property, which is done by not passing a string label but a properties object constructed via Kokkos::view_alloc(Kokkos::SequentialHostInit, label)
.
This works for Views
but ParArray#D
are not plain View
any more but carry a state
.
Thus, in order to pass the properties object, we'd have to mirror that interface for ParArray#D
, too.
Given that there's not place in the code (I could find) where we use the state
for the outer view in a view of view, I decided to just use plain Views again (rather than extending the interface).
In order to not leak (more) "Kokkos
" pieces in the codebase, I'd be happy to introduce RawParArray#D
that directly map to views if that gets a 👍 by more people.
src/bvals/comms/bvals_utils.hpp
Outdated
cache.bnd_info_h = Kokkos::create_mirror_view(cache.bnd_info); | ||
cache.bnd_info = BndInfoArr_t(ViewOfViewAlloc("bnd_info"), nbound); | ||
cache.bnd_info_h = Kokkos::create_mirror_view( | ||
Kokkos::view_alloc(Kokkos::SequentialHostInit), cache.bnd_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why kokkos
can't automatically handle this for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have our own type/calls that hide these kokkos calls for views of views to minimize boilerplate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why kokkos can't automatically handle this for us.
Probably because the general answer to views of views is: No :D
Should we have our own type/calls that hide these kokkos calls for views of views to minimize boilerplate?
Yes, I can add that. How about ViewOfViewMirror(view)
to mirror the ViewOfViewAlloc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested downstream and it looks like some tests are failing but otherwise LGTM. (I can't approve since it was my branch originally ;) )
src/bvals/comms/bvals_utils.hpp
Outdated
cache.bnd_info_h = Kokkos::create_mirror_view(cache.bnd_info); | ||
cache.bnd_info = BndInfoArr_t(ViewOfViewAlloc("bnd_info"), nbound); | ||
cache.bnd_info_h = Kokkos::create_mirror_view( | ||
Kokkos::view_alloc(Kokkos::SequentialHostInit), cache.bnd_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have our own type/calls that hide these kokkos calls for views of views to minimize boilerplate?
// on the host. If the ViewOfViews in on the device, then `SequentialHostInit` should be | ||
// passed when calling `create_mirror_view`. | ||
template <typename T = DevMemSpace> | ||
auto ViewOfViewAlloc(const std::string &label) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
The failing tests are due to the increased build size, see discussion on Matrix.
|
But if you have to (which is the case in some places inside Parthenon) | ||
then follow this pattern: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this preferred over just making the inner View
s unmanaged in a View
of View
s? My understanding is that the unmanaged view doesn't call the destructors, so doing this would also solve the problem since I don't think we ever make a view of views that actually manages the inner views (maybe this is wrong?). I think it would be trivial to make an UnmanagedParArray#D
by just changing the view type it is templated on. Then we wouldn't have to have raw Kokkos views floating around in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't know and potentially am afraid of some unknown side effects.
What happens to the reference counter of the existing managed view when we assign it to an unmanaged element of the outer view.
Similarly, for the boundary info and prolong/restrict info outer view that contain objects which implicitly contain ParArray#D
s. If they're not reference counted, we'd manually have to destruct those, don't we (or how else would the inner object know that they don't need to exist anymore)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the Kokkos docs:
“Unmanaged” means that Kokkos does not do reference counting or automatic deallocation for those Views.
I think this implies that the unmanaged view doesn't increment the reference count. So it is possible that the other view pointing to the same memory deallocates the memory while the unmanaged view still points to it, but the unmanaged view will never try to call the destructor.
I believe that in BndInfo
and prolong/restrict info we are never creating a ParArray#D
that is only held by those objects. Rather, we are essentially "pointing" to ParArray#D
s that are held elsewhere (e.g. the communication buffer, variable data). I think the other places those objects persist longer than the packs, BndInfo
, etc. so it is safe to have the internal Views in views of views unmanaged.
I think my real concern is that it is unclear (at least to me) what SequentialHostInit
is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the Kokkos docs:
“Unmanaged” means that Kokkos does not do reference counting or automatic deallocation for those Views.
I think this implies that the unmanaged view doesn't increment the reference count. So it is possible that the other view pointing to the same memory deallocates the memory while the unmanaged view still points to it, but the unmanaged view will never try to call the destructor.
I believe that in
BndInfo
and prolong/restrict info we are never creating aParArray#D
that is only held by those objects. Rather, we are essentially "pointing" toParArray#D
s that are held elsewhere (e.g. the communication buffer, variable data). I think the other places those objects persist longer than the packs,BndInfo
, etc. so it is safe to have the internal Views in views of views unmanaged.I think my real concern is that it is unclear (at least to me) what
SequentialHostInit
is doing.
The PR that added this feature to Kokkos is just a few lines (kokkos/kokkos#7229 excluding the test code).
AFAIK the property effectively tells Kokkos to explicitly call the destructor (line 229 https://github.com/kokkos/kokkos/pull/7229/files#diff-0d719ff2418eb0512b065dca1765d2788ddd4a524734f8f96b7af34a22f4b560) which results in a clean deallocation of the inner view.
Otherwise, the default dtor of the outer view would call the dtor of the inner views in a parallel region, which is illegal.
But maybe I'm also wrong.
But if that's correct, then I somehow feel more comfortable with this way (for the moment) as I understand that concept whereas it's unclear to me if there are any side effects (not necessarily from a Kokkos point of view but from the Parthenon usage point of view) in using unmanaged inner views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what I didn't understand was that it seemed like I could only set SequentialHostInit
for host space views of device space views and it seemed like we actually needed this for device space views of device space views. But it is very possible that I just don't understand what is going on.
Just throwing some input here -- this PR was necessary for Parthenon to work on Venado with multiple GPUs per node after the recent DST. Old Parthenon builds and downstream codes (with
Updating to this PR avoids the issue. |
Alright, I made the changes as discussed last week, i.e.,
Given the extend of the changes, might be good if someone could have a final look before merging. |
@brryan @bprather is this working for your downstreams? @pdmullen @lroberts36 @AstroBarker we should check for phoebus and riot. (Though I don't expect any blocks.) |
The HIP builds are failing due to a compiler bug:
But I don't think it makes sense to build for gfx1030, since that's not a datacenter GPU... Would it make sense to change the CI build to |
Or is this a compiler issue? |
Ah, didn't realize the CI machine had a gfx1030 card. Yes, definitely a compiler issue. |
This works for me downstream in Artemis on Venado GPUs, at least with 2 ranks on one interactive node. The ultimate test is 8 ranks on 2 nodes but the machine seems busy right now so I think from my perspective its fine if this gets merged now. |
Bumping rocm to 6.2 didn't fix the issue. |
PR Summary
Bumps Kokkos to 4.4.1.
This revealed some memory leaks in our view of view (because we never deallocate/destruct the inner views).
This PR fixes this by using the
SequentialHostInit
property that was introduced in 4.4.1 for the outer view allocation.With this property the destructors of the inner views are called plainly on the host (rather than inside a parallel region, which is illegal in the Kokkos programming model).
Moreover, this required different logic as our view of view are on device (so we need to take special care when to pass
SequentialHostInit
because it cannot be passed to a device outer view).Thus we only pass
SequentialHostInit
in the outer view allocation when compiling for a host execution space (because mirror views are noops then) and only passSequentialHostInit
for the outer view host mirror, when compiling on device.Given that our ParArray#D interface did not allow to parse arbitrary allocation properties in the ctor, I decided to switch the outer views to plain Kokkos::Views (without
state
info) over changing the ctor interface of the ParArray#D to also parse allocation properties (to keep the interface small/clean).It's probably worth to double check if any of the outer views need a
state
(but I don't think so because the code compiles fine).Additional details/discussion also #1205 and #1193
PR Checklist