Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HIP] Implement coarse-grained memory advice for the HIP adapter #1249

Merged

Conversation

GeorgeWeb
Copy link
Contributor

@GeorgeWeb GeorgeWeb commented Jan 15, 2024

Implements coarse grained memory access via a memory advice flag for HIP platforms on AMD hardware.

Related intel/llvm changes and testing: intel/llvm#12394

@GeorgeWeb GeorgeWeb force-pushed the georgi/hip_memadvise_coarse_grained branch from c817d04 to 52959b8 Compare January 15, 2024 14:11
@GeorgeWeb GeorgeWeb force-pushed the georgi/hip_memadvise_coarse_grained branch from 52959b8 to 7fdaed2 Compare January 15, 2024 14:11
@codecov-commenter
Copy link

codecov-commenter commented Jan 15, 2024

Codecov Report

Attention: 21 lines in your changes are missing coverage. Please review.

Comparison is base (79c28d0) 15.80% compared to head (2a9ded6) 15.51%.
Report is 41 commits behind head on main.

Files Patch % Lines
include/ur_print.hpp 11.11% 16 Missing ⚠️
test/conformance/enqueue/urEnqueueUSMAdvise.cpp 0.00% 5 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1249      +/-   ##
==========================================
- Coverage   15.80%   15.51%   -0.29%     
==========================================
  Files         223      238      +15     
  Lines       31481    33793    +2312     
  Branches     3558     3743     +185     
==========================================
+ Hits         4975     5243     +268     
- Misses      26455    28500    +2045     
+ Partials       51       50       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@GeorgeWeb GeorgeWeb requested a review from a team as a code owner January 17, 2024 11:12
Copy link
Contributor

@fabiomestre fabiomestre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to spec and tests look fine to me.
Just left a minor comment about the HIP adapter changes.

scripts/core/usm.yml Outdated Show resolved Hide resolved
@@ -92,30 +92,46 @@ ur_result_t setHipMemAdvise(const void *DevPtr, const size_t Size,
if (URAdviceFlags &
(UR_USM_ADVICE_FLAG_SET_NON_ATOMIC_MOSTLY |
UR_USM_ADVICE_FLAG_CLEAR_NON_ATOMIC_MOSTLY |
UR_USM_ADVICE_FLAG_BIAS_CACHED | UR_USM_ADVICE_FLAG_BIAS_UNCACHED)) {
UR_USM_ADVICE_FLAG_BIAS_CACHED | UR_USM_ADVICE_FLAG_BIAS_UNCACHED
#if !defined(__HIP_PLATFORM_AMD__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth it to add ifdef conditions for this? I thought that we don't really support HIP for Nvidia devices.

Copy link
Contributor Author

@GeorgeWeb GeorgeWeb Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actively support it but until we have completely dropped it officially there are a bunch of other entry points in the hip adapter that maintain codepaths for hip-nv platforms. Either way, I do not disagree to removing it and simplifying this function a bit this way.

…ory advise hint

Co-authored-by: Fábio <fabio.m.mestre@gmail.com>
@GeorgeWeb GeorgeWeb force-pushed the georgi/hip_memadvise_coarse_grained branch from 742ea0d to 2a9ded6 Compare January 24, 2024 11:44
@GeorgeWeb
Copy link
Contributor Author

@oneapi-src/unified-runtime-hip-write Can I get some eyes on this one?
I think it's in a good state but would certainly like to hear @ldrumm's opinion.

Copy link
Contributor

@ldrumm ldrumm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are functional, but I'm not happy about those loops. Will be good to go once that's sorted. Sorry for the late review

/// @cond
UR_USM_ADVICE_FLAG_FORCE_UINT32 = 0x7fffffff
/// @endcond

} ur_usm_advice_flag_t;
/// @brief Bit Mask for validating ur_usm_advice_flags_t
#define UR_USM_ADVICE_FLAGS_MASK 0xffff8000
#define UR_USM_ADVICE_FLAGS_MASK 0xfffe0000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now has fewer bits set. Is this right? If so that's a somewhat unusual definition of "mask"

Copy link
Contributor Author

@GeorgeWeb GeorgeWeb Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also generated from the source generation scripts and initially threw me off too, but I think I got it now though. It is actually the inverse mask that UR uses for validation.

@@ -6441,6 +6447,26 @@ inline ur_result_t printFlag<ur_usm_advice_flag_t>(std::ostream &os, uint32_t fl
}
os << UR_USM_ADVICE_FLAG_CLEAR_PREFERRED_LOCATION_HOST;
}

if ((val & UR_USM_ADVICE_FLAG_SET_NON_COHERENT_MEMORY) == (uint32_t)UR_USM_ADVICE_FLAG_SET_NON_COHERENT_MEMORY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're just copying from the surrounding style here, but if you're doing a bitwise AND of a single bit flag you only need to check against zero. Thus, the equality comparison and uint32_t cast is superfluous and just serves to make the line longer. Not going to block this patch, but it could do with cleaning up later, maybe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that could be brought up to UR because I think it was the generate cmake target that produced the code.

source/adapters/hip/enqueue.cpp Show resolved Hide resolved
source/adapters/hip/enqueue.cpp Show resolved Hide resolved
source/adapters/hip/enqueue.cpp Show resolved Hide resolved
@GeorgeWeb GeorgeWeb added the ready to merge Added to PR's which are ready to merge label Jan 26, 2024
@kbenzie kbenzie merged commit cd97e17 into oneapi-src:main Feb 2, 2024
51 checks passed
sommerlukas pushed a commit to intel/llvm that referenced this pull request Feb 5, 2024
Enables and tests coarse grained memory access via the memadvise
implementation for HIP platforms on AMD hardware.

See related UR changes for the adapter implementation:
oneapi-src/unified-runtime#1249

---------

Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Co-authored-by: aarongreig <aarongreig01@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge Added to PR's which are ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants