Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NATIVECPU] Support reqd_work_group_size on Native CPU #1477

Merged
merged 3 commits into from
May 15, 2024

Conversation

PietroGhg
Copy link
Contributor

@PietroGhg PietroGhg commented Mar 27, 2024

Adds support to the reqd_work_group_size kernel attribute. The metadata handling mechanism is similar to what is done in the CUDA adapter.
DPC++ PR: intel/llvm#13175

@@ -67,6 +78,10 @@ struct ur_kernel_handle_t_ : RefCounted {
}
}

bool hasReqdWGSize() { return HasReqdWGSize; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 new methods hasReqdWGSize and getReqdWGSize could be made const but that could probably be done in a subsequent PR to not hold up this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you

size_t MDElemsSize = MetadataElement.size - sizeof(std::uint64_t);

// Expect between 1 and 3 32-bit integer values.
UR_ASSERT(MDElemsSize >= sizeof(std::uint32_t) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to directly check for the three expected multiples of sizeof(std::uint32_t) in case the meta data format has changed or got corrupted. This would also catch a possible underflow in MetadataElement.size - sizeof(std::uint64_t), or a case when MDElemsSize is not a multiple of sizeof(std::uint32_t) for whatever reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are right, I've done that

if (Tag == __SYCL_UR_PROGRAM_METADATA_TAG_REQD_WORK_GROUP_SIZE) {
native_cpu::ReqdWGSize_t reqdWGSize;
getReqdWGSize(mdNode, reqdWGSize);
hProgram->KernelReqdWorkGroupSizeMD[Prefix] = reqdWGSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The right-hand side could perhaps be std::move(reqdWGSize) which may not make a difference now, but just in case native_cpu::ReqdWGSize_t becomes a moveable type in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you

@kbenzie kbenzie added the native-cpu Native CPU adapter specific issues label Apr 10, 2024
auto [Prefix, Tag] = splitMetadataName(mdName);
if (Tag == __SYCL_UR_PROGRAM_METADATA_TAG_REQD_WORK_GROUP_SIZE) {
native_cpu::ReqdWGSize_t reqdWGSize;
getReqdWGSize(mdNode, reqdWGSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the value returned by getReqdWGSize seems to be missing (unless UR_ASSERT is implemented as throwing an exception)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting it, I've added a check


ur_kernel_handle_t_(const ur_kernel_handle_t_ &other)
: _name(other._name), _subhandler(other._subhandler), _args(other._args),
_localArgInfo(other._localArgInfo), _localMemPool(other._localMemPool),
_localMemPoolSize(other._localMemPoolSize) {
_localMemPoolSize(other._localMemPoolSize),
HasReqdWGSize(other.HasReqdWGSize), ReqdWGSize(other.ReqdWGSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there also be a hProgram(other.hProgram) in the mem initiliser list?

if (res != UR_RESULT_SUCCESS) {
return res;
}
hProgram->KernelReqdWorkGroupSizeMD[Prefix] = std::move(reqdWGSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor (for a later PR): It might be worth diagnosing when an existing prefix gets overwritten with a different reqdWGSize

@PietroGhg PietroGhg added the ready to merge Added to PR's which are ready to merge label May 3, 2024
@kbenzie kbenzie merged commit 50452b7 into oneapi-src:main May 15, 2024
51 checks passed
steffenlarsen pushed a commit to intel/llvm that referenced this pull request May 15, 2024
Adds support for the `reqd_work_group_size` attribute on Native CPU. The
change in the driver is required in order to have the metadata node
available in the runtime.
UR PR: oneapi-src/unified-runtime#1477
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
native-cpu Native CPU adapter specific issues ready to merge Added to PR's which are ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants