-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Matrix] Add joint matrix query for CUDA and HIP backends #12075
[SYCL][Matrix] Add joint matrix query for CUDA and HIP backends #12075
Conversation
else | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use only one instance of ((sM == 32 && sN == 32 && sK == 8) || (sM == 16 && sN == 16 && sK == 16)))
to be &&
ed with ORed std::is_same_v
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just tried it this way but the code now looks quite unreadable due to the one more OR in that case:
This would take shape of: the above conditions ORed with the extra case for double
:
if ((((sM == 32 && sN == 32 && sK == 8) ||
(sM == 16 && sN == 16 && sK == 16)) &&
(std::is_same_v<Ta, half> && std::is_same_v<Tc, float>) ||
(std::is_same_v<Ta, int8_t> && std::is_same_v<Tc, int32_t>) ||
(std::is_same_v<Ta, bfloat16> && std::is_same_v<Tc, float>)) ||
((sM == 16 && sN == 16 && sK == 4) &&
(std::is_same_v<Ta, double> && std::is_same_v<Tc, double>)))
btw, this is already after applying clang-format. I think for the sake of readability this should be left as is.
((sM == 32 && sN == 32 && sK == 8) || | ||
(sM == 16 && sN == 16 && sK == 16))) || | ||
(std::is_same_v<Ta, unsigned short> && std::is_same_v<Tc, float> && | ||
((sM == 32 && sN == 32 && sK == 8) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned short is not a supported input type. It seems bfloat16
is missing here.
bfloat16 is used in joint_matrix_hip_gfx90a.cpp test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
else | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may return the statement without if / else.
!std::is_same_v<Ta, void> && !std::is_same_v<Tb, void> && | ||
!std::is_same_v<Tc, void> && !std::is_same_v<Td, void> && | ||
std::is_same_v<Ta, Tb> && std::is_same_v<Tc, Td>)>::type> { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd replace std::enable_if<..>::type
with std::enable_if_t<..>
I'd also try to avoid the below static_assert
by bringing the required logic into the enable_if
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out! I switched to using std::enable_if_t
. Is there a reason for avoiding the static_assert
here? I think it gives a more informative error message, giving more context to the user as to why such a combination could be wrong.
"Invalid types for AMD gfx90a, supported types are half, float, " | ||
"int8_t, int32_t, double and bf16 (Note that unsigned short" | ||
"should be used in the DPC++ code to implement bf16) "); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bfloat16
is used in DPC++ code for instance in joint_matrix_hip_gfx90a.cpp test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to bfloat16
|
||
template <typename Ta, typename Tc> | ||
constexpr bool is_combination_valid_amd_gfx90a(size_t sM, size_t sN, | ||
size_t sK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why using sM
, sN
and sK
to represent dimensions. I appreciate you followed them for consistency, though.
!std::is_same_v<Ta, void> && !std::is_same_v<Tb, void> && | ||
!std::is_same_v<Tc, void> && !std::is_same_v<Td, void> && | ||
std::is_same_v<Ta, Tb> && std::is_same_v<Tc, Td> && sM != 0 && | ||
sN != 0 && sK != 0)>::type> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another instance to potentially use std::enable_if_t
and also improve it to have no need for below static_assert
sycl/source/detail/device_info.hpp
Outdated
@@ -718,6 +722,8 @@ struct get_device_info_impl< | |||
get(const DeviceImplPtr &Dev) { | |||
using namespace ext::oneapi::experimental::matrix; | |||
using namespace ext::oneapi::experimental; | |||
using oneapi_exp_arch = sycl::ext::oneapi::experimental::architecture; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems not used anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is actually used in line 814. The macro NVIDIA_AMD_ARCHES
defined a few lines above needs it:
auto GetArchNum = [](const architecture &arch) {
NVIDIA_AMD_ARCHES(CMP_NVIDIA_AMD_ARCH);
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed that line after incorporating the newest changes
// RUN: %{run} %t.out | ||
// | ||
// This tests the joint matrix runtime query for the cuda backend. | ||
// This test must be compiled with -Xsycl-target-backend --cuda-gpu-arch=sm_xx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this statement is actually true. I think that if you compile with default sm_50 the test will pass, even if you run it on e.g. sm_80.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add nvidia to the name of the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that other cuda tests use "_tensorcores" suffix as well. I think we should keep the name as-is. No need to add "nvidia".
sycl/source/detail/device_info.hpp
Outdated
throw sycl::exception( | ||
make_error_code(errc::runtime), | ||
"The current device architecture is not supported by " | ||
"sycl_ext_oneapi_device_architecture."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid this duplication. matrix_combinations query which is part of one extension, should not implement anything from separate extension, the extension should re-use another extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the error message completely and left only throw;
, since this part of the lambda will never be executed. The matching arch number will be always found for a given DeviceArch
, otherwise the error would be thrown earlier while querying for the DeviceArch
. Let me know if that looks plausible
|
||
template <typename Ta, typename Tc, typename Td> | ||
constexpr bool is_combination_valid_cuda_sm70(size_t sM, size_t sN, size_t sK) { | ||
return (((std::is_same_v<Ta, half> && std::is_same_v<Tc, float> && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Think it would be better to just call are_types_valid_cuda_sm70
here instead of repeating the logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that could definitely make use of are_types_valid
. Changed it now
@@ -0,0 +1,118 @@ | |||
// REQUIRES: cuda | |||
// RUN: %{build} -Xsycl-target-backend --cuda-gpu-arch=sm_70 -o %t.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, this arch flag isn't necessary for this test, you can use the default which means it will work on all supported devices.
// RUN: %{build} -Xsycl-target-backend --cuda-gpu-arch=sm_70 -o %t.out | |
// RUN: %{build} -o %t.out |
(note also see the below related suggested change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
std::move(sm_70_combinations.begin(), sm_70_combinations.end(), | ||
std::back_inserter(expected_combinations)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else { | |
return 0; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA part LGTM.
Could you write a short description, which acts as a commit message. |
Pinging @intel/llvm-reviewers-runtime, is this good to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parens can help to clarify grouping, but at this level they actually make things harder to read. Apart from that, things look sane
Thanks for review, changes were applied. |
@intel/llvm-reviewers-runtime can we get a review for this, please? |
sycl/source/detail/device_info.hpp
Outdated
if (Item.second == arch) | ||
return Item.first; | ||
} | ||
throw; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we throwing here? It's not immediately obvious in this wall of similar patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to throw sycl::exception
with the appropriate error message
sycl/source/detail/device_info.hpp
Outdated
std::move(sm_70_combinations.begin(), sm_70_combinations.end(), | ||
std::back_inserter(sm_80_combinations)); | ||
std::move(sm_72_combinations.begin(), sm_72_combinations.end(), | ||
std::back_inserter(sm_80_combinations)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were using C++20 I would have requested to rely on constexpr
creation of vectors instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what I meant, sorry for confusion. What I was thinking about is that maybe we can avoid std::move
in runtime altogether in C++20/C++23, and even then I wasn't sure.
Do you know how would std::back_inserter
of a constexpr
vector would behave? I think I'd prefer the contexper
to be dropped for now as it might be unclear for the average reader what happens here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've just realized this is not the way to go. We don't yet have constexpr std::vector
in C++ but I removed std::move
and used vec.insert()
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied all the changes and all checks are passing, does the last solution with vec.insert()
sound okay to you @aelovikov-intel ? And if so, could I get an approve on this please?
@@ -0,0 +1,33 @@ | |||
// REQUIRES: cuda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update CODEOWNERS
for the new cuda/matrix and hip/matrix directories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added intel/llvm-reviewers-cuda
to be the owner of those directories
.github/CODEOWNERS
Outdated
sycl/test/check_device_code/cuda/ @intel/llvm-reviewers-cuda | ||
sycl/test/check_device_code/cuda/matrix @intel/llvm-reviewers-cuda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can drop this, I wasn't aware its parent is already covered here.
.github/CODEOWNERS
Outdated
sycl/test/check_device_code/cuda/ @intel/llvm-reviewers-cuda | ||
sycl/test/check_device_code/cuda/matrix @intel/llvm-reviewers-cuda | ||
sycl/test/check_device_code/hip/matrix @intel/llvm-reviewers-cuda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should limit to a parent hip
directory, probably.
sycl/source/detail/device_info.hpp
Outdated
std::move(sm_70_combinations.begin(), sm_70_combinations.end(), | ||
std::back_inserter(sm_80_combinations)); | ||
std::move(sm_72_combinations.begin(), sm_72_combinations.end(), | ||
std::back_inserter(sm_80_combinations)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what I meant, sorry for confusion. What I was thinking about is that maybe we can avoid std::move
in runtime altogether in C++20/C++23, and even then I wasn't sure.
Do you know how would std::back_inserter
of a constexpr
vector would behave? I think I'd prefer the contexper
to be dropped for now as it might be unclear for the average reader what happens here.
Friendly ping @intel/llvm-gatekeepers, this is ready to be merged now. |
This PR adds joint matrix query for CUDA and HIP backends as described in sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc