Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC][ABI-BREAK] Remove bfloat16 math aspect. #13351

Merged
merged 10 commits into from
Jul 3, 2024

Conversation

JackAKirk
Copy link
Contributor

@JackAKirk JackAKirk commented Apr 10, 2024

This aspect is not required because bfloat16 math functions are implemented for all devices via generic implementations. This PR updates this status inline with the main bfloat16 extension/doc.

This aspect is completely unused in the latest repo and it's
documentation leads to confusion. bfloat16 math functions are supported
for all devices.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk JackAKirk requested a review from a team as a code owner April 10, 2024 15:03
Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to the doc look good.

Will you remove the aspect from the headers in a separate PR?

@JackAKirk
Copy link
Contributor Author

Changes to the doc look good.

Will you remove the aspect from the headers in a separate PR?

I grep'd for it and as far as I can tell it has already been completely removed? Also in Unified-Runtime.

@gmlueck
Copy link
Contributor

gmlueck commented Apr 10, 2024

Oh, I think there was a typo with the name of the aspect in the doc. If you grep for the correct name:

$ ff ext_oneapi_bfloat16_math_functions
./sycl/device_aspect_macros.hpp:#ifndef __SYCL_ALL_DEVICES_HAVE_ext_oneapi_bfloat16_math_functions__
./sycl/device_aspect_macros.hpp:// __SYCL_ASPECT(ext_oneapi_bfloat16_math_functions, 35)
./sycl/device_aspect_macros.hpp:#define __SYCL_ALL_DEVICES_HAVE_ext_oneapi_bfloat16_math_functions__ 0
./sycl/device_aspect_macros.hpp:#ifndef __SYCL_ANY_DEVICE_HAS_ext_oneapi_bfloat16_math_functions__
./sycl/device_aspect_macros.hpp:// __SYCL_ASPECT(ext_oneapi_bfloat16_math_functions, 35)
./sycl/device_aspect_macros.hpp:#define __SYCL_ANY_DEVICE_HAS_ext_oneapi_bfloat16_math_functions__ 0
./sycl/info/aspects.def:__SYCL_ASPECT(ext_oneapi_bfloat16_math_functions, 35)
./sycl/info/device_traits.def:__SYCL_PARAM_TRAITS_SPEC(device, ext_oneapi_bfloat16_math_functions, bool,

@JackAKirk
Copy link
Contributor Author

Oh, I think there was a typo with the name of the aspect in the doc. If you grep for the correct name:

$ ff ext_oneapi_bfloat16_math_functions
./sycl/device_aspect_macros.hpp:#ifndef __SYCL_ALL_DEVICES_HAVE_ext_oneapi_bfloat16_math_functions__
./sycl/device_aspect_macros.hpp:// __SYCL_ASPECT(ext_oneapi_bfloat16_math_functions, 35)
./sycl/device_aspect_macros.hpp:#define __SYCL_ALL_DEVICES_HAVE_ext_oneapi_bfloat16_math_functions__ 0
./sycl/device_aspect_macros.hpp:#ifndef __SYCL_ANY_DEVICE_HAS_ext_oneapi_bfloat16_math_functions__
./sycl/device_aspect_macros.hpp:// __SYCL_ASPECT(ext_oneapi_bfloat16_math_functions, 35)
./sycl/device_aspect_macros.hpp:#define __SYCL_ANY_DEVICE_HAS_ext_oneapi_bfloat16_math_functions__ 0
./sycl/info/aspects.def:__SYCL_ASPECT(ext_oneapi_bfloat16_math_functions, 35)
./sycl/info/device_traits.def:__SYCL_PARAM_TRAITS_SPEC(device, ext_oneapi_bfloat16_math_functions, bool,

I see thanks. OK I'll go through and remove all these.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk JackAKirk requested review from a team as code owners April 11, 2024 13:38
@JackAKirk JackAKirk changed the title [DOC] Remove reference to bfloat16 math aspect. [DOC][ABI-BREAK][BFLOAT16] Remove reference to bfloat16 math aspect. Apr 11, 2024
@JackAKirk JackAKirk added the abi-break change that's breaking abi and waiting for the next window to be able to merge label Apr 11, 2024
Remove mistaken change.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm-spirv changes should be done in the upstream translator PR, even though I don't think that they are necessary. Perhaps an even better change would be to drop that metadata from those tests completely, because it has nothing to do with SPIR-V translation.

llvm changes LGTM

@JackAKirk
Copy link
Contributor Author

JackAKirk commented Apr 12, 2024

@gmlueck removing this aspect check reveals that CI is failing for intel 11th gen cpu with:

__devicelib_ConvertBF16ToFINTEL [passing parameter 1 with incompatible type]
# | Linking failed
# |  -17 (PI_ERROR_LINK_PROGRAM_FAILURE)

This would seem to be because this backend is choosing the path of some native bf16 specializations irrespective of arch generation. I presume that for AMX cpus this would pass, but I guess this should also be verified.

I see that the aspect check is now made via UR_DEVICE_INFO_BFLOAT16 in unified runtime. L0 is also marked as not supporting this device aspect: https://github.com/oneapi-src/unified-runtime/blob/1473ed8a81ca86e67321dfc90c79e951ead212ac/source/adapters/level_zero/device.cpp#L792 . It looks like we will have to keep this aspect for now.

@JackAKirk JackAKirk marked this pull request as draft April 12, 2024 10:45
@JackAKirk
Copy link
Contributor Author

@gmlueck removing this aspect check reveals that CI is failing for intel 11th gen cpu with:

__devicelib_ConvertBF16ToFINTEL [passing parameter 1 with incompatible type]
# | Linking failed
# |  -17 (PI_ERROR_LINK_PROGRAM_FAILURE)

This would seem to be because this backend is choosing the path of some native bf16 specializations irrespective of arch generation. I presume that for AMX cpus this would pass, but I guess this should also be verified.

I see that the aspect check is now made via UR_DEVICE_INFO_BFLOAT16 in unified runtime. L0 is also marked as not supporting this device aspect: https://github.com/oneapi-src/unified-runtime/blob/1473ed8a81ca86e67321dfc90c79e951ead212ac/source/adapters/level_zero/device.cpp#L792 . It looks like we will have to keep this aspect for now.

Actually the problem is not specific to the math builtins test. It is the same issue that has cpu marked xfail on the bfloat16 type/conversions tests. So it probably does make sense to remove the aspect and then mark the bfloat16 math test xfail on the cpu backend to match the status of the rest of the bfloat16 functions.

@gmlueck
Copy link
Contributor

gmlueck commented Apr 12, 2024

Actually the problem is not specific to the math builtins test. It is the same issue that has cpu marked xfail on the bfloat16 type/conversions tests. So it probably does make sense to remove the aspect and then mark the bfloat16 math test xfail on the cpu backend to match the status of the rest of the bfloat16 functions.

I thought we had logic to emulate the BF16 operations when they are not supported natively on the device. @rdeodhar might remember more.

@rdeodhar
Copy link
Contributor

Actually the problem is not specific to the math builtins test. It is the same issue that has cpu marked xfail on the bfloat16 type/conversions tests. So it probably does make sense to remove the aspect and then mark the bfloat16 math test xfail on the cpu backend to match the status of the rest of the bfloat16 functions.

I thought we had logic to emulate the BF16 operations when they are not supported natively on the device. @rdeodhar might remember more.

When bfloat16 support was first implemented there were "bfloat16 native" and "bfloat16 emulation" libraries. When JITing, a runtime aspect check was made and depending on bfloat16 availability, one or the other library was linked in. In AOT mode, where a specific device was specified, the knowledge of which devices supported bfloat16 was known to the clang Driver and it linked in the proper library. The implementation has gone through some change and I don't know for sure how it is handled now.

@JackAKirk
Copy link
Contributor Author

JackAKirk commented Apr 15, 2024

Actually the problem is not specific to the math builtins test. It is the same issue that has cpu marked xfail on the bfloat16 type/conversions tests. So it probably does make sense to remove the aspect and then mark the bfloat16 math test xfail on the cpu backend to match the status of the rest of the bfloat16 functions.

I thought we had logic to emulate the BF16 operations when they are not supported natively on the device. @rdeodhar might remember more.

When bfloat16 support was first implemented there were "bfloat16 native" and "bfloat16 emulation" libraries. When JITing, a runtime aspect check was made and depending on bfloat16 availability, one or the other library was linked in. In AOT mode, where a specific device was specified, the knowledge of which devices supported bfloat16 was known to the clang Driver and it linked in the proper library. The implementation has gone through some change and I don't know for sure how it is handled now.

Yeah it seems that the switch to opaque pointers #9828 meant that the fallback bfloat16 conversions are not working/being called correctly.

So it will make sense to remove the aspect since there is nothing special about the math function failures on some intel cpus because it is due to the same opaque pointer bfloat16 conversion failure.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk JackAKirk marked this pull request as ready for review July 2, 2024 12:16
@JackAKirk
Copy link
Contributor Author

@KseniyaTikhomirova It would be great to get a review/merge this asap, since the abi break window is now open.

Thanks

@JackAKirk JackAKirk changed the title [DOC][ABI-BREAK][BFLOAT16] Remove reference to bfloat16 math aspect. [DOC][ABI-BREAK] Remove bfloat16 math aspect. Jul 2, 2024
Copy link
Contributor

@KseniyaTikhomirova KseniyaTikhomirova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JackAKirk
Copy link
Contributor Author

@intel/llvm-gatekeepers please merge.

windows failure is a unrelated CI issue seen in other PRs

@sarnex
Copy link
Contributor

sarnex commented Jul 2, 2024

The windows failure preventing the tests from running at all, I'd rather they run, so I'm restarting the CI.

@JackAKirk
Copy link
Contributor Author

This is up to date with sycl branch. Please do not cancel if there is no longer an issue in sycl branch as per the latest communication.

@ldrumm ldrumm merged commit f958dce into intel:sycl Jul 3, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abi-break change that's breaking abi and waiting for the next window to be able to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants