Limit vec alignment to 64 bytes #448

gmlueck · 2023-07-25T21:08:23Z

We currently require vec to be aligned according to its total storage size. For example, a vec<uint32_t, 8> must be aligned on a 32 byte boundary (32 = sizeof(uint32_t) * 8). This comes from OpenCL, which has the same requirement. The largest required alignment is for vec<double, 16>, which is typically 128 bytes.

However, this is problematic in SYCL because we want to support a variety of host compilers. The MSVC compiler has a limit of 64 byte alignment for function parameters. Therefore, the SYCL alignment requirements makes it impossible to pass a vec<double, 16> as a function parameter when compiling with MSVC. This is particularly problematic because many of the SYCL "builtin" math function take vec as a parameter type.

We therefore propose weakening the alignment requirement to make it possible to support MSVC. Of course, implementations are still allowed to align vec more highly if this results in more efficient code.

We considered removing the special alignment requirements entirely for vec and leaving this as an implementation detail. However, existing code may depend on this alignment, so we opted for the most conservative change that still allows MSVC to work. Therefore, we propose capping the required alignment to 64 bytes.

We currently require `vec` to be aligned according to its total storage size. For example, a `vec<uint32_t, 8>` must be aligned on a 32 byte boundary (32 = sizeof(uint32_t) * 8). This comes from OpenCL, which has the same requirement. The largest required alignment is for `vec<double, 16>`, which is typically 128 bytes. However, this is problematic in SYCL because we want to support a variety of host compilers. The MSVC compiler has a limit of 64 byte alignment for function parameters. Therefore, the SYCL alignment requirements makes it impossible to pass a `vec<double, 16>` as a function parameter when compiling with MSVC. This is particularly problematic because many of the SYCL "builtin" math function take `vec` as a parameter type. We therefore propose weakening the alignment requirement to make it possible to support MSVC. Of course, implementations are still allowed to align `vec` more highly if this results in more efficient code. We considered removing the special alignment requirements entirely for `vec` and leaving this as an implementation detail. However, existing code may depend on this alignment, so we opted for the most conservative change that still allows MSVC to work. Therefore, we propose capping the required alignment to 64 bytes.

keryell

That is a new interesting case of conflicting requirements between host and device.
It is not clear about what are the consequences for some hardware.
A program failing to run on a pedantic OpenCL device?
I have the feeling that it prevents using compilation flow with different compilers for host and device when they do not agree on this alignment constraint because it would not be possible to inject a compiler pass into MSVC to arbitrary increase the alignment by adding some padding elements in some structs for example.

gmlueck · 2023-08-21T15:44:46Z

It is not clear about what are the consequences for some hardware.
A program failing to run on a pedantic OpenCL device?

A vendor is responsible for providing a complete SYCL implementation, including both compilers and backends. If the vendor uses a pedantic OpenCL implementation that really does require vec to be highly aligned, they have several options. One option is to use a host compiler that provides this same alignment guarantee and implement vec with high alignment. A vendor is certainly allowed to align vec more highly if they want.

Another option is to implement SYCL vec<double, 16> as two OpenCL vectors, each with 8 elements aligned to 64 bytes. This is conformant to the OpenCL specification, so it should work even on a pedantic implementation.

I have the feeling that it prevents using compilation flow with different compilers for host and device when they do not agree on this alignment constraint because it would not be possible to inject a compiler pass into MSVC to arbitrary increase the alignment by adding some padding elements in some structs for example.

I do no think we ever expected that a vendor could use any arbitrary combination of host and device compiler to implement SYCL. A vendor is responsible for choosing (or configuring) the compilers to agree on certain things like the size of fundamental types, byte order, alignment requirements, etc.

when they do not agree on this alignment constraint

Note that the proposed change here does not add a new alignment constraint. In fact, it does the opposite -- it weakens an alignment constraint that already existed in the SYCL spec.

Perhaps your concern is not with the change proposed in this PR, but rather you are concerned with the original wording of the spec that guarantees any alignment of the vec type?

keryell · 2023-08-23T23:31:11Z

I am still confused about how it can work with a pedantic OpenCL implementation.
You have a sycl::vec<double, 16> on the host and you use it with USM à la HMM as a capture by address in a kernel.
The pointer you get has a 0.5 probability to be misaligned. So, what you propose is to have the device compiler to strip-mine any sycl::vec<double, 16> operation by operations on sycl::vec<double, 8>? This could work, even if I admit some ugliness to it. :-)
More annoying is that if you use OpenCL interoperability mode to call an existing OpenCL kernel, then we fail on using sycl::vec<double, 16> which was designed to be compatible with OpenCL... :-(

gmlueck · 2023-08-24T14:16:18Z

More annoying is that if you use OpenCL interoperability mode to call an existing OpenCL kernel, then we fail on using sycl::vec<double, 16> which was designed to be compatible with OpenCL... :-(

This is a good point. I think you're right, it's probably not possible to have a conformant implementation of SYCL on Windows that uses msvc as the host compiler and also uses a pedantic version of OpenCL that requires high alignment of vec<double, 16>.

The change proposed in this PR still gives implementors a choice, though. They can either use msvc and an OpenCL implementation that tolerates vec<double, 16> aligned at 64 bytes, or they can use a different host compiler, or they can use a backend other than OpenCL.

keryell

Thanks for the argumentation.

tomdeakin · 2023-09-07T16:31:28Z

Please add non-normative note justifying this.

psalz · 2023-09-07T16:54:03Z

Please add non-normative note justifying this.

Following up on this week's discussion on whether we should keep a changelog, this would be another good use-case IMO: including a quick note as to why a change was made, if it is not immediately obvious (like for a bug fix).

Add a non-normative note explaining why the guaranteed alignment is limited to 64 bytes.

gmlueck · 2023-09-08T14:37:38Z

Please add non-normative note justifying this.

Done in 8d41edc

tomdeakin · 2023-09-14T15:54:31Z

Ready to merge.

KhronosGroup/SYCL-Docs#448 limited sycl::vec alignment to 64 bytes

Apply sycl::vec 64 bytes limit. KhronosGroup/SYCL-Docs#448 limited sycl::vec alignment to 64 bytes

Limit vec alignment to 64 bytes

Limit vec alignment to 64 bytes (cherry picked from commit 472b3a2)

keryell reviewed Aug 10, 2023

View reviewed changes

AerialMantis approved these changes Aug 23, 2023

View reviewed changes

keryell approved these changes Aug 24, 2023

View reviewed changes

Add non-normative note

8d41edc

Add a non-normative note explaining why the guaranteed alignment is limited to 64 bytes.

gmlueck merged commit 472b3a2 into KhronosGroup:SYCL-2020/master Sep 14, 2023
1 check passed

KornevNikita added a commit to KornevNikita/SYCL-CTS that referenced this pull request Sep 19, 2023

Apply sycl::vec 64 bytes limit

814e538

KhronosGroup/SYCL-Docs#448 limited sycl::vec alignment to 64 bytes

KornevNikita added a commit to KornevNikita/SYCL-CTS that referenced this pull request Sep 19, 2023

Apply sycl::vec 64 bytes limit

b8f7ec4

KhronosGroup/SYCL-Docs#448 limited sycl::vec alignment to 64 bytes

KornevNikita mentioned this pull request Sep 19, 2023

Apply sycl::vec 64 bytes limit KhronosGroup/SYCL-CTS#802

Merged

keryell added a commit to KhronosGroup/SYCL-CTS that referenced this pull request Sep 29, 2023

Merge pull request #802 from KornevNikita/apply-vec-64-limit

8c8d422

Apply sycl::vec 64 bytes limit. KhronosGroup/SYCL-Docs#448 limited sycl::vec alignment to 64 bytes

gmlueck deleted the gmlueck/vec-align branch November 10, 2023 21:37

keryell pushed a commit that referenced this pull request Sep 10, 2024

Merge pull request #448 from gmlueck/gmlueck/vec-align

01cad54

Limit vec alignment to 64 bytes

gmlueck added a commit that referenced this pull request Nov 7, 2024

Merge pull request #448 from gmlueck/gmlueck/vec-align

e13b4fb

Limit vec alignment to 64 bytes (cherry picked from commit 472b3a2)

gmlueck added a commit that referenced this pull request Nov 7, 2024

Merge pull request #448 from gmlueck/gmlueck/vec-align

6386e94

Limit vec alignment to 64 bytes (cherry picked from commit 472b3a2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit vec alignment to 64 bytes #448

Limit vec alignment to 64 bytes #448

gmlueck commented Jul 25, 2023

keryell left a comment

gmlueck commented Aug 21, 2023

keryell commented Aug 23, 2023

gmlueck commented Aug 24, 2023

keryell left a comment

tomdeakin commented Sep 7, 2023

psalz commented Sep 7, 2023

gmlueck commented Sep 8, 2023

tomdeakin commented Sep 14, 2023

Limit vec alignment to 64 bytes #448

Limit vec alignment to 64 bytes #448

Conversation

gmlueck commented Jul 25, 2023

keryell left a comment

Choose a reason for hiding this comment

gmlueck commented Aug 21, 2023

keryell commented Aug 23, 2023

gmlueck commented Aug 24, 2023

keryell left a comment

Choose a reason for hiding this comment

tomdeakin commented Sep 7, 2023

psalz commented Sep 7, 2023

gmlueck commented Sep 8, 2023

tomdeakin commented Sep 14, 2023