Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit vec alignment to 64 bytes #448

Merged
merged 2 commits into from
Sep 14, 2023

Conversation

gmlueck
Copy link
Contributor

@gmlueck gmlueck commented Jul 25, 2023

We currently require vec to be aligned according to its total storage size. For example, a vec<uint32_t, 8> must be aligned on a 32 byte boundary (32 = sizeof(uint32_t) * 8). This comes from OpenCL, which has the same requirement. The largest required alignment is for vec<double, 16>, which is typically 128 bytes.

However, this is problematic in SYCL because we want to support a variety of host compilers. The MSVC compiler has a limit of 64 byte alignment for function parameters. Therefore, the SYCL alignment requirements makes it impossible to pass a vec<double, 16> as a function parameter when compiling with MSVC. This is particularly problematic because many of the SYCL "builtin" math function take vec as a parameter type.

We therefore propose weakening the alignment requirement to make it possible to support MSVC. Of course, implementations are still allowed to align vec more highly if this results in more efficient code.

We considered removing the special alignment requirements entirely for vec and leaving this as an implementation detail. However, existing code may depend on this alignment, so we opted for the most conservative change that still allows MSVC to work. Therefore, we propose capping the required alignment to 64 bytes.

We currently require `vec` to be aligned according to its total storage
size.  For example, a `vec<uint32_t, 8>` must be aligned on a 32 byte
boundary (32 = sizeof(uint32_t) * 8).  This comes from OpenCL, which
has the same requirement.  The largest required alignment is for
`vec<double, 16>`, which is typically 128 bytes.

However, this is problematic in SYCL because we want to support a
variety of host compilers.  The MSVC compiler has a limit of 64 byte
alignment for function parameters.  Therefore, the SYCL alignment
requirements makes it impossible to pass a `vec<double, 16>` as a
function parameter when compiling with MSVC.  This is particularly
problematic because many of the SYCL "builtin" math function take
`vec` as a parameter type.

We therefore propose weakening the alignment requirement to make it
possible to support MSVC.  Of course, implementations are still allowed
to align `vec` more highly if this results in more efficient code.

We considered removing the special alignment requirements entirely for
`vec` and leaving this as an implementation detail.  However, existing
code may depend on this alignment, so we opted for the most
conservative change that still allows MSVC to work.  Therefore, we
propose capping the required alignment to 64 bytes.
Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a new interesting case of conflicting requirements between host and device.
It is not clear about what are the consequences for some hardware.
A program failing to run on a pedantic OpenCL device?
I have the feeling that it prevents using compilation flow with different compilers for host and device when they do not agree on this alignment constraint because it would not be possible to inject a compiler pass into MSVC to arbitrary increase the alignment by adding some padding elements in some structs for example.

@gmlueck
Copy link
Contributor Author

gmlueck commented Aug 21, 2023

It is not clear about what are the consequences for some hardware.
A program failing to run on a pedantic OpenCL device?

A vendor is responsible for providing a complete SYCL implementation, including both compilers and backends. If the vendor uses a pedantic OpenCL implementation that really does require vec to be highly aligned, they have several options. One option is to use a host compiler that provides this same alignment guarantee and implement vec with high alignment. A vendor is certainly allowed to align vec more highly if they want.

Another option is to implement SYCL vec<double, 16> as two OpenCL vectors, each with 8 elements aligned to 64 bytes. This is conformant to the OpenCL specification, so it should work even on a pedantic implementation.

I have the feeling that it prevents using compilation flow with different compilers for host and device when they do not agree on this alignment constraint because it would not be possible to inject a compiler pass into MSVC to arbitrary increase the alignment by adding some padding elements in some structs for example.

I do no think we ever expected that a vendor could use any arbitrary combination of host and device compiler to implement SYCL. A vendor is responsible for choosing (or configuring) the compilers to agree on certain things like the size of fundamental types, byte order, alignment requirements, etc.

when they do not agree on this alignment constraint

Note that the proposed change here does not add a new alignment constraint. In fact, it does the opposite -- it weakens an alignment constraint that already existed in the SYCL spec.

Perhaps your concern is not with the change proposed in this PR, but rather you are concerned with the original wording of the spec that guarantees any alignment of the vec type?

@keryell
Copy link
Member

keryell commented Aug 23, 2023

I am still confused about how it can work with a pedantic OpenCL implementation.
You have a sycl::vec<double, 16> on the host and you use it with USM à la HMM as a capture by address in a kernel.
The pointer you get has a 0.5 probability to be misaligned. So, what you propose is to have the device compiler to strip-mine any sycl::vec<double, 16> operation by operations on sycl::vec<double, 8>? This could work, even if I admit some ugliness to it. :-)
More annoying is that if you use OpenCL interoperability mode to call an existing OpenCL kernel, then we fail on using sycl::vec<double, 16> which was designed to be compatible with OpenCL... :-(

@gmlueck
Copy link
Contributor Author

gmlueck commented Aug 24, 2023

More annoying is that if you use OpenCL interoperability mode to call an existing OpenCL kernel, then we fail on using sycl::vec<double, 16> which was designed to be compatible with OpenCL... :-(

This is a good point. I think you're right, it's probably not possible to have a conformant implementation of SYCL on Windows that uses msvc as the host compiler and also uses a pedantic version of OpenCL that requires high alignment of vec<double, 16>.

The change proposed in this PR still gives implementors a choice, though. They can either use msvc and an OpenCL implementation that tolerates vec<double, 16> aligned at 64 bytes, or they can use a different host compiler, or they can use a backend other than OpenCL.

Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the argumentation.

@tomdeakin
Copy link
Contributor

Please add non-normative note justifying this.

@psalz
Copy link
Contributor

psalz commented Sep 7, 2023

Please add non-normative note justifying this.

Following up on this week's discussion on whether we should keep a changelog, this would be another good use-case IMO: including a quick note as to why a change was made, if it is not immediately obvious (like for a bug fix).

Add a non-normative note explaining why the guaranteed alignment is
limited to 64 bytes.
@gmlueck
Copy link
Contributor Author

gmlueck commented Sep 8, 2023

Please add non-normative note justifying this.

Done in 8d41edc

@tomdeakin
Copy link
Contributor

Ready to merge.

@gmlueck gmlueck merged commit 472b3a2 into KhronosGroup:SYCL-2020/master Sep 14, 2023
1 check passed
KornevNikita added a commit to KornevNikita/SYCL-CTS that referenced this pull request Sep 19, 2023
KhronosGroup/SYCL-Docs#448 limited sycl::vec
alignment to 64 bytes
KornevNikita added a commit to KornevNikita/SYCL-CTS that referenced this pull request Sep 19, 2023
KhronosGroup/SYCL-Docs#448 limited sycl::vec
alignment to 64 bytes
keryell added a commit to KhronosGroup/SYCL-CTS that referenced this pull request Sep 29, 2023
Apply sycl::vec 64 bytes limit.
KhronosGroup/SYCL-Docs#448 limited sycl::vec alignment to 64 bytes
@gmlueck gmlueck deleted the gmlueck/vec-align branch November 10, 2023 21:37
keryell pushed a commit that referenced this pull request Sep 10, 2024
Limit vec alignment to 64 bytes
gmlueck added a commit that referenced this pull request Nov 7, 2024
Limit vec alignment to 64 bytes

(cherry picked from commit 472b3a2)
gmlueck added a commit that referenced this pull request Nov 7, 2024
Limit vec alignment to 64 bytes

(cherry picked from commit 472b3a2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants