Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification regarding Alignment decoration #332

Open
Snektron opened this issue Mar 26, 2023 · 5 comments
Open

Clarification regarding Alignment decoration #332

Snektron opened this issue Mar 26, 2023 · 5 comments

Comments

@Snektron
Copy link
Contributor

According to the SPIR-V specification, the Alignment decoration is used to assert that a pointer has a "known minimum alignment". It is unclear to me what exactly this entails: Judging from the wording, it seems that this is only used to aid optimization, and has no impact on the semantic correctness of a module. Sometimes, however, a variable is required to have a particular alignment that is different than those intrinsic to the type. Consider the following example:

alignas(64) int a;

Clang/LLVM generates the following module (some instructions removed for brevity, see full output here).

               OpDecorate %a Alignment 64
       %uint = OpTypeInt 32 0
     %uint_0 = OpConstant %uint 0
%_ptr_CrossWorkgroup_uint = OpTypePointer CrossWorkgroup %uint
       %void = OpTypeVoid
          %a = OpVariable %_ptr_CrossWorkgroup_uint CrossWorkgroup %uint_0
  1. Should the Alignment (and AlignmentId) decorations impose this alignment requirement on an OpVariable instruction if it is applied to one, or is the above module actually incorrect?
  2. If the module is incorrect regarding the current wording, having a way to actually do this would help out with code generation for some types of constants, such as unions. I believe that this would also be required to implement the full specification for C++ for OpenCL, for example.
@dneto0
Copy link
Contributor

dneto0 commented Apr 3, 2023

I think:

  1. Yes, the module snippet shown looks right to me.
  2. n/a

Generally "variable X has known minimum alignment of K" would mean that the byte address of the variable X must be divisible by K. Since the system is actually in control of allocating space and hence the addresses of these OpVariable's, then the decoration actually constrains that allocation.

If, on the other hand you bitcast an integer to a pointer to a type with an alignment requirement, then I think strictly speaking it's undefined behaviour if that original integer isn't divisible by the alignment requirement.

Some CPUs are forgiving and only run slower if the alignment requirement is violated, e.g. x86 is like this.
Some CPUs are unforgiving. For example, a SPARC processor will give trap with a bus error.

@Snektron
Copy link
Contributor Author

Snektron commented Apr 4, 2023

To clarify my use case a bit (for the WG), I want to implement code generation for unions. My idea was to approach this the same as how unions are represented in LLVM IR, and that is by generating different structs (one for each union field), and using pointers casts to access the different fields. The problem with this approach is that if you initialize a global union variable with some active field with some alignment, and later want to write a different active field to that union with a larger alignment, then the original OpVariable should be created with sufficient alignment. (Of course, one can work around this using bitcasts and what not, but the case can still be massaged in a way that that is not possible). To give a concrete example:

union A {
    struct {
        short s;
        unsigned i;
    };
    char c;
};

global A a = {.c = 45};

The alignment of A is 4 (because of the unsigned int). If we lower this to SPIR-V as follows, then the final OpVariable does not necessarily have the right alignment:

%short = OpTypeInt 16 1
 %uint = OpTypeInt 32 0
 %char = OpTypeInt 32 0
   %45 = OpConstant %char 45
    %7 = OpConstant %int 7 

%A_padding = OpTypeArray %char %7
      %A_c = OpTypeStruct %char %A_padding
  %ptr_A_c = OpTypePointer CrossWorkgroup %A_c 

%a_padding = OpUndef %A_padding
   %a_init = OpConstantComposite %A_c %45 %a_padding
        %a = OpVariable %ptr_A_c CrossWorkgroup %a_init

but now there is nothing on %a that sets the right alignment, and it cannot be safely written to with the other struct. So the question was: Does OpDecorate %a Alignment 4 fix that?

On a side note, with this exact case clang does not generate any alignment instructions, and just kinda assumes that the OpVariable already has the right alignment when writing a different active union field to it: https://godbolt.org/z/x98vT6xo5.

@johnkslang
Copy link
Member

I think the intention was to give hints about externally provided addresses. That's why the wording is like that:

Alignment

Apply only to a pointer. Alignment is an unsigned 32-bit integer declaring a known minimum alignment the pointer has.

Instead of saying "to tell the compiler where to allocate a variable".

Side point: I assume it's just a bug in the example %char is a 32-bit int, yes?

Could you not instead declare the variable with the biggest size needed?

If we need this feature, I recommend we either add an "align allocation to" feature, or generalize the existing feature to discuss both uses.

@Snektron
Copy link
Contributor Author

I think the intention was to give hints about externally provided addresses. That's why the wording is like that:

Yes, that was my impression also, hence this issue.

Side point: I assume it's just a bug in the example %char is a 32-bit int, yes?

Yes, thats supposed to be 8 bits.

Could you not instead declare the variable with the biggest size needed?

That would not completely solve the problem:

  • It would be impossible to initialize the variable using OpConstant instructions without a lot of hassle, because there would need to be a way to initialize that variable with a struct of a different type, essentially. For primitives, a bitcast can be used here, but that does not work with structs. One solution would be to convert each field of the struct manually using OpSpecConstantOp using bitcast, bit shifts, and bit masks. The other solution is what clang does, which is emit an initialization kernel that uses a pointer cast and memcopy at runtime, see this godbolt.
  • The example in my original comment, with alignas, would still not be expressable in SPIR-V.

@Snektron
Copy link
Contributor Author

Snektron commented Apr 16, 2023

To give a concrete case where this goes wrong:

               OpCapability Kernel
               OpCapability Addresses
               OpCapability Int8
               OpCapability Int16
               OpCapability GenericPointer
               OpMemoryModel Physical64 OpenCL
               OpEntryPoint Kernel %k "test" %a %b

               OpDecorate %b Alignment 4

       %void = OpTypeVoid
         %u8 = OpTypeInt 8 0
        %u16 = OpTypeInt 16 0
        %u32 = OpTypeInt 32 0

          %1 = OpConstant %u32 1
          %8 = OpConstant %u32 8

         %A = OpTypeArray %u8 %8
       %p_A = OpTypePointer Workgroup %A

        %B = OpTypeStruct %u32 %u32
      %p_B = OpTypePointer Workgroup %B
        %c = OpConstantComposite %B %1 %8

       %p_u8 = OpTypePointer Workgroup %u8
      %pp_u8 = OpTypePointer CrossWorkgroup %p_u8

          %a = OpVariable %p_u8 Workgroup
          %b = OpVariable %p_A Workgroup

          %K = OpTypeFunction %void %pp_u8

          %k = OpFunction %void None %K
      %out_0 = OpFunctionParameter %pp_u8
        %lbl = OpLabel

               ; Pointer-cast %b to %B and write something to it
         %bb = OpBitcast %p_B %b
               OpStore %bb %c Aligned 4

               ; Write address of %a to out[0]
               OpStore %out_0 %a

               ; Write address of %b to out[1]
          %x = OpBitcast %p_u8 %b
      %out_1 = OpInBoundsPtrAccessChain %pp_u8 %out_0 %1
               OpStore %out_1 %x

               OpReturn
               OpFunctionEnd

This SPIR-V is equivalent to the following kernel:

using A = uint8_t[8];

struct B {
    uint32_t a, b;
};

__local uint8_t a;
__local A b;

void test(__local uint8_t* global* out) {
    *(__local B*)b = {1, 8};

    out[0] = &a;
    out[1] = (__local uint8_t*) &b;
}

Some findings:

  • rusticl (with llvmpipe) allocates a at 0x9 and b at 0xA.
  • Intel's CPU OpenCL runtime (which uses SPIRV-LLVM-Translator) allocates b with alignment 8.
    • This runtime seems to honor the OpDecorate attribute up to alignment of 256 bytes.
    • The alignment is influenced by the OpStore to bb: This causes b to be aligned to 8 bytes. Removing this and the OpDecorate causes alignment of 1 byte.
    • The Aligned attribute on the OpStore does not affect alignment of b.

I created a repo with the experiment here: https://github.com/Snektron/spirv-alignment-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants