[subgroups][non_uniform_broadcast] Fix broadcasting index generation #1680

Nuullll · 2023-03-23T06:02:28Z

Dynamically activate half of the work items in the current subgroup instead of hardcoding the number of active work items.

Otherwise, if subgroup size == NR_OF_ACTIVE_WORK_ITEMS == 4, then we will encounter "divide-by-zero" error when evaluating bcast_index % (n - NR_OF_ACTIVE_WORK_ITEMS).

Signed-off-by: Yilong Guo yilong.guo@intel.com

The subgroup size may not be greater than `NR_OF_ACTIVE_WORK_ITEMS`. Broadcasting index needs to be reduced in that case. Otherwise, if subgroup size == `NR_OF_ACTIVE_WORK_ITEMS` == 4, then we will encounter "divide-by-zero" error when evaluating `bcast_index % (n - NR_OF_ACTIVE_WORK_ITEMS)`.

Nuullll · 2023-03-27T03:19:06Z

gentle ping :-)

Nuullll · 2023-03-30T04:51:19Z

@StuartDBrady Can you please take a look?

Nuullll · 2023-04-12T06:28:09Z

ping @svenvh @StuartDBrady

svenvh · 2023-04-20T13:37:06Z

test_conformance/subgroups/subgroup_common_templates.h

                    // last workgroup last subgroup
                    if (last_subgroup_size && j == nj - 1
                        && last_subgroup_size < NR_OF_ACTIVE_WORK_ITEMS)
                    {
                        bcast_if = bcast_index % last_subgroup_size;
                        bcast_elseif = bcast_if;
                    }
+                    // reduce broadcasting index in case subgroup size <=
+                    // NR_OF_ACTIVE_WORK_ITEMS (i.e. all items are active)


This highlights a problem with the broadcast tests: they currently are not meant to test subgroup sizes <= NR_OF_ACTIVE_WORK_ITEMS (i.e., 4). Your proposed fix causes the test to skip an important aspect of non-uniform subgroup operations, as the else in sub_group_non_uniform_broadcast_source will not be executed. That means the subgroup operation will not be tested properly when the subgroup size is <= 4, so I don't think we should commit this.

Instead, we should probably try to get rid of NR_OF_ACTIVE_WORK_ITEMS and use work-item masks to introduce divergence in the broadcast tests (as done for e.g. sub_group_non_uniform_any).

Thanks for reviewing. I agree. Will update the PR.

…eration" This reverts commit 9bbab53.

Dynamically activate half of the work items in the current subgroup instead of hardcoding as `NR_OF_ACTIVE_WORK_ITEMS`.

StuartDBrady · 2023-10-10T18:42:17Z

@Nuullll, would you be able to use the same mechanism as used for sub_group_non_uniform_any, instead? Without that, I don't think we test with all items active, for example.

Nuullll · 2023-10-12T06:43:34Z

@Nuullll, would you be able to use the same mechanism as used for sub_group_non_uniform_any, instead? Without that, I don't think we test with all items active, for example.

@StuartDBrady

I did try to use the uint4 work_item_mask_vector parameter to represent the active items. The problem is that we need to provide a corresponding broadcasting index for each mask, as the second parameter of sub_group_non_uniform_broadcast call. However, the current gen() implementation is independent of the actual mask:

OpenCL-CTS/test_conformance/subgroups/subhelpers.h

Lines 1614 to 1633 in c73d6a3

    
           // Generate the desired input for the kernel 
        
           test_params.subgroup_size = subgroup_size; 
        
           Fns::gen(idata.data(), mapin.data(), sgmap.data(), test_params); 
        
           test_status status; 
        
           if (test_params.divergence_mask_arg != -1) 
        
           { 
        
               for (auto &mask : test_params.all_work_item_masks) 
        
               { 
        
                   test_params.work_items_mask = mask; 
        
                   cl_uint4 mask_vector = bs128_to_cl_uint4(mask); 
        
                   clSetKernelArg(kernel, test_params.divergence_mask_arg, 
        
                                  sizeof(cl_uint4), &mask_vector); 
        
                   status = executor.run_and_check(test_params); 
        
                   if (status == TEST_FAIL) break; 
        
               } 
        
           }

Of course, we can do the following (pseudo code):

__kernel void test(..., uint4 work_item_mask_vector) {
    ...
    if (elect_work_item & work_item_mask)
        out[gid] = sub_group_non_uniform_broadcast(x, get_index_of_one_bit(work_item_mask_vector));
    else
        out[gid] = sub_group_non_uniform_broadcast(x, get_index_of_one_bit(~work_item_mask_vector));

get_index_of_one_bit could be anything as long as it produces an index of a set bit according to the dynamic mask value. And we need to implement the same get_index_of_one_bit algorithm in host chk() function. I think it's just another form of hardcoding on the broadcasting index.

In my opinion, the current approach (hardcoding the divergence condition both in kernel code and gen() implementation, and taking the broadcast index from the random input) is much simpler. Or I could have missed something, could you please give some advice?

Nuullll · 2024-02-07T03:13:57Z

any comments?

svenvh

In my opinion, the current approach (hardcoding the divergence condition both in kernel code and gen() implementation, and taking the broadcast index from the random input) is much simpler. Or I could have missed something, could you please give some advice?

As @StuartDBrady pointed out, we don't seem to test sub_group_non_uniform_broadcast with all work-items active, which is an existing gap in test coverage. It would be nice to address that, but if you'd rather only remove the hardcoded split of 4 in this PR, that should be fine too I suppose.

test_conformance/subgroups/subgroup_common_templates.h

Nuullll · 2024-03-04T09:19:36Z

friendly ping.

bashbaug · 2024-03-12T16:25:03Z

Merging as discussed in the March 12th teleconference.

…hronosGroup#1680) (KhronosGroup#58) cherry-pick KhronosGroup@a045f76. CMPLRLLVM-60752. Co-authored-by: Yilong Guo <yilong.guo@intel.com>

svenvh requested a review from StuartDBrady March 23, 2023 09:07

svenvh requested changes Apr 20, 2023

View reviewed changes

Nuullll added 3 commits September 27, 2023 15:38

Merge branch 'master' into fix-sub_group_non_uniform_broadcast

082f5dd

Revert "[subgroups][non_uniform_broadcast] Fix broadcasting index gen…

2dcf4af

…eration" This reverts commit 9bbab53.

[subgroups][non_uniform_broadcast] Fix broadcasting index generation

cf67385

Dynamically activate half of the work items in the current subgroup instead of hardcoding as `NR_OF_ACTIVE_WORK_ITEMS`.

Nuullll requested a review from svenvh September 27, 2023 09:42

bashbaug added the focused review label Feb 7, 2024

svenvh reviewed Feb 8, 2024

View reviewed changes

test_conformance/subgroups/subgroup_common_templates.h Show resolved Hide resolved

test_conformance/subgroups/subgroup_common_templates.h Show resolved Hide resolved

Nuullll added 2 commits February 28, 2024 11:57

Merge branch 'main' into fix-sub_group_non_uniform_broadcast

050239c

Apply suggestion

a4ff687

Nuullll requested review from svenvh and lakshmih February 28, 2024 04:16

svenvh approved these changes Mar 6, 2024

View reviewed changes

svenvh mentioned this pull request Mar 6, 2024

Subgroups tests - adjust test non_uniform_broadcast and broadcast_first to use mask for active workitems #1198

Open

lakshmih approved these changes Mar 8, 2024

View reviewed changes

bashbaug merged commit a045f76 into KhronosGroup:main Mar 12, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[subgroups][non_uniform_broadcast] Fix broadcasting index generation #1680

[subgroups][non_uniform_broadcast] Fix broadcasting index generation #1680

Nuullll commented Mar 23, 2023 •

edited

Loading

Nuullll commented Mar 27, 2023

Nuullll commented Mar 30, 2023

Nuullll commented Apr 12, 2023 •

edited

Loading

svenvh Apr 20, 2023

Nuullll Apr 21, 2023

StuartDBrady commented Oct 10, 2023

Nuullll commented Oct 12, 2023 •

edited

Loading

Nuullll commented Feb 7, 2024

svenvh left a comment

Nuullll commented Mar 4, 2024

bashbaug commented Mar 12, 2024

[subgroups][non_uniform_broadcast] Fix broadcasting index generation #1680

[subgroups][non_uniform_broadcast] Fix broadcasting index generation #1680

Conversation

Nuullll commented Mar 23, 2023 • edited Loading

Nuullll commented Mar 27, 2023

Nuullll commented Mar 30, 2023

Nuullll commented Apr 12, 2023 • edited Loading

svenvh Apr 20, 2023

Choose a reason for hiding this comment

Nuullll Apr 21, 2023

Choose a reason for hiding this comment

StuartDBrady commented Oct 10, 2023

Nuullll commented Oct 12, 2023 • edited Loading

Nuullll commented Feb 7, 2024

svenvh left a comment

Choose a reason for hiding this comment

Nuullll commented Mar 4, 2024

bashbaug commented Mar 12, 2024

Nuullll commented Mar 23, 2023 •

edited

Loading

Nuullll commented Apr 12, 2023 •

edited

Loading

Nuullll commented Oct 12, 2023 •

edited

Loading