-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[subgroups][non_uniform_broadcast] Fix broadcasting index generation #1680
[subgroups][non_uniform_broadcast] Fix broadcasting index generation #1680
Conversation
The subgroup size may not be greater than `NR_OF_ACTIVE_WORK_ITEMS`. Broadcasting index needs to be reduced in that case. Otherwise, if subgroup size == `NR_OF_ACTIVE_WORK_ITEMS` == 4, then we will encounter "divide-by-zero" error when evaluating `bcast_index % (n - NR_OF_ACTIVE_WORK_ITEMS)`.
gentle ping :-) |
@StuartDBrady Can you please take a look? |
ping @svenvh @StuartDBrady |
// last workgroup last subgroup | ||
if (last_subgroup_size && j == nj - 1 | ||
&& last_subgroup_size < NR_OF_ACTIVE_WORK_ITEMS) | ||
{ | ||
bcast_if = bcast_index % last_subgroup_size; | ||
bcast_elseif = bcast_if; | ||
} | ||
// reduce broadcasting index in case subgroup size <= | ||
// NR_OF_ACTIVE_WORK_ITEMS (i.e. all items are active) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This highlights a problem with the broadcast tests: they currently are not meant to test subgroup sizes <= NR_OF_ACTIVE_WORK_ITEMS
(i.e., 4). Your proposed fix causes the test to skip an important aspect of non-uniform subgroup operations, as the else
in sub_group_non_uniform_broadcast_source
will not be executed. That means the subgroup operation will not be tested properly when the subgroup size is <= 4
, so I don't think we should commit this.
Instead, we should probably try to get rid of NR_OF_ACTIVE_WORK_ITEMS
and use work-item masks to introduce divergence in the broadcast tests (as done for e.g. sub_group_non_uniform_any
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing. I agree. Will update the PR.
…eration" This reverts commit 9bbab53.
Dynamically activate half of the work items in the current subgroup instead of hardcoding as `NR_OF_ACTIVE_WORK_ITEMS`.
@Nuullll, would you be able to use the same mechanism as used for |
I did try to use the OpenCL-CTS/test_conformance/subgroups/subhelpers.h Lines 1614 to 1633 in c73d6a3
Of course, we can do the following (pseudo code): __kernel void test(..., uint4 work_item_mask_vector) {
...
if (elect_work_item & work_item_mask)
out[gid] = sub_group_non_uniform_broadcast(x, get_index_of_one_bit(work_item_mask_vector));
else
out[gid] = sub_group_non_uniform_broadcast(x, get_index_of_one_bit(~work_item_mask_vector));
In my opinion, the current approach (hardcoding the divergence condition both in kernel code and |
any comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, the current approach (hardcoding the divergence condition both in kernel code and gen() implementation, and taking the broadcast index from the random input) is much simpler. Or I could have missed something, could you please give some advice?
As @StuartDBrady pointed out, we don't seem to test sub_group_non_uniform_broadcast
with all work-items active, which is an existing gap in test coverage. It would be nice to address that, but if you'd rather only remove the hardcoded split of 4 in this PR, that should be fine too I suppose.
friendly ping. |
Merging as discussed in the March 12th teleconference. |
…hronosGroup#1680) (KhronosGroup#58) cherry-pick KhronosGroup@a045f76. CMPLRLLVM-60752. Co-authored-by: Yilong Guo <yilong.guo@intel.com>
Dynamically activate half of the work items in the current subgroup instead of hardcoding the number of active work items.
Otherwise, if subgroup size ==
NR_OF_ACTIVE_WORK_ITEMS
== 4, then we will encounter "divide-by-zero" error when evaluatingbcast_index % (n - NR_OF_ACTIVE_WORK_ITEMS)
.Signed-off-by: Yilong Guo yilong.guo@intel.com