-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the Vulkan interop buffer kernel workgroup size to 256 #1828
Conversation
Using `local_size_x = 512` in the shader used in the Vulkan interop tests fails on older hardware because this exceeds the maximum barrier size limit. Changed to 256, which is still a multiple of 32 and 64, and is below the hardware limitations. The tests are not performance tests anyways, so the performance penalty, if any, is acceptable. Signed-off-by: Ahmed Hesham <ahmed.hesham@arm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is fine and more portable than it was previously, but if I'm reading the Vulkan 1.3 spec right the required minimum value for maxComputeWorkGroupInvocations
is 128. Would it be most portable to reduce the work-group size to 128, rather than 256?
Merging as discussed on Memory subgroup call of November 14, 2023. |
Vulkan guarantees 128 is always supported. Relates to KhronosGroup#1828 Signed-off-by: Kevin Petit <kevin.petit@arm.com>
Vulkan guarantees 128 is always supported. Relates to KhronosGroup#1828 Signed-off-by: Kevin Petit <kevin.petit@arm.com>
Vulkan guarantees 128 is always supported. Relates to #1828 Signed-off-by: Kevin Petit <kevin.petit@arm.com>
Using
local_size_x = 512
in the shader used in the Vulkan interop tests fails on older hardware because this exceeds the maximum barrier size limit.Changed to 256, which is still a multiple of 32 and 64, and is below the hardware limitations. The tests are not performance tests anyways, so the performance penalty, if any, is acceptable.
This fixes issue #1818