Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array wrapper may cause a GPU crash? #2308

Closed
js6i opened this issue Apr 12, 2024 · 6 comments
Closed

Array wrapper may cause a GPU crash? #2308

js6i opened this issue Apr 12, 2024 · 6 comments
Labels
help wanted Contribution from external parties is desired here.

Comments

@js6i
Copy link
Contributor

js6i commented Apr 12, 2024

This is something I found while working on KhronosGroup/MoltenVK#2199. Enabling variable sized arrays caused a bunch of vkd3d tests that otherwise worked fine (modulo validation errors due to declaring huge arrays and binding small buffers to them) to crash the GPU (M1). I tried a few things, like using the regular array<> type instead of spvDescriptor* wrappers, but what ended up fixing the crash was making the entry point variables plain pointers.

Below is one of the kernels that crashed (from vkd3d d3d12 test_register_space), with the plain pointer version #if'd inline:

#pragma clang diagnostic ignored "-Wmissing-prototypes"
#pragma clang diagnostic ignored "-Wunused-variable"

#include <metal_stdlib>
#include <simd/simd.h>
#include <metal_atomic>

using namespace metal;

template<typename T>
struct spvDescriptor
{
    T value;
};

template<typename T>
struct spvDescriptorArray
{
    spvDescriptorArray(const device spvDescriptor<T>* ptr) : ptr(ptr)
    {
    }
    const device T& operator [] (size_t i) const
    {
        return ptr[i].value;
    }
    const device spvDescriptor<T>* ptr;
};

struct push_cb_struct
{
    float4 cb0[1];
    float4 cb1[1];
    uint _m2[1];
};

struct spvDescriptorSetBuffer1
{
    spvDescriptor<texture_buffer<uint>> t0 [[id(0)]][1] /* unsized array hack */;
};

struct spvDescriptorSetBuffer3
{
    spvDescriptor<texture_buffer<uint, access::write>> u0 [[id(0)]][1] /* unsized array hack */;
    // Overlapping binding: spvDescriptor<texture_buffer<uint, access::write>> u2 [[id(0)]][1] /* unsized array hack */;
};

struct spvDescriptorSetBuffer6
{
    spvDescriptor<texture_buffer<uint, access::read_write>> u6 [[id(0)]][1] /* unsized array hack */;
};

kernel void main0(const device spvDescriptorSetBuffer1& spvDescriptorSet1 [[buffer(1)]], const device spvDescriptorSetBuffer3& spvDescriptorSet3 [[buffer(3)]], const device spvDescriptorSetBuffer6& spvDescriptorSet6 [[buffer(6)]], constant push_cb_struct& push_cb [[buffer(9)]])
{
#if CRASHES
    spvDescriptorArray<texture_buffer<uint>> t0 {spvDescriptorSet1.t0};
    spvDescriptorArray<texture_buffer<uint, access::write>> u0 {spvDescriptorSet3.u0};
    spvDescriptorArray<texture_buffer<uint, access::write>> u2 {reinterpret_cast<const device spvDescriptor<texture_buffer<uint, access::write>>*>(&spvDescriptorSet3.u0)};
    spvDescriptorArray<texture_buffer<uint, access::read_write>> u6 {spvDescriptorSet6.u6};
#else // WORKS
    const device texture_buffer<uint> *t0 = &(spvDescriptorSet1.t0[0].value);
    const device texture_buffer<uint, access::write> *u0 = &(spvDescriptorSet3.u0[0].value);
    const device texture_buffer<uint, access::write> *u2 = &(reinterpret_cast<const device spvDescriptor<texture_buffer<uint, access::write>>*>(&spvDescriptorSet3.u0)[0].value);
    const device texture_buffer<uint, access::read_write> *u6 = &(spvDescriptorSet6.u6[0].value);
#endif
    float4 r0;
    r0.x = as_type<float>(t0[0u + push_cb._m2[0u]].read(uint((uint(0) >> 2u))).x);
    u0[6u + push_cb._m2[0u]].write(uint4(as_type<uint>(r0.x)), uint((0u >> 2u)));
    r0.x = as_type<float>(t0[1u + push_cb._m2[0u]].read(uint((uint(0) >> 2u))).x);
    u0[7u + push_cb._m2[0u]].write(uint4(as_type<uint>(r0.x)), uint((0u >> 2u)));
    r0.x = as_type<float>(t0[2u + push_cb._m2[0u]].read(uint(0)).x);
    u0[9u + push_cb._m2[0u]].write(uint4(as_type<uint>(r0.x)), uint((0u >> 2u)));
    r0.x = as_type<float>(t0[3u + push_cb._m2[0u]].read(uint(0)).x);
    u0[8u + push_cb._m2[0u]].write(uint4(as_type<uint>(r0.x)), uint((0u >> 2u)));
    r0.x = as_type<float>(t0[5u + push_cb._m2[0u]].read(uint(((uint(0) * 1u) + (uint(0) >> 2u)))).x);
    u2[10u + push_cb._m2[0u]].write(as_type<uint4>(r0.xxxx), uint(0));
    r0.x = as_type<float>(t0[4u + push_cb._m2[0u]].read(uint(((uint(0) * 1u) + (uint(0) >> 2u)))).x);
    u2[11u + push_cb._m2[0u]].write(as_type<uint4>(r0.xxxx), uint(0));
    u0[12u + push_cb._m2[0u]].write(uint4(as_type<uint>(push_cb.cb0[0u].x)), uint(((uint(0) * 1u) + (uint(0) >> 2u))));
    u0[13u + push_cb._m2[0u]].write(uint4(as_type<uint>(push_cb.cb1[0u].x)), uint(((uint(0) * 1u) + (uint(0) >> 2u))));
    uint _175 = u6[12u + push_cb._m2[0u]].atomic_fetch_add(0u, 1).x;
    r0.x = as_type<float>(_175);
    uint _181 = u6[13u + push_cb._m2[0u]].atomic_fetch_sub(0u, 1).x;
    r0.x = as_type<float>(_181 - 1u);
}

The crash is vkQueueSubmit MTLCommandBuffer on Queue 3-0" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault

@HansKristian-Work
Copy link
Contributor

This is impossible for me to debug. I don't have any Apple hardware.

@HansKristian-Work HansKristian-Work added the help wanted Contribution from external parties is desired here. label Apr 15, 2024
@Try
Copy link
Contributor

Try commented Apr 22, 2024

Hi, @js6i !

Unless there were major changes, since I've worked on spvDescriptorArray, it should not be used in context of emulation-layers such as MoltenVK. The goal of this class is to map runtime-sized array from GLSL to Metal3 argument buffer.
Code bellow makes no sense then:

struct spvDescriptorSetBuffer1
{
    spvDescriptor<texture_buffer<uint>> t0 [[id(0)]][1] /* unsized array hack */;
};

This is a mix of T1-argument-buffer(spvDescriptorSetBuffer1) and T2 via spvDescriptor.

AFAIR what you need to to set following setting of compiler:

msl_options.argument_buffers      = true; // emulate descriptor sets via ABuffer
msl_options.argument_buffers_tier = Options::ArgumentBuffersTier::Tier1; // disallow Tier2

@js6i
Copy link
Contributor Author

js6i commented Apr 23, 2024

@Try thanks for chiming in. Are you saying that msl_options.argument_buffers are inherently in conflict with the Tier2 setting? MoltenVK needs variable sized arrays supported as well - is there a reason to require disabling msl_options.argument_buffers, or maybe we should just not emit the spvDescriptor wrappers in this case?

Besides, the generated code seems fine other than the weird bug that I addressed in #2314.

@Try
Copy link
Contributor

Try commented Apr 23, 2024

Are you saying that msl_options.argument_buffers are inherently in conflict with the Tier2 setting?

There were not intended to be uses together. Yet, have to say that I didn't followed spirv-cross for several months, and thing maybe different now. Maybe @billhollings knows better, than I am, if there is a use case for argument_buffers + Tier2 in MoltenVK.

In Metal T1-argument buffer is close analog to Vulkan descriptor-set, and T2 - more like descriptor-buffer, allowing pointer to other buffers/textures within the buffer.
Here, IMAO is misleading naming has place. msl_options.argument_buffers de facto mean "use T1-argument buffer to emulate vulkan-descriptor set". And argument_buffers_tier=Tier2, mean to emit array of descriptors as T2-argument buffer.

MoltenVK needs variable sized arrays supported as well

Don't think it needed to emulate vulkan. Vulkan descriptor-indexing model always requires to declare upper bound for each runtime-sized array, on C++ side. Even if VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT is in use.

@js6i
Copy link
Contributor Author

js6i commented Apr 24, 2024

Don't think it needed to emulate vulkan. Vulkan descriptor-indexing model always requires to declare upper bound for each runtime-sized array, on C++ side. Even if VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT is in use.

Yeah, the issue with that is that then binding a smaller buffer (and the descriptor pool only needs to be large enough to fit the variable count) triggers a validation error.

@HansKristian-Work
Copy link
Contributor

Workaround is merged, so should be fine now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Contribution from external parties is desired here.
Projects
None yet
Development

No branches or pull requests

3 participants