Add support for VK_EXT_host_image_copy extension. #2208

billhollings · 2024-04-15T13:45:27Z

Add support for VK_EXT_host_image_copy extension.
MVKResource::getHostMemoryAddress() return nullptr if MVKDeviceMemory::getHostMemoryAddress() returns nullptr, regardless of local offset.
Remove unnecessary enum value kMVKVkFormatFeatureFlagsTexTransfer to reduce redundancy between read and transfer feature flag options
MVKResource remove unnecessary inline qualifiers (unrelated).
MVKDevice remove some obsolete commentary (unrelated).

ovvldc · 2024-04-15T14:10:34Z

Why include unrelated changes in a PR, why not separate them in a housekeeping PR or such? Doesn't that make the commit history difficult to trace, or I am missing something?

billhollings · 2024-04-15T14:38:54Z

Why include unrelated changes in a PR, why not separate them in a housekeeping PR or such? Doesn't that make the commit history difficult to trace, or I am missing something?

A fair question.

To help keep the code tidy over the long run, we try to have a "clean as you go" approach, when it comes to non-functional maintenance, since those kind of maintenance projects never seem to get prioritized.

I've removed one of the "unrelated" notes, as it actually was functional as part of this change.

cdavis5e

You forgot to add this to the list of all supported extensions in the User's Guide.

Also, FYI, it looks like GitHub doesn't recognize the keyword "completes" as indicating that an issue should be closed after a PR is merged. You may have to close the issue manually.

I'm also worried about the implications of the VK_HOST_IMAGE_COPY_MEMCPY_EXT flag. I don't think we can support that for nonlinear images. Managed textures on Intel are where this would crop up.

MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm

MoltenVK/MoltenVK/GPUObjects/MVKImage.mm

cdavis5e · 2024-04-16T01:19:59Z

To help keep the code tidy over the long run, we try to have a "clean as you go" approach, when it comes to non-functional maintenance, since those kind of maintenance projects never seem to get prioritized.

Fair point. Coming from Wine, however, which is extremely strict about this, I actually prefer to at least split the housekeeping changes into their own commits. That way, they show up separately in the history, which makes bisection easier. The PR could still include the main change and the unrelated cleanups together.

- MVKResource::getHostMemoryAddress() return nullptr if MVKDeviceMemory::getHostMemoryAddress() returns null pointer, regardless of local offset. - Remove unnecessary enum value kMVKVkFormatFeatureFlagsTexTransfer to reduce redundancy between read and transfer feature flag options. - Fix spelling of mvkVkOffset3DFromMTLOrigin() (unrelated). - MVKResource remove unnecessary inline qualifiers (unrelated). - MVKDevice remove some obsolete commentary (unrelated).

billhollings · 2024-04-16T16:31:51Z

That way, they show up separately in the history, which makes bisection easier. The PR could still include the main change and the unrelated cleanups together.

This makes sense. I'll try to remember that approach on future PR's.

billhollings · 2024-04-16T17:05:26Z

You forgot to add this to the list of all supported extensions in the User's Guide.

D'oh! Good catch. Fixed.

Also, FYI, it looks like GitHub doesn't recognize the keyword "completes" as indicating that an issue should be closed after a PR is merged. You may have to close the issue manually.

Yeah. I noticed that after the fact, too. I've modified it above now.

I'm also worried about the implications of the VK_HOST_IMAGE_COPY_MEMCPY_EXT flag. I don't think we can support that for nonlinear images. Managed textures on Intel are where this would crop up.

I don't know why it didn't make it into the spec proper, but the extension proposal includes:

...flags may be VK_HOST_IMAGE_COPY_MEMCPY_EXT, in which case the data in host memory should have the same swizzling layout as the image. This is mainly useful for embedded systems where this swizzling is known and well defined outside of Vulkan.

This leads me to interpret VK_HOST_IMAGE_COPY_MEMCPY_EXT as a nice-to-have optimization, but that falling back to a linear extraction for both linear and optimal layouts (basically ignoring VK_HOST_IMAGE_COPY_MEMCPY_EXT) is acceptable.

The CTS tests all seem to test both with and without VK_HOST_IMAGE_COPY_MEMCPY_EXT, and so far so good. However, I have not tested on Intel GPU's. Why do you feel any issues would be limited to that platform?

cdavis5e · 2024-04-16T20:34:09Z

The CTS tests all seem to test both with and without VK_HOST_IMAGE_COPY_MEMCPY_EXT, and so far so good. However, I have not tested on Intel GPU's. Why do you feel any issues would be limited to that platform?

Because on Intel Macs, the GPUs support Managed textures which are host-accessible but do not have to be linear. I suppose you could get around that by putting the texture in a placement heap and aliasing it with a buffer, but I'm not sure Metal will let you create a Managed MTLHeap... I suppose we could also forbid setting VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT with Managed textures, which we expose as Vulkan HOST_CACHED memory, but then we'd lose out on being able to use VK_EXT_host_image_copy with these images.

cdavis5e

The changes so far LGTM, but I won't approve this until you've answered my objections about VK_HOST_IMAGE_COPY_MEMCPY_EXT with Managed textures on Intel Macs.

Maybe I should run the CTS against this change. I have a couple of Intel Macs.

billhollings · 2024-04-16T21:41:10Z

Maybe I should run the CTS against this change. I have a couple of Intel Macs.

If you could, that would be helpful. Without an understanding of what may/will go wrong, it's hard to develop and test a solution for how to handle it.

I've been testing using...

cd <mvk-parent-dir>/MoltenVK/Scripts
./runcts <cts-parent-dir>/VK-GL-CTS/external/vulkancts/mustpass/main/vk-default/image/host-image-copy.txt

billhollings · 2024-04-19T01:42:18Z

Maybe I should run the CTS against this change. I have a couple of Intel Macs.

If you could, that would be helpful.

NM. I've dug out my old Intel/Radeon and have managed to get things running on it.

The integrated Intel GPU is fine, but it's definitely misbehaving on the discrete Radeon GPU, as you predicted.

Looking into it.

MoltenVK/MoltenVK/GPUObjects/MVKImage.mm

…s before copying. Discrete GPUs use managed-memory textures, and these need to be synchronized from GPU memory before being available for host-copying to memory using the CPU. Metal automatically handles the reverse sync when copying from memory to a texture.

billhollings · 2024-04-23T19:02:43Z

The changes so far LGTM, but I won't approve this until you've answered my objections about VK_HOST_IMAGE_COPY_MEMCPY_EXT with Managed textures on Intel Macs.
Maybe I should run the CTS against this change. I have a couple of Intel Macs.

If you could, that would be helpful.

NM. I've dug out my old Intel/Radeon and have managed to get things running on it.

The integrated Intel GPU is fine, but it's definitely misbehaving on the discrete Radeon GPU, as you predicted.

Looking into it.

This is now working properly on both Radeon & Intel.

It turns out the failures were not a linear/optimal issue, but a managed-memory texture sync issue. I've added a synchronizeTexture: where appropriate for managed-memory textures on discrete GPUs.

cdavis5e

Well, I suppose if it works without any special munging beyond the extra synchronization...

cdavis5e · 2024-04-23T23:18:25Z

MoltenVK/MoltenVK/GPUObjects/MVKImage.mm

+#if MVK_MACOS
+ // On macOS, if the device doesn't have unified memory, and the texture is using managed memory, we need
+ // to sync the managed memory from the GPU, so the texture content is accessible to be copied by the CPU.
+ if ( !getPhysicalDevice()->getHasUnifiedMemory() && getMTLStorageMode() == MTLStorageModeManaged ) {
+ @autoreleasepool {
+ id<MTLCommandBuffer> mtlCmdBuff = getDevice()->getAnyQueue()->getMTLCommandBuffer(kMVKCommandUseCopyImageToMemory);
+ id<MTLBlitCommandEncoder> mtlBlitEnc = [mtlCmdBuff blitCommandEncoder];
+
+ for (uint32_t imgRgnIdx = 0; imgRgnIdx < pCopyImageToMemoryInfo->regionCount; imgRgnIdx++) {
+ auto& imgRgn = pCopyImageToMemoryInfo->pRegions[imgRgnIdx];
+ auto& imgSubRez = imgRgn.imageSubresource;
+ id<MTLTexture> mtlTex = getMTLTexture(getPlaneFromVkImageAspectFlags(imgSubRez.aspectMask));
+ for (uint32_t imgLyrIdx = 0; imgLyrIdx < imgSubRez.layerCount; imgLyrIdx++) {
+ [mtlBlitEnc synchronizeTexture: mtlTex
+ slice: imgSubRez.baseArrayLayer + imgLyrIdx
+ level: imgSubRez.mipLevel];
+ }
+ }
+
+ [mtlBlitEnc endEncoding];
+ [mtlCmdBuff commit];
+ [mtlCmdBuff waitUntilCompleted];
+ }
+ }
+#endif


I wonder if there's a point where we should be calling -synchronizeTexture:slice:level: where we aren't in response to a barrier command.

I wonder if there's a point where we should be calling -synchronizeTexture:slice:level: where we aren't in response to a barrier command.

I'm not sure what you are wondering here?

I mean, the host has to synchronize its use of the image with the device before calling this function. According to the Vulkan spec:

vkCopyImageToMemoryEXT does not check whether the device memory associated with srcImage is currently in use before performing the copy. The application must guarantee that any previously submitted command that writes to the copy regions has completed before the host performs the copy.

It's unclear, though, if that means that the host is responsible for emitting a barrier, or if this function includes an implicit barrier. I wonder, in particular, if the vkTransitionImageLayoutEXT() function has something to do with this.

It's unclear, though, if that means that the host is responsible for emitting a barrier

Yes. I was unsure about that spec wording too.

The spec wording for vkCopyImageToMemoryEXT() also says

This command (] is functionally similar to `vkCmdCopyImageToBuffer2, except it is executed on the host and writes to host memory instead of a buffer.

which, if you squint hard enough, might imply the same barrier operations apply. But the spec is silent on which barriers to use.

Some of the CTS tests failing on Radeon did indeed emit a pipeline image barrier after rendering, but they only included VK_ACCESS_TRANSFER_READ_BIT, and not the VK_ACCESS_HOST_READ_BIT that I would have expected when preparing for a host-copy.

I actually did initially experiment with modifying MVKImageMemoryBinding::needsHostReadSync(), to force a sync on VK_ACCESS_TRANSFER_READ_BIT if the image included VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT. It did solve the issue, but I was concerned that it would unduly penalize any non-host transfers that the image might be involved with. I wasn't convinced that setting VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT implies that the app will focus on host-copy transfers only.

My other consideration was that since vkCopyImageToMemoryEXT() is not a GPU command, there's an implicit understanding that the CPU is expecting to wait for the GPU to be finished anyway, and so taking the hit when we know it's expected, was more appropriate.

vkTransitionImageLayoutEXT() is about layouts, and not memory access, and I couldn't find anything equivalent in Metal. Plus the problematic CTS tests were not issuing a call to that function anyway.

So, we could follow the path of forcing a sync on every VK_ACCESS_TRANSFER_READ_BIT barrier, if that is more appropriate, but the spec is definitely not clear on what's expected. I've posted a question to the Vulkan Working Group for their thoughts.

billhollings requested a review from cdavis5e April 15, 2024 13:45

billhollings force-pushed the VK_EXT_host_image_copy branch from d0f0c55 to be8f6dc Compare April 15, 2024 14:22

billhollings mentioned this pull request Apr 15, 2024

MoltenVK Enhancement Roadmap #1975

Open

cdavis5e requested changes Apr 16, 2024

View reviewed changes

MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm Outdated Show resolved Hide resolved

MoltenVK/MoltenVK/GPUObjects/MVKImage.mm Outdated Show resolved Hide resolved

MoltenVK/MoltenVK/GPUObjects/MVKImage.mm Outdated Show resolved Hide resolved

billhollings force-pushed the VK_EXT_host_image_copy branch from be8f6dc to 2290e86 Compare April 16, 2024 16:18

billhollings requested a review from cdavis5e April 16, 2024 17:05

cdavis5e requested changes Apr 16, 2024

View reviewed changes

spnda reviewed Apr 23, 2024

View reviewed changes

MoltenVK/MoltenVK/GPUObjects/MVKImage.mm Outdated Show resolved Hide resolved

billhollings requested a review from cdavis5e April 23, 2024 19:02

cdavis5e approved these changes Apr 23, 2024

View reviewed changes

billhollings merged commit 6c68ba1 into KhronosGroup:main Apr 24, 2024
6 checks passed

billhollings deleted the VK_EXT_host_image_copy branch April 24, 2024 00:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for VK_EXT_host_image_copy extension. #2208

Add support for VK_EXT_host_image_copy extension. #2208

billhollings commented Apr 15, 2024 •

edited

Loading

ovvldc commented Apr 15, 2024

billhollings commented Apr 15, 2024

cdavis5e left a comment

cdavis5e commented Apr 16, 2024

billhollings commented Apr 16, 2024

billhollings commented Apr 16, 2024

cdavis5e commented Apr 16, 2024

cdavis5e left a comment

billhollings commented Apr 16, 2024

billhollings commented Apr 19, 2024 •

edited

Loading

billhollings commented Apr 23, 2024

cdavis5e left a comment

cdavis5e Apr 23, 2024

billhollings Apr 24, 2024

cdavis5e Apr 24, 2024

billhollings Apr 24, 2024 •

edited

Loading

Add support for VK_EXT_host_image_copy extension. #2208

Add support for VK_EXT_host_image_copy extension. #2208

Conversation

billhollings commented Apr 15, 2024 • edited Loading

ovvldc commented Apr 15, 2024

billhollings commented Apr 15, 2024

cdavis5e left a comment

Choose a reason for hiding this comment

cdavis5e commented Apr 16, 2024

billhollings commented Apr 16, 2024

billhollings commented Apr 16, 2024

cdavis5e commented Apr 16, 2024

cdavis5e left a comment

Choose a reason for hiding this comment

billhollings commented Apr 16, 2024

billhollings commented Apr 19, 2024 • edited Loading

billhollings commented Apr 23, 2024

cdavis5e left a comment

Choose a reason for hiding this comment

cdavis5e Apr 23, 2024

Choose a reason for hiding this comment

billhollings Apr 24, 2024

Choose a reason for hiding this comment

cdavis5e Apr 24, 2024

Choose a reason for hiding this comment

billhollings Apr 24, 2024 • edited Loading

Choose a reason for hiding this comment

billhollings commented Apr 15, 2024 •

edited

Loading

billhollings commented Apr 19, 2024 •

edited

Loading

billhollings Apr 24, 2024 •

edited

Loading