Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for VK_EXT_host_image_copy extension. #2208

Merged
merged 2 commits into from
Apr 24, 2024

Conversation

billhollings
Copy link
Contributor

@billhollings billhollings commented Apr 15, 2024

  • Add support for VK_EXT_host_image_copy extension.
  • MVKResource::getHostMemoryAddress() return nullptr if MVKDeviceMemory::getHostMemoryAddress() returns nullptr, regardless of local offset.
  • Remove unnecessary enum value kMVKVkFormatFeatureFlagsTexTransfer to reduce redundancy between read and transfer feature flag options
  • MVKResource remove unnecessary inline qualifiers (unrelated).
  • MVKDevice remove some obsolete commentary (unrelated).

Fixes #2197.

@ovvldc
Copy link

ovvldc commented Apr 15, 2024

Why include unrelated changes in a PR, why not separate them in a housekeeping PR or such? Doesn't that make the commit history difficult to trace, or I am missing something?

@billhollings
Copy link
Contributor Author

Why include unrelated changes in a PR, why not separate them in a housekeeping PR or such? Doesn't that make the commit history difficult to trace, or I am missing something?

A fair question.

To help keep the code tidy over the long run, we try to have a "clean as you go" approach, when it comes to non-functional maintenance, since those kind of maintenance projects never seem to get prioritized.

I've removed one of the "unrelated" notes, as it actually was functional as part of this change.

Copy link
Collaborator

@cdavis5e cdavis5e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to add this to the list of all supported extensions in the User's Guide.

Also, FYI, it looks like GitHub doesn't recognize the keyword "completes" as indicating that an issue should be closed after a PR is merged. You may have to close the issue manually.

I'm also worried about the implications of the VK_HOST_IMAGE_COPY_MEMCPY_EXT flag. I don't think we can support that for nonlinear images. Managed textures on Intel are where this would crop up.

MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm Outdated Show resolved Hide resolved
MoltenVK/MoltenVK/GPUObjects/MVKImage.mm Outdated Show resolved Hide resolved
MoltenVK/MoltenVK/GPUObjects/MVKImage.mm Outdated Show resolved Hide resolved
@cdavis5e
Copy link
Collaborator

To help keep the code tidy over the long run, we try to have a "clean as you go" approach, when it comes to non-functional maintenance, since those kind of maintenance projects never seem to get prioritized.

Fair point. Coming from Wine, however, which is extremely strict about this, I actually prefer to at least split the housekeeping changes into their own commits. That way, they show up separately in the history, which makes bisection easier. The PR could still include the main change and the unrelated cleanups together.

- MVKResource::getHostMemoryAddress() return nullptr if
  MVKDeviceMemory::getHostMemoryAddress() returns null pointer,
  regardless of local offset.
- Remove unnecessary  enum value kMVKVkFormatFeatureFlagsTexTransfer
  to reduce redundancy between read and transfer feature flag options.
- Fix spelling of mvkVkOffset3DFromMTLOrigin() (unrelated).
- MVKResource remove unnecessary inline qualifiers (unrelated).
- MVKDevice remove some obsolete commentary (unrelated).
@billhollings
Copy link
Contributor Author

That way, they show up separately in the history, which makes bisection easier. The PR could still include the main change and the unrelated cleanups together.

This makes sense. I'll try to remember that approach on future PR's.

@billhollings
Copy link
Contributor Author

You forgot to add this to the list of all supported extensions in the User's Guide.

D'oh! Good catch. Fixed.

Also, FYI, it looks like GitHub doesn't recognize the keyword "completes" as indicating that an issue should be closed after a PR is merged. You may have to close the issue manually.

Yeah. I noticed that after the fact, too. I've modified it above now.

I'm also worried about the implications of the VK_HOST_IMAGE_COPY_MEMCPY_EXT flag. I don't think we can support that for nonlinear images. Managed textures on Intel are where this would crop up.

I don't know why it didn't make it into the spec proper, but the extension proposal includes:

...flags may be VK_HOST_IMAGE_COPY_MEMCPY_EXT, in which case the data in host memory should have the same swizzling layout as the image. This is mainly useful for embedded systems where this swizzling is known and well defined outside of Vulkan.

This leads me to interpret VK_HOST_IMAGE_COPY_MEMCPY_EXT as a nice-to-have optimization, but that falling back to a linear extraction for both linear and optimal layouts (basically ignoring VK_HOST_IMAGE_COPY_MEMCPY_EXT) is acceptable.

The CTS tests all seem to test both with and without VK_HOST_IMAGE_COPY_MEMCPY_EXT, and so far so good. However, I have not tested on Intel GPU's. Why do you feel any issues would be limited to that platform?

@cdavis5e
Copy link
Collaborator

The CTS tests all seem to test both with and without VK_HOST_IMAGE_COPY_MEMCPY_EXT, and so far so good. However, I have not tested on Intel GPU's. Why do you feel any issues would be limited to that platform?

Because on Intel Macs, the GPUs support Managed textures which are host-accessible but do not have to be linear. I suppose you could get around that by putting the texture in a placement heap and aliasing it with a buffer, but I'm not sure Metal will let you create a Managed MTLHeap... I suppose we could also forbid setting VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT with Managed textures, which we expose as Vulkan HOST_CACHED memory, but then we'd lose out on being able to use VK_EXT_host_image_copy with these images.

Copy link
Collaborator

@cdavis5e cdavis5e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes so far LGTM, but I won't approve this until you've answered my objections about VK_HOST_IMAGE_COPY_MEMCPY_EXT with Managed textures on Intel Macs.

Maybe I should run the CTS against this change. I have a couple of Intel Macs.

@billhollings
Copy link
Contributor Author

Maybe I should run the CTS against this change. I have a couple of Intel Macs.

If you could, that would be helpful. Without an understanding of what may/will go wrong, it's hard to develop and test a solution for how to handle it.

I've been testing using...

cd <mvk-parent-dir>/MoltenVK/Scripts
./runcts <cts-parent-dir>/VK-GL-CTS/external/vulkancts/mustpass/main/vk-default/image/host-image-copy.txt

@billhollings
Copy link
Contributor Author

billhollings commented Apr 19, 2024

Maybe I should run the CTS against this change. I have a couple of Intel Macs.

If you could, that would be helpful.

NM. I've dug out my old Intel/Radeon and have managed to get things running on it.

The integrated Intel GPU is fine, but it's definitely misbehaving on the discrete Radeon GPU, as you predicted.

Looking into it.

…s before copying.

Discrete GPUs use managed-memory textures, and these need to be synchronized
from GPU memory before being available for host-copying to memory using the CPU.
Metal automatically handles the reverse sync when copying from memory to a texture.
@billhollings
Copy link
Contributor Author

The changes so far LGTM, but I won't approve this until you've answered my objections about VK_HOST_IMAGE_COPY_MEMCPY_EXT with Managed textures on Intel Macs.
Maybe I should run the CTS against this change. I have a couple of Intel Macs.

If you could, that would be helpful.

NM. I've dug out my old Intel/Radeon and have managed to get things running on it.

The integrated Intel GPU is fine, but it's definitely misbehaving on the discrete Radeon GPU, as you predicted.

Looking into it.

This is now working properly on both Radeon & Intel.

It turns out the failures were not a linear/optimal issue, but a managed-memory texture sync issue. I've added a synchronizeTexture: where appropriate for managed-memory textures on discrete GPUs.

Copy link
Collaborator

@cdavis5e cdavis5e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I suppose if it works without any special munging beyond the extra synchronization...

Comment on lines +715 to +739
#if MVK_MACOS
// On macOS, if the device doesn't have unified memory, and the texture is using managed memory, we need
// to sync the managed memory from the GPU, so the texture content is accessible to be copied by the CPU.
if ( !getPhysicalDevice()->getHasUnifiedMemory() && getMTLStorageMode() == MTLStorageModeManaged ) {
@autoreleasepool {
id<MTLCommandBuffer> mtlCmdBuff = getDevice()->getAnyQueue()->getMTLCommandBuffer(kMVKCommandUseCopyImageToMemory);
id<MTLBlitCommandEncoder> mtlBlitEnc = [mtlCmdBuff blitCommandEncoder];

for (uint32_t imgRgnIdx = 0; imgRgnIdx < pCopyImageToMemoryInfo->regionCount; imgRgnIdx++) {
auto& imgRgn = pCopyImageToMemoryInfo->pRegions[imgRgnIdx];
auto& imgSubRez = imgRgn.imageSubresource;
id<MTLTexture> mtlTex = getMTLTexture(getPlaneFromVkImageAspectFlags(imgSubRez.aspectMask));
for (uint32_t imgLyrIdx = 0; imgLyrIdx < imgSubRez.layerCount; imgLyrIdx++) {
[mtlBlitEnc synchronizeTexture: mtlTex
slice: imgSubRez.baseArrayLayer + imgLyrIdx
level: imgSubRez.mipLevel];
}
}

[mtlBlitEnc endEncoding];
[mtlCmdBuff commit];
[mtlCmdBuff waitUntilCompleted];
}
}
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a point where we should be calling -synchronizeTexture:slice:level: where we aren't in response to a barrier command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a point where we should be calling -synchronizeTexture:slice:level: where we aren't in response to a barrier command.

I'm not sure what you are wondering here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, the host has to synchronize its use of the image with the device before calling this function. According to the Vulkan spec:

vkCopyImageToMemoryEXT does not check whether the device memory associated with srcImage is currently in use before performing the copy. The application must guarantee that any previously submitted command that writes to the copy regions has completed before the host performs the copy.

It's unclear, though, if that means that the host is responsible for emitting a barrier, or if this function includes an implicit barrier. I wonder, in particular, if the vkTransitionImageLayoutEXT() function has something to do with this.

Copy link
Contributor Author

@billhollings billhollings Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear, though, if that means that the host is responsible for emitting a barrier

Yes. I was unsure about that spec wording too.

The spec wording for vkCopyImageToMemoryEXT() also says

This command (] is functionally similar to `vkCmdCopyImageToBuffer2, except it is executed on the host and writes to host memory instead of a buffer.

which, if you squint hard enough, might imply the same barrier operations apply. But the spec is silent on which barriers to use.

Some of the CTS tests failing on Radeon did indeed emit a pipeline image barrier after rendering, but they only included VK_ACCESS_TRANSFER_READ_BIT, and not the VK_ACCESS_HOST_READ_BIT that I would have expected when preparing for a host-copy.

I actually did initially experiment with modifying MVKImageMemoryBinding::needsHostReadSync(), to force a sync on VK_ACCESS_TRANSFER_READ_BIT if the image included VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT. It did solve the issue, but I was concerned that it would unduly penalize any non-host transfers that the image might be involved with. I wasn't convinced that setting VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT implies that the app will focus on host-copy transfers only.

My other consideration was that since vkCopyImageToMemoryEXT() is not a GPU command, there's an implicit understanding that the CPU is expecting to wait for the GPU to be finished anyway, and so taking the hit when we know it's expected, was more appropriate.

vkTransitionImageLayoutEXT() is about layouts, and not memory access, and I couldn't find anything equivalent in Metal. Plus the problematic CTS tests were not issuing a call to that function anyway.

So, we could follow the path of forcing a sync on every VK_ACCESS_TRANSFER_READ_BIT barrier, if that is more appropriate, but the spec is definitely not clear on what's expected. I've posted a question to the Vulkan Working Group for their thoughts.

@billhollings billhollings merged commit 6c68ba1 into KhronosGroup:main Apr 24, 2024
6 checks passed
@billhollings billhollings deleted the VK_EXT_host_image_copy branch April 24, 2024 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for VK_EXT_host_image_copy extension
4 participants