-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for VK_EXT_host_image_copy extension. #2208
Add support for VK_EXT_host_image_copy extension. #2208
Conversation
Why include unrelated changes in a PR, why not separate them in a housekeeping PR or such? Doesn't that make the commit history difficult to trace, or I am missing something? |
d0f0c55
to
be8f6dc
Compare
A fair question. To help keep the code tidy over the long run, we try to have a "clean as you go" approach, when it comes to non-functional maintenance, since those kind of maintenance projects never seem to get prioritized. I've removed one of the "unrelated" notes, as it actually was functional as part of this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot to add this to the list of all supported extensions in the User's Guide.
Also, FYI, it looks like GitHub doesn't recognize the keyword "completes" as indicating that an issue should be closed after a PR is merged. You may have to close the issue manually.
I'm also worried about the implications of the VK_HOST_IMAGE_COPY_MEMCPY_EXT
flag. I don't think we can support that for nonlinear images. Managed
textures on Intel are where this would crop up.
Fair point. Coming from Wine, however, which is extremely strict about this, I actually prefer to at least split the housekeeping changes into their own commits. That way, they show up separately in the history, which makes bisection easier. The PR could still include the main change and the unrelated cleanups together. |
- MVKResource::getHostMemoryAddress() return nullptr if MVKDeviceMemory::getHostMemoryAddress() returns null pointer, regardless of local offset. - Remove unnecessary enum value kMVKVkFormatFeatureFlagsTexTransfer to reduce redundancy between read and transfer feature flag options. - Fix spelling of mvkVkOffset3DFromMTLOrigin() (unrelated). - MVKResource remove unnecessary inline qualifiers (unrelated). - MVKDevice remove some obsolete commentary (unrelated).
be8f6dc
to
2290e86
Compare
This makes sense. I'll try to remember that approach on future PR's. |
D'oh! Good catch. Fixed.
Yeah. I noticed that after the fact, too. I've modified it above now.
I don't know why it didn't make it into the spec proper, but the extension proposal includes:
This leads me to interpret The CTS tests all seem to test both with and without |
Because on Intel Macs, the GPUs support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes so far LGTM, but I won't approve this until you've answered my objections about VK_HOST_IMAGE_COPY_MEMCPY_EXT
with Managed
textures on Intel Macs.
Maybe I should run the CTS against this change. I have a couple of Intel Macs.
If you could, that would be helpful. Without an understanding of what may/will go wrong, it's hard to develop and test a solution for how to handle it. I've been testing using...
|
NM. I've dug out my old Intel/Radeon and have managed to get things running on it. The integrated Intel GPU is fine, but it's definitely misbehaving on the discrete Radeon GPU, as you predicted. Looking into it. |
…s before copying. Discrete GPUs use managed-memory textures, and these need to be synchronized from GPU memory before being available for host-copying to memory using the CPU. Metal automatically handles the reverse sync when copying from memory to a texture.
This is now working properly on both Radeon & Intel. It turns out the failures were not a linear/optimal issue, but a managed-memory texture sync issue. I've added a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I suppose if it works without any special munging beyond the extra synchronization...
#if MVK_MACOS | ||
// On macOS, if the device doesn't have unified memory, and the texture is using managed memory, we need | ||
// to sync the managed memory from the GPU, so the texture content is accessible to be copied by the CPU. | ||
if ( !getPhysicalDevice()->getHasUnifiedMemory() && getMTLStorageMode() == MTLStorageModeManaged ) { | ||
@autoreleasepool { | ||
id<MTLCommandBuffer> mtlCmdBuff = getDevice()->getAnyQueue()->getMTLCommandBuffer(kMVKCommandUseCopyImageToMemory); | ||
id<MTLBlitCommandEncoder> mtlBlitEnc = [mtlCmdBuff blitCommandEncoder]; | ||
|
||
for (uint32_t imgRgnIdx = 0; imgRgnIdx < pCopyImageToMemoryInfo->regionCount; imgRgnIdx++) { | ||
auto& imgRgn = pCopyImageToMemoryInfo->pRegions[imgRgnIdx]; | ||
auto& imgSubRez = imgRgn.imageSubresource; | ||
id<MTLTexture> mtlTex = getMTLTexture(getPlaneFromVkImageAspectFlags(imgSubRez.aspectMask)); | ||
for (uint32_t imgLyrIdx = 0; imgLyrIdx < imgSubRez.layerCount; imgLyrIdx++) { | ||
[mtlBlitEnc synchronizeTexture: mtlTex | ||
slice: imgSubRez.baseArrayLayer + imgLyrIdx | ||
level: imgSubRez.mipLevel]; | ||
} | ||
} | ||
|
||
[mtlBlitEnc endEncoding]; | ||
[mtlCmdBuff commit]; | ||
[mtlCmdBuff waitUntilCompleted]; | ||
} | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's a point where we should be calling -synchronizeTexture:slice:level:
where we aren't in response to a barrier command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's a point where we should be calling
-synchronizeTexture:slice:level:
where we aren't in response to a barrier command.
I'm not sure what you are wondering here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, the host has to synchronize its use of the image with the device before calling this function. According to the Vulkan spec:
vkCopyImageToMemoryEXT
does not check whether the device memory associated withsrcImage
is currently in use before performing the copy. The application must guarantee that any previously submitted command that writes to the copy regions has completed before the host performs the copy.
It's unclear, though, if that means that the host is responsible for emitting a barrier, or if this function includes an implicit barrier. I wonder, in particular, if the vkTransitionImageLayoutEXT()
function has something to do with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear, though, if that means that the host is responsible for emitting a barrier
Yes. I was unsure about that spec wording too.
The spec wording for vkCopyImageToMemoryEXT()
also says
This command (] is functionally similar to `vkCmdCopyImageToBuffer2, except it is executed on the host and writes to host memory instead of a buffer.
which, if you squint hard enough, might imply the same barrier operations apply. But the spec is silent on which barriers to use.
Some of the CTS tests failing on Radeon did indeed emit a pipeline image barrier after rendering, but they only included VK_ACCESS_TRANSFER_READ_BIT
, and not the VK_ACCESS_HOST_READ_BIT
that I would have expected when preparing for a host-copy.
I actually did initially experiment with modifying MVKImageMemoryBinding::needsHostReadSync()
, to force a sync on VK_ACCESS_TRANSFER_READ_BIT
if the image included VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT
. It did solve the issue, but I was concerned that it would unduly penalize any non-host transfers that the image might be involved with. I wasn't convinced that setting VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT
implies that the app will focus on host-copy transfers only.
My other consideration was that since vkCopyImageToMemoryEXT()
is not a GPU command, there's an implicit understanding that the CPU is expecting to wait for the GPU to be finished anyway, and so taking the hit when we know it's expected, was more appropriate.
vkTransitionImageLayoutEXT()
is about layouts, and not memory access, and I couldn't find anything equivalent in Metal. Plus the problematic CTS tests were not issuing a call to that function anyway.
So, we could follow the path of forcing a sync on every VK_ACCESS_TRANSFER_READ_BIT
barrier, if that is more appropriate, but the spec is definitely not clear on what's expected. I've posted a question to the Vulkan Working Group for their thoughts.
VK_EXT_host_image_copy
extension.MVKResource::getHostMemoryAddress()
returnnullptr
ifMVKDeviceMemory::getHostMemoryAddress()
returnsnullptr
, regardless of local offset.kMVKVkFormatFeatureFlagsTexTransfer
to reduce redundancy between read and transfer feature flag optionsMVKResource
remove unnecessary inline qualifiers (unrelated).MVKDevice
remove some obsolete commentary (unrelated).Fixes #2197.