Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "previous" member to MVKPerformanceTracker structure #2183

Merged
merged 6 commits into from
Mar 20, 2024

Conversation

SRSaunders
Copy link
Contributor

@SRSaunders SRSaunders commented Mar 13, 2024

Addresses #2177.

This small PR adds a previous member to the MVKPerformanceTracker struct and saves the former latest value before updating latest within updateActivityPerformance(). It also adds previous to MVKLogInfo.

Note that I did not change the MVK_PRIVATE_API_VERSION.

I still have a question about how to determine version information at runtime. Now that vkGetVersionStringsMVK() is deprecated, what is the offical means of determining MoltenVK's version or the MVKPerformanceStatistics version at runtime? This is to handle cases when dynamic linking is used and one needs to determine if the new field is present.

Is the only mechanism to use the VK_SUCCESS or VK_INCOMPLETE return code from vkGetPerformanceStatisticsMVK()? This approach seems limiting since all you get is a success or failure indication and not version information. For instance if you always want to use latest, and optionally use previous (if available), how would you go about determing that at runtime?

Copy link
Contributor

@billhollings billhollings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for submitting.

Thanks for pointing out about MVK_PRIVATE_API_VERSION. We increment MVK_PRIVATE_API_VERSION once per MoltenVK release cycle (which is the same as Vulkan SDK cycle), if the private ABI changes. We have just cut MoltenVK v1.2.8 for the upcoming SDK. This PR will be in the next future SDK, and does modify the ABI, so you should increment the value of MVK_PRIVATE_API_VERSION to 41.

Regarding accessing the MoltenVK version at runtime, it is encoded in VkPhysicalDeviceProperties::driverVersion and VkLayerProperties::implementationVersion. The encoding is defined by the MVK_MAKE_VERSION() macro. It's monotonically numeric, so you can compare different versions, or use MVK_MAKE_VERSION() to create a version number to compare against.

If needed, similar to MVK_MAKE_VERSION() and MVK_VERSION_STRING, we could potentially add macros to extract three integers, or create a string, from a passed-in encoded version number.

MVK_PRIVATE_API_VERSION is not runtime derivable, but it is connected to the MoltenVK version as indicted above, so you can derive the runtime version of MVK_PRIVATE_API_VERSION, and the layout of any private ABI structures, by comparing the corresponding encoded MoltenVK version, as I describe above.

I'm not sure how we can directly pass MVK_PRIVATE_API_VERSION back from a runtime, except to create a Vulkan extension to do so.

@SRSaunders
Copy link
Contributor Author

SRSaunders commented Mar 14, 2024

Thanks for the hints re VkPhysicalDeviceProperties::driverVersion and VkLayerProperties::implementationVersion for retrieving the MoltenVK version at runtime.

However, after thinking about this problem a bit more I will be reverting back to the simple approach of guarding the new code at compile time using MVK_MAKE_VERSION(1, 2, 9), and checking for VK_SUCCESS returned from vkGetPerformanceStatisticsMVK() at runtime. My logic is since the perf structure is not a linear list, the runtime and compile time ABI versions (i.e. struct sizes) have to match exactly to get the correct offsets into the returned memory blob. Even if you are dynamically linking to a later version of MoltenVK (i.e. with previous defined), the struct sizes must still match exactly for the data to make sense. Rather than carry around multiple version copies of the performance struct in my code (with different versioned offsets), I plan to simply require a match. This is the downside of using a nested struct vs. a linear list, but that's a more general issue than I want to tackle here.

FYI, I looked at mvkCopyGrowingStruct() to confirm how it behaves, and unfortunately I think I have found an error present in the code:

The current logic is: return (*pCopySize == origSize) ? VK_SUCCESS : VK_INCOMPLETE;
Shouldn't it instead be: return (sizeof(S) == origSize) ? VK_SUCCESS : VK_INCOMPLETE;

The original logic might be okay for linearly growing lists, but is not okay for nested structures like performance data. The only concern I have is this function is also used for MVKConfiguration and MVKPhysicalDeviceMetalFeatures which I think are both linear. While the current behaviour is different from the function's doc statement (Returns VK_SUCCESS if the original value of *pCopySize is the same as the actual size of the struct, or VK_INCOMPLETE otherwise), I don't want to break something unintentionally. Please advise.

If you agree with this change, I will submit commits for this as well as bumping MVK_PRIVATE_API_VERSION.

@SRSaunders
Copy link
Contributor Author

SRSaunders commented Mar 17, 2024

I decided to make the proposed changes above and @billhollings can review and comment as needed.

FYI - I have one more thing I would potentially like to add to this - a new MVKQueuePerformance entry that tracks async encoding start delay if the thread is already busy encoding the previous frame. I need to research this a bit more before confirming and will come back with an answer asap. So no rush on the merge.

@SRSaunders
Copy link
Contributor Author

I have completed my research and have submitted an additional commit that proposes and implements two new performance counters in MVKQueuePerformance for measuring asynchronous queue submit wait times: waitSubmitCommandBuffers, and waitPresentSwapchains. These new performance counters are useful when MVK_CONFIG_SYNCHRONOUS_QUEUE_SUBMITS is disabled and queue submits are asynchronous, since there are delays between the initial calls to vkQueueSubmit()/vkQueuePresentKHR() and the actual start of operations.

For instance, waitSubmitCommandBuffers allows you to determine the delay time before the start of command buffer encoding if the thread is already busy encoding the previous frame. And waitPresentSwapchains allows you to predict the delay between calling vkQueuePresentKHR() and the actual swapchain image showing up on the screen - e.g. waitPresentSwapchains + presentSwapchains.

Here are screen grabs of the embedded Optick profiler running inside RBDoom3BFG with MoltenVK. You can see the Submit_Wait times (corresponds to waitSubmitCommandBuffers), Acquire_Wait times (corresponds to retrieveCAMetalDrawable), and Present_Wait times (corresponds to waitPresentSwapchains) and their relationship to command buffer encoding and execution on the GPU.

I have now completed my work on this PR and it is ready for review.

Screenshot 2024-03-18 at 10 43 39 PM

Screenshot 2024-03-18 at 11 51 01 PM

Copy link
Contributor

@billhollings billhollings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating. Some documentation suggestions, and a refactoring request.

MoltenVK/MoltenVK/Vulkan/mvk_api.mm Show resolved Hide resolved
MoltenVK/MoltenVK/GPUObjects/MVKQueue.mm Outdated Show resolved Hide resolved
MoltenVK/MoltenVK/GPUObjects/MVKQueue.mm Outdated Show resolved Hide resolved
MoltenVK/MoltenVK/GPUObjects/MVKQueue.mm Outdated Show resolved Hide resolved
MoltenVK/MoltenVK/GPUObjects/MVKQueue.h Outdated Show resolved Hide resolved
MoltenVK/MoltenVK/API/mvk_private_api.h Outdated Show resolved Hide resolved
MoltenVK/MoltenVK/API/mvk_private_api.h Outdated Show resolved Hide resolved
Copy link
Contributor

@billhollings billhollings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Thanks for all the updates.

@billhollings billhollings merged commit 73f2b8c into KhronosGroup:main Mar 20, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants