Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to bindless resources and descriptor indexing #2260

Merged
merged 7 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 2 additions & 10 deletions Docs/MoltenVK_Configuration_Parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -616,22 +616,14 @@ cleared via a call to the `vkTrimCommandPoolKHR()` command.
---------------------------------------
#### MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS

##### Type: Enumeration
- `0`: Don't use _Metal_ Argument Buffers.
- `1`: Use _Metal_ Argument Buffers for all pipelines.
- `2`: Use _Metal_ Argument Buffers only if the `VK_EXT_descriptor_indexing` extension is enabled.

##### Default: `0`
##### Type: Boolean
##### Default: `1`

Controls whether **MoltenVK** should use _Metal_ argument buffers for resources defined in descriptor sets,
if _Metal_ argument buffers are supported on the platform. Using _Metal_ argument buffers dramatically
increases the number of buffers, textures and samplers that can be bound to a pipeline shader, and in most
cases improves performance.

_**NOTE:**_ Currently, _Metal_ argument buffer support is in beta stage, and is only supported on _macOS 11.0+_,
or on older versions of _macOS_ using an _Intel_ GPU. _Metal_ argument buffers support is not available on _iOS_ or _tvOS_.
Development to support _iOS_ and _tvOS_ and a wider combination of GPU's on older _macOS_ versions is under way.


---------------------------------------
#### MVK_CONFIG_USE_MTLHEAP
Expand Down
9 changes: 9 additions & 0 deletions Docs/Whats_New.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,19 @@ MoltenVK 1.2.10

Released TBD

- Improvements to bindless resources and descriptor indexing:
- Add support for Metal3 argument buffers.
- Support argument buffers on all platforms, when Metal 3 is available.
- Support argument buffers on macOS when Metal3 is not available.
- Use Metal argument buffers by default when they are available.
- Revert MVKConfiguration::useMetalArgumentBuffers and env var
`MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS` to a boolean value, and enable it by default.
- Update max number of bindless buffers and textures per stage to 1M, per Apple Docs.
- Add option to generate a GPU capture via a temporary named pipe from an external process.
- Fix shader conversion failure when using native texture atomics.
- MSL shader conversion, only pass resource bindings that apply to current shader stage.
- Update documentation for minimum runtime OS requirements to indicate _macOS 10.15_, _iOS 13_, or _tvOS 13_.
- Update `MVK_PRIVATE_API_VERSION` to version `43`.
- Update to latest SPIRV-Cross:
- MSL: Add option to force depth write in fragment shaders
- MSL: Improve handling of padded descriptors with argument buffers
Expand Down
2 changes: 1 addition & 1 deletion ExternalRevisions/SPIRV-Cross_repo_revision
Original file line number Diff line number Diff line change
@@ -1 +1 @@
d47a140735cb44e511d0188a6318c365789e4699
6fd1f75636b1c424b809ad8a84804654cf5ae48b
2 changes: 1 addition & 1 deletion MoltenVK/MoltenVK/API/mvk_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ extern "C" {
/**
* This header is obsolete and deprecated, and is provided for legacy compatibility only.
*
* To configure MoltenVK, use one of the following mechanisms,
* To configure MoltenVK, use one of the following mechanisms,
* as documented in MoltenVK_Configuration_Parameters.md:
*
* - The standard Vulkan VK_EXT_layer_settings extension (layer name "MoltenVK").
Expand Down
17 changes: 5 additions & 12 deletions MoltenVK/MoltenVK/API/mvk_private_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ typedef unsigned long MTLArgumentBuffersTier;
*/


#define MVK_PRIVATE_API_VERSION 41
#define MVK_PRIVATE_API_VERSION 43
billhollings marked this conversation as resolved.
Show resolved Hide resolved


#pragma mark -
Expand Down Expand Up @@ -140,14 +140,6 @@ typedef enum MVKConfigAdvertiseExtensionBits {
} MVKConfigAdvertiseExtensionBits;
typedef VkFlags MVKConfigAdvertiseExtensions;

/** Identifies the use of Metal Argument Buffers. */
typedef enum MVKUseMetalArgumentBuffers {
MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS_NEVER = 0, /**< Don't use Metal Argument Buffers. */
MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS_ALWAYS = 1, /**< Use Metal Argument Buffers for all pipelines. */
MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS_DESCRIPTOR_INDEXING = 2, /**< Use Metal Argument Buffers only if VK_EXT_descriptor_indexing extension is enabled. */
MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS_MAX_ENUM = 0x7FFFFFFF
} MVKUseMetalArgumentBuffers;

/** Identifies the Metal functionality used to support Vulkan semaphore functionality (VkSemaphore). */
typedef enum MVKVkSemaphoreSupportStyle {
MVK_CONFIG_VK_SEMAPHORE_SUPPORT_STYLE_SINGLE_QUEUE = 0, /**< Limit Vulkan to a single queue, with no explicit semaphore synchronization, and use Metal's implicit guarantees that all operations submitted to a queue will give the same result as if they had been run in submission order. */
Expand Down Expand Up @@ -240,7 +232,7 @@ typedef struct {
uint32_t apiVersionToAdvertise; /**< MVK_CONFIG_API_VERSION_TO_ADVERTISE */
MVKConfigAdvertiseExtensions advertiseExtensions; /**< MVK_CONFIG_ADVERTISE_EXTENSIONS */
VkBool32 resumeLostDevice; /**< MVK_CONFIG_RESUME_LOST_DEVICE */
MVKUseMetalArgumentBuffers useMetalArgumentBuffers; /**< MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS */
VkBool32 useMetalArgumentBuffers; /**< MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS */
MVKConfigCompressionAlgorithm shaderSourceCompressionAlgorithm; /**< MVK_CONFIG_SHADER_COMPRESSION_ALGORITHM */
VkBool32 shouldMaximizeConcurrentCompilation; /**< MVK_CONFIG_SHOULD_MAXIMIZE_CONCURRENT_COMPILATION */
float timestampPeriodLowPassAlpha; /**< MVK_CONFIG_TIMESTAMP_PERIOD_LOWPASS_ALPHA */
Expand Down Expand Up @@ -352,8 +344,8 @@ typedef struct {
uint32_t minSubgroupSize; /**< The minimum number of threads in a SIMD-group. */
VkBool32 textureBarriers; /**< If true, texture barriers are supported within Metal render passes. Deprecated. Will always be false on all platforms. */
VkBool32 tileBasedDeferredRendering; /**< If true, this device uses tile-based deferred rendering. */
VkBool32 argumentBuffers; /**< If true, Metal argument buffers are supported. */
VkBool32 descriptorSetArgumentBuffers; /**< If true, a Metal argument buffer can be assigned to a descriptor set, and used on any pipeline and pipeline stage. If false, a different Metal argument buffer must be used for each pipeline-stage/descriptor-set combination. */
VkBool32 argumentBuffers; /**< If true, Metal argument buffers are supported on the platform. */
VkBool32 descriptorSetArgumentBuffers; /**< If true, Metal argument buffers can be used for descriptor sets. */
MVKFloatRounding clearColorFloatRounding; /**< Identifies the type of rounding Metal uses for MTLClearColor float to integer conversions. */
MVKCounterSamplingFlags counterSamplingPoints; /**< Identifies the points where pipeline GPU counter sampling may occur. */
VkBool32 programmableSamplePositions; /**< If true, programmable MSAA sample positions are supported. */
Expand All @@ -364,6 +356,7 @@ typedef struct {
VkBool32 dynamicVertexStride; /**< If true, VK_DYNAMIC_STATE_VERTEX_INPUT_BINDING_STRIDE is supported. */
VkBool32 needsCubeGradWorkaround; /**< If true, sampling from cube textures with explicit gradients is broken and needs a workaround. */
VkBool32 nativeTextureAtomics; /**< If true, atomic operations on textures are supported natively. */
VkBool32 needsArgumentBufferEncoders; /**< If true, Metal argument buffer encoders are needed to populate argument buffer content. */
} MVKPhysicalDeviceMetalFeatures;


Expand Down
58 changes: 12 additions & 46 deletions MoltenVK/MoltenVK/Commands/MVKCommandEncoderState.mm
Original file line number Diff line number Diff line change
Expand Up @@ -651,7 +651,7 @@ - (void)setDepthBoundsTestAMD:(BOOL)enable minDepth:(float)minDepth maxDepth:(fl

_boundDescriptorSets[descSetIndex] = descSet;

if (descSet->isUsingMetalArgumentBuffers()) {
if (descSet->hasMetalArgumentBuffer()) {
// If the descriptor set has changed, track new resource usage.
if (dsChanged) {
auto& usageDirty = _metalUsageDirtyDescriptors[descSetIndex];
Expand All @@ -671,46 +671,22 @@ - (void)setDepthBoundsTestAMD:(BOOL)enable minDepth:(float)minDepth maxDepth:(fl
}
}

// Encode the dirty descriptors to the Metal argument buffer, set the Metal command encoder
// usage for each resource, and bind the Metal argument buffer to the command encoder.
// Encode the Metal command encoder usage for each resource,
// and bind the Metal argument buffer to the command encoder.
void MVKResourcesCommandEncoderState::encodeMetalArgumentBuffer(MVKShaderStage stage) {
if ( !_cmdEncoder->isUsingMetalArgumentBuffers() ) { return; }

bool useDescSetArgBuff = _cmdEncoder->isUsingDescriptorSetMetalArgumentBuffers();

MVKPipeline* pipeline = getPipeline();
uint32_t dsCnt = pipeline->getDescriptorSetCount();
for (uint32_t dsIdx = 0; dsIdx < dsCnt; dsIdx++) {
auto* descSet = _boundDescriptorSets[dsIdx];
if ( !descSet ) { continue; }
if ( !(descSet && descSet->hasMetalArgumentBuffer()) ) { continue; }

auto* dsLayout = descSet->getLayout();

// The Metal arg encoder can only write to one arg buffer at a time (it holds the arg buffer),
// so we need to lock out other access to it while we are writing to it.
auto& mvkArgEnc = useDescSetArgBuff ? dsLayout->getMTLArgumentEncoder() : pipeline->getMTLArgumentEncoder(dsIdx, stage);
lock_guard<mutex> lock(mvkArgEnc.mtlArgumentEncodingLock);

id<MTLBuffer> mtlArgBuffer = nil;
NSUInteger metalArgBufferOffset = 0;
id<MTLArgumentEncoder> mtlArgEncoder = mvkArgEnc.getMTLArgumentEncoder();
if (useDescSetArgBuff) {
mtlArgBuffer = descSet->getMetalArgumentBuffer();
metalArgBufferOffset = descSet->getMetalArgumentBufferOffset();
} else {
// TODO: Source a different arg buffer & offset for each pipeline-stage/desccriptors set
// Also need to only encode the descriptors that are referenced in the shader.
// MVKMTLArgumentEncoder could include an MVKBitArray to track that and have it checked below.
}

if ( !(mtlArgEncoder && mtlArgBuffer) ) { continue; }

auto& argBuffDirtyDescs = descSet->getMetalArgumentBufferDirtyDescriptors();
auto& resourceUsageDirtyDescs = _metalUsageDirtyDescriptors[dsIdx];
auto& shaderBindingUsage = pipeline->getDescriptorBindingUse(dsIdx, stage);

bool mtlArgEncAttached = false;
bool shouldBindArgBuffToStage = false;

uint32_t dslBindCnt = dsLayout->getBindingCount();
for (uint32_t dslBindIdx = 0; dslBindIdx < dslBindCnt; dslBindIdx++) {
auto* dslBind = dsLayout->getBindingAt(dslBindIdx);
Expand All @@ -719,32 +695,22 @@ - (void)setDepthBoundsTestAMD:(BOOL)enable minDepth:(float)minDepth maxDepth:(fl
uint32_t elemCnt = dslBind->getDescriptorCount(descSet);
for (uint32_t elemIdx = 0; elemIdx < elemCnt; elemIdx++) {
uint32_t descIdx = dslBind->getDescriptorIndex(elemIdx);
bool argBuffDirty = argBuffDirtyDescs.getBit(descIdx, true);
bool resourceUsageDirty = resourceUsageDirtyDescs.getBit(descIdx, true);
if (argBuffDirty || resourceUsageDirty) {
// Don't attach the arg buffer to the arg encoder unless something actually needs
// to be written to it. We often might only be updating command encoder resource usage.
if (!mtlArgEncAttached && argBuffDirty) {
[mtlArgEncoder setArgumentBuffer: mtlArgBuffer offset: metalArgBufferOffset];
mtlArgEncAttached = true;
}
if (resourceUsageDirtyDescs.getBit(descIdx, true)) {
auto* mvkDesc = descSet->getDescriptorAt(descIdx);
mvkDesc->encodeToMetalArgumentBuffer(this, mtlArgEncoder,
dsIdx, dslBind, elemIdx,
stage, argBuffDirty, true);
mvkDesc->encodeResourceUsage(this, dslBind, stage);
}
}
}
}
descSet->encodeAuxBufferUsage(this, stage);

// If the arg buffer was attached to the arg encoder, detach it now.
if (mtlArgEncAttached) { [mtlArgEncoder setArgumentBuffer: nil offset: 0]; }

// If it is needed, bind the Metal argument buffer itself to the command encoder,
if (shouldBindArgBuffToStage) {
auto& mvkArgBuff = descSet->getMetalArgumentBuffer();
MVKMTLBufferBinding bb;
bb.mtlBuffer = descSet->getMetalArgumentBuffer();
bb.offset = descSet->getMetalArgumentBufferOffset();
bb.mtlBuffer = mvkArgBuff.getMetalArgumentBuffer();
bb.offset = mvkArgBuff.getMetalArgumentBufferOffset();
bb.index = dsIdx;
bindMetalArgumentBuffer(stage, bb);
}
Expand All @@ -753,7 +719,7 @@ - (void)setDepthBoundsTestAMD:(BOOL)enable minDepth:(float)minDepth maxDepth:(fl
// the contents of Metal argument buffers. Triggering an extraction of the arg buffer
// contents here, after filling it, seems to correct that.
// Sigh. A bug report has been filed with Apple.
if (getDevice()->isCurrentlyAutoGPUCapturing()) { [descSet->getMetalArgumentBuffer() contents]; }
if (getDevice()->isCurrentlyAutoGPUCapturing()) { [descSet->getMetalArgumentBuffer().getMetalArgumentBuffer() contents]; }
}
}

Expand Down
2 changes: 1 addition & 1 deletion MoltenVK/MoltenVK/Commands/MVKMTLBufferAllocation.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ class MVKMTLBufferAllocation : public MVKBaseObject, public MVKLinkableMixin<MVK
* Returns a pointer to the begining of this allocation memory, taking into
* consideration this allocation's offset into the underlying MTLBuffer.
*/
inline void* getContents() const { return (void*)((uintptr_t)_mtlBuffer.contents + _offset); }
void* getContents() const { return (void*)((uintptr_t)_mtlBuffer.contents + _offset); }

/** Returns the pool whence this object was created. */
MVKMTLBufferAllocationPool* getPool() const { return _pool; }
Expand Down
Loading
Loading