With the introduction of Variable Rate Shading Tier 2 on Intel Arc A-Series discrete GPUs it is now possible to render images with fewer pixel shader invocations at runtime. Intel now offers a VRS Tier 2 API based on our Velocity & Luminance Adaptive Rasterization algorithm.
The VALAR API can query for VRS Tier 2 Hardware Support, Render VRS Tier 2 Masks with Intel's VALAR algorithm, as well as control VRS Combiners to include or exclude draw calls from the VRS Tier 2 mask. We chose an opaque descriptor pattern to implement the API which should allow multi-GPU support for VRS Tier 2.
It is important to understand that there are limitations to the VALAR API.
- Requires VRS Tier 2 Capable GPU
- DirectX 12 Only
The VALAR API relies on an opaque descriptor pattern. As such each function in the API requires a descriptor parameter. Upon initialization the VALAR_DESCRIPTPOR
struct will store a ID3D12Device
in an internal "opaque" descriptor. Once the descriptor is initialized, it needs to be kept in memory and re-used for each subsequent call to the VALAR API.
struct VALAR_DESCRIPTOR
{
VALAR_DESCRIPTOR();
VALAR_SHADING_RATE m_baseShadingRate = VALAR_SHADING_RATE_1X1;
float m_sensitivityThreshold = 0.50f;
float m_quarterRateShadingModifier = 2.13f;
float m_environmentLuminance = 0.02f;
bool m_allowQuarterRateShading = true;
bool m_weberFechnerMode = false;
float m_weberFechnerConstant = 1.0f;
bool m_useMotionVectors = false;
bool m_useUpscaleMotionVectors = false;
bool m_debugOverlay = false;
bool m_enabled = true;
UINT m_bufferWidth = 0;
UINT m_bufferHeight = 0;
UINT m_upscaleWidth = 0;
UINT m_upscaleHeight = 0;
ID3D12Device* m_device = nullptr;
ID3D12DescriptorHeap* m_uavHeap = nullptr;
ID3D12Resource* m_valarBuffer = nullptr;
ID3DBlob* m_shaderBlobs[VALAR_SHADER_COUNT];
ID3D12GraphicsCommandList5* m_commandList = nullptr;
VALAR_DESCRIPTOR_OPAQUE* m_pOpaque;
VALAR_HARDWARE_FEATURES m_hwFeatures;
};
The VALAR_DESCRIPTOR
contains parameters to control the settings for the VALAR algorithm. To enable or disable the VALAR API use the Enabled (m_enabled
) parameter, additionally the debug overlay can be enabled or disabled by settings the Debug Overlay (m_debugOverlay
) parameter.
The sensitivity of the VALAR algorithm is controlled through several floating point parameters; these include the Sensitivity Threshold (m_sensitivityThreshold
), the Quarter Rate Shading Modifier (m_quarterRateShadingModifier
), Additive Environment Luminance (m_environmentLuminance
)..
There are several parameters that also control the behavior of the VALAR API and the VALAR Algorithm. The Base Shading Rate (m_baseShadingRate
) parameter works when setting VRS Combiners. Allow Quarter Rate Shading (m_allowQuarterRateShading
) with toggle the use of the 2x4, 4x2, and 4x4 shading rates in the mask. The Weber-Fechner Mode parameter (m_weberFechnerMode
) allows you to control the precision of the VALAR algorithm using the Weber-Fechner Constant (m_weberFechnerConstant
).
The use of motion vectors can be enabled or disabled using the Use Motion Vectors (m_useMotionVectors
) and Use Upscale Motion Vectors (m_useUpscaleMotionVectors
) parameters; both toggle the behavior of the velocity portions of the VALAR algorithm. When using Upscaled Motion Vectors you must also supply an upscaled buffer width (m_upscaleWidth
) and height (m_upscaleHeight
).
Several ID3D12 objects are used to parameterize the D3D12 interface for the VALAR API, which include the device (m_device
), a UAV heap (m_uavHeap
), the VRS Buffer (m_valarBuffer
), the color buffer (m_colorBuffer
), the velocity buffer (m_velocityBuffer
), optional shader blobs (m_shaderBlobs
) for custom shader loading, and a command list (m_commandList
). These parameters will be covered in the following sections.
After initializing, a reference to the opaque descriptor (m_pOpaque
) is kept and a hardware feature (m_hwFeatures
) struct is returned which stores device capabilities of the device.
To Initialize the VALAR API create a VALAR_DESCRIPTOR
type and set the m_device
parameter to the chosen device. Once the descriptor is setup, pass it to the Intel::VALAR_Initialize
function to initialize the opaque descriptor and determine hardware feature support for VRS.
// Provide the D3D Device that will use VALAR
m_valarDescriptor.m_device = m_d3dDevice.Get();
// Check VRS Hardware Features & Initialize the Opaque Descriptor
Intel::VALAR_RETURN_CODE returnCode = Intel::VALAR_Initialize(m_valarDescriptor);
The Intel::VALAR_Initialize
function will return VALAR_RETURN_CODE_SUCCESS
if VRS Tier 2 is supported and initialization was successful.
If initialization fails the following return codes may be generated.
VALAR_RETURN_CODE_INVALID_DEVICE
indicates that the device passed in the descriptor isnullptr
VALAR_RETURN_CODE_NOT_SUPPORTED
indicates that VRS Tier 2 is not support on the device.VALAR_RETURN_CODE_INITIALIZED
indicates the descriptor passed into the function has already been initialized.VALAR_RETURN_CODE_INVALID_ARGUMENT
indicates that one or more of the optional Shader Blob parameters werenullptr
VALAR_RETURN_CODE_ROOTSIG_FAIL
indicates that there was an error in root signature creation.VALAR_RETURN_CODE_PSO_FAIL
indicates that either the optional shader blobs or the embeded shaders for VALAR failed during PSO creation.
Additionally, you can initialize the VALAR API for multiple devices, but it requires multiple VALAR descriptors to be initialized.
// Provide the first D3D Device that will use VALAR
m_valarDescriptor1.m_device = m_d3dDevice1.Get();
// Check VRS Hardware Features & Initialize the Opaque Descriptor for the First Device
Intel::VALAR_RETURN_CODE returnCode = Intel::VALAR_Initialize(m_valarDescriptor1);
// Provide the second D3D Device that will use VALAR
m_valarDescriptor2.m_device = m_d3dDevice2.Get();
// Check VRS Hardware Features & Initialize the Opaque Descriptor for the Second Device
Intel::VALAR_RETURN_CODE returnCode = Intel::VALAR_Initialize(m_valarDescriptor2);
// Initialize Additional Devices As needed
// ...
There are four shaders that are used to support VALAR at runtime, there are two VALAR compute shaders for the common shading rate tile sizes of 8x8 and 16x16 along with a Low-Power (LP) compute shading and a debug overlay compute shader.
Valar8x8CS.hlsl
VALAR Compute Shader for 8x8 Shading Rate Tile Size (Supported by Intel)Valar16x16CS.hlsl
VALAR Compute Shader for 16x16 Shading Rate Tile Size (Other Vendors)ValarLPCS.hlsl
VALAR Low Power Compute Shading (Any Vendor)ValarDebugCS.hlsl
VALAR Debug Overlay Shader for 8x8 & 16x16 Shading Rate Tile Size
By default these shaders are embedded into the .lib
file generated at compile time. The API uses the #define EMBED_VALAR_SHADERS
to control the inclusion of the embedded shaders. However, if EMBED_VALAR_SHADERS
is not defined shader blobs must be provided at initialize time. Failure to supply blobs in the VALAR descriptor will result in a VALAR_RETURN_CODE_PSO_FAIL
return code. For example, the following code initializes the VALAR API using byte code arrays as ID3DBlobs
. It is up to the application programmer to determine how to load the byte code arrays at runtime.
// Provide the D3D Device that will use VALAR
m_valarDescriptor.m_device = m_d3dDevice.Get();
// Provide the VALAR 8x8 Shader Blob Byte Code.
size[VALAR_SHADER_8X8] = sizeof(g_valar8x8ByteCode) / sizeof(const unsigned char);
D3DCreateBlob(size[VALAR_SHADER_8X8], &m_valarDescriptor.m_shaderBlobs[VALAR_SHADER_8X8]);
memcpy(m_valarDescriptor.m_shaderBlobs[VALAR_SHADER_8X8]->GetBufferPointer(), g_valar8x8ByteCode, size[VALAR_SHADER_8X8]);
// Provide the VALAR 16x16 Shader Blob Byte Code.
size[VALAR_SHADER_16X16] = sizeof(g_valar16x16ByteCode) / sizeof(const unsigned char);
D3DCreateBlob(size[VALAR_SHADER_16X16], &m_valarDescriptor.m_shaderBlobs[VALAR_SHADER_16X16]);
memcpy(m_valarDescriptor.m_shaderBlobs[VALAR_SHADER_16X16]->GetBufferPointer(), g_valar16x16ByteCode, size[VALAR_SHADER_16X16]);
// Provide the VALAR Low Power Shader Blob Byte Code.
size[VALAR_LP_SHADER] = sizeof(g_valarLPByteCode) / sizeof(const unsigned char);
D3DCreateBlob(size[VALAR_LP_SHADER], &m_valarDescriptor.m_shaderBlobs[VALAR_LP_SHADER]);
memcpy(m_valarDescriptor.m_shaderBlobs[VALAR_LP_SHADER]->GetBufferPointer(), g_valarLPByteCode, size[VALAR_LP_SHADER]);
// Provide the VALAR Debug Shader Blob Byte Code.
size[VALAR_DEBUG_SHADER] = sizeof(g_valarDebugByteCode) / sizeof(const unsigned char);
D3DCreateBlob(size[VALAR_DEBUG_SHADER], &m_valarDescriptor.m_shaderBlobs[VALAR_DEBUG_SHADER]);
memcpy(m_valarDescriptor.m_shaderBlobs[VALAR_DEBUG_SHADER]->GetBufferPointer(), g_valarDebugByteCode, size[VALAR_DEBUG_SHADER]);
// Check VRS Hardware Features & Initialize the Opaque Descriptor with Custom Shaders
Intel::VALAR_RETURN_CODE returnCode = Intel::VALAR_Initialize(m_valarDescriptor);
Once VALAR initialization is successful be sure to keep a reference to the VALAR descriptor object to use later when generating and applying masks.
After the VALAR_DESCRIPTOR
has been successfully initialized, the descriptor will contain a VALAR_HARDWARE_FEATURES
struct in the m_hwFeatures
field, that can be used to examine hardware support for variable rate shading.
struct VALAR_HARDWARE_FEATURES
{
UINT m_shadingRateTileSize = 0;
bool m_additionalShadingRatesSupported = false;
bool m_sumCombinerSupported = false;
bool m_meshShaderPerPrimSupported = false;
VALAR_VARIABLE_SHADING_RATE_TIER m_shadingRateTier = {};
bool m_vrsTier1Support = false;
bool m_vrsTier2Support = false;
bool m_isVALARSupported = false;
};
The shading rate tile size is returned in m_shadingRateTileSize
and should be used to initialize VRS Buffers used to generate the VALAR mask. The supported shading rate tier is returned in m_shadingRateTier, however, there are also boolean fields to explicitly determine if VRS Tier 1 (m_vrsTier1Support
) or VRS Tier 2 (m_vrsTier2Support
)is supported. m_isVALARSupported
will indicate if there is sufficient hardware support for the VALAR algorithm to work correctly on the device. Additional shading rate support is indicated by m_additionalShadingRatesSupported
and determines if 2x4, 4x2, and 4x4 shading rates are supported. Additionally, a few other VRS related hardware features are detected for convenience such as VRS Sum Combiner Support (m_sumCombinerSupported
) and VRS Mesh Shader Per-Primitive Support (m_meshShaderPerPrimSupported
). At runtime values can be reported or logged similar to the following example
Logger::WriteLine("VALAR Init Succeeded");
Logger::WriteLine("Tier1 Supported: {0}", m_valarDescriptor.m_hwFeatures.m_vrsTier1Support ? "True" : "False");
Logger::WriteLine("T14x4 Supported: {0}", m_valarDescriptor.m_hwFeatures.m_additionalShadingRatesSupported ? "True" : "False");
Logger::WriteLine("Tier2 Supported: {0}", m_valarDescriptor.m_hwFeatures.m_vrsTier2Support ? "True" : "False");
Logger::WriteLine("Tier2 Tile Size: {0}", m_valarDescriptor.m_hwFeatures.m_shadingRateTileSize ? "True" : "False");
Logger::WriteLine("VALAR Supported: {0}", m_valarDescriptor.m_hwFeatures.m_isVALARSupported ? "True" : "False");
Logger::WriteLine("T2Sum Combiners: {0}", m_valarDescriptor.m_hwFeatures.m_sumCombinerSupported ? "True" : "False");
Logger::WriteLine("MeshShader Prim: {0}", m_valarDescriptor.m_hwFeatures.m_meshShaderPerPrimSupported ? "True" : "False");
Once a VALAR_DESCRIPTOR
has been initialized it is possible to generate a VALAR mask. In addition to providing an initialized descriptor the application is also responsible for supplying a Graphics Command List (m_commandList
), a 4 slot UAV Heap (m_uavHeap
), and a VRS Buffer (m_valarBuffer
).
The m_uavHeap
parameter must contain at least two UAVs; Slot 0 is reserved for the VALAR buffer, Slot 1 is reserved for the native resolution color buffer, and optionally Slot 3 is reserved for native resolution motion vectors, while slot 4 can optionally be used with XeSS to provide upscaled motion vectors.
auto uavDescriptorSize = m_d3dDevice->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
D3D12_UNORDERED_ACCESS_VIEW_DESC uavDesc = {};
uavDesc.ViewDimension = D3D12_UAV_DIMENSION_TEXTURE2DARRAY;
uavDesc.Texture2DArray.PlaneSlice = 0;
uavDesc.Texture2DArray.FirstArraySlice = 0;
uavDesc.Texture2DArray.MipSlice = 0;
uavDesc.Texture2DArray.ArraySize = 1;
// Use UAV Slot 0 to pass in VALAR buffer UAV
{
CD3DX12_CPU_DESCRIPTOR_HANDLE uavHandle(m_valarDescriptorHeap->GetCPUDescriptorHandleForHeapStart(), 0, uavDescriptorSize);
uavDesc.Format = DXGI_FORMAT_R8_UINT;
m_d3dDevice->CreateUnorderedAccessView(g_VRSTier2Buffer.GetResource(), nullptr, &uavDesc, uavHandle);
}
// Use UAV Slot 1 to pass in Color buffer UAV
{
CD3DX12_CPU_DESCRIPTOR_HANDLE uavHandle(m_valarDescriptorHeap->GetCPUDescriptorHandleForHeapStart(), 1, uavDescriptorSize);
uavDesc.Format = Target.GetFormat();
m_d3dDevice->CreateUnorderedAccessView( Target.GetResource(), nullptr, &uavDesc, uavHandle);
}
// Use UAV Slot 2 to pass in Velocity buffer UAV
{
CD3DX12_CPU_DESCRIPTOR_HANDLE uavHandle(m_valarDescriptorHeap->GetCPUDescriptorHandleForHeapStart(), 2, uavDescriptorSize);
uavDesc.Format = DXGI_FORMAT_R32_UINT;
m_d3dDevice->CreateUnorderedAccessView(g_VelocityBuffer.GetResource(), nullptr, &uavDesc, uavHandle);
}
// Use UAV Slot 3 to pass in Upscaled Velocity buffer UAV
{
CD3DX12_CPU_DESCRIPTOR_HANDLE uavHandle(m_valarDescriptorHeap->GetCPUDescriptorHandleForHeapStart(), 3, uavDescriptorSize);
uavDesc.Format = DXGI_FORMAT_R16G16_FLOAT;
m_d3dDevice->CreateUnorderedAccessView(g_UpscaledVelocityBuffer.GetResource(), nullptr, &uavDesc, uavHandle);
}
Once the SRV/UAV heap and buffers are initialized update the descriptor with the descriptor heap (m_uavHeap
), color buffer (m_colorBuffer
), vrs buffer (m_valarBuffer
), and velocity buffers (m_velocityBuffer
). The width and height of the color buffer should also be provided in the m_bufferWidth
and m_bufferHeight
descriptor parameters.
Intel::VALAR_DESCRIPTOR& valarDesc = m_valarDescriptor;
valarDesc.m_valarBuffer = m_valarBuffer.Get();
valarDesc.m_uavHeap = m_valarDescriptorHeap.Get();
valarDesc.m_commandList = commandList.Get();
valarDesc.m_bufferHeight = m_height;
valarDesc.m_bufferWidth = m_width;
// ...
After rendering to your velocity and color buffers the application should call Intel::VALAR_ComputeMask
with a command list and parameters to control the VALAR algorithm.
// ...
valarDesc.m_enable = true;
valarDesc.m_commandList = commandList.Get();
valarDesc.m_sensitivityThreshold = 0.5f;
valarDesc.m_quarterRateShadingModifier = 2.13f
valarDesc.m_environmentLuminance = 0.0f;
valarDesc.m_colorBuffer = m_colorBuffer.Get();
// Optional Paramters
// valarDesc.m_useWeberFechner = false;
// valarDesc.m_weberFechnerConstant = 1.0;
// valarDesc.m_allowQuarterRateShading = true;
// ...
// Render Scene
// ...
Intel::VALAR_RETURN_CODE retCode = Intel::VALAR_ComputeMask(valarDesc);
assert(retCode == Intel::VALAR_RETURN_CODE_SUCCESS);
// ...
// Post-Process / GUI
// ...
VALAR supports both native-resolution and XeSS upscaled-resolution motion vectors. As mentioned in the previous section native-resolution velocity is passed in to the compute shader in m_uavHeap
UAV slot 2, while the upscaled motion vectors are passed in UAV slot 3. To enable native-resolution motion vectors set m_useMotionVectors = true
and m_useUpscaledMotionVectors = false
. To enable upscaled-resolution motion vectors for super-sampled render targets set m_useMotionVectors = true
and m_useUpscaledMotionVectors = true
and specify upscaled width and height in m_upscaledWidth
and m_upscaledHeight
fields.
// Enable Native Resolution Motion Vector Support
m_valarDescriptor.m_useMotionVectors = true;
m_valarDescriptor.m_useUpscaleMotionVectors = false;
// ...
// Enable Upscaled Resolution Motion Vector Support
m_valarDescriptor.m_useMotionVectors = true;
m_valarDescriptor.m_useUpscaleMotionVectors = true;
m_valarDescriptor.m_upscaledWidth = m_upscaledWidth;
m_valarDescriptor.m_upscaledHeight = m_upscaledHeight;
Intel::VALAR_RETURN_CODE retCode = Intel::VALAR_ComputeMask(valarDesc);
assert(retCode == Intel::VALAR_RETURN_CODE_SUCCESS);
Intel::VALAR_ComputeMask
will return an return code of VALAR_RETURN_CODE_SUCCESS
if the mask is successfully generated. Otherwise the following VALAR error codes will be returned.
VALAR_RETURN_CODE_NOT_INITIALIZED
indicates thatIntel::VALAR_Initialize
function failed or was never called.VALAR_RETURN_CODE_INVALID_DEVICE
indicates that the opaque descriptors internal device is invalid.VALAR_RETURN_CODE_NOT_SUPPORTED
indicates that the device does not support VRS Tier 2VALAR_RETURN_CODE_INVALID_ARGUMENT
indicates that the Command List, UAV Heap, or VRS buffer is invalid.
Once the Intel::VALAR_ComputeMask
function returns successfully you can apply the mask to any valid graphics command list.
While Intel::VALAR_ComputeMask
does produce a high quality VRS mask it can be expensive for large render targets or integrated GPUs. Intel::VALAR_ComputeMaskLP
can execute an approximation of the VALAR algorithm which reduces the cost of the compute shader with minimal quality loss in the VRS buffer. The descriptor parameters for Intel::VALAR_ComputeMaskLP
are exactly the same as Intel::VALAR_ComputeMask
making Intel::VALAR_ComputeMaskLP
an drop-in replacement for Intel::VALAR_ComputeMask
. It should be noted that Low Power Mode works best with 2x2 Only Mode.
Intel::VALAR_ComputeMaskLP
works by reducing the number of thread groups and threads dispatched. To reduce the number of threads groups the dispatch size is modified to be width & height of the native resolution buffer divided by the shading rate tile size, which is then divided by the thread group size of 8.
// # Thread Groups in X = (width / tileSize) / 8
// # Thread Groups in Y = (height / tileSize) / 8
m_commandList.Dispatch(
(UINT)ceilf(((float)desc.m_bufferWidth / (float)desc.m_pOpaque->m_featureSupport.m_shadingRateTileSize) / 8.0f),
(UINT)ceilf(((float)desc.m_bufferHeight / (float)desc.m_pOpaque->m_featureSupport.m_shadingRateTileSize) / 8.0f), 1);
For Example, to render a VALAR mask for an 1920x1080 native resolution render target with a VRS tile size of 8x8 the following equations are used to determine how many thread groups are dispatched.
Thread Groups X: 1920 / 8 = 240
Thread Groups Y: 1080 / 8 = 135
Standard Thread Groups: 240 * 135 = 32,640
Thread Groups X: = 1920 / 8 / 8 = 30
Thread Groups Y: = 1080 / 8 / 8 = 16.8 ~= 17
Low-Power Thread Groups: = 30 * 17 = 510
In ValarLPCS.hlsl the thread group size for the shader is is defined as 8x8x1. Unlike Intel::VALAR_ComputeMask
the [numthreads()] intrinsic of the Intel::VALAR_ComputeMaskLP
compute shader is not coupled to the shading rate tile size.
// 8x8 threads dispatched per thread group
[numthreads(8, 8, 1)]
For example, to compute the total number of threads dispatched for a 1920x1080 native resolution render target the following equations are used.
Threads X: 240 * 8 = 1920
Threads Y: 135 * 8 = 1080
# VALAR Standard Threads: 1920 * 1080 = 2,073,600
Threads X: 30 * 8 = 240
Threads Y: 17 * 8 = 136
# VALAR Low-Power Threads: 240 * 136 = 32,640
Each thread is then responsible for computing the average luminance and luminance differences for each tile in the VRS buffer. To reduce the number of reads from the original native resolution buffer, only 4 pixels are sampled from the native resolution buffer per thread. These pixels are the pixels adjacent to the centroid of the VRS Tile.
By only sampling four adjacent pixels the local luminance differences in the X-Axis and Y-Axis can be computed without additional sampling.
For an 8x8 VRS Tile, there are 3 samples per pixel in the native resolution buffer. However, by only executing 1 thread per tile and 4 samples per thread, the number of samples is thus reduced from 8 * 8 * 3 = 192 Samples to 1 * 1 * 4 = 4 Samples
Additionally, due to the modified thread group dispatch and sampling patterns two GroupMemoryBarrierWithGroupSync()
calls are removed from ValarLPCS.hlsl
Due to the increased sensitivity of a reduced number of luminance samples it was necessary to reduce the incoming sensitivity threshold of the algorithm to get the approximation to more closely match the original implementation. This was done by dividing the sensitivity threshold by 2.
JND = (SensitivityThreshold / 2) * (AverageLuminance + EnvironmentalLuminance)
Due to the modifications to the thread groups and sampling pattern, the resulting VALAR mask produced by Intel::VALAR_ComputeMaskLP
is only an approximation of a mask generated by Intel::VALAR_ComputeMask
. However, the mask that is produced is very similar to the original with only some slight noise introduced. The differences in each mask can be compared using an image comparison tool such as Beyond Compare.
The number of similar and different pixels can be computed from the image diff for each of the masks above. At 1080p in the sample image used, ~79% of pixels where the same while ~21% changed. As the resolution increased to 2k, the percent of similar pixels increased to ~88% and the number of different pixels decreased to ~12%. Increasing the resolution further to 4k resulted in ~91% of pixels staying the same and only ~9% of pixels changing.
Intel::VALAR_ComputeMaskLP
will return an return code of VALAR_RETURN_CODE_SUCCESS
if the mask is successfully generated. Otherwise the following VALAR error codes will be returned.
VALAR_RETURN_CODE_NOT_INITIALIZED
indicates thatIntel::VALAR_Initialize
function failed or was never called.VALAR_RETURN_CODE_INVALID_DEVICE
indicates that the opaque descriptors internal device is invalid.VALAR_RETURN_CODE_NOT_SUPPORTED
indicates that the device does not support VRS Tier 2VALAR_RETURN_CODE_INVALID_ARGUMENT
indicates that the Command List, UAV Heap, or VRS buffer is invalid.
Once the Intel::VALAR_ComputeMaskLP
function returns successfully you can apply the mask to any valid graphics command list.
After a mask has been generated it needs to be applied to the next frame. Masks can be applied using the Intel::VALAR_ApplyMask
function. Internally Intel::VALAR_ApplyMask
calls ID3D12GraphicsCommandList5::RSSetShadingRateImage
. To apply a mask, a valid VALAR_DESCRIPTOR
must be passed with a valid ID3D12GraphicsCommandList5
assigned to m_commandList
parameter along with a valid ID3D12Resource
passed in the m_valarBuffer
parameter.
valarDesc.m_enable = true;
valarDesc.m_commandList = commandList.Get();
valarDesc.m_valarBuffer = m_valarBuffer.Get();
// ...
// Start Frame
// ...
// ...
// Set VRS Combiners
// ...
Intel::VALAR_RETURN_CODE retCode = Intel::VALAR_ApplyMask(valarDesc);
assert(retCode == Intel::VALAR_RETURN_CODE_SUCCESS);
// ...
// Render Frame (With Mask)
// ...
// ...
// Compute Mask For Next Frame
// ...
Intel::VALAR_ApplyMask
will return VALAR_RETURN_CODE_SUCCESS
when the mask is successfully applied or the m_enable
flag is set to false. Otherwise, Intel::VALAR_ApplyMask
will return the following error codes.
VALAR_RETURN_CODE_NOT_INITIALIZED
indicates that theVALAR_DESCRIPTOR
was not properly initialized.VALAR_RETURN_CODE_NOT_SUPPORTED
indicates that the device used to initialize VALAR does not support VRS Tier 2VALAR_RETURN_CODE_INVALID_ARGUMENT
indicates that eitherm_commandList
orm_valarBuffer
isnullptr
Intel::VALAR_ApplyMask
can also be used to provide a custom mask generated by the application. The mask does not need to be generated by Intel::VALAR_ComputeMask
it just needs to be a valid ID3D12Resource
with a format of DXGI_FORMAT_R8_UINT
. Applying the mask can also be done on any valid ID3D12GraphicsCommandList5
and each command list can apply a different mask if needed.
In some cases after a mask has been applied using Intel::VALAR_ApplyMask
it may be necessary to disable the mask before a specific render pass is rendered, such as post-processing or a GUI. In this case the application can call Intel::VALAR_ResetMask
. The Intel::VALAR_ResetMask
function requires a valid descriptor with m_commandList
to a valid ID3D12GraphicsCommandList5
.
valarDesc.m_enable = true;
valarDesc.m_commandList = commandList.Get();
// ...
// Apply VALAR Mask
// ...
// ...
// Render Frame
// ...
Intel::VALAR_RETURN_CODE retCode = Intel::VALAR_ResetMask(valarDesc);
assert(retCode == Intel::VALAR_RETURN_CODE_SUCCESS);
// ...
// Render GUI
// ...
Intel::VALAR_ResetMask
will return VALAR_RETURN_CODE_SUCCESSS
when called succesfully, otherwise the following error codes will be returned.
VALAR_RETURN_CODE_NOT_INITIALIZED
indicates theVALAR_DESCRIPTOR
was not properly initialized.VALAR_RETURN_CODE_NOT_SUPPORTED
indicates that the device used to initialize the descriptor does not support VRS Tier 2VALAR_RETURN_CODE_INVALID_ARGUMENT
indicates thatm_commandList
isnullptr
.
Before applying a VRS Tier 2 mask it is necessary to properly configure the VRS combiners to allow screen-space shading rates or per-draw shading rates to take precedent. The VALAR API provides a few helper functions to set a "Hero" Passthrough + Passthrough combiner, a screen-space Passthrough + Override combiner, or a custom combiner. Each function requires a valid ID3D12GraphicsCommandList5
pointer to be passed in the m_commandList
field of the VALAR descriptor. Optionally, these functions can also be used to provide a base VRS Tier 1 shading rate to render draw calls with by setting m_baseShadingRate
in the VALAR descriptor struct.
Intel::VALAR_SetScreenSpaceCombiners
is a Passthrough + Override combiner which allows the screen-space VRS Tier 2 shading rates to override the per-draw shading rates provided by RSSetShadingRate
. Draw calls drawn after applying the Screen-Space combiners will be drawn using the VRS Tier 2 shading rate.
valarDesc.m_baseShadingRate = VALAR_SHADING_RATE_1X1;
valarDesc.m_commandList = commandList;
retCode = Intel::VALAR_SetScreenSpaceCombiners(valarDesc);
assert(retCode == Intel::VALAR_RETURN_CODE_SUCCESS);
// ...
// Apply VALAR Mask
// ...
// ...
// Render Frame
// ...
Intel::VALAR_SetHeroCombiners
is a Passthrough + Passthrough combiner which will allow individual draw calls to take precedent over the VRS Tier 2 shading rate. These combiners can be used to preserve visual quality for "Hero" art assets or to fix visual corruption from shaders that don't work well with Variable Rate Shading.
valarDesc.m_baseShadingRate = VALAR_SHADING_RATE_2X2;
valarDesc.m_commandList = commandList;
// ...
// Apply VALAR Mask
// ...
// ...
// Render Frame With Screen Space Shading Rate
// ...
retCode = Intel::VALAR_SetHeroCombiners(valarDesc);
assert(retCode == Intel::VALAR_RETURN_CODE_SUCCESS);
// ...
// Render Frame "Hero" Assets
// ...
Intel::VALAR_SetCustomCombiners
allows the application to set a custom set of combiners if needed.
valarDesc.m_baseShadingRate = VALAR_SHADING_RATE_4X4;
valarDesc.m_commandList = commandList;
retCode = Intel::VALAR_SetCustomCombiners(valarDesc,
VALAR_SHADING_RATE_COMBINER_MIN,
VALAR_SHADING_RATE_COMBINER_MAX);
assert(retCode == Intel::VALAR_RETURN_CODE_SUCCESS);
If the call to any of these functions is successful a return code VALAR_RETURN_CODE_SUCCESS
is returned. Otherwise one of the following error codes will be returned.
VALAR_RETURN_CODE_NOT_INITIALIZED
indicates theVALAR_DESCRIPTOR
was not properly initialized.VALAR_RETURN_CODE_NOT_SUPPORTED
indicates that the device used to initialize the descriptor does not support VRS Tier 1VALAR_RETURN_CODE_INVALID_ARGUMENT
indicates thatm_commandList
isnullptr
.
The VALAR API provides a debug overlay that can be used to visualize the VALAR buffer at runtime to aid in debugging. The debug visualization works on both native resolution render targets as well as upscaled render targets. To use the debug overlay a VALAR_DESCRIPTOR
must be used with the m_debugOverlay
field set to true. The VALAR_DESCRIPTOR
must also have a valid ID3D12GraphicsCommandList5
pointer set for the m_commandList
field along with valid resources for the color buffer (m_colorBuffer
), VRS buffer (m_valarBuffer
), and SRV/UAV heap (m_uavHeap
).
The Debug Overlay requires a 2 slot UAV heap; UAV slot 0 contains the VRS Buffer to visualize, while UAV slot 2 will contain a native resolution color UAV or an upscaled color UAV to draw the overlay on. An upscaled width and height must always be passed in through the descriptor using m_upscaledWidth
and m_upscaledHeight
. When using a native resolution color buffer, set m_upscaledWidth
and m_upscaledHeight
equal to m_nativeWidth
and m_nativeHeight
respectively.
// ...
// Render Scene
// ...
// ...
// Generate VALAR Mask
// ...
// Render the Debug Overlay
D3D12_UNORDERED_ACCESS_VIEW_DESC uavDesc = {};
uavDesc.Texture2DArray.PlaneSlice = 0;
uavDesc.Texture2DArray.FirstArraySlice = 0;
uavDesc.Texture2DArray.MipSlice = 0;
uavDesc.Texture2DArray.ArraySize = 1;
uavDesc.ViewDimension = D3D12_UAV_DIMENSION_TEXTURE2DARRAY;
// Use UAV Slot 0 for VRS Buffer with format of DXGI_FORMAT_R8_UINT
{
CD3DX12_CPU_DESCRIPTOR_HANDLE uavHandle(m_valarDebugHeap->GetCPUDescriptorHandleForHeapStart(), 0, uavDescriptorSize);
uavDesc.Format = DXGI_FORMAT_R8_UINT;
m_d3dDevice->CreateUnorderedAccessView(g_VRSTier2Buffer.GetResource(), nullptr, &uavDesc, uavHandle);
}
// Use UAV Slot 1 for Upscaled Color buffer or Native Color Buffer
if(upscaledOverlay)
{
CD3DX12_CPU_DESCRIPTOR_HANDLE uavHandle(m_valarDebugHeap->GetCPUDescriptorHandleForHeapStart(), 1, uavDescriptorSize);
uavDesc.Format = g_UpscaledSceneColorBuffer.GetFormat();
m_d3dDevice->CreateUnorderedAccessView(g_UpscaledSceneColorBuffer.GetResource(), nullptr, &uavDesc, uavHandle);
valarDesc.m_upscaleHeight = g_UpscaledSceneColorBuffer.GetHeight();
valarDesc.m_upscaleWidth = g_UpscaledSceneColorBuffer.GetWidth();
}
else
{
CD3DX12_CPU_DESCRIPTOR_HANDLE uavHandle(m_valarDebugHeap->GetCPUDescriptorHandleForHeapStart(), 1, uavDescriptorSize);
uavDesc.Format = g_SceneColorBuffer.GetFormat();
m_d3dDevice->CreateUnorderedAccessView(g_SceneColorBuffer.GetResource(), nullptr, &uavDesc, uavHandle);
valarDesc.m_upscaleHeight = g_SceneColorBuffer.GetHeight();
valarDesc.m_upscaleWidth = g_SceneColorBuffer.GetWidth();
}
valarDesc.m_debugOverlay = true;
valarDesc.m_debugGrid = VRS::DebugDrawDrawGrid;
valarDesc.m_bufferHeight = Target.GetHeight();
valarDesc.m_bufferWidth = Target.GetWidth();
// Use the 2 slot Debug Heap
valarDesc.m_uavHeap = m_valarDebugHeap;
Intel::VALAR_RETURN_CODE retCode = Intel::VALAR_DebugOverlay(valarDesc);
assert(retCode == Intel::VALAR_RETURN_CODE_SUCCESS);
// ...
// Render GUI
// ...
Intel::VALAR_DebugOverlay
will return VALAR_RETURN_CODE_SUCCESS
when successfully called. Otherwise the following error codes may be returned on failure.
VALAR_RETURN_CODE_NOT_INITIALIZED
indicates thatIntel::VALAR_Initialize
function failed or was never called.VALAR_RETURN_CODE_INVALID_DEVICE
indicates that the opaque descriptors internal device is invalid.VALAR_RETURN_CODE_NOT_SUPPORTED
indicates that the device does not support VRS Tier 2VALAR_RETURN_CODE_INVALID_ARGUMENT
indicates that the Command List or UAV Heap is invalid.
When the application shuts down it is important to cleanup the internal resources required by the VALAR API. This can be done by calling the Intel::VALAR_Release
function with a valid VALAR_DESCRIPTOR
reference.
// Release Internal VALAR API Resources
Intel::VALAR_RETURN_CODE retCode = Intel::VALAR_Release(m_valarDescriptor);
assert(retCode == Intel::VALAR_RETURN_CODE_SUCCESS);
Intel::VALAR_Release
will return VALAR_RETURN_CODE_SUCCESS
when successfully called. Otherwise the following error codes may be returned on failure.
VALAR_RETURN_CODE_NOT_INITIALIZED
indicates thatIntel::VALAR_Initialize
function failed or was never called.
After calling Intel::VALAR_Release
the VALAR descriptor used will no longer be valid and cannot be used unless Intel::VALAR_Initialize
is called first.
Many thanks to Lei Yang and his paper on visually lossless motion adaptive shading in games, http://leiy.cc/publications/nas/nas-pacmcgit.pdf.
Thanks to Adam Lake, Aria Kraft, Dietmar Suoch, Daniele Pieroni, Meghan Weicht, Ani Alston, for co-authoring, reviewing and supporting this work as well as many others who helped along the way!
Many thanks to the developers of the following open-source libraries or projects that helped make this project possible.
- dear imgui (https://github.com/ocornut/imgui)
- AMD's AOFX (https://github.com/shadercoder/AOFX)
- fx-gltf by Jesse Yurkovich (https://github.com/jessey-git/fx-gltf)
- [Yang, 2019]: AS/NAS, Visually Lossless Content and Motion Adaptive Shading in Games, Lei Yang, 2019, http://leiy.cc/publications/nas/nas-pacmcgit.pdf
- [Weicht, Lake, Kraft, du Bois, 2022]: Velocity and Luminance Adaptive Rasterization Using VRS Tier 2, Weicht, Lake, Kraft, du Bois, 2022, https://www.intel.com/content/www/us/en/developer/articles/technical/velocity-luminance-adaptive-rasterization-vrs-tier.html
Sample and its code provided under MIT license, please see LICENSE. All third-party source code provided under their own respective and MIT-compatible Open Source licenses.
Copyright (C) 2024, Intel Corporation