Skip to content

Latest commit

 

History

History
884 lines (877 loc) · 91.9 KB

configuration_flags.md

File metadata and controls

884 lines (877 loc) · 91.9 KB

Configuration flags for Linux Release

Overview

Linux release build allows enabling user-selected configuration flags. They are available after installing release build according to the instructions here. This file is autogenerated from igc_flags.h.

Important notice

Configuration flags are generally used either for debug purposes or to experimentally change the compiler's behavior. Intel does not guarantee full performance and conformance when using configuration flags.

How to enable a flag

A flag is enabled when it is set as a variable in an environment.

The syntax is as follows:

IGC_<flag>=<value>

For example - to enable ShaderDumpEnable flag in shell:

$ export IGC_ShaderDumpEnable=1

VISA optimization

Flag Description Release builds
AssumeUniformIndirectCall Assume indirect call is uniform to avoid looping code -
AvoidDstSrcGRFOverlap avoid GRF overlap for destination and source operands of an SIMD16/SIMD32 instruction -
AvoidSrc1Src2Overlap avoid src1 and src2 GRF overlap to avoid the conflict without read suppression -
CSSIMD16_SpillThreshold Percentage of instructions allowed for spilling on CS SIMD16 -
CSSIMD32_SpillThreshold Percentage of instructions allowed for spilling on CS SIMD32 -
DPASTokenReduction optimization to reduce the tokens used for DPAS instruction. Available
DisableCSEL disable csel peep-hole -
DisableFlagOpt Disable optimization cmp with logic op -
DisableGatherRSFusionSyncWA Disable WA for gather instruction when read suppression and EU fusion are enabled. Available
DisableHFMath Disables HF math instructions. -
DisableIfCvt Disable ifcvt -
DisableMixMode Disables mix mode in vISA BE. -
DisableRegDistDep distable regDist dependence Available
DisableSendS Setting this to 1/true adds a compiler switch to not generate sends commands, default is to enable sends -
DisableThreeALUPipes Disable three ALU Pipelines. XeHP only Available
DisableWriteCombine Disable write combine. PVC+ only -
DumpASMToConsole Dump ASM to console and do early exit Available
DumpPromoteI8 Dump useful info during promoting i8 to i16 Available
DumpVISAASMToConsole Dump VISAASM to console and do early exit Available
Enable16DWURBWrite Enable 16 Dword URB Write messages Available
Enable16OWSLMBlockRW Enable 16 OWord (8 GRF) SLM block read/write message Available
Enable64BMediaBlockRW Enable 64 byte wide media block read/write message Available
EnableAdd3 Enable Add3. XeHP+ only Available
EnableAtomicFusion To enable/disable atomic send fusion (simd8 shaders). Valid if EnableSendFusion is on. -
EnableBCR Enable bank conflict reduction. Available
EnableBfn Enable Bfn. XeHP+ only Available
EnableCallUniform [tmp, testing] Ignore indirect call's uniform Available
EnableCallWA Control call WA when EU fusion is on. 0: off; 1: on Available
EnableCoalesceScalarMoves Enable scalar moves to be coalesced into fewer moves Available
EnableForceDebugSWSB Enable force debugging functionality for software scoreboard generation Available
EnableGroupScheduleForBC Enable bank conflict reduction in scheduling. Available
EnableHWGenerateThreadID Enable new behavior of HW generating threadID for GPGPU pipe. XeHP and non-OCL only. Available
EnableHWGenerateThreadIDForTileY Enable HW generating threadID for GPGPU pipe for TileY mode. XeHP and non-OCL only. Available
EnableIGAEncoder Enable VISA IGA encoder -
EnableIGASWSB Use IGA for SWSB Available
EnableMathDPASWA PVC math instruction running with DPAS issue -
EnableNonOCLWalkOrderSel Enable WalkOrder selection for HW generating threadID for GPGPU pipe. XeHP and non-OCL only. Available
EnablePassInlineData 1: Force pass 1st GRF of cross-thread payload as inline data; -1: Force disable passing inline data Available
EnablePreemption Enable generating preeemptable code (SKL+) -
EnablePromoteI8 Enable promoting i8 (char) to i16 on all ALU insts that does support i8. It's only for XeHPC+ for now. Available
EnablePromoteI8Vec Control if a certain i8 vector needs to be promoted (detail in code) Available
EnablePvtMemHalfToFloat Enable conversion from half to float for private memory. Available
EnableRemoveLoopDependency Enable removing of fantom loop dependency introduced by SROA Available
EnableQWRotateInstructions Enable QW type support for rotate instructions. PVC only. Available
EnableQuickTokenAlloc Insert dependence resolve for kernel stitching Available
EnableSWSBInstStall Enable force stall to specific(start) instruction start for software scoreboard generation Available
EnableSWSBInstStallEnd Enable force stall to end instruction for software scoreboard generation Available
EnableSWSBStitch Insert dependence resolve for kernel stitching Available
EnableSWSBTokenBarrier Enable force specific instruction as a barrier for software scoreboard generation Available
EnableSendFusion Enable(!=0)/disable(0)/force(2) send fusion. Valid for simd8 shader/kernel only. -
EnableSeparateScratchWA Apply the workaround in slot0 and slot1 sizes when separating scratch spacesSeparate scratch space. Available
EnableSpillSpaceCompression Enable spill space compression. 0 - off, 1 - on, 2 - platform default -
EnableUntypedSurfRWofSS Enable untyped surface RW to scratch space. XeHP A0 only. Available
EnableVISABinary Enable VISA Binary Available
EnableVISABoundsChecking Enable VISA bounds checking. -
EnableVISADebug Runs VISA in debug mode, all optimizations disabled -
EnableVISADotAll Enable VISA DotAll. Dumps dot files for intermediate stages -
EnableVISADumpCommonISA Enable VISA Dump Common ISA Available
EnableVISAJmpi Enable/Disable VISA generating jmpi (scalar jump). -
EnableVISANoBXMLEncoder Enable VISA No-BXML encoder -
EnableVISANoSchedule Enable VISA No-Schedule Available
EnableVISAOutput Enable VISA GenISA output Available
EnableVISAPreSched Enable VISA Pre-RA Scheduler Available
EnableVISASlowpath Enable VISA Slowpath. Needed to dump .visaasm Available
EnableVISAStructurizer Enable/Disable VISA structurizer. See value defs in igc_flags.hpp. -
ExpandPlane Enable pln to mad macro expansion. -
Force32bitConstantGEPLowering Go back to old version of GEP lowering for constant address space. PVC only -
ForceAllowSmallSpill Allow small spills regardless of SIMD, API, or platform. The spill amount is set below -
ForceBCR Force bank conflict reduction, no matter spill or not. Available
ForceHWThreadNumberPerEU Total HW thread number per-EU. -
ForceInlineDataForXeHPC Force InlineData for XeHPC. For testing purposes. Available
ForceNoMaskWA [tmp, testing] Force NoMaskWA on any platforms -
ForcePreemptionWA Force generating preemptable code across platforms Available
ForcePreserveR0 Setting this to true makes VISA preserve r0 in r0 Available
ForcePromoteI8 Force promoting i8 (char) to i16 on all ALU insts (for testing). Available
ForceSubReturn If a subroutine does not have a return, generate a dummy return if this key is set (to meet visa requirement) -
ForceTexelMaskClear If set to 1 or 2, forces evaluate messages to clear the texel mask to 0 or 1, respectively. Available
ForceUniformBuffer Force buffer operand to be uniform -
ForceUniformSurfaceSampler Force surface and sampler operand to be uniform -
ForceVISAPreSched Force enabling of VISA Pre-RA Scheduler -
ForceVISAStructurizer Force VISA structurizer for testing. Used on platforms in which we turns off SCF and use UCF by default -
GlobalSendVarSplit Enable global send variable splitting when we are about to spill -
NewSpillCostFunction Use new spill cost function in VISA RA -
NoMaskWA Enable NoMask WA by using software-computed emask flag -
ReplaceIndirectCallWithJmpi Replace indirect call with jmpi instruction (HW WA) Available
ReservedRegisterNum Reserve register number for spill cost testing. -
SIMD16_SpillThreshold Percentage of instructions allowed for spilling on SIMD16 -
SIMD32_SpillThreshold Percentage of instructions allowed for spilling on SIMD32 -
SIMD8_SpillThreshold Percentage of instructions allowed for spilling on SIMD8 -
SWSBMakeLocalWAR make WAR SBID dependence tracking BB local Available
SWSBTokenNum Total tokens used for SWSB. Available
ScratchSpaceSizeLimit Size limit of scratch space. XeHP and above only. Test only. Remove it once stabalized. Available
ScratchSpaceSizeReserved Reserved size of scratch space. XeHP and above only. Test only. Remove it once stabalized. Available
SeparateSpillPvtScratchSpace Separate scratch spaces for spillfill and privatememory. XeHP and above only. Test only. Remove it once stabalized. Available
SetA0toTdrForSendc Set A0 to tdr0 before each sendc/sendsc Available
SpillCompressionThresholdOverride Set a threshold number (1K based) to run with spill compression -
TotalGRFNum Total GRF setting for both IGC-LLVM and vISA -
TotalGRFNum4CS Total GRF setting for both IGC-LLVM and vISA, for ComputeShader-only experiment. -
UnifiedSendCycle Using unified send cycle. -
Use16ByteBindlessSampler True if 16-byte aligned bindless sampler state is used -
UseLinearScanRA use Linear Scan as default register allocation algorithm -
UseMathWithLUT Use the implementations of cos, cospi, log, sin, sincos, and sinpi with Look-Up Tables (LUT). -
VISALTO vISA LTO optimization flags. check LINKER_TYPE for more details -
VISAOptions Options to vISA. Space-separated options. Available
VISAPostScheduleEndBBID The ID of BB which will be last scheduled -
VISAPostScheduleStartBBID The ID of BB which will be first scheduled -
VISAPreSchedCtrl Configure Pre-RA Scheduler, default(0), logging(1), latency(2), pressure(4) -
VISAPreSchedExtraGRF Bump up GRF number to make pre-RA Scheduling more greedy, 0 for the default -
VISAPreSchedRPThreshold Threshold to commit a pre-RA Scheduling without spills, 0 for the default -
VISAScheduleEndBBID The ID of BB which will be last scheduled -
VISAScheduleStartBBID The ID of BB which will be first scheduled -
WARSWSBLocalEnd WAR localization end BB Available
WARSWSBLocalStart WAR localization start BB Available
disableCompaction Disables compaction. Available
disableIGASyntax Disables GEN isa text output using IGA and new syntax. -

IGC Optimization

Flag Description Release builds
AllowMem2Reg Setting this to true makes IGC run mem2reg even when optimizations are disabled Available
BlockPushConstantGRFThreshold Set the maximum limit for block push constants i.e. UBO data pushed.
Set to 0xFFFFFFFF to use the default threshold for the platform.
Note that for small pixel shaders the PayloadSizeThreshold may be the limiting factor.
-
CodeLoopSinkingMinSize Don't sink in the loop if the number of instructions in the kernel is less -
CodeSinkingLoadSchedulingInstr Instructions number to step to schedule loads in advance before the load use to cover latency. 1 to insert it immediately before use -
CodeSinkingMinSize Don't sink if the number of instructions in the kernel is less -
DisableAttributePush Bit mask to disable push Attribute per shader stages. bit0 = All, Bit 1 = VS, Bit 2 = HS, Bit 3 = DS, Bit 4 = GS -
DisableBranchSwaping Setting this to 1/true adds a compiler switch to disable branch swapping. -
DisableCodeHoisting Setting this to 1/true adds a compiler switch to disable code-hoisting -
DisableCodeSinking Setting this to 1/true adds a compiler switch to disable code-sinking -
DisableCodeSinkingInputVec Setting this to 1/true disable sinking inputVec inst (test) -
DisableConstBaseGlobalBaseArg Do no generate kernel implicit arguments: constBase and globalBase -
DisableConstantCoalescing Setting this to 1/true adds a compiler switch to disable constant coalesing -
DisableConstantCoalescingOfStatefulNonUniformLoads Disable merging non-uniform loads from stateful buffers. Note: does not affect merging to sampler loads -
DisableConstantCoalescingOutOfBoundsCheck Setting this to 1/true adds a compiler switch to disable constant coalesing out of bounds check -
DisableCustomUnsafeOpt Disable IGC to run custom unsafe optimizations -
DisableDX9LowPrecision Disables HF in DX9. -
DisableDotAddToDp4aMerge Disable Dot and Add ops to Dp4a merge optimization. -
DisableDynamicResInfoFolding Disable Dynamic ResInfo Instruction Folding -
DisableDynamicTextureFolding Disable Dynamic Texture Folding -
DisableEmptyBlockRemoval Setting this to 1/true adds a compiler switch to disable empty block optimization -
DisableFDivReassociation Disable reassociation for Fdiv operations to avoid precision difference -
DisableFlattenSmallSwitch Disable the flatten small switch pass -
DisableGatingSimilarSamples Disable Gating of similar sample instructions -
DisableIGCOptimizations Setting this to 1/true adds a compiler switch to disables all the above IGC optimizations -
DisableIPConstantPropagation Disable Inter-procedrual constant propgation -
DisableIRVerification Setting this to 1/true adds a compiler switch to disable IGC IR verification. -
DisableImmConstantOpt Disable IGC IndirectICBPropagaion optimization -
DisableLLVMGenericOptimizations Disable LLVM generic optimization passes -
DisableLoadSinking Setting this to 1/true adds a compiler switch to disable load sinking during retry -
DisableLoopSink Disable sinking in all loops -
DisableLoopSplitWidePHIs Disable splitting of loop PHI values to eliminate subvector extract operations -
DisableLoopUnroll Setting this to 1/true adds a compiler switch to disable loop unrolling. Available
DisableMCSOpt Disable IGC to run MCS optimization -
DisableMatchFloor Setting this to 1/true adds a compiler switch to disable sub-frc = floor optimization -
DisableMatchMad Setting this to 1/true adds a compiler switch to disable mul+add = mad optimization -
DisableMatchPow Setting this to 1/true adds a compiler switch to disable log2/mul/exp2 = pow optimization -
DisableMatchPredAdd Setting this to 1/true adds a compiler switch to disable pred+add = predAdd optimization -
DisableMatchSimpleAdd Setting this to 1/true adds a compiler switch to disable simple cmp+and+add optimization -
DisableMovingInstanceIDIndexOfVS Disable moving index of InstanceID in VS to last location. -
DisablePayloadCoalescing Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for all types -
DisablePayloadCoalescing_AtomicTyped Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for atomic typed only -
DisablePayloadCoalescing_RT Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for RT only -
DisablePayloadCoalescing_Sample Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for Samplers only -
DisablePayloadCoalescing_URB Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for URB writes only -
DisablePromotePrivMem Setting this to 1/true adds a compiler switch to disable IGC private array promotion -
DisablePullConstantHeuristics Disable the heuristics to determine the no. push constants based on payload size. -
DisablePushConstant Bit mask to disable push constant per shader stages. bit0 = All, Bit 1 = VS, Bit 2 = HS, Bit 3 = DS, Bit 4 = GS, Bit 5 = PS -
DisableRectListOpt Disable Rect List optimization -
DisableReducePow Disable IGC to reduce pow instructions -
DisableSIMD32Slicing Setting this to 1/true adds a compiler switch to disable emitting SIMD32 VISA code in slices -
DisableSimplePushWithDynamicUniformBuffers Disable Simple Push Constants Optimization for dynamic uniform buffers. -
DisableSqrtOpt Prevent IGC from doing the optimization y*y = x if y = sqrt(x) -
DisableStaticCheck Disable static check to push constants. -
DisableStaticCheckForConstantFolding Disable static check to fold constants. -
DisableSynchronizationObjectCoalescingPass Disable SynchronizationObjectCoalescing pass -
DisableURBPartialWritesPass Disable IGC pass that converts URB partial writes to full-mask writes. -
DisableURBReadMerge Disable IGC pass that merges URB Read instructions. -
DisableURBWriteMerge Setting this to 1/true adds a compiler switch to disable URB write merge -
DisableUniformAnalysis Setting this to 1/true adds a compiler switch to disable uniform_analysis -
DisableUniformTypedAccess Setting this will disable uniform typed access handling -
DisableUniformURBWrite Disables generation of uniform URB write messages -
EnableAtomicBranch Enable Atomic branch optimization that break atomic into if/else. 1: if Val == 0 ignore iadd/sub/umax 0. 2: checks if memory is lower than Val before doing umax. 3: applies both 1 for iadd/sub and 2 for umax -
EnableBitcastedLoadNarrowing Enable narrowing of vector loads in bitcasts patterns. -
EnableBitcastedLoadNarrowingToScalar Enable narrowing of vector loads to scalar ones in bitcasts patterns. -
EnableBlendToDiscard Enable blend to discard based on blend state. -
EnableBlendToFill Enable blend to fill based on blend state. -
EnableCodeAssumption
If set (> 0), generate llvm.assume to help certain optimizations. It is OCL only for now.
Only 1 and 2 are valid. 2 will be 1 plus additional assumption. It also does other minor changes.
-
EnableCustomLoopVersioning Enable IGC to do custom loop versioning -
EnableDeSSA Setting this to 0/false adds a compiler switch to disable De-SSA -
EnableDeSSAWA [tmp]Keep some piece of code to avoid perf regression -
EnableExtractCommonMultiplier Enable ExtractCommonMultiplier optimization in CustomUnsafeOptPass. -
EnableFastMath Enable fast math optimizations in IGC -
EnableFastSampleD Enable fast sample D opt. -
EnableGEPLSR Enables GEP Loop Strength Reduction pass -
EnableGEPLSRAnyIntBitWidth Experimental: Enables reduction of SCEV with illegal integers. Requires legalization pass to clear up expanded code. Available
EnableGEPLSRToPreheader Enables reduction to loop's preheader in GEP Loop Strength Reduction pass -
EnableGVN Enable LLVM global value numbering -
EnableGenUpdateCB Enable derived constant optimization. -
EnableGenUpdateCBResInfo Enable derived constant optimization with resinfo. -
EnableHighestSIMDForNoSpill When there is no spill choose highest SIMD (compute shader only). -
EnableHoistDp3 Enable dp3 Hoisting. -
EnableHoistMulInLoop Hoist multiply with loop invirant out of loop, FP unsafe -
EnableIndependentSharedMemoryFenceFunctionality Enable treating global memory fences as shared memory fences in SynchronizationObjectCoalescing pass -
EnableIntegerMad Setting this to 1/true adds a compiler switch to enable integer mul+add = mad optimization -
EnableJumpThreading Setting this to 1/true adds a compiler switch to enable llvm jumpThreading pass. Available
EnableLSCFence Enable LSC Fence in ConvertDXIL for the device has LSC -
EnableLoadChainLoopSink Allow sinking of load address calculation when the load was sinked to the loop, even if the needed regpressure is achieved (only single use instructions) -
EnableLoadsLoopSink Allow sinking of loads in the loop -
EnableLogicalAndToBranch Enable convert logical AND to conditional branch -
EnableLoopHoistConstant Enables pass to check for specific loop patterns where variables are constant across all but the last iteration, and hoist them out of the loop. -
EnableNewTileYCheck Enable new TileY check. 0 - off, 1 - on, 2 - platform default -
EnableOptReportLoadNarrowing Generate opt report for narrowing of vector loads. -
EnablePingPongTextureOpt Enables the Ping Pong texture optimization which is used only for Compute Shaders for back to back dispatches -
EnablePlatformFenceOpt Force fence optimization -
EnablePowToLogMulExp Enable pow to exp(log(x)*y) optimization in CustomUnsafeOptPass. -
EnableRobustBufferAccessPush Setting to 1/true will allow a single push buffer to be supported when the client requests robust buffer access (DG2+ only) -
EnableSLMConstProp Enable SLM constant propagation (compute shader only). -
EnableSamplerChannelReturn Setting this to 1/true adds a compiler switch to enable using header to return selective channels from sampler -
EnableSimplePushSizeBasedOpimization Enable the simplepush optimization to do push based on size -
EnableSimplifyGEP Enable IGC to simplify indices expr of GEP. -
EnableSoftwareStencil Enable software stencil for PS. -
EnableSoftwareVertexFetch Enable software vertex fetch for VS. -
EnableSplitIndirectEEtoSel Enable the split indirect extractelement to icmp+sel pass -
EnableSplitUnalignedVector Enable Splitting of unaligned vectors for loads and stores -
EnableStatefulAtomic Enable promoting stateless atomic to stateful atomic. -
EnableStatefulToken Enable generating patch token to indicate a ptr argument is fully converted to stateful (temporary) -
EnableStatelessToStateful Enable Stateless To Stateful transformation for global and constant address space in OpenCL kernels -
EnableSumFractions Enable SumFractions optimization in CustomUnsafeOptPass. -
EnableTextureLoadCoalescing Enable merging non-uniform loads from bindless textures -
EnableThreadCombiningOpt Enables the thread combining optimization which is used only for Compute Shaders for combining a number of software threads to dispatch smaller number of hardware threads -
EnableThreeWayLoadSpiltOpt Enable three way load spilt opt. -
EnableTrigFuncRangeReduction reduce the sin and cosing function domain range Available
EnableUnmaskedFunctions Enable unmaksed functions SYCL feature. Available
EnableWaveForce32 Force Wave to use simd32 -
EnableWorkGroupUniformGoto Setting to 1 enables generating uniform goto for work group uniform [eu fusion only] -
FPRoundingModeCoalescingMaxDistance Max distance in instructions for reordering FP instructions with common rounding mode -
ForceAddressArithSinking Force sinking address arithmetic closer to the usage -
ForceHoistDp3 force dp3 Hoisting. -
ForceLinearWalkOnLinearUAV Force linear walk on linear UAV buffer -
ForceLoadsLoopSink Force sinking of loads in the loop from the beginning -
ForceLoopSink Force sinking in all loops -
ForceSupportsAutoGRFSelection ForceSupportsAutoGRFSelection Available
ForceSupportsStaticRegSharing ForceSupportsStaticRegSharing Available
ForceTileY Force TileY mode on DG2 -
GEPLSRThresholdRatio Ratio for register pressure threshold in GEP Loop Strength Reduction pass -
KeepTileYForFlattened Keep TileY for FlattenedThreadIdInGroup. 0 - off, 1 - on, 2 - platform default -
LLVMCommandLine applies LLVM command line -
LoopSinkMinSave If loop sink can have save more 32-bit values than this Minimum, do it; otherwise, skip -
LoopSinkMinSaveUniform If loop sink can have save more scalar (uniform) values than this Minimum, do it; otherwise, skip -
LoopSinkRegpressureMargin Sink into the loop until the pressure becomes less than #grf-margin -
LoopSinkRollbackThreshold Rollback loop sinking if the estimated regpressure after the sinking is still higher than this + #available registers, and the number of registers can be increased -
LoopSinkThresholdDelta Do loop sink If the estimated register pressure is higher than this + #avaialble registers -
MaxImmConstantSizePushed Set the max size of immediate constant buffer pushed -
PSSIMD32HeuristicFP16 enable PS SIMD32 heuristic based on fp16 characteristic -
PSSIMD32HeuristicLoopAndDiscard enable PS SIMD32 heuristic based on loop info and discard -
PayloadSizeThreshold Set the max payload size threshold for short shades that have PSD bottleneck. -
PrepopulateLoadChainLoopSink Check the loop for loop chains before sinking to use the existing chains in a heuristic -
RovOpt Bitmask for ROV optimizations. 0 for all off, 1 for force fence flush none, 2 for setting LSC_L1UC_L3C_WB, 3 for both opt on -
RuntimeLoopUnrolling Setting this to switch on/off runtime loop unrolling. 0: default (on), 1: force on, 2: force off -
SelectiveHashOptions applies options to hash range via string -
SetBranchSwapThreshold Set the branch swaping threshold. -
SetDefaultTileYWalk Use TileY walk as default for HW generating threadID Available
SetLoopUnrollThreshold Set the loop unroll threshold. Value 0 will use the default threshold. -
SetLoopUnrollThresholdForHighRegPressure Set the loop unroll threshold for shaders with high reg pressure. Value 0 will use the default threshold. -
SetRegisterPressureThresholdForLoopUnroll Set the register pressure threshold for limiting the loop unroll to smaller loops -
SetURBFullWriteGranularity Overrides the minimum access granularity for URB full writes.
Valid values are 0, 16 and 32, value 0 means use default for the platform.
Available
SplitIndirectEEtoSelThreshold Split indirect extractelement cost threshold -
SynchronizationObjectCoalescingConfig Modify the default behavior of SynchronizationObjectCoalescing value is a bitmask bit0 – remove fences in read barrier write scenario Available
UseHDCTypedReadForAllTextures Setting this to use HDC message rather than sampler ld for texture read -
UseHDCTypedReadForAllTypedBuffers Setting this to use HDC message rather than sampler ld for buffer read -
UseTiledCSThreadOrder Use 4x4 disaptch for CS order when it seems beneficial -
WaAllowMatchMadOptimizationforVS Setting this to 1/true adds a compiler switch to enable mul+add = mad optimization for VS -
WaDisableMatchMadOptimizationForCS Setting this to 1/true adds a compiler switch to disable mul+add = mad optimization for CS -
forceFullUrbWriteMask Set Full URB write mask. -
forcePushConstantMode set the push constant mode, 0 is default behavior, 1 is simple push, 2 is gather constant, 3 is none/pull constants -

Shader debugging

Flag Description Release builds
CompileOneAtTime Compile only one kernel (out of many in llvm::module) at a time. Prints compiled kenrels names to stdout. Useful to debug compilation time and crashes - it does not produce valid binary. -
CopyA0ToDBG0 Copy a0 used for extended msg descriptor to dbg0 to help debug -
DPASReadSuppressionWA Enable read suppression WA for the send and indirect access -
DebugInternalSwitch Code pass selection, debug only -
DisablePassToggles Disable each IGC pass by setting the bit. HEXADECIMAL ONLY!. Ex: C0 is to disable pass 6 and pass 7. -
DisableSendSrcDstOverlapWA Disable Send Source/destination overlap WA which is enabled for GEN10/GEN11 and whenever Wddm2Svm is set in WATable -
DumpPayloadToScratch Setting this to 1/true dumps thread payload to scartch space. Used for workloads which doesnt use scartch space for other purposes -
EnableBitcastExtractInsertPattern Enable BitcastExtractInsertPattern in CustomSafeOptPass. Available
EnableCSSIMD32 Enable computer shader SIMD32 mode, and fall back to lower SIMD when spill -
EnableDebugging Enable shader debugging for release internal -
EnableDivergentBarrierCheck Uses WIAnalysis to find barriers in divergent flow control. May have false positives. -
EnableHashMovsAtPrologue Rather than after EOT, insert hash code movs at shader entry Available
EnableLSCFenceUGMBeforeEOT Enable inserting fence.ugm.06.tile before EOT if a kernel has any write to UGM [XeHPC, PVC]. Available
EnableOptionalBufferOffset For StatelessToStateful optimization [OCL], if true, make buffer offset optional. Valid only if buffer offset is supported. Available
EnableRTLSCFenceUGMBeforeEOT [tmp]Enable inserting fence.ugm.06.tile before EOT for RT shader [XeHPC, PVC]. -
EnableRTmaskPso Enable render target mask optimization in PSO opt -
EnableSIPOverride This key forces load of SIP from a a Local File. -
EnableSupportBufferOffset [debugging]For StatelessToStateful optimization [OCL], support implicit buffer offset argument (same as -cl-intel-has-buffer-offset-arg). -
EnableTestIGCBuiltin Enable testing igc builtin (precompiled kernels) using OCL. -
EnableTrivialEmulateSinCos Enable Emulation for Sine and Cosine instructions -
EnableZeroSomeARF If set, insert mov inst to zero a0, acc, etc to assist HW debugging. -
EnablerReadSuppressionWA Enable read suppression WA for the send and indirect access -
ForceCSLeastSIMD Force computer shader to the lowest allowed SIMD mode -
ForceCSSIMD16 Force computer shader SIMD16 mode if allowed, otherwise it will use SIMD32 -
ForceCSSIMD32 Force computer shader SIMD32 mode -
ForceDisableShaderDebugHashCodeInKernel Disable hash code addition to the binary after EOT Available
ForceEmuKind Force emuKind used by PreCompiledFuncImport pass. This flag takes emulation kind value that is defined in EmuKind enum in PreCompiledFuncImport.hpp [TEST ONLY] -
ForceFunctionsToNop Replace functions with immediate return to help narrow down shaders; use with Options.txt. -
ForceLoosenSimd32Occu Control loosenSimd32occu return value. 0 - off, 1 - on, 2 - platform default -
ForceMemoryFenceBeforeEOT Forces inserting SLM or gloabal memory fence before EOT if shader writes to SLM or goblam memory respectively. -
ForcePerThreadPrivateMemorySize Useful for ensuring a certain amount of private memory when doing a shader override. Available
ForceStatelessForQueueT In OCL, force to use stateless memory to hold queue_t*. This is a legacy feature to be removed. -
ForceRecompilation Force RetryManager to make recompilation. -
MSAAClearedKernel Insert the discard code for MSAA_MSC_Cleared kernels. 2/4/8/16 -
PrintVerboseGenericControlFlowLog Forces compiler to print detailed log about additional control flow generated due to a presence of generic memory operations Available
RetryManagerFirstStateId For debugging purposes, it can be useful to start on a particular id rather than id 0. -
RouteByLodHint An integer offset addon to route the resource to HDC on DG2 -
SIPOverrideFilePath This key when enabled with EnableSIPOverride load of SIP from a specified path. -
SToSProducesPositivePointer This key is for StatelessToStateful optimization if the user knows the pointer offset is postive to the kernel argument. -
ShaderDebugHashCode The driver will set a breakpoint in the first instruction of the shader which has the provided hash code.
It works only when the value is different then 0 and SystemThreadEnable is set to TRUE.
Ex: VS_asm2df26246434553ad_nos0000000000000000 , only the LowPart Need
to be Enterd in Registry Ex : 0x434553ad ,i.e Lower 8 Hex Digits of the 16 Digit Hash Code
for Compatibilty Reasons
-
ShaderDebugHashCodeInKernel Add hash code to the binary Available
ShaderDisableOptPassesAfter Will only run first N optimization passes, any further passes will be ignored. This flag can be used to bisect optimization passes. -
ShaderDisplayAllPassesNames Display to console all passes name with their ID and occurrence number. -
ShaderOverride Will override any LLVM shader with matching name in c:\Intel\IGC\ShaderOverride -
ShaderPassDisable Disable specific passes eg. '9;17-19;239-;Error Check;ResolveOCLAtomics:2;Dead Code Elimination:3-5;BreakConstantExprPass:7-'
disable pass 9, disable passes from 17 to 19, disable all passes after 238, disable all occurrences of pass Error Check,
disable second occurrence of ResolveOCLAtomics, disable pass Dead Code Elimination occurrences from 3 to 5,
disable all BreakConstantExprPass after his 6 occurrence
To show a list of pass names and their occurrence set ShaderDisplayAllPassesNames.
Must be used with ShaderDumpEnableAll flag.
-
SystemThreadEnable This key forces software to create a system thread. The system thread may still be created by software even
if this control is set to false.The system thread is invoked if either the software requires
exception handling or if kernel debugging is active and a breakpoint is hit.
-
TestIGCPreCompiledFunctions Enable testing for precompiled kernels. [TEST ONLY] -
ld2dmsInstsClubbingThreshold Do not club more than these ld2dms insts into the new BB during MCSOpt -
manualEnableRSWA Enable read suppression WA for the send and indirect access -

Shader dumping

Flag Description Release builds
AddExtraIntfInfo Will add extra inteference info from .extraintf files from c:\Intel\IGC\ShaderOverride -
DebugDumpNamePrefix Set a prefix to debug info dump filenames(with path) and drop hash info from them (for testing purposes) Available
DumpDeSSA dump DeSSA info into file. Available
DumpHasNonKernelArgLdSt Print if hasNonKernelArg load/store to stderr Available
DumpLLVMIR dump LLVM IR Available
DumpLoopSink Dump debug info in LoopSink -
DumpOCLProgramInfo dump OpenCL Patch Tokens, Kernel/Program Binary Header Available
DumpPatchTokens Enable dumping of patch tokens. Available
DumpResourceLoop dump resource loop detected by ResourceLoopAnalysis Available
DumpTimeStats Timing of translation, code generation, finalizer, etc Available
DumpTimeStatsCoarse Only collect/dump coarse level time stats, i.e. skip opt detail timer for now Available
DumpTimeStatsPerPass Collect Timing of IGC/LLVM passes Available
DumpToCurrentDir dump shaders to the current directory Available
DumpToCustomDir Dump shaders to custom directory. Parent directory must exist. Available
DumpUseShorterName If set, use an internal shader name(_entry_id) in dump file name Available
DumpVariableAlias Dump variable alias info, valid if EnableVariableAlias is on Available
DumpWIA dump WI (uniform) infomation into files in dump directory if set to true -
DumpZEInfoToConsole Dump zeinfo to console Available
ElfDumpEnable dump ELF file Available
ElfTempDumpEnable dump temporary ELF files Available
EnableCapsDump Enable hardware caps dump Available
EnableCisDump Enable cis dump Available
EnableCosDump Enable cos dump Available
EnableKernelNamesBasedHash If set, use kernels' names to calculate the hash. Doesn't work on .cl dump's hash. Will overwrite dumps if multiple modules have the same kernel names. -
EnableLivenessDump Enable dumping out liveness info on stderr. Available
EnableScalarizerDebugLog print step by step scalarizer debug info. Available
EnableShaderNumbering Number shaders in the order they are dumped based on their hashes Available
ForceRPE Force RPE (RegisterEstimator) computation if > 0. If 2, force RPE per inst. Available
InterleaveSourceShader Interleave the source shader in asm dump Available
PrintAfter Take either all or comma/semicolon-separated list of pass names. If set, enable print LLVM IR after the given pass is done (mimic llvm print-after) Available
PrintBefore Take either all or comma/semicolon-separated list of pass names. If set, enable print LLVM IR before the given pass is done (mimic llvm print-before) Available
PrintHexFloatInShaderDumpAsm print floats in hex in asm dump Available
PrintInstOffsetInShaderDumpAsm print instruction offsets as comments in asm dump Available
PrintMDBeforeModule Print metadata of the module at the beginning of the dump. Used for LIT tests. Available
PrintPsoDdiHash Print psoDDIHash in TimeStats_Shaders.csv file Available
PrintToConsole dump to console Available
ProgbinDumpFileName Specify filename to use for dumping progbin file to current dir Available
QualityMetricsEnable Enable Quality Metrics for IGC Available
RPEDumpLevel > 0 : dump info of register pressure estimate on stderr. See igc_flags.hpp level defs. -
ShaderDataBaseStats Enable gathering sends' sizes for shader statistics -
ShaderDataBaseStatsFilePath Path to a file with dumped shader stats additional data e.g. data available during compilation only -
ShaderDumpEnable dump LLVM IR, visaasm, and GenISA Available
ShaderDumpEnableAll dump all LLVM IR passes, visaasm, and GenISA Available
ShaderDumpEnableG4 same as ShaderDumpEnable but adds G4 dumps (0 = off, 1 = some, 2 = all) -
ShaderDumpEnableIGAJSON adds IGA JSON output to shader dumps (0 = off, 1 = enabled, 2 = include def/use info but causes longer compile times) -
ShaderDumpEnableRAMetadata adds RA Metadata file to shader dumps Available
ShaderDumpFilter Only dump files matching the given regex Available
ShaderDumpInstNamer dump all unnamed LLVM IR instruction with variable names 'tmp' which makes easier for shaderoverriding Available
ShaderDumpPidDisable disabled adding PID to the name of shader dump directory Available
ShowFullVectorsInShaderDumps print all elements of vectors in ShaderDumps, can dramatically increase ShaderDumps size Available

Debugging features

Flag Description Release builds
AvoidUsingR0R1 Do not use r0 and r1 as generic usage registers -
BufferBoundsChecking Setting this to 1 (true) enables buffer bounds checking -
DebugInfoEnforceAmd64EM Enforces elf file with the debug infomation to have eMachine set to AMD64 -
DebugInfoValidation Enable optional (strict) checks to detect debug information inconsistencies -
EnableRelocations Setting this to 1 (true) makes IGC emit relocatable ELF with debug info Available
EnableTestSplitI64 Test legalization that split i64 store unnecessarily, to be deleted once test is done[temp] Available
EnableWriteOldFPToStack Setting this to 1 (true) writes the caller frame's frame-pointer to the start of callee's frame on stack, to support stack walk -
ExtraOCLInternalOptions Extra internal options for OpenCL Available
ExtraOCLOptions Extra options for OpenCL Available
ForceAssignRhysicalReg Force assigning dclId to phyiscal reg. Available
ForceSpillVariables comma-separated string, each provide the declare id of variable which will be spilled Available
InitializeAddressRegistersBeforeUse Setting this to 1 (true) initializes address register to 0 before each use -
InitializeRegistersEnable Setting this to 1/true initializes all GRFs, Flag and address registers to 0 at the beginning of the shader -
InitializeUndefValueEnable Setting this to 1/true initializes all undefs in URB payload to 0 -
MetricsDumpEnable Dump IGC Metrics to file *.optrpt in current working directory.
Setting to 0 - disabled, 1 - makes in binary format, 2 - makes in plain-text format.
Available
MinimumValidAddress If it's greater than 0, it enables minimal valid address checking where the threshold is the given value (in hex). -
NoCatchAllDebugLine Don't emit special placeholder instruction to map VISA orphan instructions -
PrintDebugSettings Prints all non-default debug settings -
ShaderDumpTranslationOnly Dump LLVM IR right after translation from SPIRV to stderr and ignore all passes -
StackOverflowDetection Inserts checks for stack overflow when stack calls are used. Available
UseMTInLLD Use multi-threading when linking multiple elf files Available
UseVISAVarNames Make VISA generate names for virtual variables so they match with dbg file Available
UseVMaskPredicate Use VMask as predicate for subspan usage -
UseVMaskPredicateForIndirectMove Use VMask as predicate for subspan usage (indirect mov only) Available
UseVMaskPredicateForLoads Use VMask as predicate for subspan usage (loads only) Available
ZeBinCompatibleDebugging Setting this to 1 (true) enables embed debug info in zeBinary Available
deadLoopForFloatException enable a dead loop if float exception happened -

IGC Features

Flag Description Release builds
AdvCodeMotionControl Control bits to fine-tune advanced code motion -
AdvRuntimeUnrollCount Advanced runtime unroll count -
AllowedSpillRegCount Max allowed spill size without recompile -
CSSpillThreshold2xGRFRetry Spill Threshold for CS to trigger 2xGRFRetry -
CSSpillThresholdNoSLM Spill Threshold for CS SIMD16 without SLM -
CSSpillThresholdSLM Spill Threshold for CS SIMD16 with SLM -
CheckCSSLMLimit Check SLM or threads limit on compute shader to turn on Enable2xGRF on DG2+
0 - off, 1 - SLM limit heuristic, 2 - platform based heuristic (XE2 - threads limit, others - SLM limit)
-
DPEmuNeedI64Emu Double Emulation needs I64 emulation. Unsetting it to disable I64 Emulation for testing. -
DisableCorrectlyRoundedMacros Tmp flag to disable correcly rounded macros for BMG+. This flag will be removed in the future. -
DisableDSDualPatch Setting it to true with enable Single and Dual Patch dispatch mode for Domain Shader -
DisableEarlyOutPatterns Disable optimization trying to create an early out after sampleC messages -
DisableGPGPUIndirectPayload Disable OCL indirect GPGPU payload -
DisableLSCForTypedUAV Forces legacy HDC messages for typed UAV read/write.
Temporary knob for XE2 bringup.
Available
DisableLSCSIMD32TGMMessages Forces splitting SIMD32 typed messages into 2xSIMD16.
Only valid on XE2+.
Available
DisableMemOpt Disable MemOpt, merging load/store Available
DisableMemOpt2 Disable MemOpt2 -
DisableMergeStore [temp]If EnableLdStCombine is on, disable mergestore (memopt) if this is set. Temp key for testing Available
DisablePrefetchToL1Cache Disable prefetch to L1 cache Available
DisablePromoteToDirectAS This key disables the PromoteResourceToDirectAS pass -
DisableRecompilation Disable recompilation, skip retry stage Available
DisableScalarAtomics Disable the Scalar Atomics optimization -
DisableSystemMemoryCachingInGPUForConstantBuffers Disables caching system memory in GPU for loads from constant buffers -
DisableWaSampleLZ Disable The Sample Lz workaround and generate Sample LZ -
DivergentBarrierUniformLoad Optimize loads for spill/fill generated by DivergentBarrier with uniform analysis Available
Enable16BitLDMCS Enable 16-bit ld_mcs on supported platforms Available
Enable2xGRF Enable 2x GRF for high SLM or high threads usage
0 - off, 1 - on, 2 - platform default
-
Enable64BitEmulation Enable 64-bit emulation -
Enable64BitEmulationOnSelectedPlatform Enable 64-bit emulation on selected platforms -
EnableAIParameterCombiningWithLODBias Enable AI parameter combining With LOD Bias parameter. XeHP Available
EnableAdvCodeMotion Enable advanced code motion -
EnableAdvMemOpt Enable advanced memory optimization -
EnableAdvRuntimeUnroll Enable advanced runtime unroll -
EnableCPSMSAAOMaskWA Enable WA which forces rt writes to happen at pixel rate when cps, msaa, and omask are present. Available
EnableCPSOmaskWA Enable workaround for oMask with CPS -
EnableConstIntDivReduction Enables strength reduction on integer division/remainder with constant divisors/moduli Available
EnableDG2LSCSIMD8WA Enables WA for DG2 LSC simd8 d32-v8/d64-v3/d64-v4. [temp, should be replaced with WA id -
EnableDPEmulation Enforce double precision floating point operations emulation on platforms that do not support it natively Available
EnableDivergentBarrierWA Generate continuation code to handle shaders that places barriers in divergent control flow -
EnableDualSIMD8 enable dual SIMD8 on supported platforms Available
EnableExplicitCopyForByVal Enable generating an explicit copy (alloca + memcpy) in a caller for aggregate argumentes with byval attribute Available
EnableFallbackToBindless This key enables fallback to bindless mode on all shaders -
EnableFallbackToStateless This key enables fallback to stateless mode on all shaders -
EnableFunctionPointer Enables support for function pointers and indirect calls -
EnableGASResolver Enable GAS Resolver -
EnableGEPSimplification Enable GEP simplification Available
EnableGen11TwoStackTSG Enable Two stack TSG gen11 feature -
EnableGlobalStateBuffer This key allows stack calls to read implicit args from side buffer. It also emits a relocatable add in VISA. Available
EnableHFpacking Enable HF packing -
EnableHSSinglePatchDispatch Setting this to 1/true enables SIMD8 single-patch dispatch in HullShader. Default is either SIMD8 single patch/dual patch dispatch based on control point count -
EnableImplicitArgAsIntrinsic Use GenISAIntrinsic instructions for supported implicit args instead of passing them as function arguments Available
EnableIndirectCallOptimization Enables inlining indirect calls by comparing function addresses -
EnableInsertingPairedResourcePointer Enable to insert a bindless paired resource address into sampler headers in context of sampling feedback resources Available
EnableIntDivRemCombine Given div/rem pairs with same operands merged; replace rem with mul+sub on quotient; 0x3 (set bit[1]) forces this on constant power of two divisors as well Available
EnableL3FlushForGlobal Enable/disable flushing L3 cache for globals -
EnableLSC Enables the new dataport encoding for LSC messages. Available
EnableLdStCombine Enable load/store combine pass if set to 1 (lsc message only) or 2; bit 3 = 1 [tmp for testing] : enabled load combine (intend to replace memopt) Available
EnableLowerGPCallArg Enable pass to lower generic pointers in function arguments -
EnableLscSamplerRouting Enables conversion of LD to LD_L instructions. -
EnableMadLoopSlice Enables the slicing of mad loops. Available
EnableMaxWGSizeCalculation Enable max work group size calculation [OCL only] Available
EnableMeshSLMCache Enables caching Mesh shader outputs in SLM,
bitmask:
bit0 - cache AND flush mode, enable caching of Primitive Count and Primitive Indices,
bit1 - cache AND flush mode, enable caching of per-vertex outputs,
bit2 - cache AND flush mode, enable caching of per-primitive outputs,
bit3 - mirror mode, if this bit is set bits 0, 1 and 2 are ignored,
enable caching of outputs that are read in the shader
data is only mirrored in SLM
Available
EnableMeshShaderSimdSize Set allowed simd sizes for mesh shader compilation,
bitmask bit0 - simd8, bit1 - simd16, bit2 - simd32,
e.g. 0x7 enables all simd sizes and 0x2 enables only simd16,
valid values are from 0 to 7
ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size,
ignored if ForceMeshShaderSimdSize is set
Available
EnableOCLSIMD16 Enable OCL SIMD16 mode Available
EnableOCLSIMD32 Enable OCL SIMD32 mode Available
EnableOCLScratchPrivateMemory Enable the use of scratch space for private memory [OCL only] Available
EnablePartialEmuI64 Enable the partial I64 emulation for PVC-B, Xe2 Available
EnablePostCullPatchFIFOHP Enable Post-Cull Patch Decoupling FIFO. XeHP. Available
EnablePostCullPatchFIFOLP Enable Post-Cull Patch Decoupling FIFO. GEN12LP. Available
EnablePreRARematFlag Enable PreRA Rematerialization of Flag -
EnablePromotionToSampleMlod Enables promotion of sample and sample_c to sample_mlod and sample_c_mlod instructions when min lod is present -
EnableReadGTPinInput Enables setting GTPin context flags by reading the input to the compiler adapters -
EnableRecursionOpenCL Enable recursion with OpenCL user functions -
EnableSIMD16ForNonWaveXe2 Enable SIMD16 for Xe2 if the shader doesn't have wave -
EnableSIMD16ForXe2 Enable SIMD16 for Xe2 -
EnableSIMDVariantCompilation Enables compiling kernels in variant SIMD sizes -
EnableSMRescheduling Change instruction order to enable extra Sample Multiversioning cases -
EnableSampleBMLODWA Enable workaround for sample_b messages that use the mlod parameter -
EnableSampleDEmulation Enable emulation of sample_d. Available
EnableSampleDEmulationForTesting Enable emulation of sample_d on pre-XeHP platforms. Available
EnableSamplerSupport Enables sampler messages generation for PVC. Available
EnableScalarTypedAtomics Enable the Scalar Typed Atomics optimization -
EnableScratchMessageD64WA Enables WA to legalize D64 scratch messages to D32 -
EnableSelectiveScalarizer enable selective scalarizer on GPGPU path Available
EnableSingleVertexDispatch Vertex Shader Single Patch Dispatch Regkey -
EnableTaskShaderSimdSize Set allowed simd sizes for task shader compilation,
bitmask bit0 - simd8, bit1 - simd16, bit2 - simd32,
e.g. 0x7 enables all simd sizes and 0x2 enables only simd16,
valid values are from 0 to 7
ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size,
ignored if ForceMeshShaderSimdSize is set
Available
EnableTileYForExperiments Enable TileY heuristics for experiments -
EnableTypeDemotion Enable Type Demotion -
Enable_Wa14010017096 Enable Wa_14010017096 regardless of the platfrom stepping Available
Enable_Wa1507979211 Enable Wa_1507979211 regardless of the platfrom stepping Available
Enable_Wa1807084924 Enable Wa_1807084924 regardless of the platfrom stepping Available
Enable_Wa22010487853 Enable Wa_22010487853 regardless of the platfrom stepping Available
Enable_Wa22010493955 Enable Wa_22010493955 regardless of the platfrom stepping Available
Force32BitIntDivRemEmu Force 32-bit Int Div/Rem emulation using fp64, ignored if no native fp64 support Available
Force32BitIntDivRemEmuSP Force 32-bit Int Div/Rem emulation using fp32, ignored if Force32BitIntDivRemEmu is set and actually used Available
ForceDPEmulation Force double emulation for testing purpose -
ForceFFIDOverwrite Force overwriting ffid in sr0.0 -
ForceFormatConversionDG2Plus Forces SW image format conversion for R10G10B10A2_UNORM, R11G11B10_FLOAT, R10G10B10A2_UINT image formats on DG2+ platforms Available
ForceI64DivRemEmu Forces specific int64 div/rem emulation: 0 = platform default, 1 = int based, 2 = SP based, 3 = DP based -
ForceMeshShaderSimdSize Force mesh shader simd size,
valid values are 0 (not set), 8, 16 and 32
ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size
Available
ForceNoLSC Disables the new dataport encoding for LSC messages. Available
ForceOCLSIMDWidth Force using SIMD width specified. 0 : no forcing. This overrides driver forced SIMD value(if any) and runtime behaviour could be different if driver expects something fixed Available
ForcePrefetchToL1Cache Forces standard builtin prefetch to use L1 cache Available
ForceSPDivEmulation Force SP Div emulation for testing purpose -
ForceStaticToDynamic Force write of vertex count in GS -
ForceTaskShaderSimdSize Force task shader simd size,
valid values are 0 (not set), 8, 16 and 32
ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size
Available
ForceXYZworkGroupWalkOrder Force X/Y/Z WorkGroup walk order Available
HoistPSConstBufferValues Hoists up down converts for contant buffer accesses, so they an be vectorized more easily. -
LICMStatThreshold LICM stat threshold to avoid retry SIMD16 for CS -
LateInlineUnmaskedFunc Postpone inlining of Unmasked functions till end of CG to avoid code movement inside/outside of unmasked region -
LscForceSpillNonStackcall Non-stack call kernels that spill will use LSC on DG2+ Available
LscImmOffsMatch
Match address patterns that have an immediate offset for the vISA LSC API
(0 means off/no matching,
1 means on/match for supported platforms (Xe2+) and APIs,
2 means force on for all platforms (vISA will emulate the addition if HW lacks support) and APIs;
also see LscImmOffsVisaOpts
Available
LscImmOffsVisaOpts
This maps to vISA_lscEnableImmOffsFor
(enables/disables immediate offsets for various address types;
see that option for semantics)
Available
MaxLiveOutThreshold Max LiveOut Threshold in MemOpt2 -
MaxLoadVectorSizeInBytes [LdStCombine] the max non-uniform vector size for the coalesced load. 0: compiler choice (default, 16(4DW)); others: 4/8/16/32 Available
MaxStoreVectorSizeInBytes [LdStCombine] the max non-uniform vector size for the coalesced store. 0: compiler choice (default, 16(4DW)); others: 4/8/16/32 Available
MemOptGEPCanon [test] GEP canonicalization in MemOpt. 0 : enable; 1: disable; 2: disable only for OCL; Available
OCLEnableReassociate Enable reassociation Available
OCLSIMD16SelectionMask Select SIMD 16 heuristics. Valid values are 0, 1, 2 and 3 -
OverrideDeviceIdForWA Enable this to override DeviceId -
OverrideProductFamilyForWA Enable this to override the product family, get the correct enum from igfxfmid.h -
OverrideRevIdForWA Enable this to override the stepping/RevId, default is a0 = 0, b0 = 1, c0 = 2, so on... -
RemoveLegacyOCLStatelessPrivateMemoryCases Remove cases where OCL uses stateless private memory. XeHP and above only! [OCL only] Available
SampleMultiversioning Create branches aroung samplers which can be redundant with some values -
SelectiveLoopUnrollForDPEmu Setting this to 0/false disable selective loop unrolling for DP emu. Available
SendMultipleSIMDModesCS Send multiple SIMD modes for CS -
SkipPsSimdWithDualSimd Setting it to values def in igc.h will force SIMD mode to skip if the dual-SIMD8 kernel exists Available
TestGEPSimplification [Test] Testing GEP simplification without actually lowering GEP. Used in lit test -
UniformMemOpt4OW increase uniform memory optimization from 2 owords to 4 owords Available
allowLICM Enable LICM in IGC. Available
allowDecompose2DBlockFuncs Enable decomposition of 2D block intrinsics in IGC. Available
allowImmOff2DBlockFuncs Allow compiler to decide to use immediate offsets in 2D block intrinsics in IGC. Available

Performance experiments

Flag Description Release builds
AddNoInlineToTrimmedFunctions Tell late passes not to inline trimmed functions -
AllocaRAPressureThreshold The threshold for the register pressure potential -
AllocateZeroInitializedVarsInBss Allocate zero initialized global variables in .bss section in ZEBinary Available
AllowNonLoopConstantPromotion Allows promotion for constants not in loop (e.g. used once) -
AllowStackCallRetry Enable/Disable retry when stack function spill. 0 - Don't allow, 1 - Allow retry on kernel group, 2 - Allow retry per function -
BlockFrequencySampling Use block frequencies to derive a distribution Available
ByPassAllocaSizeHeuristic Force some Alloca to pass the pressure heuristic until the given size Available
CodePatch Enable Pixel Shader code patching to directly emit code after stitching -
CodePatchExperiments Experiment with code patching when != 0 -
CodePatchFilter Filter out unsupported patterns -
CodePatchLimit Debug CodePatch via limiting the number of shader been patched -
ConstantPromotionCmpSelSize Array size threshold for cmp-sel transform -
ConstantPromotionSize Threshold in number of GRFs -
ControlInlineImplicitArgs Avoid trimming functions with implicit args Available
ControlInlineTinySize Tiny function size for controlling kernel total size Available
ControlInlineTinySizeForSPGT Tiny function size for controlling kernel total size Available
ControlKernelTotalSize Control kernel total size Available
ControlUnitSize Control compilation unit size by unit trimming Available
DelayEmuInt64AddLimit Delay emulating Int64 Add operations in vISA -
DetectCastToGAS Check if the module contains local/private to GAS (Gerneric Address Space) cast, it also check internal flags Available
DiableWaSamplerNoMask Disable WA DiableWaSamplerNoMask -
DisableAddingAlwaysAttribute Disable adding always attribute Available
DisableCSContentCheck Disable CS content check that can force SIMD32 Available
DisableDualBlendSource Force the compiler to never use dual blend source messages -
DisableFDIV Disable fdiv support -
DisableFastMathConstantHandling Disable Fast Math Constant Handling Available
DisableFastRAWA Disable Fast RA for hanging issues on large workloads -
DisableFastestGopt Disable global optimizations for stage 1 shaders. -
DisableFastestLinearScan Disable LinearScanRA in FastestSIMD. -
DisableUndefAlphaOutputAsRed Disable output red for undefined alpha output -
DisableWaDisableSIMD16On3SrcInstr Disable C0 WA WaDisableSIMD16On3SrcInstr, may be unsafe -
DisableWaSendSEnableIndirectMsgDesc Disable a C0 WA WaSendSEnableIndirectMsgDesc, may be unsafe -
DisbleLocalFences On CNL+ we need to emit local fences. Setting this to true removes those. It may be functionaly not correct. -
DispatchAlongY_XY_ratio min threshold for thread group size x / y for dispatchAlongY -
DispatchAlongY_X_threshold min threshold for thread group size x for dispatchAlongY -
DispatchGPGPUWalkerAlongYFirst 0 = No SW Y-walk, 1 = Dispatch GPGPU walker along Y first -
DownConvertI32Sampler Convert i32 sampler messages to return i16.
This optimization can only be enabled for resources with 16bit integer format
or if it is known that the upper 16bits of data is always 0.
-
DumpRegPressureEstimate Dump RegPressureEstimate to a file -
DumpRegPressureEstimateFilter Only dump RegPressureEstimate for functions matching the given regex -
EmitPreDefinedForAllFunctions When enabled, pre-defined variables for gid, grid, lid are emitted for all functions. This causes those functions to be inlined even when stack calls is enabled. Available
EmulateFDIV Emulate fdiv instructions -
EmulationFunctionControl FunctionControl on some DP emulation functions. It has the same value as FunctionControl. Available
EnableA64WA Guarantee A64 load/store addres-hi is uniform Available
EnableAccSub Enable accumulator substitution -
EnableByValStructArgPromotion If enabled, byval/sret struct arguments are promoted to pass-by-value if possible. Available
EnableConstantPromotion Enable global constant data to register promotion -
EnableDisableMidThreadPreemptionOpt Disable mid thread preemption -
EnableEvaluateSamplerSplit Split evaluate messages to sampler into either SIMD8 or SIMD1 messages -
EnableExtractMask When enabled, it is mostly for reducing response size of send messages. -
EnableFastestSingleCSSIMD Enable selecting single CS SIMD in staged compilation. -
EnableForceGroupSize Enable forcing thread Group Size ForceGroupSizeX and ForceGroupSizeY -
EnableForceThreadCombining Enable forcing Thread Combining with thread Group Size ForceGroupSizeX and ForceGroupSizeY -
EnableFunctionCloningControl If enabled, limits function cloning by converting stackcalls to indirect calls based on the FunctionCloningThreshold value. Available
EnableGPUFenceScopeOnSingleTileGPUs Allow the use of GPU fence scope on single-tile GPUs. By default the TILE scope is used instead of GPU scope on single-tile GPUs. Available
EnableGSURBEntryPadding Enable padding of GS URB Entry by adding extra portions of Control Data Header. -
EnableGSVtxCountMsgHalfCLSize Enable the Vertex Count msg of half CL size, instead of 1DW size. -
EnableGather4cpoWA Enable WA transforming gather4cpo/gather4po into gather4c/gather4 -
EnableGreedyTrimming Find the optimal set of functions to trim Available
EnableHalfPromotion Enable pass that replaces instructions using halfs with corresponding float counterparts for pre-SKL -
EnableInsertElementScalarCoalescing Enable coalescing on the scalar operand of insertelement -
EnableIntelFast Enable intel fast, experimental flag. -
EnableLTO Enable link time optimization -
EnableLTODebug Enable debug information for LTO Available
EnableLeafCollapsing Collapse leaf functions in order to avoid trimming small leaf functions Available
EnableLocalIdCalculationInShader Enables calcualtion of local thread IDs in shader. Valid only in compute
shaders on XeHP+. IDs are calculated only if HW generated IDs cannot be
used.
Available
EnableMixIntOperands Enable generating mix-sized operands for int ALU -
EnableOptReportPrivateMemoryToSLM [POC] Generate opt report file for moving private memory allocations to SLM. -
EnablePreRAAccSchedAndSub Enable accumulator substitution -
EnablePrivMemNewSOATranspose 0 : disable new algo; 1 and up : enable new algo.
1 : enable new algo just for array of struct;
2 : 1 plus new algo for array of dw[xn]/qw[xn],etc
3 : 2 plus new algo for array of complicated struct.
Available
EnableProgrammableOffsetsMessageBitInHeader Use pre-delta feature (legacy) method of passing MSB of PO messages opcode. -
EnableReusingLSCStoreConstPayload Enable reusing LSC stores const payload -
EnableReusingXYZWStoreConstPayload Enable reusing XYZW stores const payload -
EnableSOAPromotionDisablingHeuristic Enable heuristic to disable SOA promotion when it may be not beneficial -
EnableSamplerSplit Split Sampler 3d message to odd and even -
EnableSizeContributionOptimization Put more weight on a function when the potential size contirubion is big Available
EnableStackCallFuncCall If enabled, the default function call mode will be set to stack call. Otherwise, subroutine call is used. -
EnableTCSHWBarriers Enable TCS pass with HW barriers support. Default TCS pass is TCS pass with multiple continuation functions. -
EnableTEFactorsClear Enable clearing of tessellation factors. -
EnableTEFactorsPadding Enable padding of the TE factors. -
EnableThreadCombiningWithNoSLM Enable thread combining opt for shader without SLM -
EnableTrackPtr Track Staging Context alloc/dealloc -
EnableVariableAlias Enable variable aliases (part of VariableReuse Pass, but separate functionality) -
EnableVariableReuse Enable local variable reuse -
EnableVector8LoadStore Enable Vectorizer to generate 8x32i and 4x64i loads and stores Available
ExcludeIRFromZEBinary Exclude IR sections from ZE binary Available
ExpandedUnitSizeThreshold Trimming target of compilation unit size Available
ExtraRetrySIMD16 Enable extra simd16 with retry for STAGE1_BEST_PREF -
FastCompileRA Provide the fast compilatoin path for RA, fail safe at first iteration -
FastSpill fast spill code gen. This may produce worse equality code for the spilling shader -
FastestS1Experiments Select configs for fastest compilation by bits. -
FirstStagedSIMD Force Pixel shader to be 1: FastSIMD (SIMD8), 2: BestSIMD (SIMD16 or SIMD8), 3: FatestSIMD (SIMD8 opt off) -
ForceAddingStackcallKernelPrerequisites Force adding static overhead for stackcall to the kernel entry such as HWTID instructions for experiments Available
ForceAllPrivateMemoryToSLM [POC] Force moving all private memory allocations to SLM. -
ForceBestSIMD Force pixel shader to return the best SIMD, either SIMD16 or SIMD8. -
ForceDisableSrc0Alpha Force the compiler to skip sending src0 alpha. Only works if we are sure alpha to coverage and alpha test is off -
ForceFastestSIMD Force PS, CS, VS to return lowest possible SIMD as fast as possible. -
ForceFastestSingleCSSIMD Force selecting single CS SIMD in staged compilation on unsupported platforms. -
ForceGroupSizeShaderHash Shader hash for forcing thread group size or thread combining (lower 8 hex digits) -
ForceGroupSizeX force group size along X -
ForceGroupSizeY force group size along Y -
ForceHalfPromotion Force enable pass that replaces instructions using halfs with corresponding float counterparts -
ForceInlineExternalFunctions not to trim functions called from multiple kernels Available
ForceInlineStackCallWithImplArg If enabled, stack calls that uses implicit args will be force inlined. Available
ForceLowestSIMDForStackCalls If enabled, compile to the lowest allowed SIMD mode when stack calls or indirect calls are present Available
ForceMCFBarriers Force TCS pass with MCF (SW) barriers support. Default TCS pass is TCS pass with multiple continuation functions. -
ForceMixMode force enable mix mode even on platforms that do not support it -
ForceNoFP64bRegioning force regioning rules for FP and 64b FPU instructions -
ForceNoInfiniteLoops Limit # of loop iterations to UINT_MAX in while/for loops. Can be used to detect infinite loops in shaders -
ForceNonCoherentStatelessBTI Enable gneeration of non cache coherent stateless messages -
ForcePixelShaderSIMDMode Setting it to values def in igc.h will force SIMD mode compilation for pixel shaders. Note that only SIMD8 is compiled unless other ForcePixelShaderSIMD* are also selected. 1-SIMD8, 2-SIMD16,4-SIMD32 -
ForcePrivateMemoryToGlobalOnGeneric Force moving private memory allocations to global buffer when generic pointer is present Available
ForcePrivateMemoryToSLMOnBuffers [POC] Force moving private memory allocations to SLM, semicolon-separated list of buffers. -
ForceSWCoalescingOfAtomicCounter Force software coalescing of atomic counter -
ForceScratchSpaceSize Override Scratch Space Size in bytes for perf testing -
ForceSendsSupportOnSKLA0 Allow sends on SKL A0, may be unsafe -
FunctionCloningThreshold Limits the number of cloned functions when called from multiple function groups.
If number of cloned functions exceeds the threshold, compile the function only once and use address relocation instead.
Setting this to '0' allows IGC to choose the default threshold.
Available
FunctionControl Control function inlining/subroutine/stackcall. See value defs in igc_flags.hpp. Available
FuseResourceLoop Enable fusing resource loops -
FuseTypedWrite Enable fusing of simd8 typed write -
HPCFastCompilation Force to do fast compilation for HPC kernel -
HPCGlobalInstNumThreshold The threshold for the register pressure potential -
HPCInstNumThreshold The threshold for the register pressure potential -
HasDoubleAcc has doubled accumulators -
HybridRAWithSpill Did Hybrid RA with Spill -
InlinedEmulationThreshold Inlined instruction threshold for enabling subroutines -
JointMatrixLoadStoreOpt Selects subgroup (0), or block read/write (1), or optimized block read/write (2), 2d block read/write (3) implementation of Joint Matrix Load/Store built-ins Available
KernelTotalSizeThreshold Trimming target of kernel total size Available
LTOForStage1Compilation LTO for stage 1 compilation -
LimitConstantBuffersPushed Limit max number of CBs pushed when SupportIndirectConstantBuffer is true -
MSAA16BitPayloadEnable Enable support for MSAA 16 bit payload , a hardware DCN supporting this from ICL+ to improve perf on MSAA workloads -
MemCpyLoweringUnrollThreshold Min number of mem instructions that require non-unrolled loop when lowering memcpy -
MemOptWindowSize Size of the window in unit of instructions in which load/stores are allowed to be coalesced. Keep it limited in order to avoid creating long liveranges. Default value is 150 -
MetricForKernelSizeReduction Set 1 to active a normal distribution, 2 a long-tail distribution, and 4 an average% Available
MidThreadPreemptionDisableThreshold Threshold to disable mid thread preemption -
NewSOATransposeForOpenCL If true, EnablePrivMemNewSOATranspose only applies to OpenCL kernels. For testing purpose Available
NumGeneralAcc set the number [1-8] of general acc for accumulator substitution. 0 means using the platform-default value -
OCLInlineThreshold Setting OCL inline thershold Available
OverrideCsTileLayout Override compute walker tile layout. False is linear. True is TileY Available
OverrideCsTileLayoutEnable Enable overriding compute walker tile layout Available
OverrideCsWalkOrder Override compute walker walk order Available
OverrideCsWalkOrderEnable Enable overriding compute walker walk order Available
OverrideOCLMaxParamSize Override the value imposed on the kernel by CL_DEVICE_MAX_PARAMETER_SIZE. Value in bytes, if value==0 no override happens. Available
ParameterForColdFuncThreshold C/10-STD for a normal distribution / low K% for a long-tail distribution Available
PartitionUnit Partition compilation unit Available
PartitionWithFastHybridRA Enable FastRA and HybridRA when partition is enabled Available
PixelShaderDoNotAbortOnSpill Do not abort on a spill -
PrintControlKernelTotalSize Print Control kernel total size Available
PrintControlUnitSize Print information about unit trimming Available
PrintFunctionSizeAnalysis Print analysis data of function sizes Available
PrintPartitionUnit Print information about compilation unit partitioning Available
PrintStackCallDebugInfo Print all debug info to command line related to stack call debugging Available
PrintStaticProfileGuidedKernelSizeReduction Print information about static profile-guided trimming and partitioning Available
PrintStaticProfileGuidedSpillCostAnalysis Print debug messages for profile embedding Available
RegPressureVerbocity Different printing types -
RematAddrSpaceCastToUse Allow rematerialization of inttoptr that are used inside AddrSpaceCastInst -
RematAllowExtractElement Allow Extract Element to computation chain -
RematAllowLoads Remat allow to move loads, no checks, exclusively for testing purposes -
RematAllowOneUseLoad Remat allow to move loads that have one use and it's inside the chain -
RematCallsOperand Allow rematerialization of inttoptr that are used as call's operand -
RematChainLimit If number of instructions we've collected is more than this value, we bail on it -
RematEnable Enable clone adress arithmetic pass not only on retry -
RematFlowThreshold Proportion of the whole rematerialization targets to cutoff remat chain -
RematInstCombineBefore Enable short sequence of passes before clone address arithmetic pass to potentially decrese amount of operations that will be rematerialized -
RematLog Dump Remat Log, usefull for analyzing spills as well -
RematRPELimit Cutoff value for register estimator, lower than that, kernel won't be rematted -
RematReassocBefore Enable short sequence of passes before clone address arithmetic pass to potentially decrese amount of operations that will be rematerialized -
RematRespectUniformity Cutoff computation chain on uniform values -
RematSameBBScope Confine rematerialization only to variables within the same BB, we won't pull down values from predeccors -
RequestStage2 Enable staged compilation via requesting stage 2 -
RetryRevertExcessiveSpillingKernelCoefficient Sets the coefficient for Retry Manager to know whether we should revert back to a previously compiled kernel -
RetryRevertExcessiveSpillingKernelThreshold Sets the threshold for Retry Manager to know which kernel is considered as Excessive Spilling and applies different set of rules -
SSOShifter Adjust ScratchSurfaceOffset with shl(hwtid, shifter). 0 menas disabling padding -
SaveRestoreIR Save/Restore IR for staged compilation to avoid duplicated compilations -
SelectiveFastRA Apply fast RA with spills selectively using heuristics Available
SelectiveFunctionControl Selectively enables FunctionControl for a list of line-separated function names in 'FunctionDebug.txt' in the IGC output dir.
When set by this flag, the functions in the FunctionDebug list will override the default FunctionControl mode.
0 - Disable, 1 - Enable and read from FunctionDebug.txt, 2 - Print all callable functions to FunctionDebug.txt
See comments in ProcessFuncAttributes.cpp for how to use this flag.
Available
SelectiveTrimming Choose a specific function to trim Available
SkipPaddingScratchSpaceSize Skip adding padding when estimated scratch space size is smaller than or equal to this value -
SkipTREarlyExitCheck Skip SIMD16 early exit check in ShaderCodeGen -
SkipTrimmingOneCopyFunction Don't trim a function whose size contribution is no more than its size Available
StagedCompilationExperiments Experiment with staged compilation when != 0 -
StaticProfileGuidedPartitioning Enable static analysis in the partitioning algorithm. Available
StaticProfileGuidedSpillCostAnalysis Use static profile information to estimate spill cost,
1 for profile generation, 2 for profile transfer, 4 for profile embedding,
8 for spill computation, and 16 for enabling frequency-based spill selection
Available
StaticProfileGuidedSpillCostAnalysisFunc Spill cost function where 0 is based on a new spill cost and 1 the existing one Available
StaticProfileGuidedSpillCostAnalysisScale Scale adjustment for static profile guided spill cost analysis Available
StaticProfileGuidedTrimming Enable static analysis in the kernel trimming Available
StripDebugInfo Strip debug info from llvm IR lowered from input to IGC .
Possible values: 0 - dont strip, 1 - strip all, 2 - strip non-line info
-
SubroutineInlinerThreshold Subroutine inliner threshold -
SubroutineThreshold Minimal kernel size to enable subroutines -
UnitSizeThreshold Compilation unit size threshold Available
UpConvertF16Sampler up-convert fp16 sampler message to return fp32 -
UseFrequencyInfoForSPGT Consider frequency information for trimming functions Available
UseOldSubRoutineAugIntf Use the old subroutine augmentation code which is slower -
VFPackingDisablePartialElements disable packing for partial vertex element as it causes performance drops -
VariableReuseByteSize The byte size threshold for variable reuse -
VectorAlias Vector aliasing control under EnableVariableAlias. Some features are still experimental Available
VectorAliasBBThreshold Max number of BBs of a function that VectorAlias will apply. VectorAlias will skip for funtions beyond this threshold Available
ScalarAliasBBSizeThreshold Max size of BB for which scalar aliasing will apply. Scalar aliasing will skip for BBs beyond this threshold Available
cl_khr_srgb_image_writes Enable cl_khr_srgb_image_writes extension -
disableRemat disable re-materialization -
disableUnormTypedReadWA disable software conversion for UNORM surface in Dx10 -
disableVarSplit disable variable splitting -
forceGlobalRA force global register allocator -
forceSamplerHeader force sampler messages to use header -
samplerHeaderWA enable sampler header to solve HW WA -

Generating precompiled headers

Flag Description Release builds
ApplyConservativeRastWAHeader Apply WaConservativeRasterization for the platforms enabled -

Raytracing Options

Flag Description Release builds
ContinuationInlineThreshold If number of continuations is greater than threshold, default to indirect Available
DeferCollectionStateObjectCompilation Wait to compile till the RTPSO stage Available
DisableCanonizationWA WA for A0 to inject shifts to canonize global and local pointers Available
DisableCompactifySpills Just emit spill/fill at the point of def/use Available
DisableCrossFillRemat Rematerialize values if they use already spilled values Available
DisableDPSE Disable Dead PayloadStore Elimination. Available
DisableEarlyRemat Disable quick remats to avoid some spills Available
DisableEntryFences Don't emit the evict and invalidate fences for A0 WA -
DisableExamineRayFlag Don't do IPO to see if we can fold control flow given knowledge of possible rayflag values -
DisableFuseContinuations If set, we will look for small duplicated continuations to merge into one. Available
DisableInvalidateRTStackAfterLastRead Disables L1 cache invalidation after the last read of the RT stack. Affects rayqueries only Available
DisableInvariantLoad Disabled !invariant_load metadata for raytracing shaders Available
DisableLSCControlsForRayTracing Disable different LSC Controls for HW and SW portions of the RTStack Available
DisableLateRemat Disable quick remats to avoid some spills Available
DisableMatchRegisterRegion Disable matching for debug purposes Available
DisablePayloadSinking sink stores to payload into inlined continuations Available
DisablePreSplitOpts Disable last minute optimizations befoer shader splitting Available
DisablePredicatedStackIDRelease Emit a single stack ID release at the end of the shader Available
DisablePrepareLoadsStores Disable preparation for MemOpt Available
DisableProceedBasedApproachForRayQueryDynamicRayManagementMechanism Disables proceed based approach for dynamic ray management mechanism Available
DisablePromoteContinuation BTD-able continuations in the raygen may be moved to the shader identifier -
DisablePromoteToScratch Use scratch space rather than SWStack when possible. Available
DisableRTAliasAnalysis Disable Raytracing Alias Analysis -
DisableRTBindlessAccess do bindful rather than bindless accesses to raytracing memory Available
DisableRTFenceElision Disable optimization to remove unneeded fences -
DisableRTGlobalsKnownValues load MaxBVHLevels from RTGlobals rather than assumming = 2 Available
DisableRTMemDSE Analyze stores to SWStack, etc. that aren't read before Stack ID Release -
DisableRTRetryPickBetter Disables raytracing retry to pick the best compilation instead of always using the retry compilation. -
DisableRTStackOpts Disable some optimizations that minimize reads/writes to the RTStack Available
DisableRayQueryDynamicRayManagementMechanism Dynamic ray management mechanism for Synchronous Ray Tracing Available
DisableRayQueryDynamicRayManagementMechanismForBarriers Disable dynamic ray management mechanism for shaders with barriers Available
DisableRayQueryDynamicRayManagementMechanismForExternalFunctionsCalls Disable dynamic ray management mechanism for shaders with external functions calls Available
DisableRayTracingConstantCoalescing Disable coalescing Available
DisableRayTracingOptimizations Disable RayTracing Optimizations for debugging Available
DisableRaytracingIntrinsicAttributes Turn off noalias and dereferenceable attributes Available
DisableSWStackOffsetElision Avoid loading offseting when known at compile-time -
DisableShaderFusion Don't check for duplicate, renamed shaders -
DisableSpillReorder Disables reordering of spills to try to minmize spills in a loop -
DisableStatefulRTStackAccess do stateless rather than stateful accesses to the HW portion of the async stack Available
DisableStatefulRTSyncStackAccess do stateless rather than stateful accesses to the HW portion of the sync stack Available
DisableStatefulRTSyncStackAccess4RTShader do stateless rather than stateful accesses to the HW portion of the sync stack. RT Shader only. Available
DisableStatefulRTSyncStackAccess4nonRTShader do stateless rather than stateful accesses to the HW portion of the sync stack. nonRT Shader only. Available
DisableStatefulSWHotZoneAccess do stateless rather than stateful accesses to the SW HotZone Available
DisableStatefulSWStackAccess do stateless rather than stateful accesses to the SW Stack Available
DisableWideTraceRay Disable SIMD16 style message payloads for send.rta Available
EnableCompressedRayIndices Use an alternate form with bit twiddling to pack stack pointer and indices into two DWORDs Available
EnableFillScheduling Schedule fills for reduced register pressure -
EnableHoistRemat Hoist rematerialized instructions to shader entry. Longer live ranges but common values fused. Available
EnableIndirectContinuations Enable BTD for continuation shaders (regardless of inline threshold). Available
EnableInlinedContinuations Forcibly inline all continuations Available
EnableKnownBTIBase For testing, assume that we know what baseBTI is in RTGlobals Available
EnableLSCCacheOptimization Optimize store instructions for utilizing the LSC-L1 cache -
EnableOuterLoopHoistingForRayQueryDynamicRayManagementMechanism Disable dynamic ray management mechanism for shaders with barriers Available
EnableRQHideLatency Hide RayQuery Proceed latency. -
EnableRTDispatchAlongY Dispatch Compute Walker along Y first Available
EnableRTPrintf Enable printf for ray tracing. Available
EnableRayTracingTGMFence Enable tgm fence in RT workloads for debugging -
EnableSingleRQMemRayStore Store RayQuery MemRay[TOP] only once. -
EnableStackIDReleaseScheduling Schedule Stack ID Release messages prior to the end of the shader -
EnableSyncDispatchRays Enable sync DispatchRays implementation -
ForceCSLeastSIMD4RQ Force computer shader with RayQuery to the lowest allowed SIMD mode -
ForceCSSimdSize4RQ Force RayQuery compute shader simd size,
valid values are 0 (not set), 8, 16 and 32
ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size
Available
ForceFirstFencesEvict Force evict fence op on fences prior to the stack ID release Available
ForceGenMemDefaultCacheCtrl If enabled, no message specific cache ctrls are set on memory outside of RTStack, SWStack, and SWHotZone Available
ForceGenMemLoadCacheCtrl Enables GenMemLoadCacheCtrl regkey for custom lsc load cache controls in other memory Available
ForceGenMemStoreCacheCtrl Enables GenMemStoreCacheCtrl regkey for custom lsc store cache controls in other memory Available
ForceIndirectCallsInSyncDispatchRays Will skip direct calls in synchronous raytracing and immediately call raytracing shaders via KSP shader ptr -
ForceInliningTraceRayCallsInSyncDispatchRays Will inline calls to __TraceRay, __Invoke and __TraceRaySyncToAsyncAdapter even when indirect calls are not necessary -
ForceNullBVH Swap BVH with null pointer. Infinitely fast ray traversal. Available
ForceRTCheckInstanceLeafPtr Check MemHit::valid before loading GeometryIndex, PrimitiveIndex, etc. Available
ForceRTCheckInstanceLeafPtrMask Test only. 1: committedindex; 2: potentialindex Available
ForceRTConstantBufferCacheCtrl Enables RTConstantBufferCacheCtrl regkey for custom lsc load cache controls for constant buffers Available
ForceRTRetry Raytracing is compiled in the second retry state -
ForceRTShortCircuitingOR Only for specific test.... Short curcite OR condition if CommittedGeometryIndex is used Available
ForceRTStackLoadCacheCtrl Enables RTStackLoadCacheCtrl regkey for custom lsc load cache controls in the RTStack Available
ForceRTStackStoreCacheCtrl Enables RTStackStoreCacheCtrl regkey for custom lsc store cache controls in the RTStack Available
ForceSWHotZoneLoadCacheCtrl Enables SWHotZoneLoadCacheCtrl regkey for custom lsc load cache controls in the SWHotZone Available
ForceSWHotZoneStoreCacheCtrl Enables SWHotZoneStoreCacheCtrl regkey for custom lsc store cache controls in the SWHotZone Available
ForceSWStackLoadCacheCtrl Enables SWStackLoadCacheCtrl regkey for custom lsc load cache controls in the SWStack Available
ForceSWStackStoreCacheCtrl Enables SWStackStoreCacheCtrl regkey for custom lsc store cache controls in the SWStack Available
ForceWholeProgramCompile Compile as if we know all of the shaders upfront Available
KnownBTIBaseValue If EnableKnownBTIBase is set, use this value for baseBTI Available
OverrideTMax Force TMax to the given value. When 0, do nothing. -
PrintfBufferSize Set printf buffer size. Unit: KB. Available
RTFenceToggle Toggle fences Available
RTInValidDefaultIndex If MemHit::valid is false, the default value to return for some intrinsics like GeometryIndex or PrimitiveIndex etc. Available
RayTracingConstantCoalescingMinBlockSize Set the minimum load size in # OWords = [1,2,4,8,16]. Available
RayTracingCustomTileXDim1D X dimension of tile (default: 256) Available
RayTracingCustomTileXDim2D X dimension of tile (default: 32) Available
RayTracingCustomTileYDim1D Y dimension of tile (default: 1) Available
RayTracingCustomTileYDim2D Y dimension of tile (default: 4 for XE, 32 for XE2+) Available
RayTracingDumpYaml Dump yaml input/output files Available
RayTracingKeepUDivRemWA Workaround till jitIsa supports cr0 for rtz conversions Available
RematThreshold Tunes how aggresively we should remat values into continuations Available
RetryRTPickBetterThreshold Only pick the retry shader if the spill cost of the 2nd compilation is at least this percentage better than the previous compilation -
RetryRTSpillCostThreshold Only retry if the percentage of spills (over total instructions) is more than this value -
RetryRTSpillMemThreshold Only retry if spill mem used is more than this value -
ShaderFusionThrehold If there are less shaders than this, don't spend time checking duplicates -
TotalGRFNum4RQ Total GRF used for register allocation for RayQuery only. Test only. Delete later. -