Linux release build allows enabling user-selected configuration flags. They are available after installing release build according to the instructions here. This file is autogenerated from igc_flags.h
.
Configuration flags are generally used either for debug purposes or to experimentally change the compiler's behavior. Intel does not guarantee full performance and conformance when using configuration flags.
A flag is enabled when it is set as a variable in an environment.
The syntax is as follows:
IGC_<flag>=<value>
For example - to enable ShaderDumpEnable
flag in shell:
$ export IGC_ShaderDumpEnable=1
Flag | Description | Release builds |
---|---|---|
AssumeUniformIndirectCall |
Assume indirect call is uniform to avoid looping code | - |
AvoidDstSrcGRFOverlap |
avoid GRF overlap for destination and source operands of an SIMD16/SIMD32 instruction | - |
AvoidSrc1Src2Overlap |
avoid src1 and src2 GRF overlap to avoid the conflict without read suppression | - |
CSSIMD16_SpillThreshold |
Percentage of instructions allowed for spilling on CS SIMD16 | - |
CSSIMD32_SpillThreshold |
Percentage of instructions allowed for spilling on CS SIMD32 | - |
DPASTokenReduction |
optimization to reduce the tokens used for DPAS instruction. | Available |
DisableCSEL |
disable csel peep-hole | - |
DisableFlagOpt |
Disable optimization cmp with logic op | - |
DisableGatherRSFusionSyncWA |
Disable WA for gather instruction when read suppression and EU fusion are enabled. | Available |
DisableHFMath |
Disables HF math instructions. | - |
DisableIfCvt |
Disable ifcvt | - |
DisableMixMode |
Disables mix mode in vISA BE. | - |
DisableRegDistDep |
distable regDist dependence | Available |
DisableSendS |
Setting this to 1/true adds a compiler switch to not generate sends commands, default is to enable sends | - |
DisableThreeALUPipes |
Disable three ALU Pipelines. XeHP only | Available |
DisableWriteCombine |
Disable write combine. PVC+ only | - |
DumpASMToConsole |
Dump ASM to console and do early exit | Available |
DumpPromoteI8 |
Dump useful info during promoting i8 to i16 | Available |
DumpVISAASMToConsole |
Dump VISAASM to console and do early exit | Available |
Enable16DWURBWrite |
Enable 16 Dword URB Write messages | Available |
Enable16OWSLMBlockRW |
Enable 16 OWord (8 GRF) SLM block read/write message | Available |
Enable64BMediaBlockRW |
Enable 64 byte wide media block read/write message | Available |
EnableAdd3 |
Enable Add3. XeHP+ only | Available |
EnableAtomicFusion |
To enable/disable atomic send fusion (simd8 shaders). Valid if EnableSendFusion is on. | - |
EnableBCR |
Enable bank conflict reduction. | Available |
EnableBfn |
Enable Bfn. XeHP+ only | Available |
EnableCallUniform |
[tmp, testing] Ignore indirect call's uniform | Available |
EnableCallWA |
Control call WA when EU fusion is on. 0: off; 1: on | Available |
EnableCoalesceScalarMoves |
Enable scalar moves to be coalesced into fewer moves | Available |
EnableForceDebugSWSB |
Enable force debugging functionality for software scoreboard generation | Available |
EnableGroupScheduleForBC |
Enable bank conflict reduction in scheduling. | Available |
EnableHWGenerateThreadID |
Enable new behavior of HW generating threadID for GPGPU pipe. XeHP and non-OCL only. | Available |
EnableHWGenerateThreadIDForTileY |
Enable HW generating threadID for GPGPU pipe for TileY mode. XeHP and non-OCL only. | Available |
EnableIGAEncoder |
Enable VISA IGA encoder | - |
EnableIGASWSB |
Use IGA for SWSB | Available |
EnableMathDPASWA |
PVC math instruction running with DPAS issue | - |
EnableNonOCLWalkOrderSel |
Enable WalkOrder selection for HW generating threadID for GPGPU pipe. XeHP and non-OCL only. | Available |
EnablePassInlineData |
1: Force pass 1st GRF of cross-thread payload as inline data; -1: Force disable passing inline data | Available |
EnablePreemption |
Enable generating preeemptable code (SKL+) | - |
EnablePromoteI8 |
Enable promoting i8 (char) to i16 on all ALU insts that does support i8. It's only for XeHPC+ for now. | Available |
EnablePromoteI8Vec |
Control if a certain i8 vector needs to be promoted (detail in code) | Available |
EnablePvtMemHalfToFloat |
Enable conversion from half to float for private memory. | Available |
EnableRemoveLoopDependency |
Enable removing of fantom loop dependency introduced by SROA | Available |
EnableQWRotateInstructions |
Enable QW type support for rotate instructions. PVC only. | Available |
EnableQuickTokenAlloc |
Insert dependence resolve for kernel stitching | Available |
EnableSWSBInstStall |
Enable force stall to specific(start) instruction start for software scoreboard generation | Available |
EnableSWSBInstStallEnd |
Enable force stall to end instruction for software scoreboard generation | Available |
EnableSWSBStitch |
Insert dependence resolve for kernel stitching | Available |
EnableSWSBTokenBarrier |
Enable force specific instruction as a barrier for software scoreboard generation | Available |
EnableSendFusion |
Enable(!=0)/disable(0)/force(2) send fusion. Valid for simd8 shader/kernel only. | - |
EnableSeparateScratchWA |
Apply the workaround in slot0 and slot1 sizes when separating scratch spacesSeparate scratch space. | Available |
EnableSpillSpaceCompression |
Enable spill space compression. 0 - off, 1 - on, 2 - platform default | - |
EnableUntypedSurfRWofSS |
Enable untyped surface RW to scratch space. XeHP A0 only. | Available |
EnableVISABinary |
Enable VISA Binary | Available |
EnableVISABoundsChecking |
Enable VISA bounds checking. | - |
EnableVISADebug |
Runs VISA in debug mode, all optimizations disabled | - |
EnableVISADotAll |
Enable VISA DotAll. Dumps dot files for intermediate stages | - |
EnableVISADumpCommonISA |
Enable VISA Dump Common ISA | Available |
EnableVISAJmpi |
Enable/Disable VISA generating jmpi (scalar jump). | - |
EnableVISANoBXMLEncoder |
Enable VISA No-BXML encoder | - |
EnableVISANoSchedule |
Enable VISA No-Schedule | Available |
EnableVISAOutput |
Enable VISA GenISA output | Available |
EnableVISAPreSched |
Enable VISA Pre-RA Scheduler | Available |
EnableVISASlowpath |
Enable VISA Slowpath. Needed to dump .visaasm | Available |
EnableVISAStructurizer |
Enable/Disable VISA structurizer. See value defs in igc_flags.hpp. | - |
ExpandPlane |
Enable pln to mad macro expansion. | - |
Force32bitConstantGEPLowering |
Go back to old version of GEP lowering for constant address space. PVC only | - |
ForceAllowSmallSpill |
Allow small spills regardless of SIMD, API, or platform. The spill amount is set below | - |
ForceBCR |
Force bank conflict reduction, no matter spill or not. | Available |
ForceHWThreadNumberPerEU |
Total HW thread number per-EU. | - |
ForceInlineDataForXeHPC |
Force InlineData for XeHPC. For testing purposes. | Available |
ForceNoMaskWA |
[tmp, testing] Force NoMaskWA on any platforms | - |
ForcePreemptionWA |
Force generating preemptable code across platforms | Available |
ForcePreserveR0 |
Setting this to true makes VISA preserve r0 in r0 | Available |
ForcePromoteI8 |
Force promoting i8 (char) to i16 on all ALU insts (for testing). | Available |
ForceSubReturn |
If a subroutine does not have a return, generate a dummy return if this key is set (to meet visa requirement) | - |
ForceTexelMaskClear |
If set to 1 or 2, forces evaluate messages to clear the texel mask to 0 or 1, respectively. | Available |
ForceUniformBuffer |
Force buffer operand to be uniform | - |
ForceUniformSurfaceSampler |
Force surface and sampler operand to be uniform | - |
ForceVISAPreSched |
Force enabling of VISA Pre-RA Scheduler | - |
ForceVISAStructurizer |
Force VISA structurizer for testing. Used on platforms in which we turns off SCF and use UCF by default | - |
GlobalSendVarSplit |
Enable global send variable splitting when we are about to spill | - |
NewSpillCostFunction |
Use new spill cost function in VISA RA | - |
NoMaskWA |
Enable NoMask WA by using software-computed emask flag | - |
ReplaceIndirectCallWithJmpi |
Replace indirect call with jmpi instruction (HW WA) | Available |
ReservedRegisterNum |
Reserve register number for spill cost testing. | - |
SIMD16_SpillThreshold |
Percentage of instructions allowed for spilling on SIMD16 | - |
SIMD32_SpillThreshold |
Percentage of instructions allowed for spilling on SIMD32 | - |
SIMD8_SpillThreshold |
Percentage of instructions allowed for spilling on SIMD8 | - |
SWSBMakeLocalWAR |
make WAR SBID dependence tracking BB local | Available |
SWSBTokenNum |
Total tokens used for SWSB. | Available |
ScratchSpaceSizeLimit |
Size limit of scratch space. XeHP and above only. Test only. Remove it once stabalized. | Available |
ScratchSpaceSizeReserved |
Reserved size of scratch space. XeHP and above only. Test only. Remove it once stabalized. | Available |
SeparateSpillPvtScratchSpace |
Separate scratch spaces for spillfill and privatememory. XeHP and above only. Test only. Remove it once stabalized. | Available |
SetA0toTdrForSendc |
Set A0 to tdr0 before each sendc/sendsc | Available |
SpillCompressionThresholdOverride |
Set a threshold number (1K based) to run with spill compression | - |
TotalGRFNum |
Total GRF setting for both IGC-LLVM and vISA | - |
TotalGRFNum4CS |
Total GRF setting for both IGC-LLVM and vISA, for ComputeShader-only experiment. | - |
UnifiedSendCycle |
Using unified send cycle. | - |
Use16ByteBindlessSampler |
True if 16-byte aligned bindless sampler state is used | - |
UseLinearScanRA |
use Linear Scan as default register allocation algorithm | - |
UseMathWithLUT |
Use the implementations of cos, cospi, log, sin, sincos, and sinpi with Look-Up Tables (LUT). | - |
VISALTO |
vISA LTO optimization flags. check LINKER_TYPE for more details | - |
VISAOptions |
Options to vISA. Space-separated options. | Available |
VISAPostScheduleEndBBID |
The ID of BB which will be last scheduled | - |
VISAPostScheduleStartBBID |
The ID of BB which will be first scheduled | - |
VISAPreSchedCtrl |
Configure Pre-RA Scheduler, default(0), logging(1), latency(2), pressure(4) | - |
VISAPreSchedExtraGRF |
Bump up GRF number to make pre-RA Scheduling more greedy, 0 for the default | - |
VISAPreSchedRPThreshold |
Threshold to commit a pre-RA Scheduling without spills, 0 for the default | - |
VISAScheduleEndBBID |
The ID of BB which will be last scheduled | - |
VISAScheduleStartBBID |
The ID of BB which will be first scheduled | - |
WARSWSBLocalEnd |
WAR localization end BB | Available |
WARSWSBLocalStart |
WAR localization start BB | Available |
disableCompaction |
Disables compaction. | Available |
disableIGASyntax |
Disables GEN isa text output using IGA and new syntax. | - |
Flag | Description | Release builds |
---|---|---|
AllowMem2Reg |
Setting this to true makes IGC run mem2reg even when optimizations are disabled | Available |
BlockPushConstantGRFThreshold |
Set the maximum limit for block push constants i.e. UBO data pushed. Set to 0xFFFFFFFF to use the default threshold for the platform. Note that for small pixel shaders the PayloadSizeThreshold may be the limiting factor. |
- |
CodeLoopSinkingMinSize |
Don't sink in the loop if the number of instructions in the kernel is less | - |
CodeSinkingLoadSchedulingInstr |
Instructions number to step to schedule loads in advance before the load use to cover latency. 1 to insert it immediately before use | - |
CodeSinkingMinSize |
Don't sink if the number of instructions in the kernel is less | - |
DisableAttributePush |
Bit mask to disable push Attribute per shader stages. bit0 = All, Bit 1 = VS, Bit 2 = HS, Bit 3 = DS, Bit 4 = GS | - |
DisableBranchSwaping |
Setting this to 1/true adds a compiler switch to disable branch swapping. | - |
DisableCodeHoisting |
Setting this to 1/true adds a compiler switch to disable code-hoisting | - |
DisableCodeSinking |
Setting this to 1/true adds a compiler switch to disable code-sinking | - |
DisableCodeSinkingInputVec |
Setting this to 1/true disable sinking inputVec inst (test) | - |
DisableConstBaseGlobalBaseArg |
Do no generate kernel implicit arguments: constBase and globalBase | - |
DisableConstantCoalescing |
Setting this to 1/true adds a compiler switch to disable constant coalesing | - |
DisableConstantCoalescingOfStatefulNonUniformLoads |
Disable merging non-uniform loads from stateful buffers. Note: does not affect merging to sampler loads | - |
DisableConstantCoalescingOutOfBoundsCheck |
Setting this to 1/true adds a compiler switch to disable constant coalesing out of bounds check | - |
DisableCustomUnsafeOpt |
Disable IGC to run custom unsafe optimizations | - |
DisableDX9LowPrecision |
Disables HF in DX9. | - |
DisableDotAddToDp4aMerge |
Disable Dot and Add ops to Dp4a merge optimization. | - |
DisableDynamicResInfoFolding |
Disable Dynamic ResInfo Instruction Folding | - |
DisableDynamicTextureFolding |
Disable Dynamic Texture Folding | - |
DisableEmptyBlockRemoval |
Setting this to 1/true adds a compiler switch to disable empty block optimization | - |
DisableFDivReassociation |
Disable reassociation for Fdiv operations to avoid precision difference | - |
DisableFlattenSmallSwitch |
Disable the flatten small switch pass | - |
DisableGatingSimilarSamples |
Disable Gating of similar sample instructions | - |
DisableIGCOptimizations |
Setting this to 1/true adds a compiler switch to disables all the above IGC optimizations | - |
DisableIPConstantPropagation |
Disable Inter-procedrual constant propgation | - |
DisableIRVerification |
Setting this to 1/true adds a compiler switch to disable IGC IR verification. | - |
DisableImmConstantOpt |
Disable IGC IndirectICBPropagaion optimization | - |
DisableLLVMGenericOptimizations |
Disable LLVM generic optimization passes | - |
DisableLoadSinking |
Setting this to 1/true adds a compiler switch to disable load sinking during retry | - |
DisableLoopSink |
Disable sinking in all loops | - |
DisableLoopSplitWidePHIs |
Disable splitting of loop PHI values to eliminate subvector extract operations | - |
DisableLoopUnroll |
Setting this to 1/true adds a compiler switch to disable loop unrolling. | Available |
DisableMCSOpt |
Disable IGC to run MCS optimization | - |
DisableMatchFloor |
Setting this to 1/true adds a compiler switch to disable sub-frc = floor optimization | - |
DisableMatchMad |
Setting this to 1/true adds a compiler switch to disable mul+add = mad optimization | - |
DisableMatchPow |
Setting this to 1/true adds a compiler switch to disable log2/mul/exp2 = pow optimization | - |
DisableMatchPredAdd |
Setting this to 1/true adds a compiler switch to disable pred+add = predAdd optimization | - |
DisableMatchSimpleAdd |
Setting this to 1/true adds a compiler switch to disable simple cmp+and+add optimization | - |
DisableMovingInstanceIDIndexOfVS |
Disable moving index of InstanceID in VS to last location. | - |
DisablePayloadCoalescing |
Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for all types | - |
DisablePayloadCoalescing_AtomicTyped |
Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for atomic typed only | - |
DisablePayloadCoalescing_RT |
Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for RT only | - |
DisablePayloadCoalescing_Sample |
Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for Samplers only | - |
DisablePayloadCoalescing_URB |
Setting this to 1/true adds a compiler switch to disable payload coalescing optimization for URB writes only | - |
DisablePromotePrivMem |
Setting this to 1/true adds a compiler switch to disable IGC private array promotion | - |
DisablePullConstantHeuristics |
Disable the heuristics to determine the no. push constants based on payload size. | - |
DisablePushConstant |
Bit mask to disable push constant per shader stages. bit0 = All, Bit 1 = VS, Bit 2 = HS, Bit 3 = DS, Bit 4 = GS, Bit 5 = PS | - |
DisableRectListOpt |
Disable Rect List optimization | - |
DisableReducePow |
Disable IGC to reduce pow instructions | - |
DisableSIMD32Slicing |
Setting this to 1/true adds a compiler switch to disable emitting SIMD32 VISA code in slices | - |
DisableSimplePushWithDynamicUniformBuffers |
Disable Simple Push Constants Optimization for dynamic uniform buffers. | - |
DisableSqrtOpt |
Prevent IGC from doing the optimization y*y = x if y = sqrt(x) | - |
DisableStaticCheck |
Disable static check to push constants. | - |
DisableStaticCheckForConstantFolding |
Disable static check to fold constants. | - |
DisableSynchronizationObjectCoalescingPass |
Disable SynchronizationObjectCoalescing pass | - |
DisableURBPartialWritesPass |
Disable IGC pass that converts URB partial writes to full-mask writes. | - |
DisableURBReadMerge |
Disable IGC pass that merges URB Read instructions. | - |
DisableURBWriteMerge |
Setting this to 1/true adds a compiler switch to disable URB write merge | - |
DisableUniformAnalysis |
Setting this to 1/true adds a compiler switch to disable uniform_analysis | - |
DisableUniformTypedAccess |
Setting this will disable uniform typed access handling | - |
DisableUniformURBWrite |
Disables generation of uniform URB write messages | - |
EnableAtomicBranch |
Enable Atomic branch optimization that break atomic into if/else. 1: if Val == 0 ignore iadd/sub/umax 0. 2: checks if memory is lower than Val before doing umax. 3: applies both 1 for iadd/sub and 2 for umax | - |
EnableBitcastedLoadNarrowing |
Enable narrowing of vector loads in bitcasts patterns. | - |
EnableBitcastedLoadNarrowingToScalar |
Enable narrowing of vector loads to scalar ones in bitcasts patterns. | - |
EnableBlendToDiscard |
Enable blend to discard based on blend state. | - |
EnableBlendToFill |
Enable blend to fill based on blend state. | - |
EnableCodeAssumption |
If set (> 0), generate llvm.assume to help certain optimizations. It is OCL only for now. Only 1 and 2 are valid. 2 will be 1 plus additional assumption. It also does other minor changes. |
- |
EnableCustomLoopVersioning |
Enable IGC to do custom loop versioning | - |
EnableDeSSA |
Setting this to 0/false adds a compiler switch to disable De-SSA | - |
EnableDeSSAWA |
[tmp]Keep some piece of code to avoid perf regression | - |
EnableExtractCommonMultiplier |
Enable ExtractCommonMultiplier optimization in CustomUnsafeOptPass. | - |
EnableFastMath |
Enable fast math optimizations in IGC | - |
EnableFastSampleD |
Enable fast sample D opt. | - |
EnableGEPLSR |
Enables GEP Loop Strength Reduction pass | - |
EnableGEPLSRAnyIntBitWidth |
Experimental: Enables reduction of SCEV with illegal integers. Requires legalization pass to clear up expanded code. | Available |
EnableGEPLSRToPreheader |
Enables reduction to loop's preheader in GEP Loop Strength Reduction pass | - |
EnableGVN |
Enable LLVM global value numbering | - |
EnableGenUpdateCB |
Enable derived constant optimization. | - |
EnableGenUpdateCBResInfo |
Enable derived constant optimization with resinfo. | - |
EnableHighestSIMDForNoSpill |
When there is no spill choose highest SIMD (compute shader only). | - |
EnableHoistDp3 |
Enable dp3 Hoisting. | - |
EnableHoistMulInLoop |
Hoist multiply with loop invirant out of loop, FP unsafe | - |
EnableIndependentSharedMemoryFenceFunctionality |
Enable treating global memory fences as shared memory fences in SynchronizationObjectCoalescing pass | - |
EnableIntegerMad |
Setting this to 1/true adds a compiler switch to enable integer mul+add = mad optimization | - |
EnableJumpThreading |
Setting this to 1/true adds a compiler switch to enable llvm jumpThreading pass. | Available |
EnableLSCFence |
Enable LSC Fence in ConvertDXIL for the device has LSC | - |
EnableLoadChainLoopSink |
Allow sinking of load address calculation when the load was sinked to the loop, even if the needed regpressure is achieved (only single use instructions) | - |
EnableLoadsLoopSink |
Allow sinking of loads in the loop | - |
EnableLogicalAndToBranch |
Enable convert logical AND to conditional branch | - |
EnableLoopHoistConstant |
Enables pass to check for specific loop patterns where variables are constant across all but the last iteration, and hoist them out of the loop. | - |
EnableNewTileYCheck |
Enable new TileY check. 0 - off, 1 - on, 2 - platform default | - |
EnableOptReportLoadNarrowing |
Generate opt report for narrowing of vector loads. | - |
EnablePingPongTextureOpt |
Enables the Ping Pong texture optimization which is used only for Compute Shaders for back to back dispatches | - |
EnablePlatformFenceOpt |
Force fence optimization | - |
EnablePowToLogMulExp |
Enable pow to exp(log(x)*y) optimization in CustomUnsafeOptPass. | - |
EnableRobustBufferAccessPush |
Setting to 1/true will allow a single push buffer to be supported when the client requests robust buffer access (DG2+ only) | - |
EnableSLMConstProp |
Enable SLM constant propagation (compute shader only). | - |
EnableSamplerChannelReturn |
Setting this to 1/true adds a compiler switch to enable using header to return selective channels from sampler | - |
EnableSimplePushSizeBasedOpimization |
Enable the simplepush optimization to do push based on size | - |
EnableSimplifyGEP |
Enable IGC to simplify indices expr of GEP. | - |
EnableSoftwareStencil |
Enable software stencil for PS. | - |
EnableSoftwareVertexFetch |
Enable software vertex fetch for VS. | - |
EnableSplitIndirectEEtoSel |
Enable the split indirect extractelement to icmp+sel pass | - |
EnableSplitUnalignedVector |
Enable Splitting of unaligned vectors for loads and stores | - |
EnableStatefulAtomic |
Enable promoting stateless atomic to stateful atomic. | - |
EnableStatefulToken |
Enable generating patch token to indicate a ptr argument is fully converted to stateful (temporary) | - |
EnableStatelessToStateful |
Enable Stateless To Stateful transformation for global and constant address space in OpenCL kernels | - |
EnableSumFractions |
Enable SumFractions optimization in CustomUnsafeOptPass. | - |
EnableTextureLoadCoalescing |
Enable merging non-uniform loads from bindless textures | - |
EnableThreadCombiningOpt |
Enables the thread combining optimization which is used only for Compute Shaders for combining a number of software threads to dispatch smaller number of hardware threads | - |
EnableThreeWayLoadSpiltOpt |
Enable three way load spilt opt. | - |
EnableTrigFuncRangeReduction |
reduce the sin and cosing function domain range | Available |
EnableUnmaskedFunctions |
Enable unmaksed functions SYCL feature. | Available |
EnableWaveForce32 |
Force Wave to use simd32 | - |
EnableWorkGroupUniformGoto |
Setting to 1 enables generating uniform goto for work group uniform [eu fusion only] | - |
FPRoundingModeCoalescingMaxDistance |
Max distance in instructions for reordering FP instructions with common rounding mode | - |
ForceAddressArithSinking |
Force sinking address arithmetic closer to the usage | - |
ForceHoistDp3 |
force dp3 Hoisting. | - |
ForceLinearWalkOnLinearUAV |
Force linear walk on linear UAV buffer | - |
ForceLoadsLoopSink |
Force sinking of loads in the loop from the beginning | - |
ForceLoopSink |
Force sinking in all loops | - |
ForceSupportsAutoGRFSelection |
ForceSupportsAutoGRFSelection | Available |
ForceSupportsStaticRegSharing |
ForceSupportsStaticRegSharing | Available |
ForceTileY |
Force TileY mode on DG2 | - |
GEPLSRThresholdRatio |
Ratio for register pressure threshold in GEP Loop Strength Reduction pass | - |
KeepTileYForFlattened |
Keep TileY for FlattenedThreadIdInGroup. 0 - off, 1 - on, 2 - platform default | - |
LLVMCommandLine |
applies LLVM command line | - |
LoopSinkMinSave |
If loop sink can have save more 32-bit values than this Minimum, do it; otherwise, skip | - |
LoopSinkMinSaveUniform |
If loop sink can have save more scalar (uniform) values than this Minimum, do it; otherwise, skip | - |
LoopSinkRegpressureMargin |
Sink into the loop until the pressure becomes less than #grf-margin | - |
LoopSinkRollbackThreshold |
Rollback loop sinking if the estimated regpressure after the sinking is still higher than this + #available registers, and the number of registers can be increased | - |
LoopSinkThresholdDelta |
Do loop sink If the estimated register pressure is higher than this + #avaialble registers | - |
MaxImmConstantSizePushed |
Set the max size of immediate constant buffer pushed | - |
PSSIMD32HeuristicFP16 |
enable PS SIMD32 heuristic based on fp16 characteristic | - |
PSSIMD32HeuristicLoopAndDiscard |
enable PS SIMD32 heuristic based on loop info and discard | - |
PayloadSizeThreshold |
Set the max payload size threshold for short shades that have PSD bottleneck. | - |
PrepopulateLoadChainLoopSink |
Check the loop for loop chains before sinking to use the existing chains in a heuristic | - |
RovOpt |
Bitmask for ROV optimizations. 0 for all off, 1 for force fence flush none, 2 for setting LSC_L1UC_L3C_WB, 3 for both opt on | - |
RuntimeLoopUnrolling |
Setting this to switch on/off runtime loop unrolling. 0: default (on), 1: force on, 2: force off | - |
SelectiveHashOptions |
applies options to hash range via string | - |
SetBranchSwapThreshold |
Set the branch swaping threshold. | - |
SetDefaultTileYWalk |
Use TileY walk as default for HW generating threadID | Available |
SetLoopUnrollThreshold |
Set the loop unroll threshold. Value 0 will use the default threshold. | - |
SetLoopUnrollThresholdForHighRegPressure |
Set the loop unroll threshold for shaders with high reg pressure. Value 0 will use the default threshold. | - |
SetRegisterPressureThresholdForLoopUnroll |
Set the register pressure threshold for limiting the loop unroll to smaller loops | - |
SetURBFullWriteGranularity |
Overrides the minimum access granularity for URB full writes. Valid values are 0, 16 and 32, value 0 means use default for the platform. |
Available |
SplitIndirectEEtoSelThreshold |
Split indirect extractelement cost threshold | - |
SynchronizationObjectCoalescingConfig |
Modify the default behavior of SynchronizationObjectCoalescing value is a bitmask bit0 – remove fences in read barrier write scenario | Available |
UseHDCTypedReadForAllTextures |
Setting this to use HDC message rather than sampler ld for texture read | - |
UseHDCTypedReadForAllTypedBuffers |
Setting this to use HDC message rather than sampler ld for buffer read | - |
UseTiledCSThreadOrder |
Use 4x4 disaptch for CS order when it seems beneficial | - |
WaAllowMatchMadOptimizationforVS |
Setting this to 1/true adds a compiler switch to enable mul+add = mad optimization for VS | - |
WaDisableMatchMadOptimizationForCS |
Setting this to 1/true adds a compiler switch to disable mul+add = mad optimization for CS | - |
forceFullUrbWriteMask |
Set Full URB write mask. | - |
forcePushConstantMode |
set the push constant mode, 0 is default behavior, 1 is simple push, 2 is gather constant, 3 is none/pull constants | - |
Flag | Description | Release builds |
---|---|---|
CompileOneAtTime |
Compile only one kernel (out of many in llvm::module) at a time. Prints compiled kenrels names to stdout. Useful to debug compilation time and crashes - it does not produce valid binary. | - |
CopyA0ToDBG0 |
Copy a0 used for extended msg descriptor to dbg0 to help debug | - |
DPASReadSuppressionWA |
Enable read suppression WA for the send and indirect access | - |
DebugInternalSwitch |
Code pass selection, debug only | - |
DisablePassToggles |
Disable each IGC pass by setting the bit. HEXADECIMAL ONLY!. Ex: C0 is to disable pass 6 and pass 7. | - |
DisableSendSrcDstOverlapWA |
Disable Send Source/destination overlap WA which is enabled for GEN10/GEN11 and whenever Wddm2Svm is set in WATable | - |
DumpPayloadToScratch |
Setting this to 1/true dumps thread payload to scartch space. Used for workloads which doesnt use scartch space for other purposes | - |
EnableBitcastExtractInsertPattern |
Enable BitcastExtractInsertPattern in CustomSafeOptPass. | Available |
EnableCSSIMD32 |
Enable computer shader SIMD32 mode, and fall back to lower SIMD when spill | - |
EnableDebugging |
Enable shader debugging for release internal | - |
EnableDivergentBarrierCheck |
Uses WIAnalysis to find barriers in divergent flow control. May have false positives. | - |
EnableHashMovsAtPrologue |
Rather than after EOT, insert hash code movs at shader entry | Available |
EnableLSCFenceUGMBeforeEOT |
Enable inserting fence.ugm.06.tile before EOT if a kernel has any write to UGM [XeHPC, PVC]. | Available |
EnableOptionalBufferOffset |
For StatelessToStateful optimization [OCL], if true, make buffer offset optional. Valid only if buffer offset is supported. | Available |
EnableRTLSCFenceUGMBeforeEOT |
[tmp]Enable inserting fence.ugm.06.tile before EOT for RT shader [XeHPC, PVC]. | - |
EnableRTmaskPso |
Enable render target mask optimization in PSO opt | - |
EnableSIPOverride |
This key forces load of SIP from a a Local File. | - |
EnableSupportBufferOffset |
[debugging]For StatelessToStateful optimization [OCL], support implicit buffer offset argument (same as -cl-intel-has-buffer-offset-arg). | - |
EnableTestIGCBuiltin |
Enable testing igc builtin (precompiled kernels) using OCL. | - |
EnableTrivialEmulateSinCos |
Enable Emulation for Sine and Cosine instructions | - |
EnableZeroSomeARF |
If set, insert mov inst to zero a0, acc, etc to assist HW debugging. | - |
EnablerReadSuppressionWA |
Enable read suppression WA for the send and indirect access | - |
ForceCSLeastSIMD |
Force computer shader to the lowest allowed SIMD mode | - |
ForceCSSIMD16 |
Force computer shader SIMD16 mode if allowed, otherwise it will use SIMD32 | - |
ForceCSSIMD32 |
Force computer shader SIMD32 mode | - |
ForceDisableShaderDebugHashCodeInKernel |
Disable hash code addition to the binary after EOT | Available |
ForceEmuKind |
Force emuKind used by PreCompiledFuncImport pass. This flag takes emulation kind value that is defined in EmuKind enum in PreCompiledFuncImport.hpp [TEST ONLY] | - |
ForceFunctionsToNop |
Replace functions with immediate return to help narrow down shaders; use with Options.txt. | - |
ForceLoosenSimd32Occu |
Control loosenSimd32occu return value. 0 - off, 1 - on, 2 - platform default | - |
ForceMemoryFenceBeforeEOT |
Forces inserting SLM or gloabal memory fence before EOT if shader writes to SLM or goblam memory respectively. | - |
ForcePerThreadPrivateMemorySize |
Useful for ensuring a certain amount of private memory when doing a shader override. | Available |
ForceStatelessForQueueT |
In OCL, force to use stateless memory to hold queue_t*. This is a legacy feature to be removed. | - |
ForceRecompilation |
Force RetryManager to make recompilation. | - |
MSAAClearedKernel |
Insert the discard code for MSAA_MSC_Cleared kernels. 2/4/8/16 | - |
PrintVerboseGenericControlFlowLog |
Forces compiler to print detailed log about additional control flow generated due to a presence of generic memory operations | Available |
RetryManagerFirstStateId |
For debugging purposes, it can be useful to start on a particular id rather than id 0. | - |
RouteByLodHint |
An integer offset addon to route the resource to HDC on DG2 | - |
SIPOverrideFilePath |
This key when enabled with EnableSIPOverride load of SIP from a specified path. | - |
SToSProducesPositivePointer |
This key is for StatelessToStateful optimization if the user knows the pointer offset is postive to the kernel argument. | - |
ShaderDebugHashCode |
The driver will set a breakpoint in the first instruction of the shader which has the provided hash code. It works only when the value is different then 0 and SystemThreadEnable is set to TRUE. Ex: VS_asm2df26246434553ad_nos0000000000000000 , only the LowPart Need to be Enterd in Registry Ex : 0x434553ad ,i.e Lower 8 Hex Digits of the 16 Digit Hash Code for Compatibilty Reasons |
- |
ShaderDebugHashCodeInKernel |
Add hash code to the binary | Available |
ShaderDisableOptPassesAfter |
Will only run first N optimization passes, any further passes will be ignored. This flag can be used to bisect optimization passes. | - |
ShaderDisplayAllPassesNames |
Display to console all passes name with their ID and occurrence number. | - |
ShaderOverride |
Will override any LLVM shader with matching name in c:\Intel\IGC\ShaderOverride | - |
ShaderPassDisable |
Disable specific passes eg. '9;17-19;239-;Error Check;ResolveOCLAtomics:2;Dead Code Elimination:3-5;BreakConstantExprPass:7-' disable pass 9, disable passes from 17 to 19, disable all passes after 238, disable all occurrences of pass Error Check, disable second occurrence of ResolveOCLAtomics, disable pass Dead Code Elimination occurrences from 3 to 5, disable all BreakConstantExprPass after his 6 occurrence To show a list of pass names and their occurrence set ShaderDisplayAllPassesNames. Must be used with ShaderDumpEnableAll flag. |
- |
SystemThreadEnable |
This key forces software to create a system thread. The system thread may still be created by software even if this control is set to false.The system thread is invoked if either the software requires exception handling or if kernel debugging is active and a breakpoint is hit. |
- |
TestIGCPreCompiledFunctions |
Enable testing for precompiled kernels. [TEST ONLY] | - |
ld2dmsInstsClubbingThreshold |
Do not club more than these ld2dms insts into the new BB during MCSOpt | - |
manualEnableRSWA |
Enable read suppression WA for the send and indirect access | - |
Flag | Description | Release builds |
---|---|---|
AddExtraIntfInfo |
Will add extra inteference info from .extraintf files from c:\Intel\IGC\ShaderOverride | - |
DebugDumpNamePrefix |
Set a prefix to debug info dump filenames(with path) and drop hash info from them (for testing purposes) | Available |
DumpDeSSA |
dump DeSSA info into file. | Available |
DumpHasNonKernelArgLdSt |
Print if hasNonKernelArg load/store to stderr | Available |
DumpLLVMIR |
dump LLVM IR | Available |
DumpLoopSink |
Dump debug info in LoopSink | - |
DumpOCLProgramInfo |
dump OpenCL Patch Tokens, Kernel/Program Binary Header | Available |
DumpPatchTokens |
Enable dumping of patch tokens. | Available |
DumpResourceLoop |
dump resource loop detected by ResourceLoopAnalysis | Available |
DumpTimeStats |
Timing of translation, code generation, finalizer, etc | Available |
DumpTimeStatsCoarse |
Only collect/dump coarse level time stats, i.e. skip opt detail timer for now | Available |
DumpTimeStatsPerPass |
Collect Timing of IGC/LLVM passes | Available |
DumpToCurrentDir |
dump shaders to the current directory | Available |
DumpToCustomDir |
Dump shaders to custom directory. Parent directory must exist. | Available |
DumpUseShorterName |
If set, use an internal shader name(_entry_id) in dump file name | Available |
DumpVariableAlias |
Dump variable alias info, valid if EnableVariableAlias is on | Available |
DumpWIA |
dump WI (uniform) infomation into files in dump directory if set to true | - |
DumpZEInfoToConsole |
Dump zeinfo to console | Available |
ElfDumpEnable |
dump ELF file | Available |
ElfTempDumpEnable |
dump temporary ELF files | Available |
EnableCapsDump |
Enable hardware caps dump | Available |
EnableCisDump |
Enable cis dump | Available |
EnableCosDump |
Enable cos dump | Available |
EnableKernelNamesBasedHash |
If set, use kernels' names to calculate the hash. Doesn't work on .cl dump's hash. Will overwrite dumps if multiple modules have the same kernel names. | - |
EnableLivenessDump |
Enable dumping out liveness info on stderr. | Available |
EnableScalarizerDebugLog |
print step by step scalarizer debug info. | Available |
EnableShaderNumbering |
Number shaders in the order they are dumped based on their hashes | Available |
ForceRPE |
Force RPE (RegisterEstimator) computation if > 0. If 2, force RPE per inst. | Available |
InterleaveSourceShader |
Interleave the source shader in asm dump | Available |
PrintAfter |
Take either all or comma/semicolon-separated list of pass names. If set, enable print LLVM IR after the given pass is done (mimic llvm print-after) | Available |
PrintBefore |
Take either all or comma/semicolon-separated list of pass names. If set, enable print LLVM IR before the given pass is done (mimic llvm print-before) | Available |
PrintHexFloatInShaderDumpAsm |
print floats in hex in asm dump | Available |
PrintInstOffsetInShaderDumpAsm |
print instruction offsets as comments in asm dump | Available |
PrintMDBeforeModule |
Print metadata of the module at the beginning of the dump. Used for LIT tests. | Available |
PrintPsoDdiHash |
Print psoDDIHash in TimeStats_Shaders.csv file | Available |
PrintToConsole |
dump to console | Available |
ProgbinDumpFileName |
Specify filename to use for dumping progbin file to current dir | Available |
QualityMetricsEnable |
Enable Quality Metrics for IGC | Available |
RPEDumpLevel |
> 0 : dump info of register pressure estimate on stderr. See igc_flags.hpp level defs. | - |
ShaderDataBaseStats |
Enable gathering sends' sizes for shader statistics | - |
ShaderDataBaseStatsFilePath |
Path to a file with dumped shader stats additional data e.g. data available during compilation only | - |
ShaderDumpEnable |
dump LLVM IR, visaasm, and GenISA | Available |
ShaderDumpEnableAll |
dump all LLVM IR passes, visaasm, and GenISA | Available |
ShaderDumpEnableG4 |
same as ShaderDumpEnable but adds G4 dumps (0 = off, 1 = some, 2 = all) | - |
ShaderDumpEnableIGAJSON |
adds IGA JSON output to shader dumps (0 = off, 1 = enabled, 2 = include def/use info but causes longer compile times) | - |
ShaderDumpEnableRAMetadata |
adds RA Metadata file to shader dumps | Available |
ShaderDumpFilter |
Only dump files matching the given regex | Available |
ShaderDumpInstNamer |
dump all unnamed LLVM IR instruction with variable names 'tmp' which makes easier for shaderoverriding | Available |
ShaderDumpPidDisable |
disabled adding PID to the name of shader dump directory | Available |
ShowFullVectorsInShaderDumps |
print all elements of vectors in ShaderDumps, can dramatically increase ShaderDumps size | Available |
Flag | Description | Release builds |
---|---|---|
AvoidUsingR0R1 |
Do not use r0 and r1 as generic usage registers | - |
BufferBoundsChecking |
Setting this to 1 (true) enables buffer bounds checking | - |
DebugInfoEnforceAmd64EM |
Enforces elf file with the debug infomation to have eMachine set to AMD64 | - |
DebugInfoValidation |
Enable optional (strict) checks to detect debug information inconsistencies | - |
EnableRelocations |
Setting this to 1 (true) makes IGC emit relocatable ELF with debug info | Available |
EnableTestSplitI64 |
Test legalization that split i64 store unnecessarily, to be deleted once test is done[temp] | Available |
EnableWriteOldFPToStack |
Setting this to 1 (true) writes the caller frame's frame-pointer to the start of callee's frame on stack, to support stack walk | - |
ExtraOCLInternalOptions |
Extra internal options for OpenCL | Available |
ExtraOCLOptions |
Extra options for OpenCL | Available |
ForceAssignRhysicalReg |
Force assigning dclId to phyiscal reg. | Available |
ForceSpillVariables |
comma-separated string, each provide the declare id of variable which will be spilled | Available |
InitializeAddressRegistersBeforeUse |
Setting this to 1 (true) initializes address register to 0 before each use | - |
InitializeRegistersEnable |
Setting this to 1/true initializes all GRFs, Flag and address registers to 0 at the beginning of the shader | - |
InitializeUndefValueEnable |
Setting this to 1/true initializes all undefs in URB payload to 0 | - |
MetricsDumpEnable |
Dump IGC Metrics to file *.optrpt in current working directory. Setting to 0 - disabled, 1 - makes in binary format, 2 - makes in plain-text format. |
Available |
MinimumValidAddress |
If it's greater than 0, it enables minimal valid address checking where the threshold is the given value (in hex). | - |
NoCatchAllDebugLine |
Don't emit special placeholder instruction to map VISA orphan instructions | - |
PrintDebugSettings |
Prints all non-default debug settings | - |
ShaderDumpTranslationOnly |
Dump LLVM IR right after translation from SPIRV to stderr and ignore all passes | - |
StackOverflowDetection |
Inserts checks for stack overflow when stack calls are used. | Available |
UseMTInLLD |
Use multi-threading when linking multiple elf files | Available |
UseVISAVarNames |
Make VISA generate names for virtual variables so they match with dbg file | Available |
UseVMaskPredicate |
Use VMask as predicate for subspan usage | - |
UseVMaskPredicateForIndirectMove |
Use VMask as predicate for subspan usage (indirect mov only) | Available |
UseVMaskPredicateForLoads |
Use VMask as predicate for subspan usage (loads only) | Available |
ZeBinCompatibleDebugging |
Setting this to 1 (true) enables embed debug info in zeBinary | Available |
deadLoopForFloatException |
enable a dead loop if float exception happened | - |
Flag | Description | Release builds |
---|---|---|
AdvCodeMotionControl |
Control bits to fine-tune advanced code motion | - |
AdvRuntimeUnrollCount |
Advanced runtime unroll count | - |
AllowedSpillRegCount |
Max allowed spill size without recompile | - |
CSSpillThreshold2xGRFRetry |
Spill Threshold for CS to trigger 2xGRFRetry | - |
CSSpillThresholdNoSLM |
Spill Threshold for CS SIMD16 without SLM | - |
CSSpillThresholdSLM |
Spill Threshold for CS SIMD16 with SLM | - |
CheckCSSLMLimit |
Check SLM or threads limit on compute shader to turn on Enable2xGRF on DG2+ 0 - off, 1 - SLM limit heuristic, 2 - platform based heuristic (XE2 - threads limit, others - SLM limit) |
- |
DPEmuNeedI64Emu |
Double Emulation needs I64 emulation. Unsetting it to disable I64 Emulation for testing. | - |
DisableCorrectlyRoundedMacros |
Tmp flag to disable correcly rounded macros for BMG+. This flag will be removed in the future. | - |
DisableDSDualPatch |
Setting it to true with enable Single and Dual Patch dispatch mode for Domain Shader | - |
DisableEarlyOutPatterns |
Disable optimization trying to create an early out after sampleC messages | - |
DisableGPGPUIndirectPayload |
Disable OCL indirect GPGPU payload | - |
DisableLSCForTypedUAV |
Forces legacy HDC messages for typed UAV read/write. Temporary knob for XE2 bringup. |
Available |
DisableLSCSIMD32TGMMessages |
Forces splitting SIMD32 typed messages into 2xSIMD16. Only valid on XE2+. |
Available |
DisableMemOpt |
Disable MemOpt, merging load/store | Available |
DisableMemOpt2 |
Disable MemOpt2 | - |
DisableMergeStore |
[temp]If EnableLdStCombine is on, disable mergestore (memopt) if this is set. Temp key for testing | Available |
DisablePrefetchToL1Cache |
Disable prefetch to L1 cache | Available |
DisablePromoteToDirectAS |
This key disables the PromoteResourceToDirectAS pass | - |
DisableRecompilation |
Disable recompilation, skip retry stage | Available |
DisableScalarAtomics |
Disable the Scalar Atomics optimization | - |
DisableSystemMemoryCachingInGPUForConstantBuffers |
Disables caching system memory in GPU for loads from constant buffers | - |
DisableWaSampleLZ |
Disable The Sample Lz workaround and generate Sample LZ | - |
DivergentBarrierUniformLoad |
Optimize loads for spill/fill generated by DivergentBarrier with uniform analysis | Available |
Enable16BitLDMCS |
Enable 16-bit ld_mcs on supported platforms | Available |
Enable2xGRF |
Enable 2x GRF for high SLM or high threads usage 0 - off, 1 - on, 2 - platform default |
- |
Enable64BitEmulation |
Enable 64-bit emulation | - |
Enable64BitEmulationOnSelectedPlatform |
Enable 64-bit emulation on selected platforms | - |
EnableAIParameterCombiningWithLODBias |
Enable AI parameter combining With LOD Bias parameter. XeHP | Available |
EnableAdvCodeMotion |
Enable advanced code motion | - |
EnableAdvMemOpt |
Enable advanced memory optimization | - |
EnableAdvRuntimeUnroll |
Enable advanced runtime unroll | - |
EnableCPSMSAAOMaskWA |
Enable WA which forces rt writes to happen at pixel rate when cps, msaa, and omask are present. | Available |
EnableCPSOmaskWA |
Enable workaround for oMask with CPS | - |
EnableConstIntDivReduction |
Enables strength reduction on integer division/remainder with constant divisors/moduli | Available |
EnableDG2LSCSIMD8WA |
Enables WA for DG2 LSC simd8 d32-v8/d64-v3/d64-v4. [temp, should be replaced with WA id | - |
EnableDPEmulation |
Enforce double precision floating point operations emulation on platforms that do not support it natively | Available |
EnableDivergentBarrierWA |
Generate continuation code to handle shaders that places barriers in divergent control flow | - |
EnableDualSIMD8 |
enable dual SIMD8 on supported platforms | Available |
EnableExplicitCopyForByVal |
Enable generating an explicit copy (alloca + memcpy) in a caller for aggregate argumentes with byval attribute | Available |
EnableFallbackToBindless |
This key enables fallback to bindless mode on all shaders | - |
EnableFallbackToStateless |
This key enables fallback to stateless mode on all shaders | - |
EnableFunctionPointer |
Enables support for function pointers and indirect calls | - |
EnableGASResolver |
Enable GAS Resolver | - |
EnableGEPSimplification |
Enable GEP simplification | Available |
EnableGen11TwoStackTSG |
Enable Two stack TSG gen11 feature | - |
EnableGlobalStateBuffer |
This key allows stack calls to read implicit args from side buffer. It also emits a relocatable add in VISA. | Available |
EnableHFpacking |
Enable HF packing | - |
EnableHSSinglePatchDispatch |
Setting this to 1/true enables SIMD8 single-patch dispatch in HullShader. Default is either SIMD8 single patch/dual patch dispatch based on control point count | - |
EnableImplicitArgAsIntrinsic |
Use GenISAIntrinsic instructions for supported implicit args instead of passing them as function arguments | Available |
EnableIndirectCallOptimization |
Enables inlining indirect calls by comparing function addresses | - |
EnableInsertingPairedResourcePointer |
Enable to insert a bindless paired resource address into sampler headers in context of sampling feedback resources | Available |
EnableIntDivRemCombine |
Given div/rem pairs with same operands merged; replace rem with mul+sub on quotient; 0x3 (set bit[1]) forces this on constant power of two divisors as well | Available |
EnableL3FlushForGlobal |
Enable/disable flushing L3 cache for globals | - |
EnableLSC |
Enables the new dataport encoding for LSC messages. | Available |
EnableLdStCombine |
Enable load/store combine pass if set to 1 (lsc message only) or 2; bit 3 = 1 [tmp for testing] : enabled load combine (intend to replace memopt) | Available |
EnableLowerGPCallArg |
Enable pass to lower generic pointers in function arguments | - |
EnableLscSamplerRouting |
Enables conversion of LD to LD_L instructions. | - |
EnableMadLoopSlice |
Enables the slicing of mad loops. | Available |
EnableMaxWGSizeCalculation |
Enable max work group size calculation [OCL only] | Available |
EnableMeshSLMCache |
Enables caching Mesh shader outputs in SLM, bitmask: bit0 - cache AND flush mode, enable caching of Primitive Count and Primitive Indices, bit1 - cache AND flush mode, enable caching of per-vertex outputs, bit2 - cache AND flush mode, enable caching of per-primitive outputs, bit3 - mirror mode, if this bit is set bits 0, 1 and 2 are ignored, enable caching of outputs that are read in the shader data is only mirrored in SLM |
Available |
EnableMeshShaderSimdSize |
Set allowed simd sizes for mesh shader compilation, bitmask bit0 - simd8, bit1 - simd16, bit2 - simd32, e.g. 0x7 enables all simd sizes and 0x2 enables only simd16, valid values are from 0 to 7 ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size, ignored if ForceMeshShaderSimdSize is set |
Available |
EnableOCLSIMD16 |
Enable OCL SIMD16 mode | Available |
EnableOCLSIMD32 |
Enable OCL SIMD32 mode | Available |
EnableOCLScratchPrivateMemory |
Enable the use of scratch space for private memory [OCL only] | Available |
EnablePartialEmuI64 |
Enable the partial I64 emulation for PVC-B, Xe2 | Available |
EnablePostCullPatchFIFOHP |
Enable Post-Cull Patch Decoupling FIFO. XeHP. | Available |
EnablePostCullPatchFIFOLP |
Enable Post-Cull Patch Decoupling FIFO. GEN12LP. | Available |
EnablePreRARematFlag |
Enable PreRA Rematerialization of Flag | - |
EnablePromotionToSampleMlod |
Enables promotion of sample and sample_c to sample_mlod and sample_c_mlod instructions when min lod is present | - |
EnableReadGTPinInput |
Enables setting GTPin context flags by reading the input to the compiler adapters | - |
EnableRecursionOpenCL |
Enable recursion with OpenCL user functions | - |
EnableSIMD16ForNonWaveXe2 |
Enable SIMD16 for Xe2 if the shader doesn't have wave | - |
EnableSIMD16ForXe2 |
Enable SIMD16 for Xe2 | - |
EnableSIMDVariantCompilation |
Enables compiling kernels in variant SIMD sizes | - |
EnableSMRescheduling |
Change instruction order to enable extra Sample Multiversioning cases | - |
EnableSampleBMLODWA |
Enable workaround for sample_b messages that use the mlod parameter | - |
EnableSampleDEmulation |
Enable emulation of sample_d. | Available |
EnableSampleDEmulationForTesting |
Enable emulation of sample_d on pre-XeHP platforms. | Available |
EnableSamplerSupport |
Enables sampler messages generation for PVC. | Available |
EnableScalarTypedAtomics |
Enable the Scalar Typed Atomics optimization | - |
EnableScratchMessageD64WA |
Enables WA to legalize D64 scratch messages to D32 | - |
EnableSelectiveScalarizer |
enable selective scalarizer on GPGPU path | Available |
EnableSingleVertexDispatch |
Vertex Shader Single Patch Dispatch Regkey | - |
EnableTaskShaderSimdSize |
Set allowed simd sizes for task shader compilation, bitmask bit0 - simd8, bit1 - simd16, bit2 - simd32, e.g. 0x7 enables all simd sizes and 0x2 enables only simd16, valid values are from 0 to 7 ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size, ignored if ForceMeshShaderSimdSize is set |
Available |
EnableTileYForExperiments |
Enable TileY heuristics for experiments | - |
EnableTypeDemotion |
Enable Type Demotion | - |
Enable_Wa14010017096 |
Enable Wa_14010017096 regardless of the platfrom stepping | Available |
Enable_Wa1507979211 |
Enable Wa_1507979211 regardless of the platfrom stepping | Available |
Enable_Wa1807084924 |
Enable Wa_1807084924 regardless of the platfrom stepping | Available |
Enable_Wa22010487853 |
Enable Wa_22010487853 regardless of the platfrom stepping | Available |
Enable_Wa22010493955 |
Enable Wa_22010493955 regardless of the platfrom stepping | Available |
Force32BitIntDivRemEmu |
Force 32-bit Int Div/Rem emulation using fp64, ignored if no native fp64 support | Available |
Force32BitIntDivRemEmuSP |
Force 32-bit Int Div/Rem emulation using fp32, ignored if Force32BitIntDivRemEmu is set and actually used | Available |
ForceDPEmulation |
Force double emulation for testing purpose | - |
ForceFFIDOverwrite |
Force overwriting ffid in sr0.0 | - |
ForceFormatConversionDG2Plus |
Forces SW image format conversion for R10G10B10A2_UNORM, R11G11B10_FLOAT, R10G10B10A2_UINT image formats on DG2+ platforms | Available |
ForceI64DivRemEmu |
Forces specific int64 div/rem emulation: 0 = platform default, 1 = int based, 2 = SP based, 3 = DP based | - |
ForceMeshShaderSimdSize |
Force mesh shader simd size, valid values are 0 (not set), 8, 16 and 32 ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size |
Available |
ForceNoLSC |
Disables the new dataport encoding for LSC messages. | Available |
ForceOCLSIMDWidth |
Force using SIMD width specified. 0 : no forcing. This overrides driver forced SIMD value(if any) and runtime behaviour could be different if driver expects something fixed | Available |
ForcePrefetchToL1Cache |
Forces standard builtin prefetch to use L1 cache | Available |
ForceSPDivEmulation |
Force SP Div emulation for testing purpose | - |
ForceStaticToDynamic |
Force write of vertex count in GS | - |
ForceTaskShaderSimdSize |
Force task shader simd size, valid values are 0 (not set), 8, 16 and 32 ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size |
Available |
ForceXYZworkGroupWalkOrder |
Force X/Y/Z WorkGroup walk order | Available |
HoistPSConstBufferValues |
Hoists up down converts for contant buffer accesses, so they an be vectorized more easily. | - |
LICMStatThreshold |
LICM stat threshold to avoid retry SIMD16 for CS | - |
LateInlineUnmaskedFunc |
Postpone inlining of Unmasked functions till end of CG to avoid code movement inside/outside of unmasked region | - |
LscForceSpillNonStackcall |
Non-stack call kernels that spill will use LSC on DG2+ | Available |
LscImmOffsMatch |
Match address patterns that have an immediate offset for the vISA LSC API (0 means off/no matching, 1 means on/match for supported platforms (Xe2+) and APIs, 2 means force on for all platforms (vISA will emulate the addition if HW lacks support) and APIs; also see LscImmOffsVisaOpts |
Available |
LscImmOffsVisaOpts |
This maps to vISA_lscEnableImmOffsFor (enables/disables immediate offsets for various address types; see that option for semantics) |
Available |
MaxLiveOutThreshold |
Max LiveOut Threshold in MemOpt2 | - |
MaxLoadVectorSizeInBytes |
[LdStCombine] the max non-uniform vector size for the coalesced load. 0: compiler choice (default, 16(4DW)); others: 4/8/16/32 | Available |
MaxStoreVectorSizeInBytes |
[LdStCombine] the max non-uniform vector size for the coalesced store. 0: compiler choice (default, 16(4DW)); others: 4/8/16/32 | Available |
MemOptGEPCanon |
[test] GEP canonicalization in MemOpt. 0 : enable; 1: disable; 2: disable only for OCL; | Available |
OCLEnableReassociate |
Enable reassociation | Available |
OCLSIMD16SelectionMask |
Select SIMD 16 heuristics. Valid values are 0, 1, 2 and 3 | - |
OverrideDeviceIdForWA |
Enable this to override DeviceId | - |
OverrideProductFamilyForWA |
Enable this to override the product family, get the correct enum from igfxfmid.h | - |
OverrideRevIdForWA |
Enable this to override the stepping/RevId, default is a0 = 0, b0 = 1, c0 = 2, so on... | - |
RemoveLegacyOCLStatelessPrivateMemoryCases |
Remove cases where OCL uses stateless private memory. XeHP and above only! [OCL only] | Available |
SampleMultiversioning |
Create branches aroung samplers which can be redundant with some values | - |
SelectiveLoopUnrollForDPEmu |
Setting this to 0/false disable selective loop unrolling for DP emu. | Available |
SendMultipleSIMDModesCS |
Send multiple SIMD modes for CS | - |
SkipPsSimdWithDualSimd |
Setting it to values def in igc.h will force SIMD mode to skip if the dual-SIMD8 kernel exists | Available |
TestGEPSimplification |
[Test] Testing GEP simplification without actually lowering GEP. Used in lit test | - |
UniformMemOpt4OW |
increase uniform memory optimization from 2 owords to 4 owords | Available |
allowLICM |
Enable LICM in IGC. | Available |
allowDecompose2DBlockFuncs |
Enable decomposition of 2D block intrinsics in IGC. | Available |
allowImmOff2DBlockFuncs |
Allow compiler to decide to use immediate offsets in 2D block intrinsics in IGC. | Available |
Flag | Description | Release builds |
---|---|---|
AddNoInlineToTrimmedFunctions |
Tell late passes not to inline trimmed functions | - |
AllocaRAPressureThreshold |
The threshold for the register pressure potential | - |
AllocateZeroInitializedVarsInBss |
Allocate zero initialized global variables in .bss section in ZEBinary | Available |
AllowNonLoopConstantPromotion |
Allows promotion for constants not in loop (e.g. used once) | - |
AllowStackCallRetry |
Enable/Disable retry when stack function spill. 0 - Don't allow, 1 - Allow retry on kernel group, 2 - Allow retry per function | - |
BlockFrequencySampling |
Use block frequencies to derive a distribution | Available |
ByPassAllocaSizeHeuristic |
Force some Alloca to pass the pressure heuristic until the given size | Available |
CodePatch |
Enable Pixel Shader code patching to directly emit code after stitching | - |
CodePatchExperiments |
Experiment with code patching when != 0 | - |
CodePatchFilter |
Filter out unsupported patterns | - |
CodePatchLimit |
Debug CodePatch via limiting the number of shader been patched | - |
ConstantPromotionCmpSelSize |
Array size threshold for cmp-sel transform | - |
ConstantPromotionSize |
Threshold in number of GRFs | - |
ControlInlineImplicitArgs |
Avoid trimming functions with implicit args | Available |
ControlInlineTinySize |
Tiny function size for controlling kernel total size | Available |
ControlInlineTinySizeForSPGT |
Tiny function size for controlling kernel total size | Available |
ControlKernelTotalSize |
Control kernel total size | Available |
ControlUnitSize |
Control compilation unit size by unit trimming | Available |
DelayEmuInt64AddLimit |
Delay emulating Int64 Add operations in vISA | - |
DetectCastToGAS |
Check if the module contains local/private to GAS (Gerneric Address Space) cast, it also check internal flags | Available |
DiableWaSamplerNoMask |
Disable WA DiableWaSamplerNoMask | - |
DisableAddingAlwaysAttribute |
Disable adding always attribute | Available |
DisableCSContentCheck |
Disable CS content check that can force SIMD32 | Available |
DisableDualBlendSource |
Force the compiler to never use dual blend source messages | - |
DisableFDIV |
Disable fdiv support | - |
DisableFastMathConstantHandling |
Disable Fast Math Constant Handling | Available |
DisableFastRAWA |
Disable Fast RA for hanging issues on large workloads | - |
DisableFastestGopt |
Disable global optimizations for stage 1 shaders. | - |
DisableFastestLinearScan |
Disable LinearScanRA in FastestSIMD. | - |
DisableUndefAlphaOutputAsRed |
Disable output red for undefined alpha output | - |
DisableWaDisableSIMD16On3SrcInstr |
Disable C0 WA WaDisableSIMD16On3SrcInstr, may be unsafe | - |
DisableWaSendSEnableIndirectMsgDesc |
Disable a C0 WA WaSendSEnableIndirectMsgDesc, may be unsafe | - |
DisbleLocalFences |
On CNL+ we need to emit local fences. Setting this to true removes those. It may be functionaly not correct. | - |
DispatchAlongY_XY_ratio |
min threshold for thread group size x / y for dispatchAlongY | - |
DispatchAlongY_X_threshold |
min threshold for thread group size x for dispatchAlongY | - |
DispatchGPGPUWalkerAlongYFirst |
0 = No SW Y-walk, 1 = Dispatch GPGPU walker along Y first | - |
DownConvertI32Sampler |
Convert i32 sampler messages to return i16. This optimization can only be enabled for resources with 16bit integer format or if it is known that the upper 16bits of data is always 0. |
- |
DumpRegPressureEstimate |
Dump RegPressureEstimate to a file | - |
DumpRegPressureEstimateFilter |
Only dump RegPressureEstimate for functions matching the given regex | - |
EmitPreDefinedForAllFunctions |
When enabled, pre-defined variables for gid, grid, lid are emitted for all functions. This causes those functions to be inlined even when stack calls is enabled. | Available |
EmulateFDIV |
Emulate fdiv instructions | - |
EmulationFunctionControl |
FunctionControl on some DP emulation functions. It has the same value as FunctionControl. | Available |
EnableA64WA |
Guarantee A64 load/store addres-hi is uniform | Available |
EnableAccSub |
Enable accumulator substitution | - |
EnableByValStructArgPromotion |
If enabled, byval/sret struct arguments are promoted to pass-by-value if possible. | Available |
EnableConstantPromotion |
Enable global constant data to register promotion | - |
EnableDisableMidThreadPreemptionOpt |
Disable mid thread preemption | - |
EnableEvaluateSamplerSplit |
Split evaluate messages to sampler into either SIMD8 or SIMD1 messages | - |
EnableExtractMask |
When enabled, it is mostly for reducing response size of send messages. | - |
EnableFastestSingleCSSIMD |
Enable selecting single CS SIMD in staged compilation. | - |
EnableForceGroupSize |
Enable forcing thread Group Size ForceGroupSizeX and ForceGroupSizeY | - |
EnableForceThreadCombining |
Enable forcing Thread Combining with thread Group Size ForceGroupSizeX and ForceGroupSizeY | - |
EnableFunctionCloningControl |
If enabled, limits function cloning by converting stackcalls to indirect calls based on the FunctionCloningThreshold value. | Available |
EnableGPUFenceScopeOnSingleTileGPUs |
Allow the use of GPU fence scope on single-tile GPUs. By default the TILE scope is used instead of GPU scope on single-tile GPUs. |
Available |
EnableGSURBEntryPadding |
Enable padding of GS URB Entry by adding extra portions of Control Data Header. | - |
EnableGSVtxCountMsgHalfCLSize |
Enable the Vertex Count msg of half CL size, instead of 1DW size. | - |
EnableGather4cpoWA |
Enable WA transforming gather4cpo/gather4po into gather4c/gather4 | - |
EnableGreedyTrimming |
Find the optimal set of functions to trim | Available |
EnableHalfPromotion |
Enable pass that replaces instructions using halfs with corresponding float counterparts for pre-SKL | - |
EnableInsertElementScalarCoalescing |
Enable coalescing on the scalar operand of insertelement | - |
EnableIntelFast |
Enable intel fast, experimental flag. | - |
EnableLTO |
Enable link time optimization | - |
EnableLTODebug |
Enable debug information for LTO | Available |
EnableLeafCollapsing |
Collapse leaf functions in order to avoid trimming small leaf functions | Available |
EnableLocalIdCalculationInShader |
Enables calcualtion of local thread IDs in shader. Valid only in compute shaders on XeHP+. IDs are calculated only if HW generated IDs cannot be used. |
Available |
EnableMixIntOperands |
Enable generating mix-sized operands for int ALU | - |
EnableOptReportPrivateMemoryToSLM |
[POC] Generate opt report file for moving private memory allocations to SLM. | - |
EnablePreRAAccSchedAndSub |
Enable accumulator substitution | - |
EnablePrivMemNewSOATranspose |
0 : disable new algo; 1 and up : enable new algo. 1 : enable new algo just for array of struct; 2 : 1 plus new algo for array of dw[xn]/qw[xn],etc 3 : 2 plus new algo for array of complicated struct. |
Available |
EnableProgrammableOffsetsMessageBitInHeader |
Use pre-delta feature (legacy) method of passing MSB of PO messages opcode. | - |
EnableReusingLSCStoreConstPayload |
Enable reusing LSC stores const payload | - |
EnableReusingXYZWStoreConstPayload |
Enable reusing XYZW stores const payload | - |
EnableSOAPromotionDisablingHeuristic |
Enable heuristic to disable SOA promotion when it may be not beneficial | - |
EnableSamplerSplit |
Split Sampler 3d message to odd and even | - |
EnableSizeContributionOptimization |
Put more weight on a function when the potential size contirubion is big | Available |
EnableStackCallFuncCall |
If enabled, the default function call mode will be set to stack call. Otherwise, subroutine call is used. | - |
EnableTCSHWBarriers |
Enable TCS pass with HW barriers support. Default TCS pass is TCS pass with multiple continuation functions. | - |
EnableTEFactorsClear |
Enable clearing of tessellation factors. | - |
EnableTEFactorsPadding |
Enable padding of the TE factors. | - |
EnableThreadCombiningWithNoSLM |
Enable thread combining opt for shader without SLM | - |
EnableTrackPtr |
Track Staging Context alloc/dealloc | - |
EnableVariableAlias |
Enable variable aliases (part of VariableReuse Pass, but separate functionality) | - |
EnableVariableReuse |
Enable local variable reuse | - |
EnableVector8LoadStore |
Enable Vectorizer to generate 8x32i and 4x64i loads and stores | Available |
ExcludeIRFromZEBinary |
Exclude IR sections from ZE binary | Available |
ExpandedUnitSizeThreshold |
Trimming target of compilation unit size | Available |
ExtraRetrySIMD16 |
Enable extra simd16 with retry for STAGE1_BEST_PREF | - |
FastCompileRA |
Provide the fast compilatoin path for RA, fail safe at first iteration | - |
FastSpill |
fast spill code gen. This may produce worse equality code for the spilling shader | - |
FastestS1Experiments |
Select configs for fastest compilation by bits. | - |
FirstStagedSIMD |
Force Pixel shader to be 1: FastSIMD (SIMD8), 2: BestSIMD (SIMD16 or SIMD8), 3: FatestSIMD (SIMD8 opt off) | - |
ForceAddingStackcallKernelPrerequisites |
Force adding static overhead for stackcall to the kernel entry such as HWTID instructions for experiments | Available |
ForceAllPrivateMemoryToSLM |
[POC] Force moving all private memory allocations to SLM. | - |
ForceBestSIMD |
Force pixel shader to return the best SIMD, either SIMD16 or SIMD8. | - |
ForceDisableSrc0Alpha |
Force the compiler to skip sending src0 alpha. Only works if we are sure alpha to coverage and alpha test is off | - |
ForceFastestSIMD |
Force PS, CS, VS to return lowest possible SIMD as fast as possible. | - |
ForceFastestSingleCSSIMD |
Force selecting single CS SIMD in staged compilation on unsupported platforms. | - |
ForceGroupSizeShaderHash |
Shader hash for forcing thread group size or thread combining (lower 8 hex digits) | - |
ForceGroupSizeX |
force group size along X | - |
ForceGroupSizeY |
force group size along Y | - |
ForceHalfPromotion |
Force enable pass that replaces instructions using halfs with corresponding float counterparts | - |
ForceInlineExternalFunctions |
not to trim functions called from multiple kernels | Available |
ForceInlineStackCallWithImplArg |
If enabled, stack calls that uses implicit args will be force inlined. | Available |
ForceLowestSIMDForStackCalls |
If enabled, compile to the lowest allowed SIMD mode when stack calls or indirect calls are present | Available |
ForceMCFBarriers |
Force TCS pass with MCF (SW) barriers support. Default TCS pass is TCS pass with multiple continuation functions. | - |
ForceMixMode |
force enable mix mode even on platforms that do not support it | - |
ForceNoFP64bRegioning |
force regioning rules for FP and 64b FPU instructions | - |
ForceNoInfiniteLoops |
Limit # of loop iterations to UINT_MAX in while/for loops. Can be used to detect infinite loops in shaders | - |
ForceNonCoherentStatelessBTI |
Enable gneeration of non cache coherent stateless messages | - |
ForcePixelShaderSIMDMode |
Setting it to values def in igc.h will force SIMD mode compilation for pixel shaders. Note that only SIMD8 is compiled unless other ForcePixelShaderSIMD* are also selected. 1-SIMD8, 2-SIMD16,4-SIMD32 | - |
ForcePrivateMemoryToGlobalOnGeneric |
Force moving private memory allocations to global buffer when generic pointer is present | Available |
ForcePrivateMemoryToSLMOnBuffers |
[POC] Force moving private memory allocations to SLM, semicolon-separated list of buffers. | - |
ForceSWCoalescingOfAtomicCounter |
Force software coalescing of atomic counter | - |
ForceScratchSpaceSize |
Override Scratch Space Size in bytes for perf testing | - |
ForceSendsSupportOnSKLA0 |
Allow sends on SKL A0, may be unsafe | - |
FunctionCloningThreshold |
Limits the number of cloned functions when called from multiple function groups. If number of cloned functions exceeds the threshold, compile the function only once and use address relocation instead. Setting this to '0' allows IGC to choose the default threshold. |
Available |
FunctionControl |
Control function inlining/subroutine/stackcall. See value defs in igc_flags.hpp. | Available |
FuseResourceLoop |
Enable fusing resource loops | - |
FuseTypedWrite |
Enable fusing of simd8 typed write | - |
HPCFastCompilation |
Force to do fast compilation for HPC kernel | - |
HPCGlobalInstNumThreshold |
The threshold for the register pressure potential | - |
HPCInstNumThreshold |
The threshold for the register pressure potential | - |
HasDoubleAcc |
has doubled accumulators | - |
HybridRAWithSpill |
Did Hybrid RA with Spill | - |
InlinedEmulationThreshold |
Inlined instruction threshold for enabling subroutines | - |
JointMatrixLoadStoreOpt |
Selects subgroup (0), or block read/write (1), or optimized block read/write (2), 2d block read/write (3) implementation of Joint Matrix Load/Store built-ins | Available |
KernelTotalSizeThreshold |
Trimming target of kernel total size | Available |
LTOForStage1Compilation |
LTO for stage 1 compilation | - |
LimitConstantBuffersPushed |
Limit max number of CBs pushed when SupportIndirectConstantBuffer is true | - |
MSAA16BitPayloadEnable |
Enable support for MSAA 16 bit payload , a hardware DCN supporting this from ICL+ to improve perf on MSAA workloads | - |
MemCpyLoweringUnrollThreshold |
Min number of mem instructions that require non-unrolled loop when lowering memcpy | - |
MemOptWindowSize |
Size of the window in unit of instructions in which load/stores are allowed to be coalesced. Keep it limited in order to avoid creating long liveranges. Default value is 150 | - |
MetricForKernelSizeReduction |
Set 1 to active a normal distribution, 2 a long-tail distribution, and 4 an average% | Available |
MidThreadPreemptionDisableThreshold |
Threshold to disable mid thread preemption | - |
NewSOATransposeForOpenCL |
If true, EnablePrivMemNewSOATranspose only applies to OpenCL kernels. For testing purpose | Available |
NumGeneralAcc |
set the number [1-8] of general acc for accumulator substitution. 0 means using the platform-default value | - |
OCLInlineThreshold |
Setting OCL inline thershold | Available |
OverrideCsTileLayout |
Override compute walker tile layout. False is linear. True is TileY | Available |
OverrideCsTileLayoutEnable |
Enable overriding compute walker tile layout | Available |
OverrideCsWalkOrder |
Override compute walker walk order | Available |
OverrideCsWalkOrderEnable |
Enable overriding compute walker walk order | Available |
OverrideOCLMaxParamSize |
Override the value imposed on the kernel by CL_DEVICE_MAX_PARAMETER_SIZE. Value in bytes, if value==0 no override happens. | Available |
ParameterForColdFuncThreshold |
C/10-STD for a normal distribution / low K% for a long-tail distribution | Available |
PartitionUnit |
Partition compilation unit | Available |
PartitionWithFastHybridRA |
Enable FastRA and HybridRA when partition is enabled | Available |
PixelShaderDoNotAbortOnSpill |
Do not abort on a spill | - |
PrintControlKernelTotalSize |
Print Control kernel total size | Available |
PrintControlUnitSize |
Print information about unit trimming | Available |
PrintFunctionSizeAnalysis |
Print analysis data of function sizes | Available |
PrintPartitionUnit |
Print information about compilation unit partitioning | Available |
PrintStackCallDebugInfo |
Print all debug info to command line related to stack call debugging | Available |
PrintStaticProfileGuidedKernelSizeReduction |
Print information about static profile-guided trimming and partitioning | Available |
PrintStaticProfileGuidedSpillCostAnalysis |
Print debug messages for profile embedding | Available |
RegPressureVerbocity |
Different printing types | - |
RematAddrSpaceCastToUse |
Allow rematerialization of inttoptr that are used inside AddrSpaceCastInst | - |
RematAllowExtractElement |
Allow Extract Element to computation chain | - |
RematAllowLoads |
Remat allow to move loads, no checks, exclusively for testing purposes | - |
RematAllowOneUseLoad |
Remat allow to move loads that have one use and it's inside the chain | - |
RematCallsOperand |
Allow rematerialization of inttoptr that are used as call's operand | - |
RematChainLimit |
If number of instructions we've collected is more than this value, we bail on it | - |
RematEnable |
Enable clone adress arithmetic pass not only on retry | - |
RematFlowThreshold |
Proportion of the whole rematerialization targets to cutoff remat chain | - |
RematInstCombineBefore |
Enable short sequence of passes before clone address arithmetic pass to potentially decrese amount of operations that will be rematerialized | - |
RematLog |
Dump Remat Log, usefull for analyzing spills as well | - |
RematRPELimit |
Cutoff value for register estimator, lower than that, kernel won't be rematted | - |
RematReassocBefore |
Enable short sequence of passes before clone address arithmetic pass to potentially decrese amount of operations that will be rematerialized | - |
RematRespectUniformity |
Cutoff computation chain on uniform values | - |
RematSameBBScope |
Confine rematerialization only to variables within the same BB, we won't pull down values from predeccors | - |
RequestStage2 |
Enable staged compilation via requesting stage 2 | - |
RetryRevertExcessiveSpillingKernelCoefficient |
Sets the coefficient for Retry Manager to know whether we should revert back to a previously compiled kernel | - |
RetryRevertExcessiveSpillingKernelThreshold |
Sets the threshold for Retry Manager to know which kernel is considered as Excessive Spilling and applies different set of rules | - |
SSOShifter |
Adjust ScratchSurfaceOffset with shl(hwtid, shifter). 0 menas disabling padding | - |
SaveRestoreIR |
Save/Restore IR for staged compilation to avoid duplicated compilations | - |
SelectiveFastRA |
Apply fast RA with spills selectively using heuristics | Available |
SelectiveFunctionControl |
Selectively enables FunctionControl for a list of line-separated function names in 'FunctionDebug.txt' in the IGC output dir. When set by this flag, the functions in the FunctionDebug list will override the default FunctionControl mode. 0 - Disable, 1 - Enable and read from FunctionDebug.txt, 2 - Print all callable functions to FunctionDebug.txt See comments in ProcessFuncAttributes.cpp for how to use this flag. |
Available |
SelectiveTrimming |
Choose a specific function to trim | Available |
SkipPaddingScratchSpaceSize |
Skip adding padding when estimated scratch space size is smaller than or equal to this value | - |
SkipTREarlyExitCheck |
Skip SIMD16 early exit check in ShaderCodeGen | - |
SkipTrimmingOneCopyFunction |
Don't trim a function whose size contribution is no more than its size | Available |
StagedCompilationExperiments |
Experiment with staged compilation when != 0 | - |
StaticProfileGuidedPartitioning |
Enable static analysis in the partitioning algorithm. | Available |
StaticProfileGuidedSpillCostAnalysis |
Use static profile information to estimate spill cost, 1 for profile generation, 2 for profile transfer, 4 for profile embedding, 8 for spill computation, and 16 for enabling frequency-based spill selection |
Available |
StaticProfileGuidedSpillCostAnalysisFunc |
Spill cost function where 0 is based on a new spill cost and 1 the existing one | Available |
StaticProfileGuidedSpillCostAnalysisScale |
Scale adjustment for static profile guided spill cost analysis | Available |
StaticProfileGuidedTrimming |
Enable static analysis in the kernel trimming | Available |
StripDebugInfo |
Strip debug info from llvm IR lowered from input to IGC . Possible values: 0 - dont strip, 1 - strip all, 2 - strip non-line info |
- |
SubroutineInlinerThreshold |
Subroutine inliner threshold | - |
SubroutineThreshold |
Minimal kernel size to enable subroutines | - |
UnitSizeThreshold |
Compilation unit size threshold | Available |
UpConvertF16Sampler |
up-convert fp16 sampler message to return fp32 | - |
UseFrequencyInfoForSPGT |
Consider frequency information for trimming functions | Available |
UseOldSubRoutineAugIntf |
Use the old subroutine augmentation code which is slower | - |
VFPackingDisablePartialElements |
disable packing for partial vertex element as it causes performance drops | - |
VariableReuseByteSize |
The byte size threshold for variable reuse | - |
VectorAlias |
Vector aliasing control under EnableVariableAlias. Some features are still experimental | Available |
VectorAliasBBThreshold |
Max number of BBs of a function that VectorAlias will apply. VectorAlias will skip for funtions beyond this threshold | Available |
ScalarAliasBBSizeThreshold |
Max size of BB for which scalar aliasing will apply. Scalar aliasing will skip for BBs beyond this threshold | Available |
cl_khr_srgb_image_writes |
Enable cl_khr_srgb_image_writes extension | - |
disableRemat |
disable re-materialization | - |
disableUnormTypedReadWA |
disable software conversion for UNORM surface in Dx10 | - |
disableVarSplit |
disable variable splitting | - |
forceGlobalRA |
force global register allocator | - |
forceSamplerHeader |
force sampler messages to use header | - |
samplerHeaderWA |
enable sampler header to solve HW WA | - |
Flag | Description | Release builds |
---|---|---|
ApplyConservativeRastWAHeader |
Apply WaConservativeRasterization for the platforms enabled | - |
Flag | Description | Release builds |
---|---|---|
ContinuationInlineThreshold |
If number of continuations is greater than threshold, default to indirect | Available |
DeferCollectionStateObjectCompilation |
Wait to compile till the RTPSO stage | Available |
DisableCanonizationWA |
WA for A0 to inject shifts to canonize global and local pointers | Available |
DisableCompactifySpills |
Just emit spill/fill at the point of def/use | Available |
DisableCrossFillRemat |
Rematerialize values if they use already spilled values | Available |
DisableDPSE |
Disable Dead PayloadStore Elimination. | Available |
DisableEarlyRemat |
Disable quick remats to avoid some spills | Available |
DisableEntryFences |
Don't emit the evict and invalidate fences for A0 WA | - |
DisableExamineRayFlag |
Don't do IPO to see if we can fold control flow given knowledge of possible rayflag values | - |
DisableFuseContinuations |
If set, we will look for small duplicated continuations to merge into one. | Available |
DisableInvalidateRTStackAfterLastRead |
Disables L1 cache invalidation after the last read of the RT stack. Affects rayqueries only | Available |
DisableInvariantLoad |
Disabled !invariant_load metadata for raytracing shaders | Available |
DisableLSCControlsForRayTracing |
Disable different LSC Controls for HW and SW portions of the RTStack | Available |
DisableLateRemat |
Disable quick remats to avoid some spills | Available |
DisableMatchRegisterRegion |
Disable matching for debug purposes | Available |
DisablePayloadSinking |
sink stores to payload into inlined continuations | Available |
DisablePreSplitOpts |
Disable last minute optimizations befoer shader splitting | Available |
DisablePredicatedStackIDRelease |
Emit a single stack ID release at the end of the shader | Available |
DisablePrepareLoadsStores |
Disable preparation for MemOpt | Available |
DisableProceedBasedApproachForRayQueryDynamicRayManagementMechanism |
Disables proceed based approach for dynamic ray management mechanism | Available |
DisablePromoteContinuation |
BTD-able continuations in the raygen may be moved to the shader identifier | - |
DisablePromoteToScratch |
Use scratch space rather than SWStack when possible. | Available |
DisableRTAliasAnalysis |
Disable Raytracing Alias Analysis | - |
DisableRTBindlessAccess |
do bindful rather than bindless accesses to raytracing memory | Available |
DisableRTFenceElision |
Disable optimization to remove unneeded fences | - |
DisableRTGlobalsKnownValues |
load MaxBVHLevels from RTGlobals rather than assumming = 2 | Available |
DisableRTMemDSE |
Analyze stores to SWStack, etc. that aren't read before Stack ID Release | - |
DisableRTRetryPickBetter |
Disables raytracing retry to pick the best compilation instead of always using the retry compilation. | - |
DisableRTStackOpts |
Disable some optimizations that minimize reads/writes to the RTStack | Available |
DisableRayQueryDynamicRayManagementMechanism |
Dynamic ray management mechanism for Synchronous Ray Tracing | Available |
DisableRayQueryDynamicRayManagementMechanismForBarriers |
Disable dynamic ray management mechanism for shaders with barriers | Available |
DisableRayQueryDynamicRayManagementMechanismForExternalFunctionsCalls |
Disable dynamic ray management mechanism for shaders with external functions calls | Available |
DisableRayTracingConstantCoalescing |
Disable coalescing | Available |
DisableRayTracingOptimizations |
Disable RayTracing Optimizations for debugging | Available |
DisableRaytracingIntrinsicAttributes |
Turn off noalias and dereferenceable attributes | Available |
DisableSWStackOffsetElision |
Avoid loading offseting when known at compile-time | - |
DisableShaderFusion |
Don't check for duplicate, renamed shaders | - |
DisableSpillReorder |
Disables reordering of spills to try to minmize spills in a loop | - |
DisableStatefulRTStackAccess |
do stateless rather than stateful accesses to the HW portion of the async stack | Available |
DisableStatefulRTSyncStackAccess |
do stateless rather than stateful accesses to the HW portion of the sync stack | Available |
DisableStatefulRTSyncStackAccess4RTShader |
do stateless rather than stateful accesses to the HW portion of the sync stack. RT Shader only. | Available |
DisableStatefulRTSyncStackAccess4nonRTShader |
do stateless rather than stateful accesses to the HW portion of the sync stack. nonRT Shader only. | Available |
DisableStatefulSWHotZoneAccess |
do stateless rather than stateful accesses to the SW HotZone | Available |
DisableStatefulSWStackAccess |
do stateless rather than stateful accesses to the SW Stack | Available |
DisableWideTraceRay |
Disable SIMD16 style message payloads for send.rta | Available |
EnableCompressedRayIndices |
Use an alternate form with bit twiddling to pack stack pointer and indices into two DWORDs | Available |
EnableFillScheduling |
Schedule fills for reduced register pressure | - |
EnableHoistRemat |
Hoist rematerialized instructions to shader entry. Longer live ranges but common values fused. | Available |
EnableIndirectContinuations |
Enable BTD for continuation shaders (regardless of inline threshold). | Available |
EnableInlinedContinuations |
Forcibly inline all continuations | Available |
EnableKnownBTIBase |
For testing, assume that we know what baseBTI is in RTGlobals | Available |
EnableLSCCacheOptimization |
Optimize store instructions for utilizing the LSC-L1 cache | - |
EnableOuterLoopHoistingForRayQueryDynamicRayManagementMechanism |
Disable dynamic ray management mechanism for shaders with barriers | Available |
EnableRQHideLatency |
Hide RayQuery Proceed latency. | - |
EnableRTDispatchAlongY |
Dispatch Compute Walker along Y first | Available |
EnableRTPrintf |
Enable printf for ray tracing. | Available |
EnableRayTracingTGMFence |
Enable tgm fence in RT workloads for debugging | - |
EnableSingleRQMemRayStore |
Store RayQuery MemRay[TOP] only once. | - |
EnableStackIDReleaseScheduling |
Schedule Stack ID Release messages prior to the end of the shader | - |
EnableSyncDispatchRays |
Enable sync DispatchRays implementation | - |
ForceCSLeastSIMD4RQ |
Force computer shader with RayQuery to the lowest allowed SIMD mode | - |
ForceCSSimdSize4RQ |
Force RayQuery compute shader simd size, valid values are 0 (not set), 8, 16 and 32 ignored if produces invalid cofiguration, e.g. simd size too small for workgroup size |
Available |
ForceFirstFencesEvict |
Force evict fence op on fences prior to the stack ID release | Available |
ForceGenMemDefaultCacheCtrl |
If enabled, no message specific cache ctrls are set on memory outside of RTStack, SWStack, and SWHotZone | Available |
ForceGenMemLoadCacheCtrl |
Enables GenMemLoadCacheCtrl regkey for custom lsc load cache controls in other memory | Available |
ForceGenMemStoreCacheCtrl |
Enables GenMemStoreCacheCtrl regkey for custom lsc store cache controls in other memory | Available |
ForceIndirectCallsInSyncDispatchRays |
Will skip direct calls in synchronous raytracing and immediately call raytracing shaders via KSP shader ptr | - |
ForceInliningTraceRayCallsInSyncDispatchRays |
Will inline calls to __TraceRay, __Invoke and __TraceRaySyncToAsyncAdapter even when indirect calls are not necessary | - |
ForceNullBVH |
Swap BVH with null pointer. Infinitely fast ray traversal. | Available |
ForceRTCheckInstanceLeafPtr |
Check MemHit::valid before loading GeometryIndex, PrimitiveIndex, etc. | Available |
ForceRTCheckInstanceLeafPtrMask |
Test only. 1: committedindex; 2: potentialindex | Available |
ForceRTConstantBufferCacheCtrl |
Enables RTConstantBufferCacheCtrl regkey for custom lsc load cache controls for constant buffers | Available |
ForceRTRetry |
Raytracing is compiled in the second retry state | - |
ForceRTShortCircuitingOR |
Only for specific test.... Short curcite OR condition if CommittedGeometryIndex is used | Available |
ForceRTStackLoadCacheCtrl |
Enables RTStackLoadCacheCtrl regkey for custom lsc load cache controls in the RTStack | Available |
ForceRTStackStoreCacheCtrl |
Enables RTStackStoreCacheCtrl regkey for custom lsc store cache controls in the RTStack | Available |
ForceSWHotZoneLoadCacheCtrl |
Enables SWHotZoneLoadCacheCtrl regkey for custom lsc load cache controls in the SWHotZone | Available |
ForceSWHotZoneStoreCacheCtrl |
Enables SWHotZoneStoreCacheCtrl regkey for custom lsc store cache controls in the SWHotZone | Available |
ForceSWStackLoadCacheCtrl |
Enables SWStackLoadCacheCtrl regkey for custom lsc load cache controls in the SWStack | Available |
ForceSWStackStoreCacheCtrl |
Enables SWStackStoreCacheCtrl regkey for custom lsc store cache controls in the SWStack | Available |
ForceWholeProgramCompile |
Compile as if we know all of the shaders upfront | Available |
KnownBTIBaseValue |
If EnableKnownBTIBase is set, use this value for baseBTI | Available |
OverrideTMax |
Force TMax to the given value. When 0, do nothing. | - |
PrintfBufferSize |
Set printf buffer size. Unit: KB. | Available |
RTFenceToggle |
Toggle fences | Available |
RTInValidDefaultIndex |
If MemHit::valid is false, the default value to return for some intrinsics like GeometryIndex or PrimitiveIndex etc. | Available |
RayTracingConstantCoalescingMinBlockSize |
Set the minimum load size in # OWords = [1,2,4,8,16]. | Available |
RayTracingCustomTileXDim1D |
X dimension of tile (default: 256) | Available |
RayTracingCustomTileXDim2D |
X dimension of tile (default: 32) | Available |
RayTracingCustomTileYDim1D |
Y dimension of tile (default: 1) | Available |
RayTracingCustomTileYDim2D |
Y dimension of tile (default: 4 for XE, 32 for XE2+) | Available |
RayTracingDumpYaml |
Dump yaml input/output files | Available |
RayTracingKeepUDivRemWA |
Workaround till jitIsa supports cr0 for rtz conversions | Available |
RematThreshold |
Tunes how aggresively we should remat values into continuations | Available |
RetryRTPickBetterThreshold |
Only pick the retry shader if the spill cost of the 2nd compilation is at least this percentage better than the previous compilation | - |
RetryRTSpillCostThreshold |
Only retry if the percentage of spills (over total instructions) is more than this value | - |
RetryRTSpillMemThreshold |
Only retry if spill mem used is more than this value | - |
ShaderFusionThrehold |
If there are less shaders than this, don't spend time checking duplicates | - |
TotalGRFNum4RQ |
Total GRF used for register allocation for RayQuery only. Test only. Delete later. | - |