You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As new instructions/features are added with each new arch, it is useful to know the target architecture at compile time to employ separate code paths. For example: FP64 MFMA was added in CDNA2, so CDNA2 and later can use one code path while CDNA1 uses a different code path.
It gets tedious because all the archs need to be enumerated, and code needs to be updated as new archs become available:
#if __gfx940__ || __gfx941__ || __gfx942__
// Code path for CDNA3
#elif __gfx90a__
// Code path for CNDA2
#elif __gfx908__
// Code path for CDNA1
#endif
It would be nice if we had something like:
#if CDNA_VERSION >= 3
// Code path for CDNA3 and later
#elif CDNA_VERSION >= 2
// Code path for CDNA2
#else
// Code path for CDNA1
#endif
This would mirror the way it is done in CUDA:
__device__ func()
{
#if __CUDA_ARCH__ >= 800
// Code path for compute capability 8.x and later
#elif __CUDA_ARCH__ >= 700
// Code path for compute capability 7.x
#else
// Code path for compute capability < 7.0
#endif
}
Operating System
No response
GPU
No response
ROCm Component
No response
The text was updated successfully, but these errors were encountered:
Suggestion Description
As new instructions/features are added with each new arch, it is useful to know the target architecture at compile time to employ separate code paths. For example: FP64 MFMA was added in CDNA2, so CDNA2 and later can use one code path while CDNA1 uses a different code path.
It gets tedious because all the archs need to be enumerated, and code needs to be updated as new archs become available:
It would be nice if we had something like:
This would mirror the way it is done in CUDA:
Operating System
No response
GPU
No response
ROCm Component
No response
The text was updated successfully, but these errors were encountered: