Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently
GPU_CFLAGS
(containing--gpu-code
and--gpu-architecture
) are being passed tonvcc
during compilation, but not when producing the shared object (.so
) library. As a result,nvcc
's defaults are being used, which may or may not match what the object (.o
) files may contain, depending on the CUDA version used. In case of a mismatch, warnings are generated.A safe way to avoid the mismatch is to always link with the same
GPU_CPLAGS
as used during compilation.Example warnings on CUDA-11.2 when
GPU_ARCHS:=sm_61
andGPU_PTX_ARCH:=compute_61
:This PR adds
$(GPU_CFLAGS)
tonvcc
parameters used in the linking phase.