[CUDA][HIP] Use device to get native context #425

hdelan · 2023-12-07T15:30:29Z

Since oneapi-src/unified-runtime#999 it is no longer valid to get the native context from the SYCL context on a multi GPU system. The get native func for contexts has been deprecated for this reason. See intel/llvm#10975

Similar ticket: oneapi-src/oneDNN#1765

jinz2014 · 2023-12-11T15:22:59Z

reading your changes, I have a question.

For example,

auto cudaDevice = sycl::get_nativesycl::backend::ext_oneapi_cuda(queue.get_device());

Is the type of cudaDevice "CUdevice" ?

hdelan · 2023-12-11T15:24:39Z

Hi @jinz2014 yes you are correct!

src/dft/backends/cufft/commit.cpp

FMarno · 2023-12-11T16:23:48Z

cufft_run.txt
All the DFT changes look good to me and I've run the DFT tests successfully.
I'd like to see test logs for the other backends before I approve.

hdelan · 2023-12-22T17:05:31Z

AMD tests for lapack and blas all passing:
test_amd.txt

8 lapack nvidia test failing on GTX1050 but these tests are also failing on develop branch:
test_cuda_lapack.txt

Nvidia blas tests passing
test_cuda_blas.txt

muhammad-tanvir-1211 · 2024-01-12T11:25:27Z

I see all the buffer tests failing for the rocblas backend with PI_ERROR_INVALID_OPERATION.

Logs:
PR_425.txt

The failures are not because of the changes in this PR, but rather a recent change in the compiler. All these tests are expected to pass once oneapi-src/unified-runtime#1226 and intel/llvm#12297 are merged.

Rbiessy

Would you be able to attach test logs again? Also does this change compile with the 2024.0 icpx release?

Rbiessy · 2024-03-25T15:16:45Z

src/blas/backends/cublas/cublas_scope_handle.cpp

+    // Getting the primary context also sets it as the active context
+    CUDA_ERROR_FUNC(cuDevicePrimaryCtxRetain, err, &desired, cudaDevice);


Should we expect a performance cost from this change? From what I understand cuCtxSetCurrent was expected to be called only once before, assuming the context active was not changed outside of oneMKL.
This constructor is called once before each calls to blas functions so I am wary that the cost may add up.

The cost of cuDevicePrimaryContextRetain is minimal once the primary context is not being initialized for the first time, which it should not be here. A simple benchmark like this:

for (int i = 0; i < NUM_ITERATIONS; i++) { CHECK(cuDevicePrimaryCtxRetain(&context, device)); CHECK(cuDevicePrimaryCtxRelease(device)); }

Gives 32ns per loop, so calls to these funcs are almost free.

Using setup:

$ nvidia-smi Mon Mar 25 16:27:32 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce GTX 1050 Ti On | 00000000:01:00.0 Off | N/A | | 31% 24C P8 N/A / 75W | 14MiB / 4096MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

hjabird

Generally LGTM. I've tested the rocFFT and cuFFT backends with DPC++ 2024.0's icpx.
It would be good to see logs again after the rebase as Rbiessy suggests.

SYCL contexts have a many to one mapping to native contexts. Therefore it is necessary to get the desired native context from a SYCL device, as SYCL devices have a one to one mapping to native contexts.

hdelan · 2024-03-26T14:17:53Z

Some test results:

CUDA

gtx1050.txt
Some failures due to precision also present on develop branch.

HIP

gfx90a_oneMKL_test.txt
Test failures in HIP are also present on the develop branch:
gfx90a_oneMKL_test_develop_branch.txt

I am not sure how to build/run the FFT tests. Are there some build/test instructions that I can follow?

hdelan · 2024-03-26T15:19:41Z

~~In terms of building with icpx 2024.0.0 for CUDA. I am getting a segfault at linking with develop branch.~~

Fixed. LD_LIBRARY_PATH problems -_-

I can successfully build this branch with icpx 2024.0.2 for CUDA

Rbiessy · 2024-03-27T08:58:21Z

Thanks a lot @hdelan ! The instructions are here but need to be improved.

The short answer is that you should just need to add -DENABLE_CUFFT_BACKEND=True -DENABLE_ROCFFT_BACKEND=True to also test the DFT domain with the native CUDA and HIP backends.
If you are explicitly setting -DTARGET_DOMAINS in your CMake command you will also need to append dft to the list, otherwise it will be enabled by default.
If you don't want to build and test the other domains again you can use -DTARGET_DOMAINS=dft.

hdelan · 2024-03-27T10:45:32Z

Thanks @Rbiessy !

Building rocFFT is broken for me but this PR does not touch that code. Building with cuFFT is OK. Here is updated tests for all oneMKL for CUDA including cuBLAS, cuFFT, cuRAND, cuSOLVER:

gtx1050.txt

ericlars · 2024-03-28T16:00:20Z

Thanks! LGTM

Rbiessy · 2024-03-28T16:50:15Z

Thanks for the review. Let me know @lhuot or @mmeterel if you need more time, otherwise I will go ahead and merge this on Monday.

hdelan force-pushed the get-context-from-device branch from 8243065 to 9669252 Compare December 7, 2023 15:32

FMarno reviewed Dec 11, 2023

View reviewed changes

src/dft/backends/cufft/commit.cpp Outdated Show resolved Hide resolved

hdelan force-pushed the get-context-from-device branch from 396cd30 to 0ea3b84 Compare December 22, 2023 16:44

muhammad-tanvir-1211 mentioned this pull request Jan 12, 2024

[UR] Add extra param to urMemGetNativeHandle oneapi-src/unified-runtime#1226

Merged

muhammad-tanvir-1211 mentioned this pull request Jan 12, 2024

[SYCL][HIP][CUDA] Use new version of piMemGetNativeHandle and add test intel/llvm#12297

Merged

hdelan requested a review from FMarno March 25, 2024 14:39

hdelan force-pushed the get-context-from-device branch from c8a758d to 878b981 Compare March 25, 2024 14:48

Rbiessy requested review from lhuot, mmeterel and ericlars March 25, 2024 14:54

Rbiessy self-assigned this Mar 25, 2024

Rbiessy requested a review from hjabird March 25, 2024 14:56

Rbiessy reviewed Mar 25, 2024

View reviewed changes

hjabird reviewed Mar 25, 2024

View reviewed changes

Use device to get native context

8d14c8a

SYCL contexts have a many to one mapping to native contexts. Therefore it is necessary to get the desired native context from a SYCL device, as SYCL devices have a one to one mapping to native contexts.

hdelan force-pushed the get-context-from-device branch from 878b981 to 8d14c8a Compare March 25, 2024 16:22

Rbiessy approved these changes Mar 27, 2024

View reviewed changes

hjabird approved these changes Mar 28, 2024

View reviewed changes

ericlars approved these changes Mar 28, 2024

View reviewed changes

Rbiessy merged commit 4635cad into oneapi-src:develop Apr 1, 2024

normallytangent pushed a commit to normallytangent/oneMKL that referenced this pull request Aug 6, 2024

[CUDA][HIP] Use device to get native context (oneapi-src#425)

2dab082

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA][HIP] Use device to get native context #425

[CUDA][HIP] Use device to get native context #425

hdelan commented Dec 7, 2023

jinz2014 commented Dec 11, 2023

hdelan commented Dec 11, 2023

FMarno commented Dec 11, 2023 •

edited

Loading

hdelan commented Dec 22, 2023

muhammad-tanvir-1211 commented Jan 12, 2024 •

edited

Loading

Rbiessy left a comment

Rbiessy Mar 25, 2024

hdelan Mar 25, 2024 •

edited

Loading

hjabird left a comment

hdelan commented Mar 26, 2024 •

edited

Loading

hdelan commented Mar 26, 2024 •

edited

Loading

Rbiessy commented Mar 27, 2024

hdelan commented Mar 27, 2024 •

edited

Loading

ericlars commented Mar 28, 2024

Rbiessy commented Mar 28, 2024

		// Getting the primary context also sets it as the active context
		CUDA_ERROR_FUNC(cuDevicePrimaryCtxRetain, err, &desired, cudaDevice);

[CUDA][HIP] Use device to get native context #425

[CUDA][HIP] Use device to get native context #425

Conversation

hdelan commented Dec 7, 2023

jinz2014 commented Dec 11, 2023

hdelan commented Dec 11, 2023

FMarno commented Dec 11, 2023 • edited Loading

hdelan commented Dec 22, 2023

muhammad-tanvir-1211 commented Jan 12, 2024 • edited Loading

Rbiessy left a comment

Choose a reason for hiding this comment

Rbiessy Mar 25, 2024

Choose a reason for hiding this comment

hdelan Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

hjabird left a comment

Choose a reason for hiding this comment

hdelan commented Mar 26, 2024 • edited Loading

CUDA

HIP

hdelan commented Mar 26, 2024 • edited Loading

Rbiessy commented Mar 27, 2024

hdelan commented Mar 27, 2024 • edited Loading

ericlars commented Mar 28, 2024

Rbiessy commented Mar 28, 2024

FMarno commented Dec 11, 2023 •

edited

Loading

muhammad-tanvir-1211 commented Jan 12, 2024 •

edited

Loading

hdelan Mar 25, 2024 •

edited

Loading

hdelan commented Mar 26, 2024 •

edited

Loading

hdelan commented Mar 26, 2024 •

edited

Loading

hdelan commented Mar 27, 2024 •

edited

Loading