[CUDA] Dynamically load the CUPTI library when tracing #1070

pasaulais · 2023-11-13T15:01:49Z

With these changes, libcupti.so is loaded dynamically when CUDA tracing is enabled. This enables XPTI tracing-enabled builds to work on systems that do not have libcupti.so or where that library cannot be located on the system.

The enableCUDATracing and disableCUDATracing functions have been changed to take a context pointer, rather than use global variables for tracing state.

There is a temporary enableCUDATracing variant with no parameter for compatibility until the relevant changes have been merged to https://github.com/intel/llvm.

pbalcer

Does this work/do anything when built out of sycl tree? As far as I can tell, we don't set XPTI_ENABLE_INSTRUMENTATION anywhere, and the cuda adapter never links with xpti.

Might be useful to setup a CI job that builds cuda with tracing enabled.

…or XPTI tracing (#11866) This is a prerequisite for implementing dynamic loading of the CUPTI library when XPTI tracing is enabled. See oneapi-src/unified-runtime#1070

pasaulais · 2023-11-15T12:35:27Z

I haven't built UR out of the SYCL tree, but my understanding is that the CUDA tracing support is #ifdef'd out as you mention. I don't know if xpti should be a required dependency of UR, but this work is about making XPTI_ENABLE_INSTRUMENTATION enabled by default for SYCL builds.

That's a good point, I will ask internally about adding CUDA tracing to a CI job.

pasaulais · 2023-11-20T15:14:25Z

Does this work/do anything when built out of sycl tree? As far as I can tell, we don't set XPTI_ENABLE_INSTRUMENTATION anywhere, and the cuda adapter never links with xpti.

Might be useful to setup a CI job that builds cuda with tracing enabled.

I've asked internally and the consensus is that tracing should be enabled in CI. This is better done in a separate PR due to the dependencies between this repo and https://github.com/intel/llvm (e.g. changing the signature for enableCUDATracing).

pasaulais · 2023-11-20T15:17:33Z

I have also added some changes to use ur_loader::LibLoader for loading the CUPTI library instead of dlopen, which is not available on Windows. This won't affect the CI testing yet as tracing is not enabled, but these changes will be required when it gets enabled by default.

kbenzie · 2023-11-20T16:24:56Z

I've asked internally and the consensus is that tracing should be enabled in CI. This is better done in a separate PR due to the dependencies between this repo and https://github.com/intel/llvm (e.g. changing the signature for enableCUDATracing).

Please create an issue for tracking this https://github.com/oneapi-src/unified-runtime/issues/new

pasaulais · 2023-11-20T17:47:48Z

I've asked internally and the consensus is that tracing should be enabled in CI. This is better done in a separate PR due to the dependencies between this repo and https://github.com/intel/llvm (e.g. changing the signature for enableCUDATracing).

Please create an issue for tracking this https://github.com/oneapi-src/unified-runtime/issues/new

I have created #1098 for this.

pasaulais · 2023-11-20T19:47:18Z

Created CI testing PR: intel/llvm#11952

fabiomestre · 2023-12-05T16:50:26Z

I have updated the target branch of this PR from the adapters branch to the main branch.
Development in UR is moving back to main. The adapters branch will soon be deleted.

codecov-commenter · 2023-12-13T16:27:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (95f9092) 15.73% compared to head (7afc5b8) 15.73%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1070   +/-   ##
=======================================
  Coverage   15.73%   15.73%           
=======================================
  Files         223      223           
  Lines       31466    31465    -1     
  Branches     3556     3556           
=======================================
  Hits         4952     4952           
+ Misses      26463    26462    -1     
  Partials       51       51

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pasaulais · 2023-12-13T16:33:08Z

I have updated the target branch of this PR from the adapters branch to the main branch. Development in UR is moving back to main. The adapters branch will soon be deleted.

Thanks @fabiomestre, I have rebased these changes on top of latest main.

source/adapters/cuda/CMakeLists.txt

alexbatashev · 2023-12-14T09:49:55Z

I wonder, what's the content of ${CUDA_cupti_LIBRARY}? If it's a full path, then it does not solve the problem, and instead makes it silent and hard to debug for users, who really need tracing and profiling. And if it's just the library name, I'm afraid that's a potential security issue: typically, security guidelines require dynamic library loading with full paths.

pasaulais · 2023-12-14T11:20:37Z

I wonder, what's the content of ${CUDA_cupti_LIBRARY}? If it's a full path, then it does not solve the problem, and instead makes it silent and hard to debug for users, who really need tracing and profiling. And if it's just the library name, I'm afraid that's a potential security issue: typically, security guidelines require dynamic library loading with full paths.

On my system, this is the following path: /usr/local/cuda/lib64/libcupti.so

$ ls -l /usr/local/cuda/lib64/libcupti.so
lrwxrwxrwx 1 root root 14 Sep  9 04:28 /usr/local/cuda/lib64/libcupti.so -> libcupti.so.12
$ ls -l /usr/local/cuda
lrwxrwxrwx 1 root root 22 Oct 30 15:35 /usr/local/cuda -> /etc/alternatives/cuda
$ ls -l /etc/alternatives/cuda
lrwxrwxrwx 1 root root 20 Oct 30 15:35 /etc/alternatives/cuda -> /usr/local/cuda-12.3

I agree that the mechanism to find the CUPTI library currently does not give any feedback when the library cannot be found and that might be confusing for the user. However, the previous situation where the CUPTI library cannot be found and this prevents loading the CUDA adapter is even worse as the DPC++ application cannot be run at all. The goal of this PR is to solve that particular situation, improving the loading process (e.g. with a user-configurable path to override the build-time path) and feedback (e.g. run-time warnings) is better done in a separate PR.

kbenzie · 2023-12-14T12:35:25Z

The goal of this PR is to solve that particular situation, improving the loading process (e.g. with a user-configurable path to override the build-time path) and feedback (e.g. run-time warnings) is better done in a separate PR.

If you could create a new issue to track this future work @pasaulais that would be much appriciated.

pasaulais · 2023-12-14T13:18:41Z

The goal of this PR is to solve that particular situation, improving the loading process (e.g. with a user-configurable path to override the build-time path) and feedback (e.g. run-time warnings) is better done in a separate PR.

If you could create a new issue to track this future work @pasaulais that would be much appriciated.

Sure, here are issues for both parts: #1188, #1189

kbenzie · 2023-12-14T13:24:16Z

Thanks!

pasaulais · 2023-12-14T13:33:21Z

I wonder, what's the content of ${CUDA_cupti_LIBRARY}? If it's a full path, then it does not solve the problem, and instead makes it silent and hard to debug for users, who really need tracing and profiling. And if it's just the library name, I'm afraid that's a potential security issue: typically, security guidelines require dynamic library loading with full paths.

Regarding your point about 'just the library name' (relative path), I am curious as to why that would be less secure than explicitly linking against libcupti.so when building the CUDA adapter. AFAIU the search path used to locate the library is the same in both cases (at least on Linux). Do you have a specific example where using dlopen is less secure?

The reason I am asking is loading libcupti.so using a relative path after loading with an absolute path fails could be a way to implement #1189

alexbatashev · 2023-12-14T14:07:19Z

I wonder, what's the content of ${CUDA_cupti_LIBRARY}? If it's a full path, then it does not solve the problem, and instead makes it silent and hard to debug for users, who really need tracing and profiling. And if it's just the library name, I'm afraid that's a potential security issue: typically, security guidelines require dynamic library loading with full paths.

Regarding your point about 'just the library name' (relative path), I am curious as to why that would be less secure than explicitly linking against libcupti.so when building the CUDA adapter. AFAIU the search path used to locate the library is the same in both cases (at least on Linux). Do you have a specific example where using dlopen is less secure?

The reason I am asking is loading libcupti.so using a relative path after loading with an absolute path fails could be a way to implement #1189

I've checked with ld.so man page, and it seems the only way to load a library from user-writable path is if that path is part of LD_LIBRARY_PATH, which is probably fine (?). But for Windows the first thing in the library search order is either CWD or binary directory, which can be user-writable and can potentially contain a malicious DLL. I remember that SYCL runtime stopped loading libraries with just the name for the same reason. Anyway, I'm not the expert on the topic and you should consult with security champions.

Two alternative ways would be to either link cupti statically or to make a separate tracing library and to load it from the same location as the adapter library, which would eliminate a case of loading libraries with just the name.

pbalcer · 2023-12-14T14:15:45Z

I've checked with ld.so man page, and it seems the only way to load a library from user-writable path is if that path is part of LD_LIBRARY_PATH, which is probably fine (?). But for Windows the first thing in the library search order is either CWD or binary directory, which can be user-writable and can potentially contain a malicious DLL. I remember that SYCL runtime stopped loading libraries with just the name for the same reason. Anyway, I'm not the expert on the topic and you should consult with security champions.

AFAIK, on Linux it's ok to let the loader find the appropriate library file, but Windows, as you say, we need to provide a fully qualified name for it to be safe. That's what we do in the UR loader.

pasaulais · 2023-12-18T10:23:08Z

I wonder, what's the content of ${CUDA_cupti_LIBRARY}? If it's a full path, then it does not solve the problem, and instead makes it silent and hard to debug for users, who really need tracing and profiling. And if it's just the library name, I'm afraid that's a potential security issue: typically, security guidelines require dynamic library loading with full paths.

Regarding your point about 'just the library name' (relative path), I am curious as to why that would be less secure than explicitly linking against libcupti.so when building the CUDA adapter. AFAIU the search path used to locate the library is the same in both cases (at least on Linux). Do you have a specific example where using dlopen is less secure?

The reason I am asking is loading libcupti.so using a relative path after loading with an absolute path fails could be a way to implement #1189

I've checked with ld.so man page, and it seems the only way to load a library from user-writable path is if that path is part of LD_LIBRARY_PATH, which is probably fine (?). But for Windows the first thing in the library search order is either CWD or binary directory, which can be user-writable and can potentially contain a malicious DLL. I remember that SYCL runtime stopped loading libraries with just the name for the same reason. Anyway, I'm not the expert on the topic and you should consult with security champions.

Two alternative ways would be to either link cupti statically or to make a separate tracing library and to load it from the same location as the adapter library, which would eliminate a case of loading libraries with just the name.

That makes sense, thanks for the clarification about why it's a security issue on Windows. One thing I'm not sure about linking cupti statically is version mismatches. Do you know if there is support for linking against one version of the CUDA toolkit and running on a system with a different toolkit version? With shared libraries, the cupti library will have the same version as the CUDA runtime library (even if a different version was used to build the CUDA adapter).

alexbatashev · 2023-12-18T10:38:53Z

@pasaulais my understanding is that it does not matter, whether you link statically or dynamically. NVIDIA guarantees backwards compatibility inside one major version. That is, if you link against CUDA 12.3, your build will fail on CUDA 12.1 machines no matter what, as the driver version is too low. But other direction should work just fine: 12.1 library should work in 12.3 environment. In fact, nvcc links statically by default: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html?highlight=static#cudart-none-shared-static-cudart. And that was done to specifically resolve the issue you describe: make sure the application can work even if CUDA runtime is not installed on the system.

pasaulais · 2024-01-15T15:28:12Z

@pasaulais my understanding is that it does not matter, whether you link statically or dynamically. NVIDIA guarantees backwards compatibility inside one major version. That is, if you link against CUDA 12.3, your build will fail on CUDA 12.1 machines no matter what, as the driver version is too low. But other direction should work just fine: 12.1 library should work in 12.3 environment. In fact, nvcc links statically by default: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html?highlight=static#cudart-none-shared-static-cudart. And that was done to specifically resolve the issue you describe: make sure the application can work even if CUDA runtime is not installed on the system.

This is something that could use revisiting in order to make a good decision for how to link these libraries, so I have created an issue for this: #1251

CI testing for oneapi-src/unified-runtime#1070 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

oneapi-src/unified-runtime#1070 and #11952 introduced a new variant of the `enableCUDATracing` function that takes a context pointer parameter, replacing the parameterless variant of that function. The older variant will be removed from UR once this PR is merged.

pasaulais requested a review from a team as a code owner November 13, 2023 15:01

pasaulais changed the title ~~[CUDA} Dynamically load the CUPTI library when tracing~~ [CUDA] Dynamically load the CUPTI library when tracing Nov 13, 2023

pasaulais mentioned this pull request Nov 13, 2023

[SYCL][PI] Provide a preprocessor macro to locate the CUPTI library for XPTI tracing intel/llvm#11866

Merged

pbalcer reviewed Nov 14, 2023

View reviewed changes

pasaulais force-pushed the pa/dlopen-cupti branch from c953ee6 to ae6905d Compare November 20, 2023 15:09

pasaulais requested a review from a team as a code owner November 20, 2023 15:09

pasaulais mentioned this pull request Nov 20, 2023

[CUDA] Enable XPTI tracing in CI builds #1098

Open

pasaulais mentioned this pull request Nov 20, 2023

[UR][CUDA] Dynamically load the CUPTI library when tracing intel/llvm#11952

Merged

fabiomestre changed the base branch from adapters to main December 5, 2023 16:50

pbalcer mentioned this pull request Dec 11, 2023

[CUDA][XPTI] Fix XPTI-based CUDA tracing capabilities #1173

Closed

pasaulais force-pushed the pa/dlopen-cupti branch from ae6905d to c990788 Compare December 13, 2023 16:14

kbenzie reviewed Dec 13, 2023

View reviewed changes

source/adapters/cuda/CMakeLists.txt Outdated Show resolved Hide resolved

pasaulais force-pushed the pa/dlopen-cupti branch 2 times, most recently from e8076c4 to 7afc5b8 Compare December 14, 2023 11:18

This was referenced Dec 14, 2023

[CUDA] Print a warning when loading the CUPTI library fails #1188

Open

[CUDA] Improve the loading process of the CUPTI library when tracing #1189

Open

pasaulais force-pushed the pa/dlopen-cupti branch 2 times, most recently from f79902a to f0ea491 Compare January 11, 2024 16:32

pasaulais mentioned this pull request Jan 15, 2024

[CUDA] Decide on how to link CUDA libraries #1251

Open

kbenzie added the ready to merge Added to PR's which are ready to merge label Jan 15, 2024

pasaulais added 3 commits January 23, 2024 14:44

[CUDA} Dynamically load the CUPTI library when tracing

311afb9

[CUDA] Use LibLoader for cross-platform loading of libraries

3be1c4b

[CUDA] Move CUPTI function pointers to a separate struct

4b2ac71

kbenzie force-pushed the pa/dlopen-cupti branch from f0ea491 to 4b2ac71 Compare January 23, 2024 14:44

kbenzie merged commit 5b3750d into oneapi-src:main Jan 24, 2024
51 checks passed

dm-vodopyanov pushed a commit to intel/llvm that referenced this pull request Jan 24, 2024

[UR][CUDA] Dynamically load the CUPTI library when tracing (#11952)

acf89a6

CI testing for oneapi-src/unified-runtime#1070 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

pasaulais mentioned this pull request Jan 29, 2024

[UR][CUDA] Use new variant of the enableCUDATracing function intel/llvm#12521

Merged

pasaulais mentioned this pull request Feb 1, 2024

[CUDA] Remove unused function variant #1310

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Dynamically load the CUPTI library when tracing #1070

[CUDA] Dynamically load the CUPTI library when tracing #1070

pasaulais commented Nov 13, 2023

pbalcer left a comment

pasaulais commented Nov 15, 2023

pasaulais commented Nov 20, 2023

pasaulais commented Nov 20, 2023

kbenzie commented Nov 20, 2023

pasaulais commented Nov 20, 2023

pasaulais commented Nov 20, 2023

fabiomestre commented Dec 5, 2023

codecov-commenter commented Dec 13, 2023 •

edited

Loading

pasaulais commented Dec 13, 2023

alexbatashev commented Dec 14, 2023

pasaulais commented Dec 14, 2023 •

edited

Loading

kbenzie commented Dec 14, 2023

pasaulais commented Dec 14, 2023

kbenzie commented Dec 14, 2023

pasaulais commented Dec 14, 2023

alexbatashev commented Dec 14, 2023 •

edited by pbalcer

Loading

pbalcer commented Dec 14, 2023

pasaulais commented Dec 18, 2023

alexbatashev commented Dec 18, 2023

pasaulais commented Jan 15, 2024

[CUDA] Dynamically load the CUPTI library when tracing #1070

[CUDA] Dynamically load the CUPTI library when tracing #1070

Conversation

pasaulais commented Nov 13, 2023

pbalcer left a comment

Choose a reason for hiding this comment

pasaulais commented Nov 15, 2023

pasaulais commented Nov 20, 2023

pasaulais commented Nov 20, 2023

kbenzie commented Nov 20, 2023

pasaulais commented Nov 20, 2023

pasaulais commented Nov 20, 2023

fabiomestre commented Dec 5, 2023

codecov-commenter commented Dec 13, 2023 • edited Loading

Codecov Report

pasaulais commented Dec 13, 2023

alexbatashev commented Dec 14, 2023

pasaulais commented Dec 14, 2023 • edited Loading

kbenzie commented Dec 14, 2023

pasaulais commented Dec 14, 2023

kbenzie commented Dec 14, 2023

pasaulais commented Dec 14, 2023

alexbatashev commented Dec 14, 2023 • edited by pbalcer Loading

pbalcer commented Dec 14, 2023

pasaulais commented Dec 18, 2023

alexbatashev commented Dec 18, 2023

pasaulais commented Jan 15, 2024

codecov-commenter commented Dec 13, 2023 •

edited

Loading

pasaulais commented Dec 14, 2023 •

edited

Loading

alexbatashev commented Dec 14, 2023 •

edited by pbalcer

Loading