Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] Dynamically load the CUPTI library when tracing #1070

Merged
merged 3 commits into from
Jan 24, 2024

Conversation

pasaulais
Copy link
Contributor

With these changes, libcupti.so is loaded dynamically when CUDA tracing is enabled. This enables XPTI tracing-enabled builds to work on systems that do not have libcupti.so or where that library cannot be located on the system.

The enableCUDATracing and disableCUDATracing functions have been changed to take a context pointer, rather than use global variables for tracing state.

There is a temporary enableCUDATracing variant with no parameter for compatibility until the relevant changes have been merged to https://github.com/intel/llvm.

@pasaulais pasaulais requested a review from a team as a code owner November 13, 2023 15:01
@pasaulais pasaulais changed the title [CUDA} Dynamically load the CUPTI library when tracing [CUDA] Dynamically load the CUPTI library when tracing Nov 13, 2023
Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work/do anything when built out of sycl tree? As far as I can tell, we don't set XPTI_ENABLE_INSTRUMENTATION anywhere, and the cuda adapter never links with xpti.

Might be useful to setup a CI job that builds cuda with tracing enabled.

againull pushed a commit to intel/llvm that referenced this pull request Nov 15, 2023
…or XPTI tracing (#11866)

This is a prerequisite for implementing dynamic loading of the CUPTI
library when XPTI tracing is enabled.

See oneapi-src/unified-runtime#1070
@pasaulais
Copy link
Contributor Author

I haven't built UR out of the SYCL tree, but my understanding is that the CUDA tracing support is #ifdef'd out as you mention. I don't know if xpti should be a required dependency of UR, but this work is about making XPTI_ENABLE_INSTRUMENTATION enabled by default for SYCL builds.

That's a good point, I will ask internally about adding CUDA tracing to a CI job.

@pasaulais
Copy link
Contributor Author

Does this work/do anything when built out of sycl tree? As far as I can tell, we don't set XPTI_ENABLE_INSTRUMENTATION anywhere, and the cuda adapter never links with xpti.

Might be useful to setup a CI job that builds cuda with tracing enabled.

I've asked internally and the consensus is that tracing should be enabled in CI. This is better done in a separate PR due to the dependencies between this repo and https://github.com/intel/llvm (e.g. changing the signature for enableCUDATracing).

@pasaulais
Copy link
Contributor Author

I have also added some changes to use ur_loader::LibLoader for loading the CUPTI library instead of dlopen, which is not available on Windows. This won't affect the CI testing yet as tracing is not enabled, but these changes will be required when it gets enabled by default.

@kbenzie
Copy link
Contributor

kbenzie commented Nov 20, 2023

I've asked internally and the consensus is that tracing should be enabled in CI. This is better done in a separate PR due to the dependencies between this repo and https://github.com/intel/llvm (e.g. changing the signature for enableCUDATracing).

Please create an issue for tracking this https://github.com/oneapi-src/unified-runtime/issues/new

@pasaulais
Copy link
Contributor Author

I've asked internally and the consensus is that tracing should be enabled in CI. This is better done in a separate PR due to the dependencies between this repo and https://github.com/intel/llvm (e.g. changing the signature for enableCUDATracing).

Please create an issue for tracking this https://github.com/oneapi-src/unified-runtime/issues/new

I have created #1098 for this.

@pasaulais
Copy link
Contributor Author

Created CI testing PR: intel/llvm#11952

@fabiomestre fabiomestre changed the base branch from adapters to main December 5, 2023 16:50
@fabiomestre
Copy link
Contributor

I have updated the target branch of this PR from the adapters branch to the main branch.
Development in UR is moving back to main. The adapters branch will soon be deleted.

@codecov-commenter
Copy link

codecov-commenter commented Dec 13, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (95f9092) 15.73% compared to head (7afc5b8) 15.73%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1070   +/-   ##
=======================================
  Coverage   15.73%   15.73%           
=======================================
  Files         223      223           
  Lines       31466    31465    -1     
  Branches     3556     3556           
=======================================
  Hits         4952     4952           
+ Misses      26463    26462    -1     
  Partials       51       51           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pasaulais
Copy link
Contributor Author

I have updated the target branch of this PR from the adapters branch to the main branch. Development in UR is moving back to main. The adapters branch will soon be deleted.

Thanks @fabiomestre, I have rebased these changes on top of latest main.

@alexbatashev
Copy link
Contributor

I wonder, what's the content of ${CUDA_cupti_LIBRARY}? If it's a full path, then it does not solve the problem, and instead makes it silent and hard to debug for users, who really need tracing and profiling. And if it's just the library name, I'm afraid that's a potential security issue: typically, security guidelines require dynamic library loading with full paths.

@pasaulais pasaulais force-pushed the pa/dlopen-cupti branch 2 times, most recently from e8076c4 to 7afc5b8 Compare December 14, 2023 11:18
@pasaulais
Copy link
Contributor Author

pasaulais commented Dec 14, 2023

I wonder, what's the content of ${CUDA_cupti_LIBRARY}? If it's a full path, then it does not solve the problem, and instead makes it silent and hard to debug for users, who really need tracing and profiling. And if it's just the library name, I'm afraid that's a potential security issue: typically, security guidelines require dynamic library loading with full paths.

On my system, this is the following path: /usr/local/cuda/lib64/libcupti.so

$ ls -l /usr/local/cuda/lib64/libcupti.so
lrwxrwxrwx 1 root root 14 Sep  9 04:28 /usr/local/cuda/lib64/libcupti.so -> libcupti.so.12
$ ls -l /usr/local/cuda
lrwxrwxrwx 1 root root 22 Oct 30 15:35 /usr/local/cuda -> /etc/alternatives/cuda
$ ls -l /etc/alternatives/cuda
lrwxrwxrwx 1 root root 20 Oct 30 15:35 /etc/alternatives/cuda -> /usr/local/cuda-12.3

I agree that the mechanism to find the CUPTI library currently does not give any feedback when the library cannot be found and that might be confusing for the user. However, the previous situation where the CUPTI library cannot be found and this prevents loading the CUDA adapter is even worse as the DPC++ application cannot be run at all. The goal of this PR is to solve that particular situation, improving the loading process (e.g. with a user-configurable path to override the build-time path) and feedback (e.g. run-time warnings) is better done in a separate PR.

@kbenzie
Copy link
Contributor

kbenzie commented Dec 14, 2023

The goal of this PR is to solve that particular situation, improving the loading process (e.g. with a user-configurable path to override the build-time path) and feedback (e.g. run-time warnings) is better done in a separate PR.

If you could create a new issue to track this future work @pasaulais that would be much appriciated.

@pasaulais
Copy link
Contributor Author

The goal of this PR is to solve that particular situation, improving the loading process (e.g. with a user-configurable path to override the build-time path) and feedback (e.g. run-time warnings) is better done in a separate PR.

If you could create a new issue to track this future work @pasaulais that would be much appriciated.

Sure, here are issues for both parts: #1188, #1189

@kbenzie
Copy link
Contributor

kbenzie commented Dec 14, 2023

Thanks!

@pasaulais
Copy link
Contributor Author

I wonder, what's the content of ${CUDA_cupti_LIBRARY}? If it's a full path, then it does not solve the problem, and instead makes it silent and hard to debug for users, who really need tracing and profiling. And if it's just the library name, I'm afraid that's a potential security issue: typically, security guidelines require dynamic library loading with full paths.

Regarding your point about 'just the library name' (relative path), I am curious as to why that would be less secure than explicitly linking against libcupti.so when building the CUDA adapter. AFAIU the search path used to locate the library is the same in both cases (at least on Linux). Do you have a specific example where using dlopen is less secure?

The reason I am asking is loading libcupti.so using a relative path after loading with an absolute path fails could be a way to implement #1189

@alexbatashev
Copy link
Contributor

alexbatashev commented Dec 14, 2023

I wonder, what's the content of ${CUDA_cupti_LIBRARY}? If it's a full path, then it does not solve the problem, and instead makes it silent and hard to debug for users, who really need tracing and profiling. And if it's just the library name, I'm afraid that's a potential security issue: typically, security guidelines require dynamic library loading with full paths.

Regarding your point about 'just the library name' (relative path), I am curious as to why that would be less secure than explicitly linking against libcupti.so when building the CUDA adapter. AFAIU the search path used to locate the library is the same in both cases (at least on Linux). Do you have a specific example where using dlopen is less secure?

The reason I am asking is loading libcupti.so using a relative path after loading with an absolute path fails could be a way to implement #1189

I've checked with ld.so man page, and it seems the only way to load a library from user-writable path is if that path is part of LD_LIBRARY_PATH, which is probably fine (?). But for Windows the first thing in the library search order is either CWD or binary directory, which can be user-writable and can potentially contain a malicious DLL. I remember that SYCL runtime stopped loading libraries with just the name for the same reason. Anyway, I'm not the expert on the topic and you should consult with security champions.

Two alternative ways would be to either link cupti statically or to make a separate tracing library and to load it from the same location as the adapter library, which would eliminate a case of loading libraries with just the name.

@pbalcer
Copy link
Contributor

pbalcer commented Dec 14, 2023

I've checked with ld.so man page, and it seems the only way to load a library from user-writable path is if that path is part of LD_LIBRARY_PATH, which is probably fine (?). But for Windows the first thing in the library search order is either CWD or binary directory, which can be user-writable and can potentially contain a malicious DLL. I remember that SYCL runtime stopped loading libraries with just the name for the same reason. Anyway, I'm not the expert on the topic and you should consult with security champions.

AFAIK, on Linux it's ok to let the loader find the appropriate library file, but Windows, as you say, we need to provide a fully qualified name for it to be safe. That's what we do in the UR loader.

@pasaulais
Copy link
Contributor Author

I wonder, what's the content of ${CUDA_cupti_LIBRARY}? If it's a full path, then it does not solve the problem, and instead makes it silent and hard to debug for users, who really need tracing and profiling. And if it's just the library name, I'm afraid that's a potential security issue: typically, security guidelines require dynamic library loading with full paths.

Regarding your point about 'just the library name' (relative path), I am curious as to why that would be less secure than explicitly linking against libcupti.so when building the CUDA adapter. AFAIU the search path used to locate the library is the same in both cases (at least on Linux). Do you have a specific example where using dlopen is less secure?

The reason I am asking is loading libcupti.so using a relative path after loading with an absolute path fails could be a way to implement #1189

I've checked with ld.so man page, and it seems the only way to load a library from user-writable path is if that path is part of LD_LIBRARY_PATH, which is probably fine (?). But for Windows the first thing in the library search order is either CWD or binary directory, which can be user-writable and can potentially contain a malicious DLL. I remember that SYCL runtime stopped loading libraries with just the name for the same reason. Anyway, I'm not the expert on the topic and you should consult with security champions.

Two alternative ways would be to either link cupti statically or to make a separate tracing library and to load it from the same location as the adapter library, which would eliminate a case of loading libraries with just the name.

That makes sense, thanks for the clarification about why it's a security issue on Windows. One thing I'm not sure about linking cupti statically is version mismatches. Do you know if there is support for linking against one version of the CUDA toolkit and running on a system with a different toolkit version? With shared libraries, the cupti library will have the same version as the CUDA runtime library (even if a different version was used to build the CUDA adapter).

@alexbatashev
Copy link
Contributor

@pasaulais my understanding is that it does not matter, whether you link statically or dynamically. NVIDIA guarantees backwards compatibility inside one major version. That is, if you link against CUDA 12.3, your build will fail on CUDA 12.1 machines no matter what, as the driver version is too low. But other direction should work just fine: 12.1 library should work in 12.3 environment. In fact, nvcc links statically by default: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html?highlight=static#cudart-none-shared-static-cudart. And that was done to specifically resolve the issue you describe: make sure the application can work even if CUDA runtime is not installed on the system.

@pasaulais
Copy link
Contributor Author

@pasaulais my understanding is that it does not matter, whether you link statically or dynamically. NVIDIA guarantees backwards compatibility inside one major version. That is, if you link against CUDA 12.3, your build will fail on CUDA 12.1 machines no matter what, as the driver version is too low. But other direction should work just fine: 12.1 library should work in 12.3 environment. In fact, nvcc links statically by default: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html?highlight=static#cudart-none-shared-static-cudart. And that was done to specifically resolve the issue you describe: make sure the application can work even if CUDA runtime is not installed on the system.

This is something that could use revisiting in order to make a good decision for how to link these libraries, so I have created an issue for this: #1251

@kbenzie kbenzie added the ready to merge Added to PR's which are ready to merge label Jan 15, 2024
@kbenzie kbenzie merged commit 5b3750d into oneapi-src:main Jan 24, 2024
51 checks passed
dm-vodopyanov pushed a commit to intel/llvm that referenced this pull request Jan 24, 2024
CI testing for oneapi-src/unified-runtime#1070

---------

Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
ldrumm pushed a commit to intel/llvm that referenced this pull request Feb 1, 2024
oneapi-src/unified-runtime#1070 and
#11952 introduced a new variant of the
`enableCUDATracing` function that takes a context pointer parameter,
replacing the parameterless variant of that function. The older variant
will be removed from UR once this PR is merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge Added to PR's which are ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants