-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
L0 provider cannot find L0 symbols when dlopen is used. #926
Comments
I was able to reproduce it. Investigating. |
I have found the root cause of the issue. The reproducer does the following:
2499740: symbol=_dl_find_dso_for_object; lookup in file=/user/svinogra/experiments/umf_repro/build/test [0]
2499740: symbol=_dl_find_dso_for_object; lookup in file=/user/svinogra/unified-memory-framework/build/lib/libumf.so.0 [0]
2499740: symbol=_dl_find_dso_for_object; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
2499740: symbol=_dl_find_dso_for_object; lookup in file=/opt/intel/oneapi/tbb/2021.13/env/../lib/intel64/gcc4.8/libhwloc.so.15 [0]
2499740: symbol=_dl_find_dso_for_object; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
2499740: symbol=zeMemAllocHost; lookup in file=/user/svinogra/experiments/umf_repro/build/test [0]
2499740: symbol=zeMemAllocHost; lookup in file=/user/svinogra/unified-memory-framework/build/lib/libumf.so.0 [0]
2499740: symbol=zeMemAllocHost; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
2499740: symbol=zeMemAllocHost; lookup in file=/opt/intel/oneapi/tbb/2021.13/env/../lib/intel64/gcc4.8/libhwloc.so.15 [0]
2499740: symbol=zeMemAllocHost; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
2499740: symbol=zeMemAllocHost; lookup in file=/lib/x86_64-linux-gnu/libm.so.6 [0]
2499740: /user/svinogra/unified-memory-framework/build/lib/libumf.so.0: error: symbol lookup error: undefined symbol: zeMemAllocHost (fatal) If we change the 2500341: symbol=_dl_find_dso_for_object; lookup in file=/user/svinogra/experiments/umf_repro/build/test [0]
2500341: symbol=_dl_find_dso_for_object; lookup in file=/user/svinogra/unified-memory-framework/build/lib/libumf.so.0 [0]
2500341: symbol=_dl_find_dso_for_object; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
2500341: symbol=_dl_find_dso_for_object; lookup in file=/opt/intel/oneapi/tbb/2021.13/env/../lib/intel64/gcc4.8/libhwloc.so.15 [0]
2500341: symbol=_dl_find_dso_for_object; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
2500341: symbol=zeMemAllocHost; lookup in file=/user/svinogra/experiments/umf_repro/build/test [0]
2500341: symbol=zeMemAllocHost; lookup in file=/user/svinogra/unified-memory-framework/build/lib/libumf.so.0 [0]
2500341: symbol=zeMemAllocHost; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
2500341: symbol=zeMemAllocHost; lookup in file=/opt/intel/oneapi/tbb/2021.13/env/../lib/intel64/gcc4.8/libhwloc.so.15 [0]
2500341: symbol=zeMemAllocHost; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
2500341: symbol=zeMemAllocHost; lookup in file=/lib/x86_64-linux-gnu/libm.so.6 [0]
2500341: symbol=zeMemAllocHost; lookup in file=./libipc.so [0]
2500341: symbol=zeMemAllocHost; lookup in file=/lib/x86_64-linux-gnu/libze_loader.so.1 [0]
|
This is very strange. Call to init_level_zero from lib.c for sure loads ze_loader.so - we would not be able to create L0 context otherwise. So we know that ze_loader.so is loaded and used before calling umfMemoryProviderCreate. What is the difference in symbol initialization between init_level_zero and in umfMemoryProviderCreate? |
My understanding is the following:
Since the
So as we can see in the case of the |
@igchor Could you please clarify how it maps to the level zero adapter implementation and its v2 version? |
@vinser52 all adapters are loaded with RTLD_LOCAL currently: https://github.com/oneapi-src/unified-runtime/blob/d3b81bfc88cc896b16634a5c602422a3aff5f4d1/source/common/linux/ur_lib_loader.cpp#L38 We could discuss changing this - I don't know what would be the exact impact. Also, @vinser52 do you know why the reproducer only fails if there is a call to umf function in main.c? If I remove umfPoolByPtr call everything works fine, even with RTLD_LOCAL. |
@igchor But you said that the issue only relevant for the v2 of L0 adapter. Do you have any ideas what is the difference between v1 and v2 of L0 adapter in that regards?
hmm, I forgot about that case. This is what I see in the
symbol=zeMemAllocHost; lookup in file=./test [0]
symbol=zeMemAllocHost; lookup in file=/home/vinser52/repos/unified-memory-framework/build/umf_install/lib/libumf.so.0 [0]
symbol=zeMemAllocHost; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
symbol=zeMemAllocHost; lookup in file=/opt/intel/oneapi/tcm/1.2/lib/libhwloc.so.15 [0]
symbol=zeMemAllocHost; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
symbol=zeMemAllocHost; lookup in file=/lib/x86_64-linux-gnu/libm.so.6 [0]
symbol=zeMemAllocHost; lookup in file=./test [0]
symbol=zeMemAllocHost; lookup in file=/home/vinser52/repos/unified-memory-framework/build/umf_install/lib/libumf.so.0 [0]
symbol=zeMemAllocHost; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
symbol=zeMemAllocHost; lookup in file=/opt/intel/oneapi/tcm/1.2/lib/libhwloc.so.15 [0]
symbol=zeMemAllocHost; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
symbol=zeMemAllocHost; lookup in file=/lib/x86_64-linux-gnu/libm.so.6 [0]
symbol=zeMemAllocHost; lookup in file=./libipc.so [0]
symbol=zeMemAllocHost; lookup in file=/lib/x86_64-linux-gnu/libze_loader.so.1 [0]
symbol=zeMemAllocHost; lookup in file=./test [0]
symbol=zeMemAllocHost; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
symbol=zeMemAllocHost; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
symbol=zeMemAllocHost; lookup in file=./libipc.so [0]
symbol=zeMemAllocHost; lookup in file=/lib/x86_64-linux-gnu/libze_loader.so.1 [0]
file=/lib/x86_64-linux-gnu/libze_loader.so.1 [0]; needed by /home/vinser52/repos/unified-memory-framework/build/umf_install/lib/libumf.so.0 [0] (relocation dependency) So when the ldd ./test
linux-vdso.so.1 (0x00007ffc3db8f000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000741877600000)
/lib64/ld-linux-x86-64.so.2 (0x0000741877932000) In the 1st and 2nd cases, the But I do not understand these lines in the symbol=zeMemAllocHost; lookup in file=/lib/x86_64-linux-gnu/libze_loader.so.1 [0]
file=/lib/x86_64-linux-gnu/libze_loader.so.1 [0]; needed by /home/vinser52/repos/unified-memory-framework/build/umf_install/lib/libumf.so.0 [0] (relocation dependency) Will continue investigation. |
The reasons we load symbols in UMF this way is are:
Here the 1st is correct, but 2nd not. What I would like to suggest is to look at /proc/self/maps and check for path to ze_loader. |
I had a similar idea yesterday in mind, but it was too late. Will check it today. |
I just checked the following: Changed the |
Problematic scenario:
L0 UR adapter uses L0 provider from UMF. L0 UR adapter is being dlopened by the loader (which application links to). When application itself also links with UMF and uses
umfPoolGetMemoryProvider
(or perahps any UMF symbol?) then L0 provider cannot find symbols.How to reproduce:
Output:
If I remove the call to
umfPoolByPtr
from main.c then the binary works (allocates memory).The text was updated successfully, but these errors were encountered: