Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DeviceSanitizer] Support detecting out-of-bounds error on CPU Device & Static Local Memory #1210

Merged
merged 185 commits into from
Feb 1, 2024

Conversation

AllanZyne
Copy link
Contributor

@AllanZyne AllanZyne commented Dec 26, 2023

intel/llvm part: intel/llvm#12248

AllanZyne and others added 30 commits November 13, 2023 17:44
This commit adds the adapter implementations of the virtual memory
extension functionality to be used in
intel/llvm#8954.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
This reverts commit 84a3afa.
@AllanZyne
Copy link
Contributor Author

Hi @oneapi-src/unified-runtime-maintain, we had already internal reviewed this PR, can UR team help to review it?
Thank you very much!

@kbenzie
Copy link
Contributor

kbenzie commented Jan 25, 2024

Looks like this fails to compile on macOS. I think the easiest solution would be to disable the sanitizer layer in CMake when macOS is detected.

@AllanZyne
Copy link
Contributor Author

Looks like this fails to compile on macOS. I think the easiest solution would be to disable the sanitizer layer in CMake when macOS is detected.

Yep, thanks for your suggestion.
We haven't supported Windows either.
I've added a warning message for Windows and macOS building.

return Result;
// FIXME: Currently, Level-Zero doesn't create independent VAs for each contexts,
// which will cause out-of-resource error when users use multiple contexts
static uptr ShadowOffset, ShadowOffsetEnd;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use static here?

Copy link
Contributor Author

@AllanZyne AllanZyne Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned on comment, Level-Zero uses one VA for multiple contexts, so if user creates multiple contexts, we have to reserve multiple huge shadow memories, which will cause out-of-resource error. So I added "static" here to let one shadow memory share with all contexts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can call_once be used instead? it might be more intuitive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code will also suffer multi thread issue (CPU as well). I'm going to redesign DeviceInfo in next patch, and let each ur_device_handle_t correspond to only one unique instance of it, so that there will be only one shadow memory offset in each DeviceInfo.

For now, let's just leave it as this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean this 'static' code will be removed after your redesign so that the multi-thread issue will be resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, because DeviceInfo will be initialized only once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

Copy link
Contributor

@pbalcer pbalcer Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a race now, right? Once all this is resolved, might be worth adding multi-threaded tests that exercise this code path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a race now, right? Once all this is resolved, might be worth adding multi-threaded tests that exercise this code path.

Yes, I will.

return Result;
// FIXME: Currently, Level-Zero doesn't create independent VAs for each contexts,
// which will cause out-of-resource error when users use multiple contexts
static uptr ShadowOffset, ShadowOffsetEnd;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean this 'static' code will be removed after your redesign so that the multi-thread issue will be resolved?

@AllanZyne
Copy link
Contributor Author

Hi, do I need to get more reviewers? Or can this PR be merged?
Thanks!

// Poison shadow memory outside of asan runtime is not allowed, so we
// need to avoid memset's call from being intercepted.
static void *memset_ptr = []() {
void *handle = dlopen("libc.so.6", RTLD_LAZY);
Copy link
Contributor

@wenju-he wenju-he Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we hardcode the library version number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I changed it to LIBC_SO macro.

@pbalcer pbalcer merged commit c6b0a24 into oneapi-src:main Feb 1, 2024
52 checks passed
@AllanZyne AllanZyne deleted the sanitizer-pr-cpu-local branch February 1, 2024 13:22
steffenlarsen pushed a commit to intel/llvm that referenced this pull request Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants