Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][DeviceSanitizer] Checking "sycl::free" related errors #12882

Merged
merged 109 commits into from
Apr 19, 2024

Conversation

AllanZyne
Copy link
Contributor

@AllanZyne AllanZyne commented Mar 1, 2024

UR: oneapi-src/unified-runtime#1402

This PR added supports for checking the following types of error in "UR_LAYER_ASAN":

  • bad-free: the memory address to be freed is not allocated by UR
  • bad-context: the memory address to be freed uses a wrong "context"
  • double-free: the memory address to be freed is already freed
  • use-after-free: the freed memory is used in kernel

I added the environment variable "UR_LAYER_ASAN_OPTIONS" to have additional control over "UR_LAYER_ASAN", which is similar to "ASAN_OPTIONS" in ASan. Currently, it supports:

  • "quarantine_size_mb" (default = 0)
    • Size (in MB) of quarantine per device. The pointers passed to urUSMFree are not freed immediately, but saved into QuarantineCache (per device cache), and when the cached chunk size (only counts the size of USM buffer, not shadow memory) is more than "quarantine_size_mb", the first enqueued chunk will be freed (aka., FIFO). Lower value may reduce memory usage but increase the chance of false negatives
    • This option must be enabled for checking "double-free" and "use-after-free"
  • "debug" (default = 0)
    • Print extra debug messages in kernel ("__AsanDebug” in “libdevice/sanitizer_utils.cpp”), which is helpful for DeviceSanitizer developers.

For example, to enable "use-after-free" with 5MB quarantine cache and debug message in kernel, you need to

UR_LAYER_ASAN_OPTIONS="quarantine_size_mb:5;debug:1" ./sycl_app

@xtian-github
Copy link

@intel/unified-runtime-reviewers who is the right person to do the merge? We need to merge this PR for Aurora deliverable asap. Thanks.

@dm-vodopyanov dm-vodopyanov changed the title [DeviceSanitizer] Checking "sycl::free" related errors [SYCL][DeviceSanitizer] Checking "sycl::free" related errors Apr 17, 2024
@dm-vodopyanov
Copy link
Contributor

@intel/unified-runtime-reviewers who is the right person to do the merge? We need to merge this PR for Aurora deliverable asap. Thanks.

@xtian-github the process is described here: https://github.com/intel/llvm/blob/sycl/CONTRIBUTING.md#merge

We still need an approval from @intel/unified-runtime-reviewers.

sycl/test-e2e/AddressSanitizer/bad-free/bad-free-host.cpp Outdated Show resolved Hide resolved
sycl/test-e2e/AddressSanitizer/bad-free/bad-free-host.cpp Outdated Show resolved Hide resolved
h.single_task<class MyKernel>([=]() { *array = 0; });
});
Q.wait();
// CHECK-DEBUG: [kernel]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is used to test UR_LAYER_ASAN_OPTIONS=debug:1 runtime flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what is it supposed to be, exactly? Imagine someone not familiar with sanitizer implementation reading this test, what could they find?

Are you even sure this single line should be tested at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, array leaks.

Copy link
Contributor Author

@AllanZyne AllanZyne Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what is it supposed to be, exactly? Imagine someone not familiar with sanitizer implementation reading this test, what could they find?

In fact, it's used to print debug message in libdevice (you can see "__AsanDebug" in "libdevice/sanitizer_utils.cpp").

Are you even sure this single line should be tested at all?

Makes sense.
Currently, UR_LAYER_ASAN_OPTIONS=debug:1 is more useful for us (sanitizer developers).
Besides, we're going to remove some of messages in release build for performance.

Okay, I'll remove this test!

@@ -0,0 +1,17 @@
// REQUIRES: linux, cpu
// RUN: %{build} %device_sanitizer_flags -O2 -g -o %t
// RUN: env SYCL_PREFER_UR=1 UR_LAYER_ASAN_OPTIONS=debug:1 %{run} %t 2>&1 | FileCheck --check-prefixes CHECK-DEBUG %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it work without UR_ENABLE_LAYERS=UR_LAYER_ASAN?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing this PR.

Copy link
Contributor Author

@AllanZyne AllanZyne Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UR_ENABLE_LAYERS=UR_LAYER_ASAN is used to forcedly enable ASan Layer in UR loader, which is the runtime support for device sanitizer.
We enable UR_LAYER_ASAN automatically by checking kernel image property.
But some tests don't have any kernel, so I have to enable UR_LAYER_ASAN manually.
Our users are unlikely to need enable device ASan without any kernel, I think it's fine to just use this flag for tests.

@kbenzie
Copy link
Contributor

kbenzie commented Apr 17, 2024

@intel/unified-runtime-reviewers who is the right person to do the merge? We need to merge this PR for Aurora deliverable asap. Thanks.

@xtian-github the process is described here: https://github.com/intel/llvm/blob/sycl/CONTRIBUTING.md#merge

We still need an approval from @intel/unified-runtime-reviewers.

We need to merge oneapi-src/unified-runtime#1402 first, its next in line. Once that's in the UNIFIED_RUNTIME_TAG will need to be updated, then we'll approve and it can ben merged.

@xtian-github
Copy link

@cdai2 @wenju-he Would you address the review comments? Thanks.

Copy link
Contributor

@kbenzie kbenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UR LGTM

Copy link

@xtian-github xtian-github left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AllanZyne
Copy link
Contributor Author

bad-free-host.cpp

I'll address them asap. Thanks!

@AllanZyne
Copy link
Contributor Author

It seems like that the CI test failure is not related to this PR.

@kbenzie
Copy link
Contributor

kbenzie commented Apr 18, 2024

@intel/llvm-gatekeepers please merge

@AllanZyne
Copy link
Contributor Author

Hi @intel/llvm-gatekeepers, all comments have been addressed, please merge. Thanks!

@steffenlarsen steffenlarsen merged commit 4723efc into intel:sycl Apr 19, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.