-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UR][Layer] Add Sanitizer Layer #1074
Conversation
This commit adds the adapter implementations of the virtual memory extension functionality to be used in intel/llvm#8954. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
This reverts commit 84a3afa.
This reverts commit 336ce89.
This reverts commit 9f90e43.
This reverts commit 9ae8a99.
This reverts commit 6ba94ac.
This reverts commit c36bd72.
…ase" This reverts commit 1096b0f.
This reverts commit 67fcd5e.
@@ -34,6 +34,7 @@ option(UR_USE_UBSAN "enable UndefinedBehaviorSanitizer" OFF) | |||
option(UR_USE_MSAN "enable MemorySanitizer" OFF) | |||
option(UR_USE_TSAN "enable ThreadSanitizer" OFF) | |||
option(UR_ENABLE_TRACING "enable api tracing through xpti" OFF) | |||
option(UR_ENABLE_SANITIZER "enable device sanitizer" ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the names UR_ENABLE_SANITIZER and UR_USE_ASAN might be a little confusing in the sense that former is for device only and the latter is for host only. The difference isn't shown in the names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I follow the same naming style as the tracing layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I follow the same naming style as the tracing layer.
I think the confusing still exists even if it is the same naming style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. IMO, the flag to enable layers needs a new naming style. @pbalcer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, UR_ENABLE_LAYER_XYZ
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm,
UR_ENABLE_LAYER_XYZ
?
Yes, this style is very similar to "UR_BUILD_ADAPTER_XYZ"
ur_event_handle_t ReadEvent{}; | ||
|
||
// If kernel has defined SPIR_DeviceSanitizerReportMem, then we try to read it | ||
// to host, but it's okay that it isn't defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which case is SPIR_DeviceSanitizerReportMem not defined?
The below code isn't handling the case when DeviceGlobalVariableRead fails even if SPIR_DeviceSanitizerReportMem exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which case is SPIR_DeviceSanitizerReportMem not defined?
When there are no memory read/write in the kernel, for example, an empty kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better approach is to query if SPIR_DeviceSanitizerReportMem exists and then read. This would cover all fail cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no API to query if the device global variables exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both OpenCL and LevelZero have API to get a pointer to a global variable. Do you mean unified runtime doesn't have this API? Is this a gap in unified runtime?
I don't think it is elegant to write to a global variable without querying if it exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed an issue to track this. #1214
|
||
if (IsNeedMapPhysicalMem) { | ||
// We use fixed GPU PageSize: 64KB | ||
const size_t PageSize = 64 * 1024u; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the page size be queried from device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there's no api doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can zeVirtualMemQueryPageSize used?
If there is no such API, what is the rational of choosing 64K size here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ur_result_t SanitizerInterceptor::addContext(ur_context_handle_t Context) { | ||
auto ContextInfo = std::make_shared<ur_sanitizer_layer::ContextInfo>(); | ||
|
||
// Host Device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment accurate? SYCL doesn't have a host device.
I suppose you meant just Host?
However, the code below DeviceInfo->Type = DeviceType::CPU
is showing a CPU device is initialized.
If the intent is to initialize information for the host, do we currently have logic to handle the coordination between host and a device? I suggest to to remove below code since it is for the future. When the coordination is in place, we can add below code back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you meant just Host?
Yes. I don't think host device is a wrong term. Host code runs on the host device.
If the intent is to initialize information for the host, do we currently have logic to handle the coordination between host and a device?
Yes, we have. I have saved all host USM allocation information in the host device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYCL doesn't have host device. What you meant is just "host". Please don't use "host device".
Yes, we have. I have saved all host USM allocation information in the host device.
Could you please point me to the code where you saved the info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unified-runtime/source/loader/layers/sanitizer/asan_interceptor.cpp
Lines 91 to 92 in 79998b9
// Device is nullptr if Type == USMMemoryType::HOST | |
auto DeviceInfo = ContextInfo->getDeviceInfo(Device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to offline sync, @AllanZyne would you remove this part of code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
auto DeviceInfo = ContextInfo->getDeviceInfo(Device); | ||
std::shared_ptr<DeviceInfo> DeviceInfo; | ||
if (Device) { | ||
DeviceInfo = ContextInfo->getDeviceInfo(Device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: brace not needed for single line code according to llvm coding style. What's the coding style using in UR? Please also see other changes in this commit.
Is the 4-space indent aligning with UR coding style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the coding style using in UR? Please also see other changes in this commit.
I searched some of UR codes, they add braces for single line code as well.
Is the 4-space indent aligning with UR coding style?
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Hi, I think I've got 2 approves, is this PR ready to merge? |
Not until it passes all of the GitHub Actions. |
Okay, I've noticed them, and I submitted a fix for them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your PR is now failing due to the use of getenv on Windows in a third-party component (xpti). This is a known issue, #1206 should suppress the problem.
Is there a scheduling to fix this #1206 issue? |
#1206 has now been merged, please update this branch with the latest changes on the main branch. |
Thanks! Done. |
When will this PR be ready to merge? I'm willing to follow the process to make it land ASAP. |
Once the build passes it should be ready to merge.
|
This PR added a new loader layer, named "sanitizer", which implemented the runtime part of device-side sanitizers: AddressSanitizer (
UR_LAYER_ASAN
), MemorySanitizer (UR_LAYER_MSAN
, planned), and ThreadSanitizer (UR_LAYER_TSAN
, planned).Currently, it implemented some basic functions for PVC on the level-zero adapter, which can detect and report out-of-bounds errors on PVC.