Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UR][Layer] Add Sanitizer Layer #1074

Merged
merged 78 commits into from
Jan 17, 2024
Merged

Conversation

AllanZyne
Copy link
Contributor

@AllanZyne AllanZyne commented Nov 14, 2023

This PR added a new loader layer, named "sanitizer", which implemented the runtime part of device-side sanitizers: AddressSanitizer (UR_LAYER_ASAN), MemorySanitizer (UR_LAYER_MSAN, planned), and ThreadSanitizer (UR_LAYER_TSAN, planned).

Currently, it implemented some basic functions for PVC on the level-zero adapter, which can detect and report out-of-bounds errors on PVC.

AllanZyne and others added 30 commits November 13, 2023 17:44
This commit adds the adapter implementations of the virtual memory
extension functionality to be used in
intel/llvm#8954.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
This reverts commit 84a3afa.
.gitignore Show resolved Hide resolved
@@ -34,6 +34,7 @@ option(UR_USE_UBSAN "enable UndefinedBehaviorSanitizer" OFF)
option(UR_USE_MSAN "enable MemorySanitizer" OFF)
option(UR_USE_TSAN "enable ThreadSanitizer" OFF)
option(UR_ENABLE_TRACING "enable api tracing through xpti" OFF)
option(UR_ENABLE_SANITIZER "enable device sanitizer" ON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the names UR_ENABLE_SANITIZER and UR_USE_ASAN might be a little confusing in the sense that former is for device only and the latter is for host only. The difference isn't shown in the names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow the same naming style as the tracing layer.

Copy link
Contributor

@wenju-he wenju-he Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow the same naming style as the tracing layer.

I think the confusing still exists even if it is the same naming style.

Copy link
Contributor Author

@AllanZyne AllanZyne Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. IMO, the flag to enable layers needs a new naming style. @pbalcer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, UR_ENABLE_LAYER_XYZ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, UR_ENABLE_LAYER_XYZ ?

Yes, this style is very similar to "UR_BUILD_ADAPTER_XYZ"

source/loader/layers/sanitizer/asan_interceptor.hpp Outdated Show resolved Hide resolved
source/loader/layers/sanitizer/common.hpp Outdated Show resolved Hide resolved
source/loader/layers/sanitizer/common.hpp Outdated Show resolved Hide resolved
ur_event_handle_t ReadEvent{};

// If kernel has defined SPIR_DeviceSanitizerReportMem, then we try to read it
// to host, but it's okay that it isn't defined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case is SPIR_DeviceSanitizerReportMem not defined?

The below code isn't handling the case when DeviceGlobalVariableRead fails even if SPIR_DeviceSanitizerReportMem exists.

Copy link
Contributor Author

@AllanZyne AllanZyne Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case is SPIR_DeviceSanitizerReportMem not defined?

When there are no memory read/write in the kernel, for example, an empty kernel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better approach is to query if SPIR_DeviceSanitizerReportMem exists and then read. This would cover all fail cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no API to query if the device global variables exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both OpenCL and LevelZero have API to get a pointer to a global variable. Do you mean unified runtime doesn't have this API? Is this a gap in unified runtime?
I don't think it is elegant to write to a global variable without querying if it exists.

Copy link
Contributor Author

@AllanZyne AllanZyne Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed an issue to track this. #1214

source/loader/layers/sanitizer/asan_interceptor.cpp Outdated Show resolved Hide resolved

if (IsNeedMapPhysicalMem) {
// We use fixed GPU PageSize: 64KB
const size_t PageSize = 64 * 1024u;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the page size be queried from device?

Copy link
Contributor Author

@AllanZyne AllanZyne Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there's no api doing this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can zeVirtualMemQueryPageSize used?

If there is no such API, what is the rational of choosing 64K size here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ur_result_t SanitizerInterceptor::addContext(ur_context_handle_t Context) {
auto ContextInfo = std::make_shared<ur_sanitizer_layer::ContextInfo>();

// Host Device
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment accurate? SYCL doesn't have a host device.
I suppose you meant just Host?
However, the code below DeviceInfo->Type = DeviceType::CPU is showing a CPU device is initialized.

If the intent is to initialize information for the host, do we currently have logic to handle the coordination between host and a device? I suggest to to remove below code since it is for the future. When the coordination is in place, we can add below code back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you meant just Host?

Yes. I don't think host device is a wrong term. Host code runs on the host device.

If the intent is to initialize information for the host, do we currently have logic to handle the coordination between host and a device?

Yes, we have. I have saved all host USM allocation information in the host device.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYCL doesn't have host device. What you meant is just "host". Please don't use "host device".

Yes, we have. I have saved all host USM allocation information in the host device.

Could you please point me to the code where you saved the info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Device is nullptr if Type == USMMemoryType::HOST
auto DeviceInfo = ContextInfo->getDeviceInfo(Device);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to offline sync, @AllanZyne would you remove this part of code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto DeviceInfo = ContextInfo->getDeviceInfo(Device);
std::shared_ptr<DeviceInfo> DeviceInfo;
if (Device) {
DeviceInfo = ContextInfo->getDeviceInfo(Device);
Copy link
Contributor

@wenju-he wenju-he Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: brace not needed for single line code according to llvm coding style. What's the coding style using in UR? Please also see other changes in this commit.

Is the 4-space indent aligning with UR coding style?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the coding style using in UR? Please also see other changes in this commit.

I searched some of UR codes, they add braces for single line code as well.

Is the 4-space indent aligning with UR coding style?

Yes.

Copy link
Contributor

@wenju-he wenju-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@AllanZyne AllanZyne requested a review from kbenzie January 5, 2024 01:56
@AllanZyne
Copy link
Contributor Author

Hi, I think I've got 2 approves, is this PR ready to merge?
Thanks!

@kbenzie
Copy link
Contributor

kbenzie commented Jan 8, 2024

Hi, I think I've got 2 approves, is this PR ready to merge?

Not until it passes all of the GitHub Actions.

@AllanZyne
Copy link
Contributor Author

Hi, I think I've got 2 approves, is this PR ready to merge?

Not until it passes all of the GitHub Actions.

Okay, I've noticed them, and I submitted a fix for them.

Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your PR is now failing due to the use of getenv on Windows in a third-party component (xpti). This is a known issue, #1206 should suppress the problem.

@cdai2
Copy link

cdai2 commented Jan 15, 2024

Your PR is now failing due to the use of getenv on Windows in a third-party component (xpti). This is a known issue, #1206 should suppress the problem.

Is there a scheduling to fix this #1206 issue?
As this PR will not introduce other known issues, Could you please land this PR first? It blocks our following work.

@kbenzie
Copy link
Contributor

kbenzie commented Jan 16, 2024

Your PR is now failing due to the use of getenv on Windows in a third-party component (xpti). This is a known issue, #1206 should suppress the problem.

Is there a scheduling to fix this #1206 issue? As this PR will not introduce other known issues, Could you please land this PR first? It blocks our following work.

#1206 has now been merged, please update this branch with the latest changes on the main branch.

@AllanZyne
Copy link
Contributor Author

Your PR is now failing due to the use of getenv on Windows in a third-party component (xpti). This is a known issue, #1206 should suppress the problem.

Is there a scheduling to fix this #1206 issue? As this PR will not introduce other known issues, Could you please land this PR first? It blocks our following work.

#1206 has not been merged, please update this branch with the latest changes on the main branch.

Thanks! Done.

@AllanZyne
Copy link
Contributor Author

When will this PR be ready to merge? I'm willing to follow the process to make it land ASAP.

@pbalcer
Copy link
Contributor

pbalcer commented Jan 16, 2024

Once the build passes it should be ready to merge.
Windows CI failed again:

  ur_testing.vcxproj -> D:\a\unified-runtime\unified-runtime\build\lib\Debug\ur_testing.lib
D:\a\unified-runtime\unified-runtime\source\loader\layers\sanitizer\asan_interceptor.cpp(102,30): error C2220: the following warning is treated as an error [D:\a\unified-runtime\unified-runtime\build\source\loader\ur_loader.vcxproj]
  test-params.vcxproj -> D:\a\unified-runtime\unified-runtime\build\bin\Debug\test-params.exe
D:\a\unified-runtime\unified-runtime\source\loader\layers\sanitizer\asan_interceptor.cpp(102,30): warning C4244: 'argument': conversion from 'ur_sanitizer_layer::uptr' to 'ur_sanitizer_layer::u32', possible loss of data [D:\a\unified-runtime\unified-runtime\build\source\loader\ur_loader.vcxproj]
D:\a\unified-runtime\unified-runtime\source\loader\layers\sanitizer\asan_interceptor.cpp(422,9): warning C4244: 'argument': conversion from 'ur_sanitizer_layer::uptr' to 'ur_sanitizer_layer::u8', possible loss of data [D:\a\unified-runtime\unified-runtime\build\source\loader\ur_loader.vcxproj]

@pbalcer pbalcer merged commit b51dec0 into oneapi-src:main Jan 17, 2024
51 checks passed
@AllanZyne AllanZyne deleted the sanitizer-pr branch April 3, 2024 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants