[HIP] Implement workaround for hipMemset2D #1395

konradkusiak97 · 2024-02-28T17:58:29Z

There is an issue with hipMemset2D in ROCm prior to 6.0.0 version and this PR adds a workaround for it in commonMemSetLargePattern.

The issue appears only when using a pointer to host pinned memory from hipHostMalloc. I believe such a case hasn't been used until trying to refactor the USM fill (intel/llvm#12702).

Testing intel/llvm: intel/llvm#12898

codecov-commenter · 2024-02-28T18:38:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 12.51%. Comparing base (78ef1ca) to head (19df225).
Report is 96 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1395      +/-   ##
==========================================
- Coverage   14.82%   12.51%   -2.32%     
==========================================
  Files         250      239      -11     
  Lines       36220    35949     -271     
  Branches     4094     4076      -18     
==========================================
- Hits         5369     4498     -871     
- Misses      30800    31447     +647     
+ Partials       51        4      -47

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

GeorgeWeb

Generally looks good to me. Would just prefer the lambda be a free function.

source/adapters/hip/enqueue.cpp

MartinWehking

LGTM!

oneapi-src/unified-runtime#1395

This PR changes the `queue.fill()` implementation to make use of the native functions for a specific backend. It also unifies that implementation with the one for memset, since it is just an 8-bit subset operation of fill. In the CUDA case, both memset and fill are currently calling `urEnqueueUSMFill` which depending on the size of the filling pattern calls either `cuMemsetD8Async`, `cuMemsetD16Async`, `cuMemsetD32Async` or `commonMemSetLargePattern`. Before this patch memset was using the same thing, just beforehand setting patternSize always to 1 byte which resulted in calling `cuMemsetD8Async`. In other backends, the behaviour is analogous. The fill method was just invoking a `parallel_for` to fill the memory with the pattern which was making this operation quite slow. This PR depends on: - oneapi-src/unified-runtime#1395 - oneapi-src/unified-runtime#1412

konradkusiak97 requested a review from a team as a code owner February 28, 2024 17:58

konradkusiak97 requested a review from MartinWehking February 28, 2024 17:58

konradkusiak97 mentioned this pull request Feb 28, 2024

[SYCL] Make queue fill use native functions intel/llvm#12702

Merged

GeorgeWeb reviewed Feb 29, 2024

View reviewed changes

source/adapters/hip/enqueue.cpp Outdated Show resolved Hide resolved

source/adapters/hip/enqueue.cpp Outdated Show resolved Hide resolved

source/adapters/hip/enqueue.cpp Outdated Show resolved Hide resolved

source/adapters/hip/enqueue.cpp Outdated Show resolved Hide resolved

konradkusiak97 force-pushed the improvedQueueFill branch 2 times, most recently from e400d8f to 7a05c32 Compare March 1, 2024 14:01

konradkusiak97 mentioned this pull request Mar 4, 2024

[UR] Bump HIP tag to 1473ed8a intel/llvm#12898

Merged

GeorgeWeb approved these changes Mar 5, 2024

View reviewed changes

MartinWehking approved these changes Mar 5, 2024

View reviewed changes

konradkusiak97 added the ready to merge Added to PR's which are ready to merge label Mar 5, 2024

Implemented workaround for hipMemset2D

f277422

konradkusiak97 force-pushed the improvedQueueFill branch from e6dfc62 to f277422 Compare March 27, 2024 10:52

kbenzie added the hip HIP adapter specific issues label Apr 3, 2024

kbenzie merged commit 1473ed8 into oneapi-src:main Apr 12, 2024
51 checks passed

sarnex pushed a commit to intel/llvm that referenced this pull request Apr 12, 2024

[UR] Bump HIP tag to 1473ed8a (#12898)

16da4ec

oneapi-src/unified-runtime#1395

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HIP] Implement workaround for hipMemset2D #1395

[HIP] Implement workaround for hipMemset2D #1395

konradkusiak97 commented Feb 28, 2024 •

edited

Loading

codecov-commenter commented Feb 28, 2024 •

edited

Loading

GeorgeWeb left a comment

MartinWehking left a comment

[HIP] Implement workaround for hipMemset2D #1395

[HIP] Implement workaround for hipMemset2D #1395

Conversation

konradkusiak97 commented Feb 28, 2024 • edited Loading

codecov-commenter commented Feb 28, 2024 • edited Loading

Codecov Report

GeorgeWeb left a comment

Choose a reason for hiding this comment

MartinWehking left a comment

Choose a reason for hiding this comment

konradkusiak97 commented Feb 28, 2024 •

edited

Loading

codecov-commenter commented Feb 28, 2024 •

edited

Loading