Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP profiling submission time query returns weird values #12904

Open
steffenlarsen opened this issue Mar 5, 2024 · 5 comments · May be fixed by #14269
Open

HIP profiling submission time query returns weird values #12904

steffenlarsen opened this issue Mar 5, 2024 · 5 comments · May be fixed by #14269
Labels
bug Something isn't working hip Issues related to execution on HIP backend.

Comments

@steffenlarsen
Copy link
Contributor

steffenlarsen commented Mar 5, 2024

Describe the bug

In #12838 it seems like the submission time on HIP is giving weird values. I did a bit of digging and it seems to me like HIP is a little different from CUDA when checking timing-differences between events. Of particular interest here is the following line for hipEventElapsedTime():

Events which are recorded in a NULL stream will block until all commands on all other streams complete execution, and then record the timestamp.

While what we expect here is to get an event with the current time, hence using an otherwise unused stream. Fixing it might be outside the scope of this PR, but a possible solution could be to lazily have a stream specifically for recording submission time of events, tied to the context. Similar could be used in the CUDA backend to avoid the assumption noted above.

Originally posted by @steffenlarsen in oneapi-src/unified-runtime#1400 (comment)

To reproduce

No response

Environment

No response

Additional context

This affects #12838, but should be reproducible on normal profiling queues. When this is fixed, sycl/test-e2e/ProfilingTag/ tests should be enabled for HIP.

@steffenlarsen steffenlarsen added bug Something isn't working hip Issues related to execution on HIP backend. labels Mar 5, 2024
@konradkusiak97
Copy link
Contributor

konradkusiak97 commented Mar 22, 2024

Hi @steffenlarsen, I've been trying to reproduce this issue and have been playing with the fix for it but now I concluded that we're already doing what you've suggested. I think the relevant line in the hip adapter is here: https://github.com/oneapi-src/unified-runtime/blob/a504ead8b9fa3b70ca90d40e21bd417eee2f204b/source/adapters/hip/event.cpp#L64

We're calling hipEventRecord with Queue->get() which invokes Queue->getNextComputeStream and that provides a new stream for the recording. However, we do get reports about bad event recordings in hip so I'm still questioning if there is more to it. Do you have any reproducible for your issue that I could try out?

So far I've been trying to check if the following two asynchronous events would give the same timings, even if we change their order:

  event e = q.submit([&](handler &h) {
    h.parallel_for(array_size_small, [=](id<1> i) {
      sum[i] = a[i] + b[i]; 
   });
  });

  event e2 = q.submit([&](handler &h) {
    h.parallel_for(array_size_big, [=](id<1> i) {
      sumTwo[i] = c[i] + d[i];
    });
  });

Assuming that the second event runs much longer than the first one if we're using the NULL stream, the first event will wait until the second is finished (according to what you've found above). But that's not what I've seen, the kernel with array_size_small gives the same timing results no matter if it runs in the first or the second event. So the root cause might be somewhat more complicated.

@steffenlarsen
Copy link
Contributor Author

@konradkusiak97 - I do not know for certain if the exact issue is as I described, but I would say the quote is a glaring issue compared to the semantics of the corresponding CUDA interfaces. The intention of evQueued is to record the "device event" representing the time of the call to the given command. In CUDA that is okay to do with the NULL-stream, assuming it is unused (which itself can be a pretty volatile assumption) as it should just record the "now". However, based on the quote above from the HIP documentation, events recorded to the NULL-stream in HIP does not work that way and so even if there are other issues, this is definitely an issue as well.

Arguably evBase could also be an issue (https://github.com/oneapi-src/unified-runtime/blob/a504ead8b9fa3b70ca90d40e21bd417eee2f204b/source/adapters/hip/context.cpp#L58) if there are non-SYCL related work on HIP in the program, but I suspect that's much less likely.

For tests showing weird behavior, you could try enabling sycl/test-e2e/ProfilingTag tests for HIP.

@konradkusiak97
Copy link
Contributor

konradkusiak97 commented May 20, 2024

You're right, for EvQueued it definitely sounds like it shouldn't be on the NULL stream for HIP. I`ve finally got what you've meant 🙂 I made the changes to it here: oneapi-src/unified-runtime#1634

@steffenlarsen
Copy link
Contributor Author

Thank you, @konradkusiak97! Would you mind checking if we can reenable the profiling tag tests on HIP?

@konradkusiak97
Copy link
Contributor

Checked it a while ago actually, they all seem to pass on gfx90a 👍

steffenlarsen added a commit to steffenlarsen/llvm that referenced this issue Jun 24, 2024
oneapi-src/unified-runtime#1634 is believed to
have fixed the issues for HIP in the profiling tag extension. This
commit reenables the tests for HIP.

Closes intel#12904.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hip Issues related to execution on HIP backend.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants