[UR][L0][CUDA][HIP] Add enqueue timestamp recording extension #1400

steffenlarsen · 2024-02-29T16:15:19Z

This commit adds a new extension feature for recording timestamps into events, the information from which can be queried using the existing profiling queries.

This new functionality is currently implemented for L0, CUDA and HIP.

Corresponding SYCL extension implementation: intel/llvm#12838

This commit adds a new extension feature for recording timestamps into events, the information from which can be queried using the existing profiling queries. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

source/adapters/cuda/enqueue.cpp

source/adapters/hip/enqueue.cpp

JackAKirk · 2024-02-29T17:19:03Z

include/ur_api.h

@@ -1647,7 +1649,7 @@ typedef enum ur_device_info_t {
 ///     - ::UR_RESULT_ERROR_INVALID_NULL_HANDLE
 ///         + `NULL == hDevice`
 ///     - ::UR_RESULT_ERROR_INVALID_ENUMERATION
-///         + `::UR_DEVICE_INFO_INTEROP_SEMAPHORE_EXPORT_SUPPORT_EXP < propName`


Just checking you meant to delete this line?

Seems like it needs to be the last enumerator, as it's saying any value after that will result in UR_RESULT_ERROR_INVALID_ENUMERATION. I was skeptical at first too, but the generated script diff check insisted.

Yeah, this is the generator making the change. It's expected.

It could probably be a bit more clever to do proper validation of the enum ranges.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

codecov-commenter · 2024-02-29T18:01:13Z

Codecov Report

Attention: Patch coverage is 3.86740% with 174 lines in your changes are missing coverage. Please review.

Project coverage is 12.46%. Comparing base (78ef1ca) to head (5db5ba8).
Report is 96 commits behind head on main.

Files	Patch %	Lines
...onformance/enqueue/urEnqueueTimestampRecording.cpp	0.00%	57 Missing ⚠️
include/ur_print.hpp	0.00%	43 Missing ⚠️
source/loader/ur_ldrddi.cpp	4.54%	21 Missing ⚠️
source/loader/layers/validation/ur_valddi.cpp	14.28%	18 Missing ⚠️
source/loader/layers/tracing/ur_trcddi.cpp	16.66%	10 Missing ⚠️
source/adapters/null/ur_nullddi.cpp	11.11%	8 Missing ⚠️
source/loader/ur_libapi.cpp	0.00%	8 Missing ⚠️
source/loader/ur_print.cpp	0.00%	4 Missing ⚠️
test/conformance/testing/source/utils.cpp	0.00%	3 Missing ⚠️
tools/urinfo/urinfo.hpp	0.00%	2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1400      +/-   ##
==========================================
- Coverage   14.82%   12.46%   -2.36%     
==========================================
  Files         250      240      -10     
  Lines       36220    36129      -91     
  Branches     4094     4094              
==========================================
- Hits         5369     4504     -865     
- Misses      30800    31621     +821     
+ Partials       51        4      -47

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

JackAKirk

hip/cuda impl and tests LGTM.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

steffenlarsen · 2024-03-01T09:08:15Z

source/adapters/hip/event.cpp

@@ -58,7 +59,7 @@ ur_result_t ur_event_handle_t_::start() {
  ur_result_t Result = UR_RESULT_SUCCESS;

  try {
-    if (Queue->URFlags & UR_QUEUE_FLAG_PROFILING_ENABLE) {
+    if (Queue->URFlags & UR_QUEUE_FLAG_PROFILING_ENABLE || isTimestampEvent()) {
      // NOTE: This relies on the default stream to be unused.
      UR_CHECK_ERROR(hipEventRecord(EvQueued, 0));


@JackAKirk - In intel/llvm#12838 it seems like the submission time on HIP is giving weird values. I did a bit of digging and it seems to me like HIP is a little different from CUDA when checking timing-differences between events. Of particular interest here is the following line for hipEventElapsedTime():

Events which are recorded in a NULL stream will block until all commands on all other streams complete execution, and then record the timestamp.

While what we expect here is to get an event with the current time, hence using an otherwise unused stream. Fixing it might be outside the scope of this PR, but a possible solution could be to lazily have a stream specifically for recording submission time of events, tied to the context. Similar could be used in the CUDA backend to avoid the assumption noted above.

@konradkusiak97 this sounds like the same thing as your hip event timings issue?

I have opened an issue about it here: intel/llvm#12904.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

scripts/core/EXP-ENQUEUE-TIMESTAMP-RECORDING.rst

scripts/core/exp-enqueue-timestamp-recording.yml

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

steffenlarsen · 2024-03-12T16:01:15Z

@oneapi-src/unified-runtime-native-cpu-write & @oneapi-src/unified-runtime-level-zero-write - Friendly ping.

PietroGhg

Native CPU lgtm, thank you

steffenlarsen · 2024-04-12T06:27:47Z

@oneapi-src/unified-runtime-level-zero-write Ping.

steffenlarsen · 2024-04-16T07:07:03Z

@oneapi-src/unified-runtime-level-zero-write ping.

pbalcer

lgtm, just a few nits.

source/adapters/level_zero/event.cpp

source/adapters/level_zero/queue.cpp

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

kbenzie · 2024-05-06T10:42:14Z

@steffenlarsen if you could pull in the latest changes from main and resolve the conflict I can get this merged.

steffenlarsen · 2024-05-06T11:19:45Z

@steffenlarsen if you could pull in the latest changes from main and resolve the conflict I can get this merged.

Thank you, @kbenzie ! Merge commit has been pushed and it should hopefully pass CI again.

source/adapters/level_zero/event.cpp

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

oneapi-src#1400 incorrectly removed some device info enumerator cases due to a fault merge conflict resolution. This commit adds them back. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

hdelan · 2024-05-16T09:01:38Z

source/adapters/cuda/enqueue.cpp

+        std::unique_ptr<ur_event_handle_t_>(ur_event_handle_t_::makeNative(
+            UR_COMMAND_TIMESTAMP_RECORDING_EXP, hQueue, CuStream));
+    UR_CHECK_ERROR(RetImplEvent->start());
+    UR_CHECK_ERROR(RetImplEvent->record());


Sorry for late review, but start and record record 3 separate native events. Perhaps it is useful to have a separate EvQueued and EvStart using this extension, but EvEnd should be unnecessary. Do you think it'd be worth having a event::checkpoint member func that minimizes the number of native events recorded?

I am not sure I understand your suggestion. Using this extension, the timestamp event should behave like a normal event, where the native events have the following behavior:

EvQueued: Should finish immediately as this is when the timestamp event was enqueued.

EvStart: Should be when the queue finishes.

EvEnd: Same as above as there is no work related to the actual enqueued "command".

Since EvStart and EvEnd are mostly the same, we could use just one, but I am not convinced there's so much overhead from recording an additional event that it's worth the additional complexity of having to ignore EvStart or EvEnd if it is a timestamp event.

Thanks for response. Perhaps we can continue discussion here #1613

Add enqueue timestamp recording extension

9aceab4

This commit adds a new extension feature for recording timestamps into events, the information from which can be queried using the existing profiling queries. Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

steffenlarsen requested review from a team as code owners February 29, 2024 16:15

steffenlarsen requested a review from JackAKirk February 29, 2024 16:15

steffenlarsen mentioned this pull request Feb 29, 2024

Add enqueue timestamp recording extension #1390

Closed

JackAKirk reviewed Feb 29, 2024

View reviewed changes

source/adapters/cuda/enqueue.cpp Show resolved Hide resolved

JackAKirk reviewed Feb 29, 2024

View reviewed changes

source/adapters/hip/enqueue.cpp Show resolved Hide resolved

JackAKirk reviewed Feb 29, 2024

View reviewed changes

Address comments

578f347

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

Revert CUDA change

b7b0a6f

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

JackAKirk approved these changes Feb 29, 2024

View reviewed changes

steffenlarsen added 4 commits February 29, 2024 22:50

Correctly enable recordings in HIP and CUDA for timestamp events

e08bf52

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

Fix use of event handle

bf5ea14

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

Fix faulty disjunction

206c4b1

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

Fix faulty disjunction 2

b690920

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

steffenlarsen commented Mar 1, 2024

View reviewed changes

Allow event creation to record timing for events

5db5ba8

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

aarongreig approved these changes Mar 4, 2024

View reviewed changes

scripts/core/EXP-ENQUEUE-TIMESTAMP-RECORDING.rst Outdated Show resolved Hide resolved

scripts/core/exp-enqueue-timestamp-recording.yml Outdated Show resolved Hide resolved

steffenlarsen mentioned this pull request Mar 5, 2024

HIP profiling submission time query returns weird values intel/llvm#12904

Open

steffenlarsen added 4 commits March 5, 2024 01:11

Address comments

2f0e050

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

Merge branch 'main' into steffen/record_event

32abd74

Merge branch 'main' into steffen/record_event

4261d04

Amend comments

5caceaf

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

PietroGhg approved these changes Mar 12, 2024

View reviewed changes

github-actions bot added the experimental Experimental feature additions/changes/specification label Apr 12, 2024

Merge branch 'main' into steffen/record_event

e7f496d

pbalcer approved these changes Apr 16, 2024

View reviewed changes

source/adapters/level_zero/event.cpp Show resolved Hide resolved

source/adapters/level_zero/event.cpp Outdated Show resolved Hide resolved

source/adapters/level_zero/queue.cpp Show resolved Hide resolved

steffenlarsen added 2 commits April 16, 2024 02:16

Move timestamp query to after commandlist get

5fd441e

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

Stop making new heap allocations for each recording

2404fe6

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

pbalcer approved these changes Apr 16, 2024

View reviewed changes

nrspruit approved these changes Apr 16, 2024

View reviewed changes

steffenlarsen added 2 commits April 17, 2024 04:16

Merge remote-tracking branch 'intel/main' into steffen/record_event

be01218

Fix diff

73de142

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

steffenlarsen added the ready to merge Added to PR's which are ready to merge label Apr 17, 2024

steffenlarsen added 6 commits April 18, 2024 05:48

Merge remote-tracking branch 'intel/main' into steffen/record_event

4a855ca

Fix diff

4f0bf8c

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

Merge branch 'main' into steffen/record_event

f7fe03e

Remove old use of urPrint

ecb6a82

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

Merge branch 'main' into steffen/record_event

cf13442

Merge remote-tracking branch 'intel/main' into steffen/record_event

c804856

Merge remote-tracking branch 'intel/main' into steffen/record_event

308f1dd

kbenzie requested changes May 7, 2024

View reviewed changes

source/adapters/level_zero/event.cpp Show resolved Hide resolved

steffenlarsen added 2 commits May 7, 2024 02:56

Remove trailing ws

06432bf

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

Add wait-list to get-command-list

84bad6c

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

kbenzie approved these changes May 7, 2024

View reviewed changes

kbenzie merged commit 7ce68e0 into oneapi-src:main May 8, 2024
51 checks passed

kbenzie mentioned this pull request May 8, 2024

[SYCL] Implement sycl_ext_oneapi_profiling_tag extension intel/llvm#12838

Merged

steffenlarsen mentioned this pull request May 8, 2024

[CUDA] Add back device info enums after merge mistake #1588

Merged

hdelan reviewed May 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UR][L0][CUDA][HIP] Add enqueue timestamp recording extension #1400

[UR][L0][CUDA][HIP] Add enqueue timestamp recording extension #1400

steffenlarsen commented Feb 29, 2024 •

edited

Loading

JackAKirk Feb 29, 2024

steffenlarsen Feb 29, 2024

kbenzie Feb 29, 2024

kbenzie Feb 29, 2024

codecov-commenter commented Feb 29, 2024 •

edited

Loading

JackAKirk left a comment

steffenlarsen Mar 1, 2024

JackAKirk Mar 1, 2024

steffenlarsen Mar 5, 2024

steffenlarsen commented Mar 12, 2024

PietroGhg left a comment

steffenlarsen commented Apr 12, 2024

steffenlarsen commented Apr 16, 2024

pbalcer left a comment

kbenzie commented May 6, 2024

steffenlarsen commented May 6, 2024

hdelan May 16, 2024

steffenlarsen May 16, 2024

hdelan May 16, 2024

[UR][L0][CUDA][HIP] Add enqueue timestamp recording extension #1400

[UR][L0][CUDA][HIP] Add enqueue timestamp recording extension #1400

Conversation

steffenlarsen commented Feb 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 29, 2024 • edited Loading

Codecov Report

JackAKirk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steffenlarsen commented Mar 12, 2024

PietroGhg left a comment

Choose a reason for hiding this comment

steffenlarsen commented Apr 12, 2024

steffenlarsen commented Apr 16, 2024

pbalcer left a comment

Choose a reason for hiding this comment

kbenzie commented May 6, 2024

steffenlarsen commented May 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steffenlarsen commented Feb 29, 2024 •

edited

Loading

codecov-commenter commented Feb 29, 2024 •

edited

Loading