Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UR][L0][CUDA][HIP] Add enqueue timestamp recording extension #1400

Merged
merged 30 commits into from
May 8, 2024

Conversation

steffenlarsen
Copy link
Contributor

@steffenlarsen steffenlarsen commented Feb 29, 2024

This commit adds a new extension feature for recording timestamps into events, the information from which can be queried using the existing profiling queries.

This new functionality is currently implemented for L0, CUDA and HIP.

Corresponding SYCL extension implementation: intel/llvm#12838

This commit adds a new extension feature for recording timestamps into
events, the information from which can be queried using the existing
profiling queries.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
include/ur_api.h Outdated
@@ -1647,7 +1649,7 @@ typedef enum ur_device_info_t {
/// - ::UR_RESULT_ERROR_INVALID_NULL_HANDLE
/// + `NULL == hDevice`
/// - ::UR_RESULT_ERROR_INVALID_ENUMERATION
/// + `::UR_DEVICE_INFO_INTEROP_SEMAPHORE_EXPORT_SUPPORT_EXP < propName`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking you meant to delete this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it needs to be the last enumerator, as it's saying any value after that will result in UR_RESULT_ERROR_INVALID_ENUMERATION. I was skeptical at first too, but the generated script diff check insisted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is the generator making the change. It's expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could probably be a bit more clever to do proper validation of the enum ranges.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
@codecov-commenter
Copy link

codecov-commenter commented Feb 29, 2024

Codecov Report

Attention: Patch coverage is 3.86740% with 174 lines in your changes are missing coverage. Please review.

Project coverage is 12.46%. Comparing base (78ef1ca) to head (5db5ba8).
Report is 96 commits behind head on main.

Files Patch % Lines
...onformance/enqueue/urEnqueueTimestampRecording.cpp 0.00% 57 Missing ⚠️
include/ur_print.hpp 0.00% 43 Missing ⚠️
source/loader/ur_ldrddi.cpp 4.54% 21 Missing ⚠️
source/loader/layers/validation/ur_valddi.cpp 14.28% 18 Missing ⚠️
source/loader/layers/tracing/ur_trcddi.cpp 16.66% 10 Missing ⚠️
source/adapters/null/ur_nullddi.cpp 11.11% 8 Missing ⚠️
source/loader/ur_libapi.cpp 0.00% 8 Missing ⚠️
source/loader/ur_print.cpp 0.00% 4 Missing ⚠️
test/conformance/testing/source/utils.cpp 0.00% 3 Missing ⚠️
tools/urinfo/urinfo.hpp 0.00% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1400      +/-   ##
==========================================
- Coverage   14.82%   12.46%   -2.36%     
==========================================
  Files         250      240      -10     
  Lines       36220    36129      -91     
  Branches     4094     4094              
==========================================
- Hits         5369     4504     -865     
- Misses      30800    31621     +821     
+ Partials       51        4      -47     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Copy link
Contributor

@JackAKirk JackAKirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hip/cuda impl and tests LGTM.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
@@ -58,7 +59,7 @@ ur_result_t ur_event_handle_t_::start() {
ur_result_t Result = UR_RESULT_SUCCESS;

try {
if (Queue->URFlags & UR_QUEUE_FLAG_PROFILING_ENABLE) {
if (Queue->URFlags & UR_QUEUE_FLAG_PROFILING_ENABLE || isTimestampEvent()) {
// NOTE: This relies on the default stream to be unused.
UR_CHECK_ERROR(hipEventRecord(EvQueued, 0));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JackAKirk - In intel/llvm#12838 it seems like the submission time on HIP is giving weird values. I did a bit of digging and it seems to me like HIP is a little different from CUDA when checking timing-differences between events. Of particular interest here is the following line for hipEventElapsedTime():

Events which are recorded in a NULL stream will block until all commands on all other streams complete execution, and then record the timestamp.

While what we expect here is to get an event with the current time, hence using an otherwise unused stream. Fixing it might be outside the scope of this PR, but a possible solution could be to lazily have a stream specifically for recording submission time of events, tied to the context. Similar could be used in the CUDA backend to avoid the assumption noted above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@konradkusiak97 this sounds like the same thing as your hip event timings issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened an issue about it here: intel/llvm#12904.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
scripts/core/EXP-ENQUEUE-TIMESTAMP-RECORDING.rst Outdated Show resolved Hide resolved
scripts/core/exp-enqueue-timestamp-recording.yml Outdated Show resolved Hide resolved
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
@steffenlarsen
Copy link
Contributor Author

@oneapi-src/unified-runtime-native-cpu-write & @oneapi-src/unified-runtime-level-zero-write - Friendly ping.

Copy link
Contributor

@PietroGhg PietroGhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native CPU lgtm, thank you

@github-actions github-actions bot added the experimental Experimental feature additions/changes/specification label Apr 12, 2024
@steffenlarsen
Copy link
Contributor Author

@oneapi-src/unified-runtime-level-zero-write Ping.

@steffenlarsen
Copy link
Contributor Author

@oneapi-src/unified-runtime-level-zero-write ping.

Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just a few nits.

source/adapters/level_zero/event.cpp Show resolved Hide resolved
source/adapters/level_zero/event.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/queue.cpp Show resolved Hide resolved
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
@steffenlarsen steffenlarsen added the ready to merge Added to PR's which are ready to merge label Apr 17, 2024
@kbenzie
Copy link
Contributor

kbenzie commented May 6, 2024

@steffenlarsen if you could pull in the latest changes from main and resolve the conflict I can get this merged.

@steffenlarsen
Copy link
Contributor Author

@steffenlarsen if you could pull in the latest changes from main and resolve the conflict I can get this merged.

Thank you, @kbenzie ! Merge commit has been pushed and it should hopefully pass CI again.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
@kbenzie kbenzie merged commit 7ce68e0 into oneapi-src:main May 8, 2024
51 checks passed
steffenlarsen added a commit to steffenlarsen/unified-runtime that referenced this pull request May 8, 2024
oneapi-src#1400 incorrectly
removed some device info enumerator cases due to a fault merge conflict
resolution. This commit adds them back.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
std::unique_ptr<ur_event_handle_t_>(ur_event_handle_t_::makeNative(
UR_COMMAND_TIMESTAMP_RECORDING_EXP, hQueue, CuStream));
UR_CHECK_ERROR(RetImplEvent->start());
UR_CHECK_ERROR(RetImplEvent->record());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late review, but start and record record 3 separate native events. Perhaps it is useful to have a separate EvQueued and EvStart using this extension, but EvEnd should be unnecessary. Do you think it'd be worth having a event::checkpoint member func that minimizes the number of native events recorded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand your suggestion. Using this extension, the timestamp event should behave like a normal event, where the native events have the following behavior:

  • EvQueued: Should finish immediately as this is when the timestamp event was enqueued.
  • EvStart: Should be when the queue finishes.
  • EvEnd: Same as above as there is no work related to the actual enqueued "command".

Since EvStart and EvEnd are mostly the same, we could use just one, but I am not convinced there's so much overhead from recording an additional event that it's worth the additional complexity of having to ignore EvStart or EvEnd if it is a timestamp event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for response. Perhaps we can continue discussion here #1613

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conformance Conformance test suite issues. cuda CUDA adapter specific issues experimental Experimental feature additions/changes/specification hip HIP adapter specific issues level-zero L0 adapter specific issues loader Loader related feature/bug native-cpu Native CPU adapter specific issues opencl OpenCL adapter specific issues ready to merge Added to PR's which are ready to merge specification Changes or additions to the specification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants