Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[L0] Support for counter-based events using L0 driver #1370

Merged
merged 3 commits into from
Apr 26, 2024

Conversation

winstonzhang-intel
Copy link
Contributor

@winstonzhang-intel winstonzhang-intel commented Feb 22, 2024

Counter-based events implementation.
Counter-based events can be enabled via the flag UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1

Below are the three conformance tests that ensures that this implementation is working as expected:

PASS: SYCL :: InorderQueue/in_order_buffs.cpp (1 of 1)

Testing Time: 6.09s

Total Discovered Tests: 1
  Passed: 1 (100.00%)

PASS: SYCL :: InorderQueue/in_order_kernels.cpp (1 of 1)

Testing Time: 6.29s

Total Discovered Tests: 1
  Passed: 1 (100.00%)


PASS: SYCL :: Basic/in_order_queue_status.cpp (1 of 1)

Testing Time: 7.55s

Total Discovered Tests: 1
  Passed: 1 (100.00%)

LLVM Draft with CI passing: intel/llvm#12848
Rebased against Raiyan's in-order list implementation

@codecov-commenter
Copy link

codecov-commenter commented Feb 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 12.43%. Comparing base (78ef1ca) to head (c87300d).
Report is 199 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1370      +/-   ##
==========================================
- Coverage   14.82%   12.43%   -2.40%     
==========================================
  Files         250      241       -9     
  Lines       36220    36242      +22     
  Branches     4094     4111      +17     
==========================================
- Hits         5369     4506     -863     
- Misses      30800    31732     +932     
+ Partials       51        4      -47     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@winstonzhang-intel winstonzhang-intel force-pushed the counter-based-events branch 2 times, most recently from 9e3024c to de6f32e Compare February 28, 2024 08:41
@mabraham
Copy link

mabraham commented Feb 28, 2024

It's great to see things starting!

@@ -1498,12 +1522,11 @@ ur_event_handle_t ur_queue_handle_t_::getEventFromQueueCache(bool IsMultiDevice,
// visible pool.
// \param HostVisible tells if the event must be created in the
// host-visible pool. If not set then this function will decide.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc missing

ur_queue_handle_t Queue, ur_event_handle_t *Event, ur_command_t CommandType,
ur_command_list_ptr_t CommandList, bool IsInternal, bool IsMultiDevice,
std::optional<bool> HostVisible,
std::optional<bool> usingCounterBasedEvents) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage for optional<bool> compared to plain bool or always looking it up from Queue internally?

If we keep optional<bool>, then in several places where we are calling .value() on it the code is simpler if we write it like .value_or(false).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, let me fix that.

source/adapters/level_zero/queue.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@MichalMrozek MichalMrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. you can only use counter based events for in order queues, they do not work for out of order queues, you need to check sycl queue parameters instead of parsing over ze queues.
  2. you cannot call reset on those events
  3. why you disable fences ?
  4. why you create all events with immediate and non immediate flags? Driver wouldn't be able to optimize this. Events should be selected basing on usage
  5. why you add synchronous mode when counter based events are present ?
  6. event pools are per context which can have in order and out of order queues, you need to differentiate when you obtain events to not get counter-based event for ooq.
  7. you cannot evaluate each time ur_queue_handle_t_::usingCounterBasedEvents() , this needs to be const flag set at queue creation, in order queue cannot be made out of order queue, map browsing would kill perf

@winstonzhang-intel
Copy link
Contributor Author

@MichalMrozek Thanks for the feedback.

  1. you can only use counter based events for in order queues, they do not work for out of order queues, you need to check sycl queue parameters instead of parsing over ze queues.
    I do make sure that we only use counter-based events for inOrderQueue. In additionally, I have to make sure that for the ZeQueues that are being used they have the ZE_COMMAND_QUEUE_FLAG_IN_ORDER flag
  2. you cannot call reset on those events
    Good catch, fixed
  3. why you disable fences ?
    My understanding was that with counter-based events, we no longer need barriers between the different lists, therefore, I removed the fences if we are using counter-based events.
  4. why you create all events with immediate and non immediate flags? Driver wouldn't be able to optimize this. Events should be selected basing on usage
    I agree, at first I was taking the L0 cts as example. But come to think of it this makes sense.
  5. why you add synchronous mode when counter based events are present?
    Same as above.
  6. event pools are per context which can have in order and out of order queues, you need to differentiate when you obtain events to not get counter-based event for ooq.
  7. you cannot evaluate each time ur_queue_handle_t_::usingCounterBasedEvents() , this needs to be const flag set at queue creation, in order queue cannot be made out of order queue, map browsing would kill perf
    Agree, fixed.

Copy link
Contributor

@MichalMrozek MichalMrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this change has dependency on in order support

source/adapters/level_zero/event.cpp Outdated Show resolved Hide resolved
@@ -1124,8 +1129,8 @@ ur_result_t ur_event_handle_t_::reset() {

if (!isHostVisible())
HostVisibleEvent = nullptr;

ZE2UR_CALL(zeEventHostReset, (ZeEvent));
if (!UrQueue->usingCounterBasedEvents())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that event should know this instead of going to queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I can keep track of it during the creation of event. However, if it is passed in I would have no way of querying eventpool_desc.pnext from eventpool

source/adapters/level_zero/queue.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/queue.cpp Outdated Show resolved Hide resolved
@@ -1818,6 +1850,9 @@ ur_queue_handle_t_::ur_queue_group_t::getZeQueue(uint32_t *QueueGroupOrdinal) {
if (QueueIndex != 0) {
ZeCommandQueueDesc.flags = ZE_COMMAND_QUEUE_FLAG_EXPLICIT_ONLY;
}
if (Queue->usingCounterBasedEvents()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be other way around, if queue is in order , then you can create counter based events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZeQueue hasn't been created at this point. If we are using counter-based event then the zequeue needs to reflect that.

Copy link
Contributor

@raiyanla raiyanla Mar 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @MichalMrozek is referring to the application Queue being passed in AKA the UR Queue object in this line. You can check if this Queue was created to be in-order using the Queue->isInOrderQueue(), since this is set on ur_queue creation.

Also I believe it should also be noted that ZE_COMMAND_QUEUE_FLAG_IN_ORDER flag should only used to create immediate command lists specifically using zeCommandListCreateImmediate(), while I see this function is used to create an ordinary queue for regular command lists using zeCommandQueueCreate(). Please correct me if I'm wrong @MichalMrozek

Reference at: https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=ze_command_queue_flag_in_order#_CPPv4N23ze_command_queue_flag_t30ZE_COMMAND_QUEUE_FLAG_IN_ORDERE

Anyhow, the PR I have up for merging soon related to in-order at #1372 should cover the setting of these flags (ZE_COMMAND_QUEUE_FLAG_IN_ORDER for immediate command lists & ZE_COMMAND_LIST_FLAG_IN_ORDER for regular command lists) appropriately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raiyanla Thanks for your input. Queue->usingCounterBasedEvents() does check for whether the Queue is in order with Queue->isInOrderQueue(). It also checks for ZE_COMMAND_QUEUE_FLAG_IN_ORDER, which I believe is also required for zeCommandQueueCreate(). See example in L0 coformance test here

Copy link
Contributor

@raiyanla raiyanla Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winstonzhang-intel Gotcha, yeah I just saw your usingCounterBasedEvents() already checks for isInOrderQueue() inside of it now. Thanks for linking that L0 conformance test, that is interesting, it seems to conflict with the specification definition of it above, since the spec link indicates its ZE_COMMAND_QUEUE_FLAG_IN_ORDER should be for immediate command lists only. @MichalMrozek Can you confirm which is is appropriate? Since the spec and conformance tests are conflicting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For regular command lists the flag is ZE_COMMAND_LIST_FLAG_IN_ORDER , passed in command list descriptor
For immediate command lists the flag is ZE_COMMAND_QUEUE_FLAG_IN_ORDER , passed in queue descriptor

ZE2UR_CALL(zeCommandListCreate, (Context->ZeContext, Device->ZeDevice,
&ZeCommandListDesc, &ZeCommandList));

ZE2UR_CALL(zeFenceCreate, (ZeCommandQueue, &ZeFenceDesc, &ZeFence));
if (!usingCounterBasedEvents()) {
ZE2UR_CALL(zeFenceCreate, (ZeCommandQueue, &ZeFenceDesc, &ZeFence));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not disable fences, they are still needed to recycle command lists.

@winstonzhang-intel winstonzhang-intel force-pushed the counter-based-events branch 2 times, most recently from e7901bc to b5cc052 Compare March 5, 2024 01:34
@winstonzhang-intel winstonzhang-intel changed the title [UR] Draft for adding support for counter-based events [L0] Support for counter-based events using L0 driver Mar 5, 2024
@winstonzhang-intel winstonzhang-intel marked this pull request as ready for review March 5, 2024 01:52
@winstonzhang-intel winstonzhang-intel requested a review from a team as a code owner March 5, 2024 01:52
ur_result_t
EventCreate(ur_context_handle_t Context, ur_queue_handle_t Queue,
bool IsMultiDevice, bool HostVisible, ur_event_handle_t *RetEvent,
std::optional<bool> CounterBasedEventEnabled = std::nullopt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::optional<bool> CounterBasedEventEnabled = std::nullopt);
bool CounterBasedEventEnabled = false);

?

@@ -468,7 +468,8 @@ static const uint32_t MaxNumEventsPerPool = [] {

ur_result_t ur_context_handle_t_::getFreeSlotInExistingOrNewPool(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's ought to be a better way of allocating an event that doesn't require passing around half a dozen arguments to multiple functions. Maybe some sort of flag?

auto event = Pool->allocateEvent(HOST_VISIBLE | ENABLE_PROFILER | COUNTER_BASED);

having functions that accept multiple boolean values is error prone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbalcer Thanks for the feedback! We could create a new enum just for this case. However, this seems to be a bit overkill just for this one function. It is only used once event.cpp during event creation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just this function. All the various boolean event parameters are being passed around in multiple functions. Just from a quick search:
createEventAndAssociateQueue, getEventFromQueueCache, EventCreate, getEventFromContextCache, getFreeSlotInExistingOrNewPool, getZeEventPoolCache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbalcer This is quite a large refactoring. Functions with these booleans are abundant. Additionally, it would make sense to add this in ur_api, however, we would need to add this in spec before I can do so. Perhaps I can file a new ticket and have this be its own patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this was just a suggestion.

@@ -510,6 +511,12 @@ ur_result_t ur_context_handle_t_::getFreeSlotInExistingOrNewPool(
ZeEventPoolDesc.flags |= ZE_EVENT_POOL_FLAG_HOST_VISIBLE;
if (ProfilingEnabled)
ZeEventPoolDesc.flags |= ZE_EVENT_POOL_FLAG_KERNEL_TIMESTAMP;
if (CounterBasedEventEnabled) {
ZeEventPoolDesc.flags |= ZE_EVENT_POOL_FLAG_HOST_VISIBLE;
ze_event_pool_counter_based_exp_desc_t counterBasedExt = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might need to update the getZeEventPoolCache(HostVisible, ProfilingEnabled, ZeDevice); call above. Otherwise, based on what the queue was that happened to call this function first for this combination of parameters, the event pool might or might not have this flag set.

And, again, I suggest a small refactoring here to use flags (or some other mechanism) instead of passing yet another boolean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Queue should have the flag set on creation. Because of that, all the subsequent eventpools should be cached with counter-based events enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this cache is in the context, so if you have two queues, one with counter-based events enabled and one without, the one that was first in this function will create the event pool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event pools are stored in context not in a queue
There may be different types of queues coming in and they would require different type of events.
Pool cannot return incompatible event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the same mechanism as previously implemented, the eventpool cache returned will be guaranteed to be with counterbased enabled if it is selected, since now there is a portion of the eventpoolcache that is reserved just for counterbased event pools.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this desperately needs a refactor though :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll file a ticket tomorrow for all the refactoring, should be pretty straight forward 👍

@winstonzhang-intel winstonzhang-intel force-pushed the counter-based-events branch 4 times, most recently from 8050b66 to 54d81b8 Compare March 7, 2024 10:20
source/adapters/level_zero/context.cpp Show resolved Hide resolved
} else {
return WithProfiling ? &ZeEventPoolCache[0] : &ZeEventPoolCache[1];
}
profiling_index_a = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what this logic is doing ?
is it to choose cache index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes to choose the cache index depending on hostVisible and CounterBasedEventEnabled.

profiling_index_b = 3;
}
if (CounterBasedEventEnabled) {
profiling_index_a += 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid magic numbers, please move this logic to separate helper function

std::vector<std::unordered_map<ze_device_handle_t, size_t>>
ZeEventPoolCacheDeviceMap{4};
ZeEventPoolCacheDeviceMap{8};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 8 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 EventPoolCacheDeviceMap reserved for each: hostVisible, not hostvisible, device and hostvisible and device and not hostvisible. Since we can't query the ext flags of eventpooldesc from eventpoolhandle, we have to reserve the space in case of each of these situations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you need profiling on top of that?
Which pool would you choose?

If you want to permute all combination this would quickly be out of control.

  • host visible
  • counter based for immediate
  • counter based for non immediate
  • profiling

we are already at 16

if (QueueIndex != 0) {
if (Queue->Device->useDriverInOrderLists() && Queue->isInOrderQueue()) {
ZeCommandQueueDesc.flags = ZE_COMMAND_QUEUE_FLAG_IN_ORDER;
} else if (QueueIndex != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this logic is in else?
it applies to in order queues as well

Copy link
Contributor Author

@winstonzhang-intel winstonzhang-intel Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the in-order list patch Not related to counter-based events.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you re-base ?
the in order patch is merged and has different change here

@@ -1820,6 +1836,11 @@ ur_queue_handle_t_::ur_queue_group_t::getZeQueue(uint32_t *QueueGroupOrdinal) {
if (QueueIndex != 0) {
ZeCommandQueueDesc.flags = ZE_COMMAND_QUEUE_FLAG_EXPLICIT_ONLY;
}
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why code is commented?
please remove if not needed

@winstonzhang-intel winstonzhang-intel force-pushed the counter-based-events branch 2 times, most recently from 41df918 to 0c9b051 Compare March 13, 2024 17:49
Copy link
Contributor

@MichalMrozek MichalMrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please re-base patch after in order command list change, right now it is hard to distinguish what are new additions and what comes from in order lists patch.

int *profiling_index_a,
int *profiling_index_b) {
if (HostVisible) {
*profiling_index_a = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this code more maintainable ?
assign some enum for each pool type, shift by enum sizes ?

Right now it would be very difficult to understand what is happening here and why, also profiling_index_a and profiling_index_b doesn't tell a lot what happens here.

if (QueueIndex != 0) {
if (Queue->Device->useDriverInOrderLists() && Queue->isInOrderQueue()) {
ZeCommandQueueDesc.flags = ZE_COMMAND_QUEUE_FLAG_IN_ORDER;
} else if (QueueIndex != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you re-base ?
the in order patch is merged and has different change here

@winstonzhang-intel
Copy link
Contributor Author

  1. you can only use counter based events for in order queues, they do not work for out of order queues, you need to check sycl queue parameters instead of parsing over ze queues.
  2. you cannot call reset on those events
  3. why you disable fences ?
  4. why you create all events with immediate and non immediate flags? Driver wouldn't be able to optimize this. Events should be selected basing on usage
  5. why you add synchronous mode when counter based events are present ?
  6. event pools are per context which can have in order and out of order queues, you need to differentiate when you obtain events to not get counter-based event for ooq.
  7. you cannot evaluate each time ur_queue_handle_t_::usingCounterBasedEvents() , this needs to be const flag set at queue creation, in order queue cannot be made out of order queue, map browsing would kill perf

So, the CI failure here: urUSMHostAllocTest.Success/Intel_R__oneAPI_Unified_Runtime_over_Level_Zero___Intel_R__Arc_TM__A750_Graphics
is because the reused counting event is coming up as signaled so it immediately completes. Either this is a bug in the L0 gpu driver if the spec says we cannot reset or we are missing something in this implementation.

Issue is fixed in the latest patch, the issue was the desc was not being used.

Thank you @nrspruit for looking into this. Once this patch is merged I will post another for the refactoring that @pbalcer requested.

source/adapters/level_zero/queue.hpp Outdated Show resolved Hide resolved
source/adapters/level_zero/queue.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/event.hpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
@nrspruit nrspruit added the level-zero L0 adapter specific issues label Apr 9, 2024
winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Apr 11, 2024
commit tag: 9b82c1d0a6df26d96797218d6964faf0e6dbc0ce
URT PR: oneapi-src/unified-runtime#1370

Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
@github-actions github-actions bot added the command-buffer Command Buffer feature addition/changes/specification label Apr 12, 2024
Copy link
Contributor

@nrspruit nrspruit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the updated changes.

@nrspruit nrspruit added the ready to merge Added to PR's which are ready to merge label Apr 18, 2024
@pbalcer pbalcer added v0.9.x Include in the v0.9.x release and removed v0.9.x Include in the v0.9.x release labels Apr 25, 2024
Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
Now counter based events will be enabled by default only on PVC+, and
only enabled for immCmdLists.

Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
@kbenzie kbenzie merged commit f4a9497 into oneapi-src:main Apr 26, 2024
51 checks passed
martygrant pushed a commit to intel/llvm that referenced this pull request Apr 26, 2024
…#12848)

commit tag: 4134bfce72d33e89eebcad11186bdf00310bba83
URT PR: oneapi-src/unified-runtime#1370

---------

Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
command-buffer Command Buffer feature addition/changes/specification level-zero L0 adapter specific issues ready to merge Added to PR's which are ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants