-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[L0] Support for counter-based events using L0 driver #1370
[L0] Support for counter-based events using L0 driver #1370
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1370 +/- ##
==========================================
- Coverage 14.82% 12.43% -2.40%
==========================================
Files 250 241 -9
Lines 36220 36242 +22
Branches 4094 4111 +17
==========================================
- Hits 5369 4506 -863
- Misses 30800 31732 +932
+ Partials 51 4 -47 ☔ View full report in Codecov by Sentry. |
9e3024c
to
de6f32e
Compare
It's great to see things starting! |
source/adapters/level_zero/queue.cpp
Outdated
@@ -1498,12 +1522,11 @@ ur_event_handle_t ur_queue_handle_t_::getEventFromQueueCache(bool IsMultiDevice, | |||
// visible pool. | |||
// \param HostVisible tells if the event must be created in the | |||
// host-visible pool. If not set then this function will decide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc missing
source/adapters/level_zero/queue.cpp
Outdated
ur_queue_handle_t Queue, ur_event_handle_t *Event, ur_command_t CommandType, | ||
ur_command_list_ptr_t CommandList, bool IsInternal, bool IsMultiDevice, | ||
std::optional<bool> HostVisible, | ||
std::optional<bool> usingCounterBasedEvents) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage for optional<bool>
compared to plain bool
or always looking it up from Queue
internally?
If we keep optional<bool>
, then in several places where we are calling .value()
on it the code is simpler if we write it like .value_or(false)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, let me fix that.
8681736
to
2c362a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- you can only use counter based events for in order queues, they do not work for out of order queues, you need to check sycl queue parameters instead of parsing over ze queues.
- you cannot call reset on those events
- why you disable fences ?
- why you create all events with immediate and non immediate flags? Driver wouldn't be able to optimize this. Events should be selected basing on usage
- why you add synchronous mode when counter based events are present ?
- event pools are per context which can have in order and out of order queues, you need to differentiate when you obtain events to not get counter-based event for ooq.
- you cannot evaluate each time ur_queue_handle_t_::usingCounterBasedEvents() , this needs to be const flag set at queue creation, in order queue cannot be made out of order queue, map browsing would kill perf
2c362a6
to
611f2c4
Compare
@MichalMrozek Thanks for the feedback.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this change has dependency on in order support
source/adapters/level_zero/event.cpp
Outdated
@@ -1124,8 +1129,8 @@ ur_result_t ur_event_handle_t_::reset() { | |||
|
|||
if (!isHostVisible()) | |||
HostVisibleEvent = nullptr; | |||
|
|||
ZE2UR_CALL(zeEventHostReset, (ZeEvent)); | |||
if (!UrQueue->usingCounterBasedEvents()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that event should know this instead of going to queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I can keep track of it during the creation of event. However, if it is passed in I would have no way of querying eventpool_desc.pnext from eventpool
source/adapters/level_zero/queue.cpp
Outdated
@@ -1818,6 +1850,9 @@ ur_queue_handle_t_::ur_queue_group_t::getZeQueue(uint32_t *QueueGroupOrdinal) { | |||
if (QueueIndex != 0) { | |||
ZeCommandQueueDesc.flags = ZE_COMMAND_QUEUE_FLAG_EXPLICIT_ONLY; | |||
} | |||
if (Queue->usingCounterBasedEvents()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be other way around, if queue is in order , then you can create counter based events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZeQueue hasn't been created at this point. If we are using counter-based event then the zequeue needs to reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @MichalMrozek is referring to the application Queue being passed in AKA the UR Queue object in this line. You can check if this Queue was created to be in-order using the Queue->isInOrderQueue(), since this is set on ur_queue creation.
Also I believe it should also be noted that ZE_COMMAND_QUEUE_FLAG_IN_ORDER flag should only used to create immediate command lists specifically using zeCommandListCreateImmediate(), while I see this function is used to create an ordinary queue for regular command lists using zeCommandQueueCreate(). Please correct me if I'm wrong @MichalMrozek
Anyhow, the PR I have up for merging soon related to in-order at #1372 should cover the setting of these flags (ZE_COMMAND_QUEUE_FLAG_IN_ORDER for immediate command lists & ZE_COMMAND_LIST_FLAG_IN_ORDER for regular command lists) appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@winstonzhang-intel Gotcha, yeah I just saw your usingCounterBasedEvents() already checks for isInOrderQueue() inside of it now. Thanks for linking that L0 conformance test, that is interesting, it seems to conflict with the specification definition of it above, since the spec link indicates its ZE_COMMAND_QUEUE_FLAG_IN_ORDER should be for immediate command lists only. @MichalMrozek Can you confirm which is is appropriate? Since the spec and conformance tests are conflicting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For regular command lists the flag is ZE_COMMAND_LIST_FLAG_IN_ORDER , passed in command list descriptor
For immediate command lists the flag is ZE_COMMAND_QUEUE_FLAG_IN_ORDER , passed in queue descriptor
source/adapters/level_zero/queue.cpp
Outdated
ZE2UR_CALL(zeCommandListCreate, (Context->ZeContext, Device->ZeDevice, | ||
&ZeCommandListDesc, &ZeCommandList)); | ||
|
||
ZE2UR_CALL(zeFenceCreate, (ZeCommandQueue, &ZeFenceDesc, &ZeFence)); | ||
if (!usingCounterBasedEvents()) { | ||
ZE2UR_CALL(zeFenceCreate, (ZeCommandQueue, &ZeFenceDesc, &ZeFence)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not disable fences, they are still needed to recycle command lists.
e7901bc
to
b5cc052
Compare
source/adapters/level_zero/event.hpp
Outdated
ur_result_t | ||
EventCreate(ur_context_handle_t Context, ur_queue_handle_t Queue, | ||
bool IsMultiDevice, bool HostVisible, ur_event_handle_t *RetEvent, | ||
std::optional<bool> CounterBasedEventEnabled = std::nullopt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::optional<bool> CounterBasedEventEnabled = std::nullopt); | |
bool CounterBasedEventEnabled = false); |
?
@@ -468,7 +468,8 @@ static const uint32_t MaxNumEventsPerPool = [] { | |||
|
|||
ur_result_t ur_context_handle_t_::getFreeSlotInExistingOrNewPool( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's ought to be a better way of allocating an event that doesn't require passing around half a dozen arguments to multiple functions. Maybe some sort of flag?
auto event = Pool->allocateEvent(HOST_VISIBLE | ENABLE_PROFILER | COUNTER_BASED);
having functions that accept multiple boolean values is error prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbalcer Thanks for the feedback! We could create a new enum just for this case. However, this seems to be a bit overkill just for this one function. It is only used once event.cpp during event creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just this function. All the various boolean event parameters are being passed around in multiple functions. Just from a quick search:
createEventAndAssociateQueue, getEventFromQueueCache, EventCreate, getEventFromContextCache, getFreeSlotInExistingOrNewPool, getZeEventPoolCache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbalcer This is quite a large refactoring. Functions with these booleans are abundant. Additionally, it would make sense to add this in ur_api, however, we would need to add this in spec before I can do so. Perhaps I can file a new ticket and have this be its own patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this was just a suggestion.
@@ -510,6 +511,12 @@ ur_result_t ur_context_handle_t_::getFreeSlotInExistingOrNewPool( | |||
ZeEventPoolDesc.flags |= ZE_EVENT_POOL_FLAG_HOST_VISIBLE; | |||
if (ProfilingEnabled) | |||
ZeEventPoolDesc.flags |= ZE_EVENT_POOL_FLAG_KERNEL_TIMESTAMP; | |||
if (CounterBasedEventEnabled) { | |||
ZeEventPoolDesc.flags |= ZE_EVENT_POOL_FLAG_HOST_VISIBLE; | |||
ze_event_pool_counter_based_exp_desc_t counterBasedExt = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might need to update the getZeEventPoolCache(HostVisible, ProfilingEnabled, ZeDevice);
call above. Otherwise, based on what the queue was that happened to call this function first for this combination of parameters, the event pool might or might not have this flag set.
And, again, I suggest a small refactoring here to use flags (or some other mechanism) instead of passing yet another boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Queue should have the flag set on creation. Because of that, all the subsequent eventpools should be cached with counter-based events enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this cache is in the context, so if you have two queues, one with counter-based events enabled and one without, the one that was first in this function will create the event pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
event pools are stored in context not in a queue
There may be different types of queues coming in and they would require different type of events.
Pool cannot return incompatible event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the same mechanism as previously implemented, the eventpool cache returned will be guaranteed to be with counterbased enabled if it is selected, since now there is a portion of the eventpoolcache that is reserved just for counterbased event pools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this desperately needs a refactor though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll file a ticket tomorrow for all the refactoring, should be pretty straight forward 👍
8050b66
to
54d81b8
Compare
} else { | ||
return WithProfiling ? &ZeEventPoolCache[0] : &ZeEventPoolCache[1]; | ||
} | ||
profiling_index_a = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what this logic is doing ?
is it to choose cache index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes to choose the cache index depending on hostVisible and CounterBasedEventEnabled.
profiling_index_b = 3; | ||
} | ||
if (CounterBasedEventEnabled) { | ||
profiling_index_a += 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid magic numbers, please move this logic to separate helper function
std::vector<std::unordered_map<ze_device_handle_t, size_t>> | ||
ZeEventPoolCacheDeviceMap{4}; | ||
ZeEventPoolCacheDeviceMap{8}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 8 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 EventPoolCacheDeviceMap reserved for each: hostVisible, not hostvisible, device and hostvisible and device and not hostvisible. Since we can't query the ext flags of eventpooldesc from eventpoolhandle, we have to reserve the space in case of each of these situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you need profiling on top of that?
Which pool would you choose?
If you want to permute all combination this would quickly be out of control.
- host visible
- counter based for immediate
- counter based for non immediate
- profiling
we are already at 16
source/adapters/level_zero/queue.cpp
Outdated
if (QueueIndex != 0) { | ||
if (Queue->Device->useDriverInOrderLists() && Queue->isInOrderQueue()) { | ||
ZeCommandQueueDesc.flags = ZE_COMMAND_QUEUE_FLAG_IN_ORDER; | ||
} else if (QueueIndex != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this logic is in else?
it applies to in order queues as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the in-order list patch Not related to counter-based events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you re-base ?
the in order patch is merged and has different change here
source/adapters/level_zero/queue.cpp
Outdated
@@ -1820,6 +1836,11 @@ ur_queue_handle_t_::ur_queue_group_t::getZeQueue(uint32_t *QueueGroupOrdinal) { | |||
if (QueueIndex != 0) { | |||
ZeCommandQueueDesc.flags = ZE_COMMAND_QUEUE_FLAG_EXPLICIT_ONLY; | |||
} | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why code is commented?
please remove if not needed
41df918
to
0c9b051
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please re-base patch after in order command list change, right now it is hard to distinguish what are new additions and what comes from in order lists patch.
int *profiling_index_a, | ||
int *profiling_index_b) { | ||
if (HostVisible) { | ||
*profiling_index_a = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this code more maintainable ?
assign some enum for each pool type, shift by enum sizes ?
Right now it would be very difficult to understand what is happening here and why, also profiling_index_a and profiling_index_b doesn't tell a lot what happens here.
source/adapters/level_zero/queue.cpp
Outdated
if (QueueIndex != 0) { | ||
if (Queue->Device->useDriverInOrderLists() && Queue->isInOrderQueue()) { | ||
ZeCommandQueueDesc.flags = ZE_COMMAND_QUEUE_FLAG_IN_ORDER; | ||
} else if (QueueIndex != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you re-base ?
the in order patch is merged and has different change here
0c9b051
to
a504ead
Compare
Thank you @nrspruit for looking into this. Once this patch is merged I will post another for the refactoring that @pbalcer requested. |
4134bfc
to
4ef258c
Compare
f6c1083
to
6e3be6f
Compare
6e3be6f
to
19848ae
Compare
19848ae
to
b492023
Compare
b492023
to
9b82c1d
Compare
commit tag: 9b82c1d0a6df26d96797218d6964faf0e6dbc0ce URT PR: oneapi-src/unified-runtime#1370 Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the updated changes.
Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
Now counter based events will be enabled by default only on PVC+, and only enabled for immCmdLists. Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
cfde409
to
39fcb2b
Compare
…#12848) commit tag: 4134bfce72d33e89eebcad11186bdf00310bba83 URT PR: oneapi-src/unified-runtime#1370 --------- Signed-off-by: Zhang, Winston <winston.zhang@intel.com> Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Counter-based events implementation.
Counter-based events can be enabled via the flag UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
Below are the three conformance tests that ensures that this implementation is working as expected:
LLVM Draft with CI passing: intel/llvm#12848
Rebased against Raiyan's in-order list implementation