-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] remove dependencies from memcpy
, memset
and kernel launch
#2075
base: develop
Are you sure you want to change the base?
Conversation
// Execute task | ||
if constexpr(is_sycl_task<TTask> && !is_sycl_kernel<TTask>) // Copy / Fill | ||
{ | ||
m_last_event = task(m_queue, m_dependencies); // Will call queue.{copy, fill} internally | ||
m_last_event = task(m_queue); // Will call queue.{copy, fill} internally | ||
} | ||
else | ||
{ | ||
m_last_event = m_queue.submit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks dangerously like a race condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean between the call to enqueue()
and a possible call to get_last_event()
or empty()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about the assignment here. What happens if two threads enqueue work concurrently? Do we have a guarantee that m_last_event
will actually refer to the last event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see.
You are right, if multiple threads enqueue a task to the same queue, there is no guarantee about which of the two events would be stored in m_last_event
.
Possible solutions:
- add back the lock, and use it only in
enqueue()
,empty()
,get_last_event()
(and maybewait()
? probably not) - remove the
m_last_event
member and
- implement
get_last_event()
with a call tom_queue.ext_oneapi_submit_barrier()
- implement
empty()
with a call tom_queue.ext_oneapi_empty()
Unfortunately according to the extension itself,ext_oneapi_empty()
is not supported by the OpenCL back0end :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the extension link:
Currently support for OpenCL backend is limited, API introduced by this extension can be called only for in-order queues which doesn’t have discard_events property. Exception is thrown if new API is called on other type of queue. OpenCL currently doesn’t have an API to get queue status.
We are using in-order SYCL queues internally, so I think we should be fine, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
I guess we can try and see if it works :-)
34f2463
to
1ff18f4
Compare
Okay, the checks are looking good. Have you executed the runtime tests? |
Yes, i've built and run the tests for CPU and GPU with the usual result (all pass on GPU, 4 out of resources on CPU) |
(I'll check also the impact on the performance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view this is good to go. @fwyzard feel free to merge once the performance analysis is done.
Converting to draft until the performance analysis is ready. |
The situation is worse than expected, apparently there are many more synchronization introduced with these changes.
Doing some profiling, these are the main differences:
|
What are those numbers? Runtime? |
Yes, sorry. |
Ah, I see. So lower numbers are worse. That is indeed a change for the worse then. How did you ensure that the requirements handling is correct in your third approach (only changes to |
Not sure the handling it's correct, it was done mostly to remove the |
Hm, maybe open a bug report for Intel? I'm sure they would be interested to hear about this :-) |
The requirements / dependencies were needed for the buffers, so now can be removed.
SYCL
queue
s andevent
s are thread-safe, the mutex inQueueGenericSyclBase.hpp
has been removed as well.When there is an event to wait for, instead of registering it as a dependency, the method
ext_oneapi_submit_barrier(const std::vector<event> &)
is used.