-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Executor 2.0 #5528
Executor 2.0 #5528
Conversation
4e618df
to
3dd85e6
Compare
e2f392c
to
bb9b2d5
Compare
c1ecd90
to
fa82135
Compare
CI MESSAGE: [16632816]: BUILD STARTED |
CI MESSAGE: [16632816]: BUILD FAILED |
CI MESSAGE: [16675503]: BUILD FAILED |
CI MESSAGE: [16675793]: BUILD STARTED |
CI MESSAGE: [16675793]: BUILD FAILED |
51e4976
to
ecdbb55
Compare
CI MESSAGE: [16715762]: BUILD STARTED |
CI MESSAGE: [16715762]: BUILD FAILED |
fcc9d27
to
803da65
Compare
aa197c2
to
97dfdcb
Compare
CI MESSAGE: [18167691]: BUILD FAILED |
CI MESSAGE: [18186935]: BUILD STARTED |
daliCreatePipeline2(&handle, serialized.c_str(), serialized.size(), batch_size, num_thread, | ||
this->device_id_, false, false, false, | ||
prefetch_queue_depth, prefetch_queue_depth, prefetch_queue_depth, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Executor 2.0 doesn't provide ExecutorMeta and doesn't support buffer preallocation (because it's against it's very design) - and it can forcibly replace the async-pipelined one if the environment variable is specified. Hence, use non-async, non-pipelined executor in meta and preallocation tests.
const bool pipelined = false; | ||
const bool async = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presizing is not supported for Executor 2.0, which can be foricbly enabled with env. DALI_USE_EXEC2=1
.
23feefc
to
810ad5c
Compare
CI MESSAGE: [18186935]: BUILD PASSED |
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
…ice. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
CI MESSAGE: [18251041]: BUILD STARTED |
CI MESSAGE: [18251041]: BUILD PASSED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments for possible enhancements
template <typename... T> | ||
std::unique_ptr<ExecutorBase> GetExecutorImpl(bool pipelined, bool separated, bool async, | ||
T&&... args) { | ||
if (async && separated && pipelined) { | ||
return std::make_unique<AsyncSeparatedPipelinedExecutor>(std::forward<T>(args)...); | ||
} else if (async && !separated && pipelined) { | ||
return std::make_unique<AsyncPipelinedExecutor>(std::forward<T>(args)...); | ||
if (ForceExec2()) { | ||
std::cerr << "\n!!! Forced use of Executor 2.0 !!!" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nitpick: I'm wondering whether this message fits in cerr
stream - as it's not really an error, more like info. Maybe simple cout
should be sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit fuzzy, but I see it as a warning rather than output.
return cfg; | ||
} | ||
|
||
bool ForceExec2() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe a documentation saying, what does the return value mean would be good here. From the name I'd say this is more like a void
function.
enum class QueueDepthPolicy : int { | ||
FullyBuffered, //< All operators maintain a queue | ||
BackendChange, //< Only operators followed by one with a different backend have a queue | ||
OutputOnly, //< Only the pipeline output has multiple buffers | ||
Legacy = BackendChange, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it hard to understand from this documentation (only this one, the other enums are clear). Could you extend the docs slightly and maybe back it up with some example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll try to clarify.
Single, //< There's just one stream that's used by all operators | ||
PerBackend, //< Operators are scheduled on a stream specific to their backend (mixed or GPU) | ||
PerOperator //< Independent operators are executed on separate streams. | ||
|
||
// TODO(michalz): Check if this is legal with existing operator implementations - likely not | ||
// PerIteration, //< Streams are cycled on a per-iteration basis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add that it's the CUDA stream that we're referring here to. Just to be perfectly clear.
enum class OperatorConcurrency : int { | ||
None, //< at no time can mutliple operators run | ||
Backend, //< operators from different backends can execute in parallel | ||
Full, //< independent operators can run in parallel | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a concurrency limit on operators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of operators that can be run in parallel is limited by the number of threads in the executor.
Otherwise the concurrency is limited by the policy (as described in this enum), the graph topology and the operators being implicitly non-reentrant (you can't run the next iteration of an operator before the previous one is finished).
// Must be 1st member to be destroyed last. | ||
std::optional<DeviceGuard> dtor_guard_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add some test/static_assert that will check this condition? I'm asking, because somebody may by accident add a member above (e.g. in unlabelled private
section right after class
keyword)
Also, how this would relate if somebody adds a static variable e.g in the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope the comment will serve to address the former. I could add a base class, but there's nothing that could stop a careless contributor from adding another one before that.
Unfortunately, I don't think there's a way to have such a check. It's possible only if the enclosing class has no virtual functions (which may be true at the time, but it'd be limiting).
I'll move it to the top of the class.
As for the second - well, that would be extremely bad programming - and adding a static variable that depends on the thread-local state (current device) is asking for trouble to say the least.
dali/python/backend_impl.cc
Outdated
py::gil_scoped_acquire aqr; | ||
{ | ||
auto tmp = std::move(obj_ref); | ||
(void)tmp; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lambda is quite unclear - e.g. why the (void)tmp;
. Could you add a description what happens there and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "trick" stores the object whose lifetime we want to prolong in the lambda closure. When the shared pointer is destroyed, the deleter (the lambda) is invoked. In the lambda, we acquire GIL and destroy the contents of the closure while GIL is held. I've described it in the comments, too.
…plain closure destruction / GIL trick. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
CI MESSAGE: [18263317]: BUILD STARTED |
} | ||
|
||
void Executor2::Init() { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it on purpose that impl_->Init is not called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing to call, so yes. I can mark it as no-op, as suggested below.
} | ||
|
||
void Executor2::ReleaseOutputs() { | ||
// no-op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that it directly says no-op, can you mark other no-op calls as well (Init?, EnableMemoryStats?)
return prefetch_depth_; | ||
} | ||
|
||
OperatorBase *GetOperator(std::string_view input_name) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How likely is that the GetOperator will fail to find a given input_name?
If it is highly unlikely, maybe throwing an exception would be better than notifying about an error by returning a special value?
I understand that this would probably induce more significant refactoring, so this may be a suggestion for future improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an excellent suggestion, but as you rightfully pointed out, it calls for its own PR. I've looked at usages and in many places the caller simply assumes that the function returns non-null. There are a few places where the return value is checked - if the need to have those checks, then I'd rename this function to GetOperatorPtr
and add an inline OperarorBase &GetOperator(sring_view name)
which would check the result of GetOperatorPtr and throw if it's null.
} | ||
|
||
enum class State { | ||
New = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you expect to have more states in the future (if not ignore the comment)? If so, I would ask myself if any transitions could be in invalid, e.g. (Building->Running)? I guess I would be happy to see a state transition method, that would throw an exception if advancing between specific states is not correct. This would give an opportunity to find such bugs.
CI MESSAGE: [18263317]: BUILD FAILED |
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
CI MESSAGE: [18268756]: BUILD STARTED |
CI MESSAGE: [18268756]: BUILD FAILED |
CI MESSAGE: [18268756]: BUILD PASSED |
Category:
New feature (non-breaking change which adds functionality)
Description:
This PR adds the Executor 2.0 facade.
It does not expose it in Python yet nor does it allow the Python user to construct D2H transitions.
The PR consists of 3 parts:
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Existing tests can be reused by specifying DALI_USE_EXEC2=1 in CI arguments (or locally, as an environment variable).
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-4030