Executor 2.0 #5528

mzient · 2024-06-17T10:55:43Z

Category:

New feature (non-breaking change which adds functionality)

Description:

This PR adds the Executor 2.0 facade.
It does not expose it in Python yet nor does it allow the Python user to construct D2H transitions.

The PR consists of 3 parts:

the implementation of the ExecutorBase interface
the changes in ExecutorFactory that enable the new executor to be used by default when AsyncPipelined executor was used - based on DALI_USE_EXEC2 environment variable
changes in tests that use a different executor type for buffer presizing and ExecutorMeta tests - executor 2.0 provides neither (presizing is incompatible with the dynamic allocation approach)
changes in Python backend related to memory destruction timing (mostly GIL related)
new CI tests that explicitly enable or disable DALI_USE_EXEC2 flag

Additional information:

Affected modules and functionalities:

ExecutorFactory
buffer Presizing tests
ExecutorMeta tests
CI jobs

Key points relevant for the review:

Tests:

Existing tests can be reused by specifying DALI_USE_EXEC2=1 in CI arguments (or locally, as an environment variable).

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: DALI-4030

dali-automaton · 2024-07-16T08:36:54Z

CI MESSAGE: [16632816]: BUILD STARTED

dali-automaton · 2024-07-16T09:20:54Z

CI MESSAGE: [16632816]: BUILD FAILED

dali-automaton · 2024-07-17T15:21:45Z

CI MESSAGE: [16675503]: BUILD FAILED

dali-automaton · 2024-07-17T15:33:34Z

CI MESSAGE: [16675793]: BUILD STARTED

dali-automaton · 2024-07-17T23:19:07Z

CI MESSAGE: [16675793]: BUILD FAILED

dali-automaton · 2024-07-18T18:59:03Z

CI MESSAGE: [16715762]: BUILD STARTED

dali-automaton · 2024-07-19T00:31:40Z

CI MESSAGE: [16715762]: BUILD FAILED

dali/test/python/test_external_source_cupy.py

dali-automaton · 2024-09-05T20:55:52Z

CI MESSAGE: [18167691]: BUILD FAILED

dali-automaton · 2024-09-06T09:20:49Z

CI MESSAGE: [18186935]: BUILD STARTED

mzient · 2024-09-06T09:23:09Z

dali/c_api/c_api_test.cc

+  daliCreatePipeline2(&handle, serialized.c_str(), serialized.size(), batch_size, num_thread,
+                      this->device_id_, false, false, false,
+                      prefetch_queue_depth, prefetch_queue_depth, prefetch_queue_depth, true);


Executor 2.0 doesn't provide ExecutorMeta and doesn't support buffer preallocation (because it's against it's very design) - and it can forcibly replace the async-pipelined one if the environment variable is specified. Hence, use non-async, non-pipelined executor in meta and preallocation tests.

mzient · 2024-09-06T09:25:25Z

dali/pipeline/pipeline_test.cc

+  const bool pipelined = false;
+  const bool async =  false;


Presizing is not supported for Executor 2.0, which can be foricbly enabled with env. DALI_USE_EXEC2=1.

dali-automaton · 2024-09-06T19:18:14Z

CI MESSAGE: [18186935]: BUILD PASSED

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

…ice. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

dali-automaton · 2024-09-09T05:37:32Z

CI MESSAGE: [18251041]: BUILD STARTED

dali-automaton · 2024-09-09T08:48:57Z

CI MESSAGE: [18251041]: BUILD PASSED

szalpal

Added some comments for possible enhancements

szalpal · 2024-09-09T10:24:25Z

dali/pipeline/executor/executor_factory.cc

 template <typename... T>
 std::unique_ptr<ExecutorBase> GetExecutorImpl(bool pipelined, bool separated, bool async,
                                              T&&... args) {
  if (async && separated && pipelined) {
    return std::make_unique<AsyncSeparatedPipelinedExecutor>(std::forward<T>(args)...);
  } else if (async && !separated && pipelined) {
-    return std::make_unique<AsyncPipelinedExecutor>(std::forward<T>(args)...);
+    if (ForceExec2()) {
+      std::cerr << "\n!!! Forced use of Executor 2.0 !!!" << std::endl;


That's a nitpick: I'm wondering whether this message fits in cerr stream - as it's not really an error, more like info. Maybe simple cout should be sufficient?

It's a bit fuzzy, but I see it as a warning rather than output.

szalpal · 2024-09-09T10:25:58Z

dali/pipeline/executor/executor_factory.cc

+  return cfg;
+}
+
+bool ForceExec2() {


I believe a documentation saying, what does the return value mean would be good here. From the name I'd say this is more like a void function.

szalpal · 2024-09-09T10:37:54Z

dali/pipeline/executor/executor2/exec2.h

+enum class QueueDepthPolicy : int {
+  FullyBuffered,  //< All operators maintain a queue
+  BackendChange,  //< Only operators followed by one with a different backend have a queue
+  OutputOnly,     //< Only the pipeline output has multiple buffers
+  Legacy = BackendChange,
+};


I found it hard to understand from this documentation (only this one, the other enums are clear). Could you extend the docs slightly and maybe back it up with some example?

OK, I'll try to clarify.

szalpal · 2024-09-09T10:38:35Z

dali/pipeline/executor/executor2/exec2.h

+  Single,       //< There's just one stream that's used by all operators
+  PerBackend,   //< Operators are scheduled on a stream specific to their backend (mixed or GPU)
+  PerOperator   //< Independent operators are executed on separate streams.
+
+  // TODO(michalz): Check if this is legal with existing operator implementations - likely not
+  // PerIteration, //< Streams are cycled on a per-iteration basis


I'd add that it's the CUDA stream that we're referring here to. Just to be perfectly clear.

szalpal · 2024-09-09T10:46:40Z

dali/pipeline/executor/executor2/exec2.h

+enum class OperatorConcurrency : int {
+  None,      //< at no time can mutliple operators run
+  Backend,   //< operators from different backends can execute in parallel
+  Full,      //< independent operators can run in parallel
+};


Is there a concurrency limit on operators?

The number of operators that can be run in parallel is limited by the number of threads in the executor.
Otherwise the concurrency is limited by the policy (as described in this enum), the graph topology and the operators being implicitly non-reentrant (you can't run the next iteration of an operator before the previous one is finished).

szalpal · 2024-09-09T10:48:19Z

dali/pipeline/executor/executor2/exec2.cc

+  // Must be 1st member to be destroyed last.
+  std::optional<DeviceGuard> dtor_guard_;


Is it possible to add some test/static_assert that will check this condition? I'm asking, because somebody may by accident add a member above (e.g. in unlabelled private section right after class keyword)

Also, how this would relate if somebody adds a static variable e.g in the constructor?

I hope the comment will serve to address the former. I could add a base class, but there's nothing that could stop a careless contributor from adding another one before that.
Unfortunately, I don't think there's a way to have such a check. It's possible only if the enclosing class has no virtual functions (which may be true at the time, but it'd be limiting).
I'll move it to the top of the class.

As for the second - well, that would be extremely bad programming - and adding a static variable that depends on the thread-local state (current device) is asking for trouble to say the least.

szalpal · 2024-09-09T10:58:01Z

dali/python/backend_impl.cc

+    py::gil_scoped_acquire aqr;
+    {
+      auto tmp = std::move(obj_ref);
+      (void)tmp;
+    }


This lambda is quite unclear - e.g. why the (void)tmp;. Could you add a description what happens there and why?

This "trick" stores the object whose lifetime we want to prolong in the lambda closure. When the shared pointer is destroyed, the deleter (the lambda) is invoked. In the lambda, we acquire GIL and destroy the contents of the closure while GIL is held. I've described it in the comments, too.

…plain closure destruction / GIL trick. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

dali-automaton · 2024-09-09T15:09:28Z

CI MESSAGE: [18263317]: BUILD STARTED

mdabek-nvidia · 2024-09-09T14:14:10Z

dali/pipeline/executor/executor2/exec2.cc

+}
+
+void Executor2::Init() {
+}


Is it on purpose that impl_->Init is not called?

There's nothing to call, so yes. I can mark it as no-op, as suggested below.

mdabek-nvidia · 2024-09-09T14:15:57Z

dali/pipeline/executor/executor2/exec2.cc

+}
+
+void Executor2::ReleaseOutputs() {
+  // no-op


I like that it directly says no-op, can you mark other no-op calls as well (Init?, EnableMemoryStats?)

mdabek-nvidia · 2024-09-09T16:28:45Z

dali/pipeline/executor/executor2/exec2.cc

+    return prefetch_depth_;
+  }
+
+  OperatorBase *GetOperator(std::string_view input_name) const {


How likely is that the GetOperator will fail to find a given input_name?
If it is highly unlikely, maybe throwing an exception would be better than notifying about an error by returning a special value?
I understand that this would probably induce more significant refactoring, so this may be a suggestion for future improvement.

That's an excellent suggestion, but as you rightfully pointed out, it calls for its own PR. I've looked at usages and in many places the caller simply assumes that the function returns non-null. There are a few places where the return value is checked - if the need to have those checks, then I'd rename this function to GetOperatorPtr and add an inline OperarorBase &GetOperator(sring_view name) which would check the result of GetOperatorPtr and throw if it's null.

mdabek-nvidia · 2024-09-09T17:11:05Z

dali/pipeline/executor/executor2/exec2.cc

+  }
+
+  enum class State {
+    New = 0,


Do you expect to have more states in the future (if not ignore the comment)? If so, I would ask myself if any transitions could be in invalid, e.g. (Building->Running)? I guess I would be happy to see a state transition method, that would throw an exception if advancing between specific states is not correct. This would give an opportunity to find such bugs.

dali-automaton · 2024-09-09T17:32:22Z

CI MESSAGE: [18263317]: BUILD FAILED

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

dali-automaton · 2024-09-09T19:01:08Z

CI MESSAGE: [18268756]: BUILD STARTED

dali-automaton · 2024-09-09T19:25:19Z

CI MESSAGE: [18268756]: BUILD FAILED

dali-automaton · 2024-09-09T21:31:33Z

CI MESSAGE: [18268756]: BUILD PASSED

mzient force-pushed the executor3 branch 3 times, most recently from 4e618df to 3dd85e6 Compare June 24, 2024 09:07

mzient force-pushed the executor3 branch 3 times, most recently from e2f392c to bb9b2d5 Compare July 4, 2024 12:54

mzient force-pushed the executor3 branch 5 times, most recently from c1ecd90 to fa82135 Compare July 10, 2024 17:45

mzient force-pushed the executor3 branch from 7edcfab to 21a1937 Compare July 16, 2024 08:35

NVIDIA deleted a comment from dali-automaton Jul 16, 2024

mzient force-pushed the executor3 branch from 21a1937 to ac3b4b8 Compare July 16, 2024 10:02

mzient force-pushed the executor3 branch from b8da6ab to ae06b04 Compare July 17, 2024 15:32

mzient force-pushed the executor3 branch 2 times, most recently from 51e4976 to ecdbb55 Compare July 18, 2024 18:42

mzient force-pushed the executor3 branch 2 times, most recently from fcc9d27 to 803da65 Compare July 19, 2024 15:55

github-advanced-security bot found potential problems Jul 19, 2024

View reviewed changes

dali/test/python/test_external_source_cupy.py Fixed Show fixed Hide fixed

mzient force-pushed the executor3 branch 2 times, most recently from aa197c2 to 97dfdcb Compare July 23, 2024 08:01

mzient changed the title ~~[WIP] Executor2~~ Executor 2.0 Sep 5, 2024

mzient mentioned this pull request Sep 6, 2024

Fix multiple initialization attempts in optical flow operator. #5624

Merged

18 tasks

mzient commented Sep 6, 2024

View reviewed changes

mzient force-pushed the executor3 branch 2 times, most recently from 23feefc to 810ad5c Compare September 6, 2024 12:54

mzient and others added 3 commits September 9, 2024 00:28

Executor 2.0 pipeline integration.

67bb59a

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Use environment variable in some L0 tests to control the executor cho…

1c4545d

…ice. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Add -exec2 tests for Xavier.

4bc1d79

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

mzient force-pushed the executor3 branch from 810ad5c to 4bc1d79 Compare September 8, 2024 22:28

Remove test file duplication.

b283738

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

mzient marked this pull request as ready for review September 8, 2024 22:36

dali-automaton assigned mdabek-nvidia and szalpal Sep 9, 2024

szalpal approved these changes Sep 9, 2024

View reviewed changes

Clarify the policies. Move the dtor_guard to the top of the class. Ex…

84ef7f1

…plain closure destruction / GIL trick. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mdabek-nvidia reviewed Sep 9, 2024

View reviewed changes

Add comments; explicitly mark no-op functions.

1c40f54

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

mdabek-nvidia approved these changes Sep 10, 2024

View reviewed changes

mzient merged commit f1a9a4d into NVIDIA:main Sep 10, 2024
6 checks passed

		// Must be 1st member to be destroyed last.
		std::optional<DeviceGuard> dtor_guard_;

Executor 2.0 #5528

Executor 2.0 #5528

Conversation

mzient commented Jun 17, 2024 • edited Loading

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

dali-automaton commented Jul 16, 2024

dali-automaton commented Jul 16, 2024

dali-automaton commented Jul 17, 2024

dali-automaton commented Jul 17, 2024

dali-automaton commented Jul 17, 2024

dali-automaton commented Jul 18, 2024

dali-automaton commented Jul 19, 2024

dali-automaton commented Sep 5, 2024

dali-automaton commented Sep 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Sep 6, 2024

dali-automaton commented Sep 9, 2024

dali-automaton commented Sep 9, 2024

szalpal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Sep 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Sep 9, 2024

dali-automaton commented Sep 9, 2024

dali-automaton commented Sep 9, 2024

dali-automaton commented Sep 9, 2024

mzient commented Jun 17, 2024 •

edited

Loading

mzient Sep 9, 2024 •

edited

Loading