Add warnings against resetting pipeline before end of epoch and test parallel ES with fw iterator #4023

stiepan · 2022-06-29T09:27:42Z

Other (e.g. Documentation, Tests, Configuration)

Category:

Description:

If someone uses fw iterators and passes size that is incorrect (i.e. different than what follows from external source raising StopIteration or one of the pipelines raises while the other do not) it may result in premature resetting of the pipelines, which further can lead to different issues: from empty epochs to corrupted data.

Adding the warnings in hope that it will be helpful for users or us handling gh issues to have those warnings in logs.
Adding the PES tests to iterators, as the existing tests focused on a pipeline's run API which is not the one used by fw iterators.

Additional information:

Affected modules and functionalities:

Python pipeline
Python fw iterator (plugins)

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: DALI-2864

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>

stiepan · 2022-06-29T10:58:03Z

!build

dali-automaton · 2022-06-29T11:00:07Z

CI MESSAGE: [5209598]: BUILD STARTED

stiepan · 2022-06-29T12:06:22Z

dali/python/nvidia/dali/pipeline.py

-        if self._last_iter:
+        if not self._last_iter:
+            # resetting before some external source raised StopIteration is a no-op
+            if self._input_callbacks:


How legitimate use case is pipeline with external source that is infinite + FW iterator with iterator size passed explicitly? Because in that case, the warning will be triggered too.

🤷 I guess it still can happen and we should behave consistently.

😟I thought that one would be useful if one has two or more pipelines with (P)ES that raise StopIteration and should be reset but for some reason the epochs diverge. Either because the number of iterations is really (but unintentionally) different in those two or because one uses .run API, with prefetch_queue_depth 1 and resets all pipelines when the first one raises, not letting the others actually reach end of epoch.

Well, I could safegaurd this check with pipeline._epoch_idx> 0. It seems that if it has ever been incremented, then there must be ES that raises StopIteration.

🤷 I guess it still can happen and we should behave consistently.

I think that we should warn that something may be not right and if te user provides -1 as the size it should work silently.

dali-automaton · 2022-06-29T12:12:53Z

CI MESSAGE: [5209598]: BUILD PASSED

JanuszL · 2022-06-29T12:56:39Z

dali/python/nvidia/dali/pipeline.py

+                # For one, when prefetching, parallel external source will reuse buffers
+                # that might be still referenced by no_copy input fed to pipeline
+                if not self.empty():
+                    warnings.warn(


I think it is possible, prefetch queue depth is 2, batches to consume is 1, we can still schedule one more run. The native part can overschedule - it will just wait for the empty output buffer, but the ES may fail (parallel mode with nocopy, but the regular ES does copy and have an internal queue).

Do you mean that it should be supported? By possible you mean that using fw iterators correctly you may still end up in such situation?

If that should be supported:

Should we warn only if we have PES and treat it as extra limitation of schedule api that is not there otherwise.

Or the PES needs to be adjusted to handle that as well.

Should we warn only if we have PES and treat it as extra limitation of schedule api that is not there otherwise.

I think so. You can still use this API without the FW iterator.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>

dali-automaton · 2023-05-09T06:34:45Z

CI MESSAGE: [8224548]: BUILD FAILED

stiepan added 5 commits June 28, 2022 15:35

Add warnings against suspicious or erroneous end of epoch conditions

b4e47b7

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>

Add assert_no_warnings utility

2ed59da

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>

Add test for .run API end of epoch warnings

c58c699

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>

Test warnings against unexpected StopIteration in iterators

1d1bd8b

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>

Add parallel external source test to fw iterators

f2ea23b

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>

jantonguirao assigned mzient and prak-nv Jun 29, 2022

Check actual data in the returned samples

cd233bb

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>

stiepan commented Jun 29, 2022

View reviewed changes

JanuszL reviewed Jun 29, 2022

View reviewed changes

Fix pes with generator test resetting one of the piplines too early

b682272

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>

stiepan marked this pull request as draft August 26, 2022 15:28

mzient unassigned prak-nv and mzient Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add warnings against resetting pipeline before end of epoch and test parallel ES with fw iterator #4023

Add warnings against resetting pipeline before end of epoch and test parallel ES with fw iterator #4023

stiepan commented Jun 29, 2022 •

edited

Loading

stiepan commented Jun 29, 2022

dali-automaton commented Jun 29, 2022

stiepan Jun 29, 2022

JanuszL Jun 29, 2022

stiepan Jun 29, 2022

stiepan Jun 29, 2022 •

edited

Loading

JanuszL Jun 29, 2022

dali-automaton commented Jun 29, 2022

JanuszL Jun 29, 2022

stiepan Jun 29, 2022 •

edited

Loading

JanuszL Jun 29, 2022

dali-automaton commented May 9, 2023

Add warnings against resetting pipeline before end of epoch and test parallel ES with fw iterator #4023

Are you sure you want to change the base?

Add warnings against resetting pipeline before end of epoch and test parallel ES with fw iterator #4023

Conversation

stiepan commented Jun 29, 2022 • edited Loading

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

stiepan commented Jun 29, 2022

dali-automaton commented Jun 29, 2022

stiepan Jun 29, 2022

Choose a reason for hiding this comment

JanuszL Jun 29, 2022

Choose a reason for hiding this comment

stiepan Jun 29, 2022

Choose a reason for hiding this comment

stiepan Jun 29, 2022 • edited Loading

Choose a reason for hiding this comment

JanuszL Jun 29, 2022

Choose a reason for hiding this comment

dali-automaton commented Jun 29, 2022

JanuszL Jun 29, 2022

Choose a reason for hiding this comment

stiepan Jun 29, 2022 • edited Loading

Choose a reason for hiding this comment

JanuszL Jun 29, 2022

Choose a reason for hiding this comment

dali-automaton commented May 9, 2023

stiepan commented Jun 29, 2022 •

edited

Loading

stiepan Jun 29, 2022 •

edited

Loading

stiepan Jun 29, 2022 •

edited

Loading