-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add warnings against resetting pipeline before end of epoch and test parallel ES with fw iterator #4023
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
!build |
CI MESSAGE: [5209598]: BUILD STARTED |
if self._last_iter: | ||
if not self._last_iter: | ||
# resetting before some external source raised StopIteration is a no-op | ||
if self._input_callbacks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How legitimate use case is pipeline with external source that is infinite + FW iterator with iterator size passed explicitly? Because in that case, the warning will be triggered too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷 I guess it still can happen and we should behave consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😟I thought that one would be useful if one has two or more pipelines with (P)ES that raise StopIteration and should be reset but for some reason the epochs diverge. Either because the number of iterations is really (but unintentionally) different in those two or because one uses .run API, with prefetch_queue_depth 1 and resets all pipelines when the first one raises, not letting the others actually reach end of epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I could safegaurd this check with pipeline._epoch_idx> 0. It seems that if it has ever been incremented, then there must be ES that raises StopIteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷 I guess it still can happen and we should behave consistently.
I think that we should warn that something may be not right and if te user provides -1
as the size it should work silently.
CI MESSAGE: [5209598]: BUILD PASSED |
# For one, when prefetching, parallel external source will reuse buffers | ||
# that might be still referenced by no_copy input fed to pipeline | ||
if not self.empty(): | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is possible, prefetch queue depth is 2, batches to consume is 1, we can still schedule one more run. The native part can overschedule - it will just wait for the empty output buffer, but the ES may fail (parallel mode with nocopy, but the regular ES does copy and have an internal queue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that it should be supported? By possible you mean that using fw iterators correctly you may still end up in such situation?
If that should be supported:
- Should we warn only if we have PES and treat it as extra limitation of schedule api that is not there otherwise.
- Or the PES needs to be adjusted to handle that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we warn only if we have PES and treat it as extra limitation of schedule api that is not there otherwise.
I think so. You can still use this API without the FW iterator.
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
CI MESSAGE: [8224548]: BUILD FAILED |
Other (e.g. Documentation, Tests, Configuration)
Category:
Description:
size
that is incorrect (i.e. different than what follows from external source raising StopIteration or one of the pipelines raises while the other do not) it may result in premature resetting of the pipelines, which further can lead to different issues: from empty epochs to corrupted data.Adding the warnings in hope that it will be helpful for users or us handling gh issues to have those warnings in logs.
Adding the PES tests to iterators, as the existing tests focused on a pipeline's run API which is not the one used by fw iterators.
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-2864