Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inverted mmap inside webdataset reader #5683

Merged
merged 2 commits into from
Oct 23, 2024

Conversation

JanuszL
Copy link
Contributor

@JanuszL JanuszL commented Oct 18, 2024

  • fixes the wrong usage of mmap when the user is not asking for it inside
    webdataset

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

  • fixes the wrong usage of mmap when the user is not asking for it inside
    webdataset

Additional information:

Affected modules and functionalities:

  • webdataset reader
  • loaders

Key points relevant for the review:

  • logic

Tests:

  • Existing tests apply
    • python/reader tests
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

- fix the wrong ussage of mmap when the user is not asking for it inside
  webdataset
- cleans up usage of copy_read_data_

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL
Copy link
Contributor Author

JanuszL commented Oct 18, 2024

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19486724]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19486724]: BUILD PASSED

@@ -69,7 +69,7 @@ void FileLabelLoaderBase<checkpointing_supported>::ReadSample(ImageLabelWrapper
});
Index file_size = current_file->Size();

if (copy_read_data_ || !current_file->CanMemoryMap()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't CanMemoryMap return false even uf copy_read_data_ is true? What about s3:// storage? Quick scan of the code didn't reveal any tests for it, so we might silently break it, can't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy_read_data_ = dont_use_mmap_ || !mmap_reserver_.CanShareMappedData() - https://github.com/NVIDIA/DALI/blob/main/dali/operators/reader/loader/file_label_loader.h#L137

The idea is that copy_read_data_ should reflect CanShareMappedData value I don't think we can break s3 as it doesn't use/shouldn't use mmap. @jantonguirao ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that obtaining CanMemoryMap on a per-file basis is safer than inferring it somehow from CanShareMappedData.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CanShareMappedData is a stronger check as it verifies if we can actually mmap all the files we want at the same time. This is important for single files as we can have many of them opened at a time. CanMemoryMap is whether a given object class can do it, but not necessarily if the OS has enough free FDs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Michal, it is required to check on a per-file basis, because even if your system allows mmaping, if you get an S3 location, you can't call Get on it (it throws)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. If we can mix plain files and S3 objects inside one reader this makes a lot of sense. Removed.

@JanuszL JanuszL added the important-fix Fixes an important issue in the software or development environment. label Oct 22, 2024
@@ -112,7 +112,7 @@ class IndexedFileLoader : public Loader<CPUBackend, IndexedFileLoaderSample, tru
}
next_seek_pos_ = seek_pos + size;

if (opts.use_mmap && current_file_->CanMemoryMap()) {
if (!copy_read_data_) {
auto p = current_file_->Get(size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the current_file is an S3 location, then you will end up trying to call Get, which will throw (can't be done for for S3). I believe you should keep the current_file_->CanMemoryMap() check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. If we can mix plain files and S3 objects inside one reader this makes a lot of sense. Removed.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL JanuszL changed the title Fix inverted mmap inside webdataset reader, cleans copy_read_data_ Fix inverted mmap inside webdataset reader Oct 22, 2024
@JanuszL
Copy link
Contributor Author

JanuszL commented Oct 22, 2024

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19618233]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19618233]: BUILD PASSED

@JanuszL JanuszL merged commit 9a80050 into NVIDIA:main Oct 23, 2024
7 checks passed
@JanuszL JanuszL deleted the fix_mmap_webdataset branch October 23, 2024 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
important-fix Fixes an important issue in the software or development environment.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants