Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve metadata speed for big directories #272

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

dmga44
Copy link
Collaborator

@dmga44 dmga44 commented Jan 6, 2025

[Must be merged after #268 ]

This commit changes the old flat_map that was advised to use for an
ordered_statistics structure that ensures good performance in every
case. The custom flat_map structure previously used had O(n) insertion
and elimination costs, the new one has O(logn) cost for both of those
operations. The goal of this change is to speedup the metadata
operations for directories with lots of files.

rolysr
rolysr previously approved these changes Jan 6, 2025
Copy link
Collaborator

@rolysr rolysr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good improvement !! Just few comments/doubts below.

utils/big_session_metadata_benchmark.cc Outdated Show resolved Hide resolved
src/master/filesystem_node.cc Show resolved Hide resolved
src/master/filesystem_checksum.cc Outdated Show resolved Hide resolved
Copy link
Collaborator

@uristdwarf uristdwarf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP review, some of my comments apply to #268

src/master/filesystem_store.cc Outdated Show resolved Hide resolved
src/master/filesystem_store.cc Outdated Show resolved Hide resolved
src/master/filesystem_store.cc Outdated Show resolved Hide resolved
src/master/filesystem_store.cc Outdated Show resolved Hide resolved
src/master/filesystem_store.cc Outdated Show resolved Hide resolved
@dmga44 dmga44 force-pushed the big-directories-metadata-speedup branch from 0ee4fab to 568ae81 Compare January 6, 2025 18:53
@dmga44 dmga44 force-pushed the big-directories-metadata-speedup branch from 568ae81 to b104bff Compare January 7, 2025 05:28
Copy link
Collaborator

@uristdwarf uristdwarf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP review 2/3

src/master/chunks.cc Outdated Show resolved Hide resolved
src/master/filesystem_node.cc Outdated Show resolved Hide resolved
src/master/filesystem_node_types.h Show resolved Hide resolved
@dmga44 dmga44 force-pushed the big-directories-metadata-speedup branch from b104bff to f6b6cf0 Compare January 8, 2025 06:42
Copy link
Collaborator

@uristdwarf uristdwarf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest looks good to me.

@dmga44 dmga44 force-pushed the big-directories-metadata-speedup branch from f6b6cf0 to fdacc58 Compare January 21, 2025 10:49
This commit changes the old flat_map that was advised to use for an
ordered_statistics structure that ensures good performance in every
case. The custom flat_map structure previously used had O(n) insertion
and elimination costs, the new one has O(logn) cost for both of those
operations. The goal of this change is to speedup the metadata
operations for directories with lots of files.

Signed-off-by: Dave <dave@leil.io>
@dmga44 dmga44 force-pushed the big-directories-metadata-speedup branch from fdacc58 to 5280a71 Compare January 21, 2025 18:02
Copy link
Contributor

@ralcolea ralcolea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @dmga44 👍 🔥 💪

typedef flat_map<hstorage::Handle *, FSNode *,
std::vector<std::pair<hstorage::Handle *, FSNode *>>,
HandleCompare>
typedef __gnu_pbds::tree<std::pair<hstorage::Handle *, FSNode *>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to use modern C++ using instruction instead of typedef in the definition of EntriesContainer.

Consider also to apply the same change in the current definition of iterator and const_iterator that also use typedef.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants