Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-15682 dfuse: Perform reads in larger chunks. #14212

Merged
merged 27 commits into from
Sep 26, 2024

Conversation

ashleypittman
Copy link
Contributor

When dfuse sees I/O as well-aligned 128k reads then read MB at
a time and cache the result allowing for faster read bandwidth
for well behaved applicaions and large files.

Create a new in-memory descriptor for file contents, pull in a
whole descriptor on first read and perform all other reads from
the same result.

This should give much higher bandwidth for well behaved applications
and should be easy to extend to proper readahead in future.

Test-tag: test_dfuse_caching_check

Signed-off-by: Ashley Pittman ashley.m.pittman@intel.com

Copy link

github-actions bot commented Apr 22, 2024

Ticket title is 'dfuse read bandwidth greater without caching or with direct IO.'
Status is 'In Review'
Labels: 'scrubbed_2.8,triaged'
https://daosio.atlassian.net/browse/DAOS-15682

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/2/testReport/

When dfuse sees I/O as well-aligned 128k reads then read MB at
a time and cache the result allowing for faster read bandwidth
for well behaved applicaions and large files.

Create a new in-memory descriptor for file contents, pull in a
whole descriptor on first read and perform all other reads from
the same result.

This should give much higher bandwidth for well behaved applications
and should be easy to extend to proper readahead in future.

Test-tag: test_dfuse_caching_check

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/4/execution/node/1317/log

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/5/testReport/

Test-tag: dfuse

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/6/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/7/testReport/

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test-tag: test_dfuse_caching_check

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/11/execution/node/1316/log

Skip-unit-tests: true
Skip-fault-injection-test: true
Test-tag: test_dfuse_caching_check

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/13/execution/node/902/log

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/14/execution/node/383/log

Skip-unit-tests: true
Skip-fault-injection-test: true
Test-tag: test_dfuse_caching_check

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/14/execution/node/353/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/14/execution/node/274/log

Features: dfuse,-test_dfuse_find,-test_dfuse_daos_build_wt_pil4dfs
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
src/client/dfuse/ops/read.c Show resolved Hide resolved
src/client/dfuse/ops/read.c Show resolved Hide resolved
@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/27/testReport/

Features: dfuse

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Copy link
Contributor

@knard-intel knard-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me, from what I understand.
Regarding the limitations notified by @mchaarawi , I have the following newbie questions:

  • does user read < 128k, will be done by a 128k read by the kernel ?
  • does user read > 128k but <1MB will be serialized with n 128k read by the kernel ?

src/client/dfuse/ops/read.c Show resolved Hide resolved
}
cc = ie->ie_chunk;

d_list_for_each_entry(cd, &cc->entries, list)
Copy link
Contributor

@knard-intel knard-intel May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing the buckets in a LRU list could have a high search cost when it will contains a lot of entries.
Maybe a structure such as a binary search tree could be more efficient ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could - entry count is key here. Entries are removed from the list when all slots in a bucket have been read so there should not really be more than one entry per client process outside of some very contrived I/O patterns and it'll only be that high if the app is using single-shard-file and clients are reading different offsets in it.

Copy link
Contributor

@knard-intel knard-intel May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit, that I am not familiar enough with HPC and AI use cases to know if this unfavorable use case is common.

Just to be sure to understand: if a client do a first read of an aligned 128k, and never read the following 7 128K. Then the entry will stay in the cache as long as the client do not close the inode ?

Copy link
Contributor Author

@ashleypittman ashleypittman May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure to understand: if a client read do first aligned read of 128k, and never read the following 7 128K. Then the entry will stay in the cache as long as the client do not close the inode ?

Almost - file handle rather than inode but yes. The cache is tied to the inode but is dropped when the number of open file handles on that inode drops to zero.

Edit: Re-reading this I've just realised that close and close are homonyms so I've re-worded my reply to avoid ambiguity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

Copy link
Contributor

@knard-intel knard-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for what I understand.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/28/execution/node/1451/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/28/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/29/testReport/

Allow-unstable-test: true

Required-githooks: true

Signed-off-by: Jeff Olivier <jeffolivier@google.com>
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/30/execution/node/1462/log

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/31/testReport/

@ashleypittman ashleypittman requested a review from a team September 26, 2024 12:33
@daltonbohning daltonbohning merged commit c757af5 into master Sep 26, 2024
55 checks passed
@daltonbohning daltonbohning deleted the amd/dfuse-chunk-read branch September 26, 2024 13:59
mchaarawi added a commit that referenced this pull request Sep 27, 2024
Quick-Functional: true
Test-tag: test_ior_small
Test-repeat: 10

This reverts commit c757af5.

Required-githooks: true

Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

8 participants