-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-15682 dfuse: Perform reads in larger chunks. #14212
Conversation
Ticket title is 'dfuse read bandwidth greater without caching or with direct IO.' |
9886fd4
to
08120b0
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/2/testReport/ |
08120b0
to
9580c48
Compare
When dfuse sees I/O as well-aligned 128k reads then read MB at a time and cache the result allowing for faster read bandwidth for well behaved applicaions and large files. Create a new in-memory descriptor for file contents, pull in a whole descriptor on first read and perform all other reads from the same result. This should give much higher bandwidth for well behaved applications and should be easy to extend to proper readahead in future. Test-tag: test_dfuse_caching_check Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
9580c48
to
5a2ca11
Compare
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/4/execution/node/1317/log |
Test-tag: dfuse
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/5/testReport/ |
Test-tag: dfuse Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/6/testReport/ |
Test-tag: dfuse
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/7/testReport/ |
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
44497f6
to
9f806da
Compare
Test-tag: test_dfuse_caching_check Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
9f806da
to
be55b28
Compare
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/11/execution/node/1316/log |
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_dfuse_caching_check Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/13/execution/node/902/log |
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/14/execution/node/380/log |
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/14/execution/node/383/log |
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_dfuse_caching_check Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
a3ed79d
to
1dd78ea
Compare
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/14/execution/node/353/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/14/execution/node/274/log |
Features: dfuse,-test_dfuse_find,-test_dfuse_daos_build_wt_pil4dfs Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
ec29bae
to
a43df49
Compare
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/27/testReport/ |
Features: dfuse Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good to me, from what I understand.
Regarding the limitations notified by @mchaarawi , I have the following newbie questions:
- does user read < 128k, will be done by a 128k read by the kernel ?
- does user read > 128k but <1MB will be serialized with n 128k read by the kernel ?
} | ||
cc = ie->ie_chunk; | ||
|
||
d_list_for_each_entry(cd, &cc->entries, list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing the buckets in a LRU list could have a high search cost when it will contains a lot of entries.
Maybe a structure such as a binary search tree could be more efficient ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could - entry count is key here. Entries are removed from the list when all slots in a bucket have been read so there should not really be more than one entry per client process outside of some very contrived I/O patterns and it'll only be that high if the app is using single-shard-file and clients are reading different offsets in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit, that I am not familiar enough with HPC and AI use cases to know if this unfavorable use case is common.
Just to be sure to understand: if a client do a first read of an aligned 128k, and never read the following 7 128K. Then the entry will stay in the cache as long as the client do not close the inode ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure to understand: if a client read do first aligned read of 128k, and never read the following 7 128K. Then the entry will stay in the cache as long as the client do not close the inode ?
Almost - file handle rather than inode but yes. The cache is tied to the inode but is dropped when the number of open file handles on that inode drops to zero.
Edit: Re-reading this I've just realised that close and close are homonyms so I've re-worded my reply to avoid ambiguity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for what I understand.
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/28/execution/node/1451/log |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/28/testReport/ |
Required-githooks: true Signed-off-by: Jeff Olivier <jeffolivier@google.com>
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/29/testReport/ |
Allow-unstable-test: true Required-githooks: true Signed-off-by: Jeff Olivier <jeffolivier@google.com>
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/30/execution/node/1462/log |
Features: dfuse
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/31/testReport/ |
Quick-Functional: true Test-tag: test_ior_small Test-repeat: 10 This reverts commit c757af5. Required-githooks: true Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>
When dfuse sees I/O as well-aligned 128k reads then read MB at
a time and cache the result allowing for faster read bandwidth
for well behaved applicaions and large files.
Create a new in-memory descriptor for file contents, pull in a
whole descriptor on first read and perform all other reads from
the same result.
This should give much higher bandwidth for well behaved applications
and should be easy to extend to proper readahead in future.
Test-tag: test_dfuse_caching_check
Signed-off-by: Ashley Pittman ashley.m.pittman@intel.com