DAOS-15682 dfuse: Perform reads in larger chunks. #14212

ashleypittman · 2024-04-22T17:15:40Z

When dfuse sees I/O as well-aligned 128k reads then read MB at
a time and cache the result allowing for faster read bandwidth
for well behaved applicaions and large files.

Create a new in-memory descriptor for file contents, pull in a
whole descriptor on first read and perform all other reads from
the same result.

This should give much higher bandwidth for well behaved applications
and should be easy to extend to proper readahead in future.

Test-tag: test_dfuse_caching_check

Signed-off-by: Ashley Pittman ashley.m.pittman@intel.com

github-actions · 2024-04-22T17:25:50Z

Ticket title is 'dfuse read bandwidth greater without caching or with direct IO.'
Status is 'In Review'
Labels: 'scrubbed_2.8,triaged'
https://daosio.atlassian.net/browse/DAOS-15682

daosbuild1 · 2024-04-22T19:27:13Z

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/2/testReport/

When dfuse sees I/O as well-aligned 128k reads then read MB at a time and cache the result allowing for faster read bandwidth for well behaved applicaions and large files. Create a new in-memory descriptor for file contents, pull in a whole descriptor on first read and perform all other reads from the same result. This should give much higher bandwidth for well behaved applications and should be easy to extend to proper readahead in future. Test-tag: test_dfuse_caching_check Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

daosbuild1 · 2024-04-23T08:16:18Z

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/4/execution/node/1317/log

Test-tag: dfuse

daosbuild1 · 2024-04-23T21:44:15Z

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/5/testReport/

Test-tag: dfuse Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

daosbuild1 · 2024-04-24T12:24:56Z

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/6/testReport/

Test-tag: dfuse

daosbuild1 · 2024-04-25T19:58:23Z

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/7/testReport/

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

Test-tag: test_dfuse_caching_check Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

daosbuild1 · 2024-04-26T14:38:46Z

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/11/execution/node/1316/log

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_dfuse_caching_check Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

daosbuild1 · 2024-04-26T17:00:01Z

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/13/execution/node/902/log

daosbuild1 · 2024-04-26T19:12:42Z

Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/14/execution/node/380/log

daosbuild1 · 2024-04-26T19:12:56Z

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/14/execution/node/383/log

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_dfuse_caching_check Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

daosbuild1 · 2024-04-26T19:13:29Z

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/14/execution/node/353/log

daosbuild1 · 2024-04-26T19:14:35Z

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/14/execution/node/274/log

Features: dfuse,-test_dfuse_find,-test_dfuse_daos_build_wt_pil4dfs Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

utils/node_local_test.py

src/client/dfuse/ops/read.c

daosbuild1 · 2024-04-30T22:58:47Z

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/27/testReport/

Features: dfuse Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

knard-intel

Mostly looks good to me, from what I understand.
Regarding the limitations notified by @mchaarawi , I have the following newbie questions:

does user read < 128k, will be done by a 128k read by the kernel ?
does user read > 128k but <1MB will be serialized with n 128k read by the kernel ?

src/client/dfuse/ops/read.c

knard-intel · 2024-05-02T09:07:35Z

src/client/dfuse/ops/read.c

+ }
+ cc = ie->ie_chunk;
+
+ d_list_for_each_entry(cd, &cc->entries, list)


Storing the buckets in a LRU list could have a high search cost when it will contains a lot of entries.
Maybe a structure such as a binary search tree could be more efficient ?

It could - entry count is key here. Entries are removed from the list when all slots in a bucket have been read so there should not really be more than one entry per client process outside of some very contrived I/O patterns and it'll only be that high if the app is using single-shard-file and clients are reading different offsets in it.

I have to admit, that I am not familiar enough with HPC and AI use cases to know if this unfavorable use case is common.

Just to be sure to understand: if a client do a first read of an aligned 128k, and never read the following 7 128K. Then the entry will stay in the cache as long as the client do not close the inode ?

Just to be sure to understand: if a client read do first aligned read of 128k, and never read the following 7 128K. Then the entry will stay in the cache as long as the client do not close the inode ?

Almost - file handle rather than inode but yes. The cache is tied to the inode but is dropped when the number of open file handles on that inode drops to zero.

Edit: Re-reading this I've just realised that close and close are homonyms so I've re-worded my reply to avoid ambiguity.

Thanks for the explanation.

knard-intel

LGTM for what I understand.

daosbuild1 · 2024-05-02T19:38:00Z

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/28/execution/node/1451/log

daosbuild1 · 2024-05-02T21:55:14Z

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/28/testReport/

Required-githooks: true Signed-off-by: Jeff Olivier <jeffolivier@google.com>

daosbuild1 · 2024-05-30T22:41:51Z

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/29/testReport/

Allow-unstable-test: true Required-githooks: true Signed-off-by: Jeff Olivier <jeffolivier@google.com>

daosbuild1 · 2024-06-03T13:30:21Z

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14212/30/execution/node/1462/log

Features: dfuse

daosbuild1 · 2024-09-23T13:11:35Z

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14212/31/testReport/

Quick-Functional: true Test-tag: test_ior_small Test-repeat: 10 This reverts commit c757af5. Required-githooks: true Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>

ashleypittman force-pushed the amd/dfuse-chunk-read branch from 9886fd4 to 08120b0 Compare April 22, 2024 17:34

ashleypittman force-pushed the amd/dfuse-chunk-read branch from 08120b0 to 9580c48 Compare April 22, 2024 19:37

ashleypittman force-pushed the amd/dfuse-chunk-read branch from 9580c48 to 5a2ca11 Compare April 22, 2024 20:02

Merge branch 'master' into amd/dfuse-chunk-read

2fa4c31

Test-tag: dfuse

Merge branch 'master' into amd/dfuse-chunk-read

79fda3b

Test-tag: dfuse Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

Merge branch 'master' into amd/dfuse-chunk-read

aae10fd

Test-tag: dfuse

ashleypittman added 3 commits April 26, 2024 07:59

Merge branch 'master' into amd/dfuse-chunk-read

11eb551

Check the size after read.

5ca6b95

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

Set the len to 0.

3e98bf3

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

ashleypittman force-pushed the amd/dfuse-chunk-read branch from 44497f6 to 9f806da Compare April 26, 2024 11:16

Fix an issue with position.

be55b28

Test-tag: test_dfuse_caching_check Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

ashleypittman force-pushed the amd/dfuse-chunk-read branch from 9f806da to be55b28 Compare April 26, 2024 11:17

ashleypittman added 2 commits April 26, 2024 15:08

Merge branch 'master' into amd/dfuse-chunk-read

0986e7c

Log when there is a short read.

2a4173a

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_dfuse_caching_check Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

Merge branch 'master' into amd/dfuse-chunk-read

a963751

Improve error logging.

1dd78ea

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_dfuse_caching_check Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

ashleypittman force-pushed the amd/dfuse-chunk-read branch from a3ed79d to 1dd78ea Compare April 26, 2024 19:13

ashleypittman requested review from jolivier23, johannlombardi and knard-intel April 30, 2024 15:33

Fix formatting.

a43df49

Features: dfuse,-test_dfuse_find,-test_dfuse_daos_build_wt_pil4dfs Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

ashleypittman force-pushed the amd/dfuse-chunk-read branch from ec29bae to a43df49 Compare April 30, 2024 16:18

daltonbohning reviewed Apr 30, 2024

View reviewed changes

utils/node_local_test.py Outdated Show resolved Hide resolved

mchaarawi reviewed Apr 30, 2024

View reviewed changes

src/client/dfuse/ops/read.c Show resolved Hide resolved

src/client/dfuse/ops/read.c Show resolved Hide resolved

ashleypittman added 2 commits May 2, 2024 07:06

Merge branch 'master' into amd/dfuse-chunk-read

fb632ff

Fix merge.

2bd4256

Features: dfuse Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>

knard-intel reviewed May 2, 2024

View reviewed changes

knard-intel approved these changes May 2, 2024

View reviewed changes

wiliamhuang approved these changes May 2, 2024

View reviewed changes

mchaarawi approved these changes May 2, 2024

View reviewed changes

jolivier23 added 3 commits May 19, 2024 22:07

Merge branch 'master' into amd/dfuse-chunk-read

1810c0a

Merge branch 'master' into amd/dfuse-chunk-read

eab66e3

Features: dfuse

b035b8a

Required-githooks: true Signed-off-by: Jeff Olivier <jeffolivier@google.com>

jolivier23 added 2 commits May 31, 2024 12:41

Merge branch 'master' into amd/dfuse-chunk-read

b12800f

Features: dfuse

89822d9

Allow-unstable-test: true Required-githooks: true Signed-off-by: Jeff Olivier <jeffolivier@google.com>

johannlombardi approved these changes Jun 17, 2024

View reviewed changes

Merge branch 'master' into amd/dfuse-chunk-read

149b8fc

Features: dfuse

ashleypittman requested a review from a team September 26, 2024 12:33

daltonbohning merged commit c757af5 into master Sep 26, 2024
55 checks passed

daltonbohning deleted the amd/dfuse-chunk-read branch September 26, 2024 13:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAOS-15682 dfuse: Perform reads in larger chunks. #14212

DAOS-15682 dfuse: Perform reads in larger chunks. #14212

ashleypittman commented Apr 22, 2024

github-actions bot commented Apr 22, 2024 •

edited

Loading

daosbuild1 commented Apr 22, 2024

daosbuild1 commented Apr 23, 2024

daosbuild1 commented Apr 23, 2024

daosbuild1 commented Apr 24, 2024

daosbuild1 commented Apr 25, 2024

daosbuild1 commented Apr 26, 2024

daosbuild1 commented Apr 26, 2024

daosbuild1 commented Apr 26, 2024

daosbuild1 commented Apr 26, 2024

daosbuild1 commented Apr 26, 2024

daosbuild1 commented Apr 26, 2024

daosbuild1 commented Apr 30, 2024

knard-intel left a comment

knard-intel May 2, 2024 •

edited

Loading

ashleypittman May 2, 2024

knard-intel May 2, 2024 •

edited

Loading

ashleypittman May 2, 2024 •

edited

Loading

knard-intel May 2, 2024

knard-intel left a comment

daosbuild1 commented May 2, 2024

daosbuild1 commented May 2, 2024

daosbuild1 commented May 30, 2024

daosbuild1 commented Jun 3, 2024

daosbuild1 commented Sep 23, 2024

DAOS-15682 dfuse: Perform reads in larger chunks. #14212

DAOS-15682 dfuse: Perform reads in larger chunks. #14212

Conversation

ashleypittman commented Apr 22, 2024

github-actions bot commented Apr 22, 2024 • edited Loading

daosbuild1 commented Apr 22, 2024

daosbuild1 commented Apr 23, 2024

daosbuild1 commented Apr 23, 2024

daosbuild1 commented Apr 24, 2024

daosbuild1 commented Apr 25, 2024

daosbuild1 commented Apr 26, 2024

daosbuild1 commented Apr 26, 2024

daosbuild1 commented Apr 26, 2024

daosbuild1 commented Apr 26, 2024

daosbuild1 commented Apr 26, 2024

daosbuild1 commented Apr 26, 2024

daosbuild1 commented Apr 30, 2024

knard-intel left a comment

Choose a reason for hiding this comment

knard-intel May 2, 2024 • edited Loading

Choose a reason for hiding this comment

ashleypittman May 2, 2024

Choose a reason for hiding this comment

knard-intel May 2, 2024 • edited Loading

Choose a reason for hiding this comment

ashleypittman May 2, 2024 • edited Loading

Choose a reason for hiding this comment

knard-intel May 2, 2024

Choose a reason for hiding this comment

knard-intel left a comment

Choose a reason for hiding this comment

daosbuild1 commented May 2, 2024

daosbuild1 commented May 2, 2024

daosbuild1 commented May 30, 2024

daosbuild1 commented Jun 3, 2024

daosbuild1 commented Sep 23, 2024

github-actions bot commented Apr 22, 2024 •

edited

Loading

knard-intel May 2, 2024 •

edited

Loading

knard-intel May 2, 2024 •

edited

Loading

ashleypittman May 2, 2024 •

edited

Loading