Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16686 dfuse: Improve concurrent overlapping read handling #15298

Draft
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

ashleypittman
Copy link
Contributor

@ashleypittman ashleypittman commented Oct 11, 2024

Handle concurrent read in the chunk_read code. Rather than assuming
each slot only gets requested once save the slot number as part of the
request and handle multiple requests.

This corrects the behaviour and avoids a crash when multiple readers read
the same file concurrently and improves the performance in this case.

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Copy link

github-actions bot commented Oct 11, 2024

Ticket title is 'Concurrent reads hit the network even when caching enabled in dfuse'
Status is 'In Progress'
Labels: 'google-cloud-daos'
https://daosio.atlassian.net/browse/DAOS-16686

@ashleypittman ashleypittman changed the title amd/dfuse concurrent read DAOS-16686 dfuse: Improve concurrent overlapping read handling Oct 11, 2024
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15298/4/testReport/

… leak

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/6/execution/node/357/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/6/execution/node/356/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/6/execution/node/351/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/6/execution/node/348/log

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15298/8/testReport/

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test-tag: dfuse

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15298/9/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/9/execution/node/1479/log

Test-tag: dfuse
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@ashleypittman ashleypittman marked this pull request as ready for review November 12, 2024 13:04
@ashleypittman ashleypittman requested review from a team as code owners November 12, 2024 13:04
Test-tag: dfuse
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15298/11/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15298/12/testReport/

Comment on lines 484 to 487
ACTION(READ_EOF_M) \
ACTION(READ_CON) \
ACTION(READ_BUCKET) \
ACTION(READ_BUCKET_M) \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra metrics in dfuse is a good thing but I'm thinking of removing these from this PR. They were certainly useful to detect/different code-paths but I think I'd rather extend these outside of a bug-fix PR.

@@ -23,7 +23,9 @@ def test_dfuse_pre_read(self):
Read one large file entirely using pre-read. Read a second smaller file to ensure that
the first file leaves the flag enabled.

:avocado: tags=all,full_regression
TODO: Check this with 128k I/O size and non-128K I/O size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this isn't be called out by the linting check. We should either do this or remove the comment before landing and add a ticket as a reminder.

:avocado: tags=all,full_regression
TODO: Check this with 128k I/O size and non-128K I/O size.

:avocado: tags=all,daily_regression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming that moving this from weekly to daily is intentional (I thought I had asked this before, but couldn't find the question in the ticket). According to the last weekly it only takes a minute to run, so the move should be fine.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/12/execution/node/1480/log

Test-tag: dfuse

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15298/13/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/13/execution/node/1463/log

@ashleypittman ashleypittman requested review from a team as code owners November 14, 2024 09:36
@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/14/execution/node/331/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/14/execution/node/328/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/14/execution/node/384/log

Test-tag: DaosBuild
Skip-fault-injection-test: true

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/15/execution/node/1372/log

@ashleypittman ashleypittman marked this pull request as draft November 15, 2024 08:30
@ashleypittman
Copy link
Contributor Author

This is consistently failing one of the tests, taking back to draft whilst I investigate.

Skip-func-test-vm: true

Test-tag: DaosBuild
Skip-fault-injection-test: true
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test-tag: dfuse

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants