-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16686 dfuse: Improve concurrent overlapping read handling #15298
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Ticket title is 'Concurrent reads hit the network even when caching enabled in dfuse' |
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test stage Functional Hardware Large completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15298/4/testReport/ |
… leak Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/6/execution/node/357/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/6/execution/node/356/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/6/execution/node/351/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/6/execution/node/348/log |
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15298/8/testReport/ |
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test-tag: dfuse Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15298/9/testReport/ |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/9/execution/node/1479/log |
Test-tag: dfuse
Test-tag: dfuse Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test-tag: dfuse Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test stage NLT on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15298/11/display/redirect |
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15298/12/testReport/ |
src/client/dfuse/dfuse.h
Outdated
ACTION(READ_EOF_M) \ | ||
ACTION(READ_CON) \ | ||
ACTION(READ_BUCKET) \ | ||
ACTION(READ_BUCKET_M) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra metrics in dfuse is a good thing but I'm thinking of removing these from this PR. They were certainly useful to detect/different code-paths but I think I'd rather extend these outside of a bug-fix PR.
src/tests/ftest/dfuse/read.py
Outdated
@@ -23,7 +23,9 @@ def test_dfuse_pre_read(self): | |||
Read one large file entirely using pre-read. Read a second smaller file to ensure that | |||
the first file leaves the flag enabled. | |||
|
|||
:avocado: tags=all,full_regression | |||
TODO: Check this with 128k I/O size and non-128K I/O size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised this isn't be called out by the linting check. We should either do this or remove the comment before landing and add a ticket as a reminder.
src/tests/ftest/dfuse/read.py
Outdated
:avocado: tags=all,full_regression | ||
TODO: Check this with 128k I/O size and non-128K I/O size. | ||
|
||
:avocado: tags=all,daily_regression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming that moving this from weekly to daily is intentional (I thought I had asked this before, but couldn't find the question in the ticket). According to the last weekly it only takes a minute to run, so the move should be fine.
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/12/execution/node/1480/log |
Test-tag: dfuse Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15298/13/testReport/ |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/13/execution/node/1463/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/14/execution/node/331/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/14/execution/node/328/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/14/execution/node/384/log |
Test-tag: DaosBuild Skip-fault-injection-test: true Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
f4fa810
to
3a2bbd1
Compare
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15298/15/execution/node/1372/log |
This is consistently failing one of the tests, taking back to draft whilst I investigate. |
Skip-func-test-vm: true Test-tag: DaosBuild Skip-fault-injection-test: true Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Test-tag: dfuse Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Handle concurrent read in the chunk_read code. Rather than assuming
each slot only gets requested once save the slot number as part of the
request and handle multiple requests.
This corrects the behaviour and avoids a crash when multiple readers read
the same file concurrently and improves the performance in this case.