-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16469 dtx: properly handle DTX partial commit #15335
base: master
Are you sure you want to change the base?
Conversation
Ticket title is 'IOR Easy performance low with EC_16P2GX' |
77246bd
to
38f3d76
Compare
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15335/7/execution/node/1565/log |
38f3d76
to
9ec1692
Compare
d0a8ee1
to
742ad75
Compare
rc = dtx_refresh(dth, ioc->ioc_coc); | ||
if (rc == -DER_AGAIN) | ||
goto again; | ||
if (!obj_rpc_is_fetch(rpc) || retry < 30) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if retry > 30 times for fetch, -DER_INPROGRESS will return to client?
then client will retry again right? do you think it is better than always call refresh here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if dtx_refresh
failed so frequently, we will stop the server-side retry and return -DER_INPROGRESS to client. Not sure whether it is better, but It will at least reduce the possibility of too much ULTs caused system busy.
e3898ea
to
f64779a
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15335/12/testReport/ |
f64779a
to
bf15b4f
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15335/13/testReport/ |
bf15b4f
to
b612777
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15335/14/testReport/ |
b612777
to
0657269
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15335/15/testReport/ |
0657269
to
9a17406
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15335/16/testReport/ |
9a17406
to
88358a2
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15335/17/testReport/ |
88358a2
to
778aa52
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15335/18/testReport/ |
778aa52
to
cfb1401
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15335/19/testReport/ |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15335/19/execution/node/1482/log |
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15335/20/testReport/ |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15335/20/execution/node/1462/log |
When a DTX leader globally commit the DTX, it is possible that some DTX participant(s) cannot commit such DTX entry because of kinds of issues, such as network or space trouble. Under such case, the DTX leader needs to keep the active DTX entry persistently for further commit/resync. But it does not means related modification attched to such DTX entry on the leader target cannot be committed, instead, we can commit related modification with only keeping the DTX header. That is enough for the DTX leader to do further DTX commit/resync to handle related former failed DTX participant(s). The benefit is that VOS aggregation on the leader target will not be affected by remote DTX commit failure. Allow-unstable-test: true Signed-off-by: Fan Yong <fan.yong@intel.com>
cfb1401
to
8b825ba
Compare
When a DTX leader globally commit the DTX, it is possible that some DTX participant(s) cannot commit such DTX entry because of kinds of issues, such as network or space trouble. Under such case, the DTX leader needs to keep the active DTX entry persistently for further commit/resync. But it does not means related modification attched to such DTX entry on the leader target cannot be committed, instead, we can commit related modification with only keeping the DTX header. That is enough for the DTX leader to do further DTX commit/resync to handle related former failed DTX participant(s).
The benefit is that VOS aggregation on the leader target will not be affected by remote DTX commit failure.
Allow-unstable-test: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: