-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-11955 pool: Ensure a PS is inside pool (#13046) #14448
Conversation
* DAOS-11955 pool: Ensure a PS is inside its pool It was found that a PS leader may enter ds_pool_plan_svc_reconfs with itself being an undesirable replica. This may lead to an assertion failure at "move n replicas from undesired to to_remove" in ds_pool_plan_svc_reconfs. Moreover, such a PS leader may be outside of the pool group, making it incapable of performing many duties that involve collective communication. This patch therefore ensures that a PS leader will remove undesirable PS replicas synchronously before committing a pool map modification that introduces new undesirable PS replicas. (If we were to keep an undesirable PS replica, it might become a PS leader.) - Extend and clean up pool_svc_sched. * Allow pool_svc_reconf_ult to return an error, so that we can fail a pool map modification if its synchronous PS replica removal fails. * Allow pool_svc_reconf_ult to get an argument, so that we can tell pool_svc_reconf_ult whether we want a synchronous remove-only run or an asyncrhonous add-remove run. * Move pool_svc_sched.{psc_svc_rf,psc_force_notify} up to pool_svc. - Prevent pool_svc_step_up_cb from canceling in-progress reconfigurations by comparing pool map versions for which the reconfigurations are scheduled. - Rename POOL_GROUP_MAP_STATUS to POOL_GROUP_MAP_STATES so that we are consistent with the pool_map module. Signed-off-by: Li Wei <wei.g.li@intel.com> Signed-off-by: Jeff Olivier <jeffolivier@google.com>
Bug-tracker data: |
@liw it wasn't a perfectly clean backport. Can you please take a look and see if anything looks odd? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
This patch cleans up the pool map update logging on the client side and the engine side. A few notable changes: - In dc_pool_map_update, if the incoming map is of the same version as the one we already have, do not perform the update. Signed-off-by: Li Wei <wei.g.li@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
It was found that a PS leader may enter ds_pool_plan_svc_reconfs with itself being an undesirable replica. This may lead to an assertion failure at "move n replicas from undesired to to_remove" in ds_pool_plan_svc_reconfs. Moreover, such a PS leader may be outside of the pool group, making it incapable of performing many duties that involve collective communication.
This patch therefore ensures that a PS leader will remove undesirable PS replicas synchronously before committing a pool map modification that introduces new undesirable PS replicas. (If we were to keep an undesirable PS replica, it might become a PS leader.)
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: