Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: allow current/before PRIMARY to be specified in PRS #16430

Open
timvaillancourt opened this issue Jul 18, 2024 · 3 comments
Open
Assignees
Labels
Component: Cluster management Type: Enhancement Logical improvement (somewhere between a bug and feature) Type: Feature Request

Comments

@timvaillancourt
Copy link
Contributor

timvaillancourt commented Jul 18, 2024

Feature Description

This issue proposes that the current/before shard PRIMARY can be specified in PlannedReparentShard (and EmergencyReparentShard 🤔?) RPC requests in order to support a more idempotent(?) operation, where if the provided, "current" PRIMARY is no longer correct, the operation fails or no-ops

This could be implemented by adding an optional request-field CurrentPrimary *topodatapb.TabletAlias containing the alias of who you think the PRIMARY is "now". In ElectNewPrimary the current primary can be compared to this alias and a mismatch is handled

// AvoidPrimary is the alias of the tablet to demote. In other words,
// specifying an AvoidPrimary alias tells the vtctld to promote any replica
// other than this one. A shard whose current primary is not this one is then
// a no-op.
//
// It is an error to set NewPrimary and AvoidPrimary to the same alias.
AvoidPrimary *topodata.TabletAlias protobuf:"bytes,4,opt,name=avoid_primary,json=avoidPrimary,proto3" json:"avoid_primary,omitempty"

The existing AvoidPrimary field sounded like it would do this from the description, but my understanding is it's only used to disqualify a tablet from being the promotion candidate. Please correct me if wrong!

Your feedback is much appreciated!

Use Case(s)

This support can prevent situations when an issue is fixed by VTOrc while external automation is also issuing PlannedReparentShard operations

It would be great if the external automation could say "PRS only if alias X is still the PRIMARY" and if something changed the world in the process, this operation does no go through

@timvaillancourt timvaillancourt added Type: Feature Request Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VTorc Vitess Orchestrator integration Component: vtctl Needs Triage This issue needs to be correctly labelled and triaged labels Jul 18, 2024
@timvaillancourt timvaillancourt self-assigned this Jul 18, 2024
@shlomi-noach shlomi-noach removed the Needs Triage This issue needs to be correctly labelled and triaged label Jul 21, 2024
@GuptaManan100 GuptaManan100 added Component: Cluster management Type: Feature Request Type: Enhancement Logical improvement (somewhere between a bug and feature) and removed Type: Feature Request Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VTorc Vitess Orchestrator integration Component: vtctl labels Jul 26, 2024
@GuptaManan100
Copy link
Member

Your understanding of AvoidPrimaryAlias is correct.

PRS is already idempotent as long as you provide the NewPrimary. If you run the PRS over and over with the new primary alias, after the first PRS that promotes the primary, it will only check that all the replicas are pointing to the correct primary in the subsequent runs.

@GuptaManan100
Copy link
Member

VTOrc only runs PRS for a new uninitialized cluster. After this point, VTOrc never runs PRS. So I'm not sure if there are going to be that many collisions in external automation vs VTOrc running PRS.

@deepthi
Copy link
Member

deepthi commented Aug 5, 2024

There are multiple ways of solving this without adding another flag to PRS

  • external automation checks for the desired condition before issuing a PRS
  • use AvoidPrimaryAlias as Manan suggested
    Manan's point about such collisions being rare in practice is also valid. Perhaps you could elaborate on what situation you are running into and trying to solve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Cluster management Type: Enhancement Logical improvement (somewhere between a bug and feature) Type: Feature Request
Projects
None yet
Development

No branches or pull requests

4 participants