-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: allow REPLICA
using hot-backup engine to serve traffic + avoid promotion
#16558
Comments
The feature request of avoiding promotion seems reasonable. The implementation as outlined will be quite complex and have to touch a lot of places, and it will violate some core assumptions, so I don't think it is workable. vtgate strictly routes queries by tablet type. Once you change the tablet type, it is not allowed to route queries that are intended for REPLICA type to BACKUP type. Allowing this is not a good idea, we don't know what side effects it might have. One of the complications at the code level is that query target always includes a tablet type and that is plumbed through various call paths. Having said that, if you still want to try to PR this, we can see exactly how ugly and complicated it gets with no guarantee that the PR will be accepted. Have you explored it in the other direction? Temporarily changing the promotion rule? Or is that equally hard-coded and complicated? I'm also somewhat unclear on exactly where you are seeing |
@deepthi thanks for this useful feedback! I don't think we'd want to create a mess supporting this and based on your gauging of effort I agree that exploring a different direction first would be best 👍
In a default In #16377 (implementation PR #16344) I proposed a way to do this with a flag per-tablet, but this approach has the drawback of requiring a tablet restart, which makes a temporary change difficult still. If there was a way to dynamically change the promotion rule of a tablet, that would be awesome, but Durability Polices are passed a single/pair-of EDIT: what might work is adding a I'll continue to think and appreciate any ideas people have 🙇 |
@deepthi if we focus on promotion rule only, is it safe to say a If we could automate the setting of |
REPLICA
using hot-backup engine to serve traffic + avoid promotion
We do keep track in tablet manager whether we are currently in the process of taking a backup. @GuptaManan100 is going to look at the code and propose how that can be used to meet this feature request. |
That's great. The methods of the
@deepthi / @GuptaManan100 thanks! |
For PRS, we use |
Technically, at least for now, this doesn't really matter because the builtin backup engine changes the tablet type to BACKUP and those tablets are not eligible for promotion anyway. What Manan is proposing will have this effect for any future online backup methods too. |
@GuptaManan100 great idea, thanks! I'm happy to implement this in the near future
Do you have a preference where we would add the ignore logic? I think both VTCtld and VTOrc should respect this consistently so perhaps |
I don't think we should put it into |
@GuptaManan100 makes sense, thanks. I'll make a draft PR and see what you both think 👍 |
Feature Description
As a user of the
xtrabackup
backup-engine (a non-blocking/"online" backup method), I would like to have the ability to continue to serve query traffic while ensuring the tablet performing the backup is avoided forPRIMARY
promotion while the backup occursOn the topic of avoiding
PRIMARY
promotion, thextrabackup
engine hardcodesfalse
in theShouldDrainForBackup()
function meaning it can remain aREPLICA
at backup-time (instead of "draining" toBACKUP
, as I understand it). Continuing to be aREPLICA
makes sense becausextrabackup
is non-blocking, however under thesemi_sync
andcross_cell
Durability Policies, being aREPLICA
causes a tablet to have aneutral
Promotion Rule, meaning the backing-up tablet could become a candidate forPRIMARY
, which isn't ideal.xtrabackup
is cheap but not "free", which is why it's safer to run on a non-PRIMARY
An option to address this is to make the
xtrabackup
engine'sShouldDrainForBackup()
function returntrue
, causing the tablet to becomeBACKUP
during the backup operation, but this will cause a loss of query-serving capacity that could also be avoided, asxtrabackup
-ing nodes don't really need to be taken out of service, at least in our experience in production. Adding additional capacity just to allow a backup to occur incurs delays and expenses in our deployment.This issue is a request for comment for a solution that will allow users of hot/online/non-blocking style backup engines to both:
PRIMARY
One solution is for the
xtrabackup
backup-engine (and other online-style engines) to trigger a transition toBACKUP
(by returningtrue
inShouldDrainForBackup()
) AND also continue to serve replica reads somehow, ie:serving: true
during backup-time + I'll assumevtgate
ignores theBACKUP
type too and may need some tweaks. It may be wise to carry aserving: true
state throughout the backup only if the tablet wasserving: true
at the start. This kind of behaviour change is somewhat breaking so it may need to go behind a flags (dare I say it):--xtrabackup-drain
(to triggerShouldDrainForBackup()
==true
unless we want this to be breaking)--xtrabackup-drain-keep-serving
/--xtrabackup-drain-preserve-serving
or similar, to indicate the tablet should keep serving trafficPerhaps this could be passed in the backup request too. cc @rvrangel
Your ideas are appreciated 🙇
Use Case(s)
A user of a "hot"/online/non-blocking Vitess backup engine that:
PRIMARY
The text was updated successfully, but these errors were encountered: