-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vreplication timeout query optimizer hints #13840
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
3a32cad
to
12dd91b
Compare
c68c15b
to
acad283
Compare
I got feedback to not move the flag to another package, but I get circular dependencies otherwise, not sure how to proceed here |
Sorry, meant to take a look at this earlier. My thought was to move the flag to tablet-level. I will take a look first thing tomorrow and push to this PR if that is OK. I can take a look at the failing unit tests too then. |
This commit moves the flag to the vttablet level.
Didn't look at the failing tests since I saw you had some recent commits to try to address them. Let me know if you want me to take a look at them. |
a4f7152
to
cabe21f
Compare
Thanks for that commit, I've pulled it in! I took a stab at the tests, but I would love some help with the remaining ones |
Only one other test related to Can you pull that and let's see if all tests pass? Thanks. |
Tests are all now passing! I'm also trying to find someone with permissions to update the hubspot repo settings so others can actually push to my branches 😅 |
@olyazavr, great, good to see that all tests are passing. Following up on the slack thread regarding this issue: did this PR fix your existing issue? Especially the fact that we are adding the timeout hints on the target for the insert/updates. Usually we have been seeing issues on the source while selecting rows in the copy phase. So just wanted to confirm that it does work. Can you also add your specific findings to the description in the issue, just so that we have it all in one place? Thanks! |
Yes, it did fix the issue, and we are now able to run migrations on larger/busier keyspaceshards that previously could not run schema migrations on ghost or vitess (: If the question is do we need the timeout hints for anything other than the selects - I confess I didn't try this out, and added the timeout hints in all places to be consistent, and because it seemed like there was no downside to having all vreplication queries be allowed to run longer I did not test this with any other usage of vreplication other than online ddl schema migrations |
@@ -726,6 +727,7 @@ func (tpb *tablePlanBuilder) generateValuesPart(buf *sqlparser.TrackedBuffer, bv | |||
func (tpb *tablePlanBuilder) generateSelectPart(buf *sqlparser.TrackedBuffer, bvf *bindvarFormatter) *sqlparser.ParsedQuery { | |||
bvf.mode = bvAfter | |||
buf.WriteString(" select ") | |||
buf.WriteString(vstreamer.GetVReplicationMaxExecutionTimeQueryHint()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olyazavr we don't need this as well. We are essentially selecting static values which are used in inserts. So the tabletmanager\vreplication
package doesn't need the optimizer hints anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks for this @olyazavr, this should be very useful to resolve issues that we have seen occassionally with users with large clusters.
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
…r and tabletserver use it. Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
12b53eb
to
507cfd9
Compare
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Rebased and dealt with merge conflicts |
Looks like all checks are passing, anything else for me to do here? |
…n timeout query optimizer hints (vitessio#3410) * backport of 3391 * resolved conflict Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --------- Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Co-authored-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…n timeout query optimizer hints (vitessio#3409) * backport of 3391 * resolved conflict Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> * fix endtoend Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> * fix endtoend Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> * fix vtcombo endtoend test Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --------- Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Co-authored-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
I will open a PR to update the docs here: https://vitess.io/docs/19.0/reference/vreplication/flags/#vreplication_copy_phase_duration After this PR the flag is applicable on BOTH the source and target keyspaces:
|
(#1797) Signed-off-by: Matt Lord <mattalord@gmail.com>
Description
When we try to run online ddl operations using the Vitess strategy, we see certain migrations on specific keyspace/tables fail due to timeout. The suggestion in slack was to increase mysql's
max_execution_time
, which did the trick, but we would like to limit this to only vreplication, not affecting any other query.To do so, I have used the
vreplication_copy_phase_duration
flag and added/*+ MAX_EXECUTION_TIME(x) */
wherex
is the value ofvreplication_copy_phase_duration
to the vreplication queries. This involved moving thevreplication_copy_phase_duration
duration flag to avoid circular dependenciesI tested this in our own environment with the previously failing schema migration and saw it make progress and complete with a 1hr setting for this flag.
Update: This has now been running in production and we no longer see online ddl schema migrations fail because of query timeouts. We see large/busy keyspaceshards able to run migrations where they previously were unable with ghost or vitess
Related Issue(s)
Fixes #14081
Checklist
Deployment Notes