-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI upgrade/downgrade tests for Online DDL / throttler / vreplication flow #16017
CI upgrade/downgrade tests for Online DDL / throttler / vreplication flow #16017
Conversation
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…ctionality) Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…ve more chance, terminate the workload and expect it to compelete Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…ironment variable values to vttablet and mysqlctl binary paths Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #16017 +/- ##
==========================================
- Coverage 68.23% 68.22% -0.02%
==========================================
Files 1541 1541
Lines 197234 197330 +96
==========================================
+ Hits 134592 134634 +42
- Misses 62642 62696 +54 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
So weird. The workflow |
OK its just reappeared. Fine. |
Sample execution flow: https://github.com/vitessio/vitess/actions/runs/9348746367/job/25728525330?pr=16017 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits, otherwise it looks good to me. We will have to remember to make this new workflow required
on main
once we merge the PR.
- name: Check for changes in relevant files | ||
if: steps.skip-workflow.outputs.skip-workflow == 'false' | ||
uses: dorny/paths-filter@v3.0.1 | ||
id: changes | ||
with: | ||
token: '' | ||
filters: | | ||
end_to_end: | ||
- 'go/**' | ||
- 'go/**/*.go' | ||
- 'test.go' | ||
- 'Makefile' | ||
- 'build.env' | ||
- 'go.sum' | ||
- 'go.mod' | ||
- 'proto/*.proto' | ||
- 'tools/**' | ||
- 'config/**' | ||
- 'bootstrap.sh' | ||
- '.github/workflows/upgrade_downgrade_test_onlineddl_flow.yml' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can be moved two steps up, after checking out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved.
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Co-authored-by: Florent Poinsard <35779988+frouioui@users.noreply.github.com> Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! ❤️
mkdir -p /tmp/vitess-build-current/ | ||
cp -R bin /tmp/vitess-build-current/ | ||
|
||
# Swap the binaries. Use vtctl version n and keep vttablet at version n-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be checking the vtctldclient
version rather than vtctl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh we don't use vtctl
and that comment is leftover.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
cp /tmp/vitess-build-next/bin/vttablet $PWD/bin/vttablet-next | ||
cp /tmp/vitess-build-next/bin/mysqlctl $PWD/bin/mysqlctl-next | ||
cp /tmp/vitess-build-next/bin/mysqlctld $PWD/bin/mysqlctld-next | ||
vtctl --version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, should be vtctldclient --version
. The tests are using vtctldclient
(cluster.VtctldClientProcess
) so not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should actually be vttablet
, this is again an oversight. Fixing in next push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
t.Logf("Using REPLICA_TABLET_BINARY_SUFFIX: %s", binarySuffix) | ||
} | ||
|
||
shards = clusterInstance.Keyspaces[0].Shards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth a require call before here too:
require.Greater(t, len(clusterInstance.Keyspaces), 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
t.Run("additional wait", func(t *testing.T) { | ||
select { | ||
case <-time.After(3 * time.Second): | ||
case <-ctx.Done(): | ||
require.Fail(t, "context cancelled") | ||
} | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what we're waiting for here? If nothing else, it will give us a clue if this becomes flaky. Are we waiting for the schema migration to reach the ready-to-complete stage? If we're just waiting for load generation then it's fine. Although we could also wait for the rows in a table to be greater than X. We have helpers for that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting just so that we generate more DMLs, and give migration/vreplication more "opportunities" to throttle or to make progress. Adding clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
select { | ||
case <-time.After(10 * time.Second): | ||
case <-ctx.Done(): | ||
require.Fail(t, "context cancelled") | ||
} | ||
|
||
status := onlineddl.WaitForMigrationStatus(t, &vtParams, shards, uuid, migrationWaitTimeout, schema.OnlineDDLStatusRunning, schema.OnlineDDLStatusComplete) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why in this spot and similar ones we're not instead just adding 10 seconds to the migrationWaitTimeout
we pass in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. You're right and this is not needed. Removing altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
row := onlineddl.VtgateExecDDL(t, &vtParams, ddlStrategy, alterStatement, "").Named().Row() | ||
if row != nil { | ||
uuid = row.AsString("uuid", "") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not use require/assert to fail here if we don't get a row and/or uuid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is copy+paste from other tests where a nil
result is in fact possible (where we expect an error in submission).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
if expected { | ||
// if migration is ready to complete, the timestamp should be non-null | ||
assert.False(t, row["ready_to_complete_timestamp"].IsNull()) | ||
} else { | ||
assert.True(t, row["ready_to_complete_timestamp"].IsNull()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we could condense this to:
assert.Equal(t, expected, row["ready_to_complete_timestamp"].IsNull())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the negation actually (IsNull()
should be !expected
) which is why I broke it into explicit if-else, as I find negation so confusing sometimes.
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
We wanted this to be in time for v20 RC1 code freeze, but did not make it. Seeing that this is a tests-only PR, it complies with code freeze guidelines. There is an advantage to having this in |
…flow (vitessio#16017) Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Co-authored-by: Florent Poinsard <35779988+frouioui@users.noreply.github.com>
Description
Adding upgrade/downgrade tests as follows:
X
version VTTablet primary, withY
VTTablet replica. See breakdown below.vitess
strategy, thereby running VReplication that is also using the throttler.See more details in https://github.com/vitessio/vitess/pull/16017/files#diff-dc6e3889488266071984d6da1f44795b7b06bf5ff6504c8e9aa359598918230aR17-R37
This will test cross version compatibility of the throttler, of vreplication, of Online DDL. It runs
onlineddl_flow
in the following setups:Each such flow runs at about 1 minute, which is why I chose to aggregate them all in the same workflow test (as opposed to splitting to "new" and "old")
EDIT: sample execution flow: https://github.com/vitessio/vitess/actions/runs/9348746367/job/25728525330?pr=16017
Related Issue(s)
No specific issue, but with #15988 in mind.
Checklist
Deployment Notes