Online DDL: fix flaky onlineddl_scheduler
CI test
#16011
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes a flakiness in
onlineddl_scheduler
CI. It seems to have been lurking for quite a while, but I only encountered it severely in #15988, though after a long investigation, I found it's unrelated to the throttler.It's a really simple race condition between VReplication and a
SELECT FOR UPDATE
query the test runs (validating the behavior ofFORCE_CUTOVER
). The race is that the test can issue said select query before VReplication has even had the chance to start running, in which case VReplication never gets to the point of cutting over, in which case the migration cannot cut-over. This is unrealistic (or rather, not interesting) in production, and not the purpose of the test.The fix is to ensure VReplication has had chance to run before validating the cutover behavior.
On top of that change, we also use
"--migration_check_interval", "5s",
which is standard across all Online DDL CI jobs, and overall reduces runtime of this CI job.Related Issue(s)
No related issue. See for example this CI failure.
Checklist
Deployment Notes