Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VReplication: Properly ignore errors from trying to drop tables that don't exist #16505

Merged
merged 10 commits into from
Aug 8, 2024

Conversation

mattlord
Copy link
Contributor

@mattlord mattlord commented Jul 30, 2024

Description

This is a follow-up to: #15977 We should backport the fix to v20, where that work was first added.

There were two problems:

  1. The concrete error type returned from the ExecuteFetch RPC call was a gRPC status.Error and NOT the sqlerror.SQLError type returned from the database failure so we needed to first convert it
  2. The error code returned when the table did not exist was actually Error number: 1051; Symbol: [ER_BAD_TABLE_ERROR]

This was missed due to inadequate testing. I've added an endtoend test here — along with updating the unit test (and framework) to test this behavior — which fails on main as expected:

❯ go test -v -timeout 10m -run ^TestMaterialize$ vitess.io/vitess/go/test/endtoend/vreplication
VTDATAROOT is /opt/vtdataroot/vreple2e_513606
=== RUN   TestMaterialize
=== RUN   TestMaterialize/Materialize
E0730 13:37:41.449283   10971 vtorc_process.go:103] configuration - {
	"Debug": true,
	"ListenAddress": ":0",
	"MySQLTopologyUser": "orc_client_user",
	"MySQLTopologyPassword": "orc_client_user_password",
	"MySQLReplicaUser": "vt_repl",
	"MySQLReplicaPassword": "",
	"RecoveryPeriodBlockSeconds": 1
}
=== RUN   TestMaterialize/Materialize/vtctldclient_materialize_with_nonexistent_table
    vreplication_test.go:1194:
        	Error Trace:	/Users/matt/git/vitess/go/test/endtoend/vreplication/vreplication_test.go:1194
        	Error:      	Received unexpected error:
        	            	exit status 1
        	Test:       	TestMaterialize/Materialize/vtctldclient_materialize_with_nonexistent_table
        	Messages:   	Materialize cancel failed, err: exit status 1, output: E0730 13:37:57.945022   12107 main.go:56] rpc error: code = Unknown desc = TabletManager.ExecuteFetchAsDba on zone1-0000000300: rpc error: code = Unknown desc = Unknown table 'vt_source.table_that_doesnt_exist' (errno 1051) (sqlstate 42S02) during query: drop table `vt_source`.`table_that_doesnt_exist`

The updated unit test demonstrating that we are properly handling the case where 1 of N tables doesn't exist:

❯ go test -count 1 -v -race -timeout 30s -run ^TestWorkflowDelete$ vitess.io/vitess/go/vt/vtctl/workflow
# vitess.io/vitess/go/vt/vtctl/workflow.test
=== RUN   TestWorkflowDelete
=== RUN   TestWorkflowDelete/missing_table
W0801 09:59:41.614732   13583 traffic_switcher.go:1183] cell-0000000210: Table `t1_2` did not exist when attempting to remove it
W0801 09:59:41.614759   13583 traffic_switcher.go:1183] cell-0000000200: Table `t1_2` did not exist when attempting to remove it
W0801 09:59:41.615560   13583 shard.go:404] Trying to remove TabletControl.DeniedTables for missing type PRIMARY in shard sourceks/0
W0801 09:59:41.615904   13583 shard.go:404] Trying to remove TabletControl.DeniedTables for missing type PRIMARY in shard targetks/80-
W0801 09:59:41.615951   13583 shard.go:404] Trying to remove TabletControl.DeniedTables for missing type PRIMARY in shard targetks/-80
=== RUN   TestWorkflowDelete/missing_denied_table_entries
W0801 09:59:41.619304   13583 shard.go:404] Trying to remove TabletControl.DeniedTables for missing type PRIMARY in shard sourceks/0
W0801 09:59:41.619547   13583 shard.go:454] one or more tables did not exist in the denylist:t1_2,t1_3
W0801 09:59:41.619553   13583 shard.go:454] one or more tables did not exist in the denylist:t1_2,t1_3
--- PASS: TestWorkflowDelete (0.01s)
    --- PASS: TestWorkflowDelete/missing_table (0.01s)
    --- PASS: TestWorkflowDelete/missing_denied_table_entries (0.00s)
PASS
ok  	vitess.io/vitess/go/vt/vtctl/workflow	2.141s

And which fails, as expected, on main:

❯ go test -count 1 -v -race -timeout 30s -run ^TestWorkflowDelete$ vitess.io/vitess/go/vt/vtctl/workflow
# vitess.io/vitess/go/vt/vtctl/workflow.test
ld: warning: '/private/var/folders/wv/yn5161kd4r5gx_p046xvsr0w0000gn/T/go-link-409845145/000014.o' has malformed LC_DYSYMTAB, expected 98 undefined symbols to start at index 1626, found 95 undefined symbols starting at index 1626
=== RUN   TestWorkflowDelete
=== RUN   TestWorkflowDelete/missing_table
E0801 10:05:55.179911   17216 traffic_switcher.go:1187] cell-0000000210: Error removing table `t1_2`: rpc error: code = Unknown desc = TabletManager.ExecuteFetchAsDba on cell-01: rpc error: code = Unknown desc = Unknown table 'vt_targetks.t1_2' (errno 1051) (sqlstate 42S02) during query: drop table `vt_targetks`.`t1_2`
E0801 10:05:55.181247   17216 traffic_switcher.go:1187] cell-0000000200: Error removing table `t1_2`: rpc error: code = Unknown desc = TabletManager.ExecuteFetchAsDba on cell-01: rpc error: code = Unknown desc = Unknown table 'vt_targetks.t1_2' (errno 1051) (sqlstate 42S02) during query: drop table `vt_targetks`.`t1_2`
    server_test.go:418:
        	Error Trace:	/Users/matt/git/vitess/go/vt/vtctl/workflow/server_test.go:418
        	Error:      	unexpected error value
        	Test:       	TestWorkflowDelete/missing_table
        	Messages:   	Server.WorkflowDelete() error = Code: UNKNOWN
        	            	rpc error: code = Unknown desc = TabletManager.ExecuteFetchAsDba on cell-01: rpc error: code = Unknown desc = Unknown table 'vt_targetks.t1_2' (errno 1051) (sqlstate 42S02) during query: drop table `vt_targetks`.`t1_2`
        	            	rpc error: code = Unknown desc = TabletManager.ExecuteFetchAsDba on cell-01: rpc error: code = Unknown desc = Unknown table 'vt_targetks.t1_2' (errno 1051) (sqlstate 42S02) during query: drop table `vt_targetks`.`t1_2`
        	            	, wantErr false
=== RUN   TestWorkflowDelete/missing_denied_table_entries
W0801 10:05:55.184561   17216 shard.go:404] Trying to remove TabletControl.DeniedTables for missing type PRIMARY in shard sourceks/0
W0801 10:05:55.184880   17216 shard.go:454] one or more tables did not exist in the denylist:t1_2,t1_3
W0801 10:05:55.184886   17216 shard.go:454] one or more tables did not exist in the denylist:t1_2,t1_3
--- FAIL: TestWorkflowDelete (0.01s)
    --- FAIL: TestWorkflowDelete/missing_table (0.01s)
    --- PASS: TestWorkflowDelete/missing_denied_table_entries (0.00s)
FAIL
FAIL	vitess.io/vitess/go/vt/vtctl/workflow	1.191s
FAIL

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Copy link
Contributor

vitess-bot bot commented Jul 30, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Jul 30, 2024
@github-actions github-actions bot added this to the v21.0.0 milestone Jul 30, 2024
@mattlord mattlord added Backport to: release-20.0 Needs to be backport to release-20.0 and removed NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jul 30, 2024
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord added Type: Bug Component: VReplication and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work labels Jul 30, 2024
@mattlord mattlord changed the title VReplication: Convert grpc status.Error to sqlerror.SQLError VReplication: Properly ignore errors from trying to drop tables that don't exist Jul 30, 2024
Copy link

codecov bot commented Jul 30, 2024

Codecov Report

Attention: Patch coverage is 38.46154% with 8 lines in your changes missing coverage. Please review.

Project coverage is 68.73%. Comparing base (3d104d0) to head (83cbdae).
Report is 12 commits behind head on main.

Files Patch % Lines
go/vt/vtctl/workflow/traffic_switcher.go 25.00% 6 Missing ⚠️
go/vt/vtctl/workflow/server.go 0.00% 1 Missing ⚠️
go/vt/vtctl/workflow/utils.go 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16505      +/-   ##
==========================================
+ Coverage   68.64%   68.73%   +0.09%     
==========================================
  Files        1552     1556       +4     
  Lines      199544   199707     +163     
==========================================
+ Hits       136976   137275     +299     
+ Misses      62568    62432     -136     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord marked this pull request as ready for review July 30, 2024 17:49
go/vt/vtctl/workflow/traffic_switcher.go Outdated Show resolved Hide resolved
go/vt/vtctl/workflow/traffic_switcher.go Outdated Show resolved Hide resolved
Signed-off-by: Matt Lord <mattalord@gmail.com>
ts.Logger().Warningf("%s: Table %s did not exist when attempting to remove it", topoproto.TabletAliasString(source.GetPrimary().GetAlias()), tableName)
return nil
Copy link
Contributor Author

@mattlord mattlord Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a separate bug fixed here that would have caused the Delete/Cancel to leave any yet to be removed tables orphaned on the source/target. We never got into this block due to the noted issues, however, so this bug could not be triggered before this PR and thus in effect did not exist. The unit test (and framework used) was also updated to test for this.

Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord
Copy link
Contributor Author

mattlord commented Jul 30, 2024

Thanks for the helpful review, @rohit-nayak-ps ! Just FYI, as I know that you're working on more unit tests for the workflow package, I added support for testing query error handling w/o a real mysqld instance here: 7ce084b

Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord requested a review from a team August 7, 2024 23:06
@mattlord mattlord merged commit 8f0d2d4 into vitessio:main Aug 8, 2024
129 checks passed
@mattlord mattlord deleted the vrepl_cancel_no_table branch August 8, 2024 13:25
vitess-bot pushed a commit that referenced this pull request Aug 8, 2024
…don't exist (#16505)

Signed-off-by: Matt Lord <mattalord@gmail.com>
mattlord pushed a commit that referenced this pull request Aug 8, 2024
…op tables that don't exist (#16505) (#16561)

Signed-off-by: Matt Lord <mattalord@gmail.com>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
venkatraju pushed a commit to slackhq/vitess that referenced this pull request Aug 29, 2024
…don't exist (vitessio#16505)

Signed-off-by: Matt Lord <mattalord@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug Report: Can't cancel Materialize workflow on non-existent table
3 participants