-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the race condition during vttablet startup #15731
Fix the race condition during vttablet startup #15731
Conversation
This avoids the problem where the connection pool is poisoned when we check for the MySQL port, by avoiding to use the pool in the first place. We only ever run this once at startup, so we can create a new connection here and then dispose of it once we've retrieved the port. That way we know the connection pool is still clean and doesn't have any problems. Fixes vitessio#15730 Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
timer := time.NewTimer(waitTime) | ||
ctx, cancel := context.WithTimeout(ctx, waitTime) | ||
defer cancel() | ||
for { | ||
conn, connErr := dbconnpool.NewDBConnection(ctx, mysqld.dbcfgs.DbaConnector()) | ||
conn, connErr := mysql.Connect(ctx, params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opted to also refactor this case slightly to avoid using the pool entirely so for any future readers it's also more obvious no pooling is actually used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good. When I was looking at this yesterday, I had to navigate through the code to see that NewDBConnection
returns a non-pooled connection.
// during MySQL startup when we still might be loading things like grants. | ||
// This means we need to use an isolated connection to avoid poisoning the | ||
// DBA connection pool for further queries. | ||
params, err := mysqld.dbcfgs.DbaConnector().MysqlParams() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing it here to use a connection without pooling which avoids the poisoning. I also fixed the context.TODO()
while being in here, making sure it's explicitly passed in.
Can this be backported to v19? |
@GrahamCampbell Are you affected by this bug? We haven't backported previous fixes either mentioned in the linked issue. It only happens when you first initialize a Vitess cluster, often when using k8s where things spin up in a fairly random order. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15731 +/- ##
=======================================
Coverage 68.38% 68.39%
=======================================
Files 1556 1556
Lines 195347 195361 +14
=======================================
+ Hits 133593 133608 +15
+ Misses 61754 61753 -1 ☔ View full report in Codecov by Sentry. |
timer := time.NewTimer(waitTime) | ||
ctx, cancel := context.WithTimeout(ctx, waitTime) | ||
defer cancel() | ||
for { | ||
conn, connErr := dbconnpool.NewDBConnection(ctx, mysqld.dbcfgs.DbaConnector()) | ||
conn, connErr := mysql.Connect(ctx, params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good. When I was looking at this yesterday, I had to navigate through the code to see that NewDBConnection
returns a non-pooled connection.
@@ -730,10 +730,18 @@ func (tm *TabletManager) checkMysql() error { | |||
return nil | |||
} | |||
|
|||
const portCheckTimeout = 5 * time.Second | |||
|
|||
func (tm *TabletManager) getMysqlPort() (int32, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems overkill to define a function that is used exactly once. Why can't all this be folded into findMysqlPort? It will grow from 10 lines to 15 lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deepthi Because it's setting up a context and you can't defer that properly in a for
loop. So hence the separate function so it's not needed to manually cancel that which is more error prone.
return 0, err | ||
} | ||
defer conn.Close() | ||
qr, err := conn.ExecuteFetch("SHOW VARIABLES LIKE 'port'", 1, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is not super important here, but should the context be passed down to the actual query? We've seen various cases where MySQL ends up being "stuck" for whatever reason not replying to incoming queries, and this would cause the GetMysqlPort
function to hang indefinitely (instead of returning an error once ctx
expires).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arthurschreiber Do you mean as a general feature / refactor? It's not possible to pass in a context at the moment in the MySQL connection handling that Vitess does.
That might be useful as a separate feature / change, but I think that's independent of what we're doing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This avoids the problem where the connection pool is poisoned when we check for the MySQL port, by avoiding to use the pool in the first place. We only ever run this once at startup, so we can create a new connection here and then dispose of it once we've retrieved the port.
That way we know the connection pool is still clean and doesn't have any problems.
Related Issue(s)
Fixes #15730
Checklist