Backport 14693 - Use GetTabletsByCell in healthcheck #514
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This backports upstream PR vitessio#14693, with a few minor changes to make it work with the Go
version we are using and a small change to topology_watcher.go so that test cases reflect
and test for the same behavior as the upstream code. The description of the original PR
follows:
VTGate's healthcheck module currently calls GetTablet for each tablet alias that it discovers
in a cell. Instead we can use GetTabletsForCell to fetch all tablets for a cell at once.
This PR does a few more things:
falling back to one tablet at a time in case of error.
introduce a configuration option for concurrency.
read those tablets at all, now we do read them, but ignore any changes if they are
already known.
The basic fix has already been tried in production and shown to reduce the number of Get
calls from vtgate -> topo from O(n) to O(1).
We can consider deprecating and deleting --refresh_known_tablets in a future release.
The concerns that originally motivated adding that flag in vitessio#3965 are alleviated by fetching
all tablets in one call to the topo.