Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer replicas that have innodb buffer pool populated in PRS #16374

Merged
merged 6 commits into from
Jul 24, 2024

Conversation

GuptaManan100
Copy link
Member

@GuptaManan100 GuptaManan100 commented Jul 12, 2024

Description

This PR addresses the feature requested in #15946.

In the previous release we had added the ability to query global status variables in #16022. Now we can replace the use of PrimaryStatus that we were using for checking reachability of vttablets, by instead using the new RPC to get the innodb buffer pool data value.

We use this value in sorting the vttablets for selection of the new primary.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

… check for reachability of tablets

Signed-off-by: Manan Gupta <manan@planetscale.com>
Copy link
Contributor

vitess-bot bot commented Jul 12, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Jul 12, 2024
@github-actions github-actions bot added this to the v21.0.0 milestone Jul 12, 2024
@GuptaManan100 GuptaManan100 added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Cluster management and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jul 18, 2024
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
@GuptaManan100
Copy link
Member Author

I was looking into the failure in vreplication_mariadb_to_mysql and I found something very interesting. MySQL is storing the global_status table in performance_schema but MariaDB is storing it in information_schema (https://mariadb.com/kb/en/information-schema-global_status-and-session_status-tables/)! Starting 10.5.2, mariadb has added the table in performance_schema too - https://mariadb.com/kb/en/performance-schema-global_status-table/ but it is empty.

I ran MariaDB and verified ☝️ -

mysql> select version();
+-----------------------------------------+
| version()                               |
+-----------------------------------------+
| 10.10.7-MariaDB-1:10.10.7+maria~ubu2204 |
+-----------------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from performance_schema.global_status;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from information_schema.global_status;
+----------+
| count(*) |
+----------+
|      558 |
+----------+
1 row in set (0.01 sec)

mysql>

Signed-off-by: Manan Gupta <manan@planetscale.com>
Copy link

codecov bot commented Jul 19, 2024

Codecov Report

Attention: Patch coverage is 95.23810% with 2 lines in your changes missing coverage. Please review.

Project coverage is 68.68%. Comparing base (0f242e9) to head (d69805b).
Report is 28 commits behind head on main.

Files Patch % Lines
go/vt/mysqlctl/fakemysqldaemon.go 0.00% 1 Missing ⚠️
...t/vtctl/grpcvtctldserver/testutil/test_tmclient.go 83.33% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #16374   +/-   ##
=======================================
  Coverage   68.67%   68.68%           
=======================================
  Files        1548     1548           
  Lines      199083   199146   +63     
=======================================
+ Hits       136727   136780   +53     
- Misses      62356    62366   +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@deepthi
Copy link
Member

deepthi commented Jul 19, 2024

Is the test performing PRS on a MariaDB cluster? Given that we support only imports from MariaDB, all those tablets should be unmanaged. If we have a test that is running PRS on MariaDB we should delete that.

Signed-off-by: Manan Gupta <manan@planetscale.com>
@GuptaManan100
Copy link
Member Author

I looked at the code, and basically we are setting up a vitess cluster running mariadb and then run movetables from that position. For initializing the mariadb cluster, we are running PlannedReparentShard. Its working fine now too. The way to remove it would be run the replication setup queries manually (this is what we expect the users to do), but running PRS is making the testing easier.

@deepthi
Copy link
Member

deepthi commented Jul 23, 2024

As long as we are not adding MariaDB specific code, it's fine to leave it for now. We just need to be aware that this could break without warning at some future point and then we'll be forced to fix the tests.

@harshit-gangal harshit-gangal merged commit 7f639d3 into vitessio:main Jul 24, 2024
127 checks passed
@harshit-gangal harshit-gangal deleted the innodb-bufferpool-prs branch July 24, 2024 06:56
venkatraju pushed a commit to slackhq/vitess that referenced this pull request Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Cluster management Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Prefer longer running replicas for PRS
3 participants