Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding new mysql shell backup engine #16295

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

rvrangel
Copy link
Contributor

@rvrangel rvrangel commented Jun 28, 2024

Description

This is a PR that implements a new backup engine for use with MySQL Shell, which is mentioned in the feature request here: #16294

It works a bit differently than the existing engines in vitess in which it only stores the metadata of how the backup was created (location + parameters used) and during the restore uses the location plus other parameters (mysql shell are different if you are doing a dump vs a restore, so we can't use exactly the same ones)

Related Issue(s)

Fixes #16294

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Copy link
Contributor

vitess-bot bot commented Jun 28, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Jun 28, 2024
@github-actions github-actions bot added this to the v21.0.0 milestone Jun 28, 2024
Copy link

codecov bot commented Jun 28, 2024

Codecov Report

Attention: Patch coverage is 25.30864% with 242 lines in your changes missing coverage. Please review.

Project coverage is 69.43%. Comparing base (56c39b2) to head (1fa6191).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
go/vt/mysqlctl/mysqlshellbackupengine.go 24.37% 211 Missing ⚠️
go/vt/mysqlctl/query.go 0.00% 21 Missing ⚠️
go/vt/mysqlctl/fakemysqldaemon.go 33.33% 4 Missing ⚠️
go/vt/mysqlctl/backup.go 83.33% 2 Missing ⚠️
go/vt/mysqlctl/builtinbackupengine.go 0.00% 2 Missing ⚠️
go/vt/mysqlctl/xtrabackupengine.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16295      +/-   ##
==========================================
- Coverage   69.53%   69.43%   -0.10%     
==========================================
  Files        1567     1568       +1     
  Lines      202388   202714     +326     
==========================================
+ Hits       140723   140758      +35     
- Misses      61665    61956     +291     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rvrangel rvrangel marked this pull request as ready for review July 11, 2024 14:17
@deepthi deepthi added Component: Backup and Restore Type: Feature Request and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jul 11, 2024
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this submission! Some initial general thoughts, before a deeper code review. PErhaps these questions are more appropriate on #16294 but I did not want to then split the discussion so let's keep it here.

I have not used MySQL Shell backups before. Some questions and notes:

  • This PR adds dependencies on mysqlsh and mysqlshell binaries. This is just an observation, but points for consideration are:

    • Neither are included in a standard MySQL build. What are version dependencies between mysqlsh/mysqlshell and the MySQL server?
    • Neither are included in the MySQL docker images, to the best of my understanding. This means this backup method will not be available on kubernetes deployments via vitess-operator.
  • Re: GTID not being available in the manifest file, this means we will not be able to run point in time recoveries with a mysqlshell-based full backup. Point in time recoveries require GTID information. As mentioned in Feature Request: MySQL Shell Logical Backups #16294 (comment), the mysqlshell method is the first and only (thus far) logical backup solution, so it's unfortunate that this solution will not support logical point in time recoveries.
    Is it not possible to read the gtidExecuted field from the @.json dump file immediately after the backup is complete, and update the manifest file? E.g. if the dump is into a directory, isn't that directory available for us to read?

// This is usually run in a background goroutine, so there's no point
// returning an error. Just log it.
logger.Warningf("error scanning lines from %s: %v", prefix, err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice we do not have a unit test for this function. Having moved it around, perhaps now is a good opportunity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I can probably add it there 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit test added for this function

@rvrangel
Copy link
Contributor Author

These are good questions, thanks Shlomi!

  • I wasn't sure, but checking the officila mysql docker images, it seems to be included actually:

    $ docker run -it mysql:8.0 mysqlsh --version
    Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
    mysqlsh   Ver 8.0.38 for Linux on x86_64 - for MySQL 8.0.38 (MySQL Community Server (GPL))
    

    In relation to the version dependency, my understanding is that MySQL Shell needs to be at least the same version of the MySQL Server, but it can be newer. we have successfully been using MySQL Shell 8.4 with Percona Server 8.0 releases.

    We don't use vitess-operator so for us it would mean we need to make sure required binaries are installed anyway (like mysqld, xtrabackup). But I imagine it being included in the official docker images means it will be less of an issue?

  • Thats a good point I didn't realise. While it is possible to read the @.json file from a directory when it is completed (if we are writing to disk), it is less straight forward when we are storing the backups on an object store. Because mysqlsh doesn't work the same way (it doesn't provide you with a single stream that can be cleanly uplodad to whatever storage engine we are using), the thought was to bypass the storage engine in vitess (except for the MANIFEST which we still write using it) and just use this metadata to help the engine locate and restore the backup instead. If this was only saving to disk, it is much easier but also very limiting.

    If we were to do this, we would need to write code that would need to fetch the @.json from the supported object stores where there is a support overlap between mysqlsh and vitess (S3, Azure, GCP, etc) and some might be missing. Perhaps a better idea would be to include this backup engine without support for PITR in the beginning and file an upstream feature request that would print or save a copy of the executed GTID once the backup is done, which we could capture in an easier way (similar to the xtrabackup engine)?

    For additional context, as proposed on select backup engine in Backup() and ignore engines in RestoreFromBackup() #16428 and described in the use case, we plan to use this mostly to keep two backup types around for each shard, but always restoring by default from xtrabackup unless we require the logical backups for a specific reason.

    We also considered mysqldump which to be honest would fit the vitess backup engine workflow a lot better, but it was just too slow. This benchmark from Percona also highlights the same thing, and for use backing up/restoring was so slow it didn't make sense.

Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Copy link
Member

@frouioui frouioui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, thank you for this contribution.

Since we have not fully finished the deletion of mysqld in the vitess/lite Docker Images, the mysqlsh binary will have to be included in the vitess/lite image regardless if it's included in the official MySQL Docker Images or not. Since we are letting people choose between using an official MySQL image or the vitess/lite image for their Docker/K8S deployment we must have the binary in both.

Regarding vitess-operator, a follow-up PR is needed on the vitess-operator to allow this new backup engine. In our CRDs we have an enumeration that restrict what backup engines are allowed, we just need to add a new entry in the enumeration. This can be done here.

FYI, I can handle the vitess-operator changes.

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Jul 22, 2024

We also considered mysqldump which to be honest would fit the vitess backup engine workflow a lot better, but it was just too slow.

Have you looked at mysqlpump? (Next gen mysqldump, included in standard builds).

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Jul 22, 2024

I wasn't sure, but checking the officila mysql docker images, it seems to be included actually:

Oh, that's nice! The reason I thought it wasn't included is that mysqlsh/mysqlshell is not included in the standard MySQL build.

Thats a good point I didn't realise. While it is possible to read the @.json file from a directory when it is completed (if we are writing to disk), it is less straight forward when we are storing the backups on an object store.

I feel like it's OK to have some solution "with limitations". We should strive to support as much functionality as possible though. So IMHO we should strive to include the GTID when the backup goes into a directory. This should be possible to do, which then means the backup should fail if for some reason we can't fetch the GTID or validate it (correct GTID form). i.e. return BackupUnusable if unable to fetch and validate the GTID entry.

I'd like @deepthi to weigh in her opinion.

Assuming we do decide to move forward, then I'd next expect a CI/endtoend test please, as follows:

When these are all added, a new CI job will run to test mysqlshell-based backup, restores, and point-in-time recoveries. These can (and should) use the directory-based backup configuration, one which does make the GTID available.

If this test passes, then you will have validated the full cycle of backup and resotre, as well as correctness of the captured GTID.

Edit: since mysqlshell does not come bundled in the mysql distribution, we'd need to further download/install mysqlshell in the GitHub workflow file.

S3, Azure, GCP can be left without GTID support for now.

We'd need a documentation PR that clearly indicates the limitations of this method.

Comment on lines 23 to 38
var (
// location to store the mysql shell backup
mysqlShellBackupLocation = ""
// flags passed to the mysql shell utility, used both on dump/restore
mysqlShellFlags = "--defaults-file=/dev/null --js -h localhost"
// flags passed to the Dump command, as a JSON string
mysqlShellDumpFlags = `{"threads": 2}`
// flags passed to the Load command, as a JSON string
mysqlShellLoadFlags = `{"threads": 4, "updateGtidSet": "replace", "skipBinlog": true, "progressFile": ""}`
// drain a tablet when taking a backup
mysqlShellBackupShouldDrain = false
// disable redo logging and double write buffer
mysqlShellSpeedUpRestore = false

MySQLShellPreCheckError = errors.New("MySQLShellPreCheckError")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have correctly followed the existing design. I'm just taking the opportunity to say at some point we will want to move away from these global variables.

// location to store the mysql shell backup
mysqlShellBackupLocation = ""
// flags passed to the mysql shell utility, used both on dump/restore
mysqlShellFlags = "--defaults-file=/dev/null --js -h localhost"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure -h localhost will work well in a k8s deployment. @frouioui / @mattlord for review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is left for the operator to decide, depending on their environment on how to connect to mysqld. I am happy to change the defaults to something else (e.g. unix socket path)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably best NOT to specify a default here (and -h localhost is already the compiled default I believe). I'm assuming these are easily overridden no matter what we do here though?

If we do add a default here, we might want to use the default socket file path in the operator: https://github.com/planetscale/vitess-operator/blob/58a8d79bb44e4238a274598817e910c57be60950/pkg/operator/vttablet/constants.go#L71

That being: /vt/socket/mysql.sock

Copy link
Member

@frouioui frouioui Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can indeed confirm that using localhost is not working: planetscale/vitess-operator#586 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the default value and should be adjusted to match the running environment (for example, if running on kube, provide the host/port or unix socket file), similar to what is done in the end to end tests. would not providing the -h flag be better in this case? AFAIK mysql shell will still use localhost as the default value

Copy link
Member

@frouioui frouioui Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine keeping this default value on vitess, however on the vitess-operator we should set another value that overrides the default of --mysql-shell-flags. That way when running with vitess-operator localhost will be replaced by the socket. I am not sure of how much the user can configure the backup using the mysql shell flags, but we might even want to let the user configure the value of that flag in the vitess-operaror's CRDs - with the default being what we have in vitess, but using a socket instead of a hostname.

// flags passed to the mysql shell utility, used both on dump/restore
mysqlShellFlags = "--defaults-file=/dev/null --js -h localhost"
// flags passed to the Dump command, as a JSON string
mysqlShellDumpFlags = `{"threads": 2}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to choose 2 rather than something based on runtime.NumCPU()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say because MySQL Shell backups can be a bit more intensive - we are requesting a bunch of data off MySQL which needs to be fetched, parsed and compressed - and in our particular use case we were taking backups online so we didn't wan't it to cause that much disruption. It is also part of the reason why I made sure ShouldDrainForBackup() was configurable in case it is more suitable for the use case.

I am fine with changing the default to runtime.NumCPU() though since it is configurable and leave this up to the user to decide based on their environment requirements, although I am also conscious that it might cause some issues in Kube where it will show the number of CPUs of the underlying node despite the pod being possibly limited on how much CPU it can use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we already have a --concurrency flag for the Backup command. Should we reuse that flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so looking this up again today, the default in MySQL Shell if not passed is 4. what I have been considering is that if we do something like:

mysqlShellDumpFlags = fmt.Sprintf("{\"threads\": %d}", runtime.NumCPU())

it might give the user the wrong impression that this is the default for MySQL Shell and that if they use the flag to change other settings and they don't add thread to it, they will use all CPUs by default.

so perhaps we should just set it to {"threads": 4} instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might give the user the wrong impression that this is the default for MySQL Shell and that if they use the flag to change other settings and they don't add thread to it, they will use all CPUs by default.

This is a documentation concern. We can't expect any user expectation. I can say that I would not assume as much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still, we can try to make it more difficult for users to do the wrong thing. perhaps we can remove threads from the default value and leave it up to the user to modify it if the desire so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we can remove threads from the default value and leave it up to the user to modify it if the desire so?

Would that mean mysql shell would then use 4? I think this is fine. So you'd give the user the option for setting "threads", but that option is by default empty? I like it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's correct. the docs mention the default is 4

fs.StringVar(&mysqlShellFlags, "mysql_shell_flags", mysqlShellFlags, "execution flags to pass to mysqlsh binary to be used during dump/load")
fs.StringVar(&mysqlShellDumpFlags, "mysql_shell_dump_flags", mysqlShellDumpFlags, "flags to pass to mysql shell dump utility. This should be a JSON string and will be saved in the MANIFEST")
fs.StringVar(&mysqlShellLoadFlags, "mysql_shell_load_flags", mysqlShellLoadFlags, "flags to pass to mysql shell load utility. This should be a JSON string")
fs.BoolVar(&mysqlShellBackupShouldDrain, "mysql_shell_should_drain", mysqlShellBackupShouldDrain, "decide if we should drain while taking a backup or continue to serving traffic")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the choice of draining vs not draining is due to the increased workload on the server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, exactly. in fact I have been meaning to propose this to be modifiable for the xtrabackup engine as well, so a tablet won't be service traffic when it is taking a backup

fs.StringVar(&mysqlShellDumpFlags, "mysql_shell_dump_flags", mysqlShellDumpFlags, "flags to pass to mysql shell dump utility. This should be a JSON string and will be saved in the MANIFEST")
fs.StringVar(&mysqlShellLoadFlags, "mysql_shell_load_flags", mysqlShellLoadFlags, "flags to pass to mysql shell load utility. This should be a JSON string")
fs.BoolVar(&mysqlShellBackupShouldDrain, "mysql_shell_should_drain", mysqlShellBackupShouldDrain, "decide if we should drain while taking a backup or continue to serving traffic")
fs.BoolVar(&mysqlShellSpeedUpRestore, "mysql_shell_speedup_restore", mysqlShellSpeedUpRestore, "speed up restore by disabling redo logging and double write buffer during the restore process")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels risky. Please indicate caveats in this flag's description. Otherwise this looks "too good", why wouldn't anyone want to speed up the restore?

Copy link
Contributor Author

@rvrangel rvrangel Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you need the redo log/double write buffer to be disable once the instance has completed the restore, there shouldn't be much risk in enabling this. for some setups there might be an interest in disabling this (I can see as a possible case, somebody running on zfs and wanting to keep the double write buffer disabled), so I didn't want to force it if the user has a similar scenario

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so for the duration of the restore this is fine, because who cares, the server is not serving, it's just being restored, right? But then, at the end of the process, as we mentioned elsewhere, these settings must return to "persistent" values, or else the backup should fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly, most users will only want to run with these settings during the restore and go back to the having redo log/doublewrite buffer enabled before the tablets starts serving. This is the use case described in the MySQL docs:

As of MySQL 8.0.21, you can disable redo logging using the ALTER INSTANCE DISABLE INNODB REDO_LOG statement. This functionality is intended for loading data into a new MySQL instance. Disabling redo logging speeds up data loading by avoiding redo log writes and doublewrite buffering.
Warning

This feature is intended only for loading data into a new MySQL instance. Do not disable redo logging on a production system.

One thing that I realise now is that this will likely fail if the user is running <8.0.21. Is this something we should validate in case this flag is set or do we already have a minimum version for MySQL with Vitess? In any case, it would just cause the backup to fail and the user would have to disable passing this flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that I realise now is that this will likely fail if the user is running <8.0.21. Is this something we should validate in case this flag is set or do we already have a minimum version for MySQL with Vitess?

We should validate that based on the actual MySQL connection. This is an already recognized "capability" in

DisableRedoLogFlavorCapability // supported in MySQL 8.0.21 and above: https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-21.html

Here is an example usage:

capableOf := mysql.ServerVersionCapableOf(conn.ServerVersion)
capable, err := capableOf(capabilities.PerformanceSchemaDataLocksTableCapability)

We should fail the operation if the flags are set and the MySQL server versions is "incapable of" the functionality. Ideally, as close as possible to the flag parsing, but also just fine if just as part of the restore process, but do try to move that up as much as possible, and before doing actual substantial resotre work (e.g. before destorying the existing data...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been added on 6e59215

go/vt/mysqlctl/mysqlshellbackupengine.go Outdated Show resolved Hide resolved
args = append(args, strings.Fields(mysqlShellFlags)...)
}

args = append(args, "-e", fmt.Sprintf("util.dumpSchemas([\"vt_%s\"], %q, %s)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the keyspace/schema names be escaped here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean escape like in MySQL as `vt_keyspace`? I am not sure MySQL Shell supports or expects it, I will verify that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't assume that the database name is vt_keyspace. It is determined by the --init_db_name_override flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brings up another point. Typically when we do the regular kind of backup we also backup the sidecar db, which defaults to _vt. Does the mysqlshell backup specifically backup only the actual keyspace/database? What are the implications of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it is possible to backup multiple db/schemas, we just need to specify them. is this the right place to get the name of the sidecar db?

const (
DefaultName = "_vt"
)

also, I am curious what are the consequences if _vt is not in place, we tried some of these restores without any issues but maybe we are losing some metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shlomi-noach would it make sense to do it in a separate PR though, so it is easier to revert if needed? I would hate for us to have to revert this PR once merged in case it causes any side effects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeleteBeforeRestore aside, would you all be comfortable with the idea adding a flag that lets makes us run DELETE USER on all users except the current one and than letting the user define loadUsers on the mysql shell restore flags on their own so these are reloaded as appropriate if they rely on replication for propagating the grants?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to just get rid of DeleteBeforeRestore if not being used, rather than having the backup engines ignore the intended value.

I interpreted that in the sense that you prefer to remove the variable in this PR. It's fine if we did that on another PR.

Copy link
Member

@frouioui frouioui Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to do it in a separate PR though

One thing to take into account is that if we do it in a separate PR, after merging this current PR the new engine won't work on vitess-operator (the restore part only). I don't mind waiting for the second PR to be ready/merged in order to merge the vitess-operator one. But, I think it would be nice to flag it somewhere, either in this PR description or else, that this engine is only available if not using vitess-operator, in the event we do two PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, lets do it in this PR then to avoid the admin work of having it separate and putting up notices, I will push a change later today

defer func() { // re-enable once we are done with the restore.
err := params.Mysqld.ExecuteSuperQueryList(ctx, []string{"ALTER INSTANCE ENABLE INNODB REDO_LOG"})
if err != nil {
params.Logger.Errorf("unable to re-enable REDO_LOG: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fail the restore process? I'm not sure if I have a good answer here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good question. the original intention here is, since we were able to successfully disable Redo logs/double write buffer, it would be better to fail the restore than potentially put it in service with a potentially dangerous configuration without the user realising it.

if the user wishes to run with redo log/double write disabled buffer they can avoid setting --mysql_shell_speedup_restore and handle it outside of vitess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we should fail the restore when these flags are provided and if the above errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just verifying that this is still to be addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there something you still wanted me to address? currently we will only attempt to disable/enable if the flags are passed, otherwise there is no change in behaviour and if we can't run the ALTER INSTANCE we will fail the backup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently we will only attempt to disable/enable if the flags are passed, otherwise there is no change in behaviour and if we can't run the ALTER INSTANCE we will fail the backup

All right! Then there is nothing else to be addressed.

@rvrangel
Copy link
Contributor Author

rvrangel commented Jul 22, 2024

@shlomi-noach yeah, we looked into mysqlpump, but it has been deprecated (the page you linked also has a notice) and it is still slower than MySQL Shell. but since it is likely going to be removed in a future MySQL version, we though it would be better not to introduce a new feature using it :)

I think the proposal to have the GTID read when backing up to a directory while not when using an object store is fair, let me look into it and make the necessary changes. If all looks good I will proceed with working on the CI/endtoend tests, I just wanted to get some initial feedback before doing so. Also curious what @deepthi thinks about this approach.

Edit: since mysqlshell does not come bundled in the mysql distribution, we'd need to further download/install mysqlshell in the GitHub workflow file.

Is that something that needs to happen as part of this PR or something separate?

This reverts commit e62ae95.

Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
@rvrangel
Copy link
Contributor Author

okay, so I have made the updates do remove the other databases and users a note on this:

  • we ignore the same databases excluded by MySQL Shell by default
  • for users, we ignore the reserved MySQL users defined in the docs
  • it seems we can't create the root user after dropping it as the vt_dba (at least on the endtoend environment) doesn't have permissions to create all of the permissions (it was failing trying to GRANT PROXY on my tests), so I added it to the excluded list for now, let me know if this is a problem
  • we talked about getting rid of the DeleteBeforeRestore, for now I will just ignore that flag (which is what the other engines are doing already) and it can be removed on a separate PR. I imagine doing this should unblock vitess-operator, so @frouioui I would appreciate if you could run the tests with the latest to see if that works

@frouioui
Copy link
Member

  • we talked about getting rid of the DeleteBeforeRestore, for now I will just ignore that flag (which is what the other engines are doing already) and it can be removed on a separate PR. I imagine doing this should unblock vitess-operator, so @frouioui I would appreciate if you could run the tests with the latest to see if that works

@rvrangel, awesome! I will to do that this week. Where is the best place to update you on the status, here or the vitess-operator PR?

@rvrangel
Copy link
Contributor Author

I would say you can update on the other PR since the plan is for us to merge this once the other things have been addressed, but up to you really :)

Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
@rvrangel
Copy link
Contributor Author

@shlomi-noach I am checking with @frouioui if we can fix that issue with vitess-operator in this PR but other than that do you have anything else left to be addressed here other than the replication test? did you want that to be part of this PR too?

@shlomi-noach
Copy link
Contributor

@rvrangel - awesome, thank you -- yes, the replication test. As mentioned, I can help with that, if you like.

@rvrangel
Copy link
Contributor Author

yeah, that would be great, thanks :)

@shlomi-noach
Copy link
Contributor

All right. Give me some time and I'll be pushing into this branch!

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…tamp

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor

Merged and pushed #16807 to be part of this PR, validating rejoining replication stream.

@shlomi-noach
Copy link
Contributor

Ignore TestMoveTablesSharded unit test errors, which are a known issue. I had to merge main and this PR inherited those errors. To be fixed.

@shlomi-noach
Copy link
Contributor

backup_pitr_mysqlshell is passing the newly introducing tests, I'm happy.

@shlomi-noach
Copy link
Contributor

Cross referencing the discussion around root account: planetscale/vitess-operator#586 (comment), which we think is still important to solve here.

@rvrangel
Copy link
Contributor Author

sounds good, thanks for adding that test! I think the only way to fix this issue so that we can create the root user again would be to give all necessary permissions to vt_dba when it gets created. where do we do this today?

@frouioui
Copy link
Member

frouioui commented Sep 19, 2024

sounds good, thanks for adding that test! I think the only way to fix this issue so that we can create the root user again would be to give all necessary permissions to vt_dba when it gets created. where do we do this today?

I think init_db.sql would be the right place, but correct me if i am wrong @shlomi-noach. We would also need to change the embedding of init_db.sql in the examples (like in examples/operator/101_initial_cluster.yaml). The test files on the vitess-operator repo will have have to be changed to include the new GRANT, this can be done in test/endtoend/operator of the vtop repo)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Backup and Restore NeedsWebsiteDocsUpdate What it says release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) Type: Feature Request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: MySQL Shell Logical Backups
5 participants