Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(restore): adds exclusive cluster lock for restore task #4122

Merged
merged 2 commits into from
Nov 26, 2024

Conversation

VAveryanov8
Copy link
Collaborator

@VAveryanov8 VAveryanov8 commented Nov 25, 2024

This makes the scheduler run the restore task only if no other tasks (backup or repair) are running in the cluster. Conversely, other tasks (backup or repair) won't start if the restore task is running in the cluster at that time.

Fixes: #4045.


Please make sure that:

  • Code is split to commits that address a single change
  • Commit messages are informative
  • Commit titles have module prefix
  • Commit titles have issue nr. suffix

This makes the scheduler run the restore task only if no other tasks (backup
or repair) are running in the cluster. Conversely, other tasks (backup or
repair) won't start if the restore task is running in the cluster at that
time.

Refs: #4045.
@VAveryanov8
Copy link
Collaborator Author

VAveryanov8 commented Nov 25, 2024

This is how output of a ./sctool task looks like, when either backup/repair task can't start due to exclusive lock:

╭──────────────────────────────────────────────┬────────┬──────────────┬────────┬──────────────────┬─────────┬───────┬────────────────────────┬────────────────────────┬─────────────┬────────────────────────╮
│ Task                                         │ Labels │ Schedule     │ Window │ Timezone         │ Success │ Error │ Last Success           │ Last Error             │ Status      │ Next                   │
├──────────────────────────────────────────────┼────────┼──────────────┼────────┼──────────────────┼─────────┼───────┼────────────────────────┼────────────────────────┼─────────────┼────────────────────────┤
│ backup/06cd5d90-6daa-4215-acd3-d19f6782b5b6  │        │              │        │ Europe/Warsaw    │ 0       │ 1     │                        │ 25 Nov 24 09:13:52 CET │ ERROR (0/3) │ 25 Nov 24 09:23:52 CET │
│ restore/f514523c-5015-486f-b25f-22e59c5b5962 │        │              │        │ Europe/Warsaw    │ 0       │ 0     │                        │                        │ RUNNING     │                        │
╰──────────────────────────────────────────────┴────────┴──────────────┴────────┴──────────────────┴─────────┴───────┴────────────────────────┴────────────────────────┴─────────────┴────────────────────────╯


and this is how output of a ./sctool progress ... looks like:

$  ./sctool.dev progress -c my-cluster backup/06cd5d90-6daa-4215-acd3-d19f6782b5b6
Run:            351378ee-ab05-11ef-aa2d-0242c0a8c802
Status:         ERROR (initialising)
Cause:          exclusive task (restore) is running: another task is running
Start time:     25 Nov 24 09:13:52 CET
End time:       25 Nov 24 09:13:52 CET
Duration:       0s
Progress:       -

@VAveryanov8 VAveryanov8 marked this pull request as ready for review November 25, 2024 09:51
Copy link
Collaborator

@Michal-Leszczynski Michal-Leszczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice tests!
Just left a small refactor oriented comments.

pkg/service/scheduler/policy.go Outdated Show resolved Hide resolved
pkg/service/scheduler/policy.go Show resolved Hide resolved
pkg/service/scheduler/policy_test.go Outdated Show resolved Hide resolved
pkg/service/scheduler/policy.go Outdated Show resolved Hide resolved
pkg/service/scheduler/policy_test.go Outdated Show resolved Hide resolved
@Michal-Leszczynski
Copy link
Collaborator

This is how output of a ./sctool task looks like, when either backup/repair task can't start due to exclusive lock:

The strange thing is that we report it as an error, but that's not connected to this PR at all.
On the other hand, it encourages user to look at the sctool progress output, so that's a good thing.
Anyway, the sctool progress error message looks good!

@Michal-Leszczynski
Copy link
Collaborator

Refs: 4045.

Why not Fixes?

This moves part of the logic to separate method to improve readability
and fixes how PostRun cleans up the resources.
@VAveryanov8
Copy link
Collaborator Author

@Michal-Leszczynski I've updated the pr, I'd appreciate if you could take a look one more time.

Copy link
Collaborator

@Michal-Leszczynski Michal-Leszczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job!

@VAveryanov8 VAveryanov8 merged commit b97d1cf into master Nov 26, 2024
51 checks passed
@VAveryanov8 VAveryanov8 deleted the va/restore-exclusive branch November 26, 2024 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to pause backups and repairs during restore
2 participants