Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix discard job #74

Merged
merged 6 commits into from
Dec 6, 2024
Merged

Fix discard job #74

merged 6 commits into from
Dec 6, 2024

Conversation

ushitora-anqou
Copy link
Contributor

@ushitora-anqou ushitora-anqou commented Dec 4, 2024

This PR contains the following commits:

  • check sync-mode annot is set before starting to reconcile MB in secondary
    • This commit ensures that sync-mode annotation is correctly set before discard and import Jobs are created. Currently, mantle-controller does NOT wait for a discard Job to finish before an import Job is created, because an empty sync-mode passes this if condition. This behaviour isn't what we expect. This commit fixes this problem.
  • set ImagePullPolicy: PullIfNotPresent to discard Job
    • This commit sets ImagePullPolicy to discard Jobs. Without this setting, this field is set to Always, and it breaks our e2e tests, because there's no controller:latest image on the Internet.
  • check restoring PV finalizer before checking cluster ID in PersisntetVolumeReconciler
    • (This is a kaizen.) getCephClusterIDFromSCName works correctly only for storage class names provisioned by Rook/Ceph. However, the PVs requested to PersistentVolumeReconciler aren't necessarily provisioned by Rook/Ceph. This commit fixes this problem by checking that the PV has the correct finalizer before calling getCephClusterIDFromSCName.
  • use Patch to update .Status.CreatedAt in CreateOrUpdateMantleBackup rpc
    • (This is a kaizen.) In CreateOrUpdateMantleBackup rpc, we first need to create (or update) a MantleBackup and then update its status. This "update-after-create" process is likely to fail due to "the object has been modified" error, unless the cache for kubeapi refreshes quickly after the creation. This commit fixes this problem by using Patch instead of Update for the status.
  • test/e2e: lenghthen timeout to wait for SyncToRemote to be True
    • (This is a kaizen.) According to some tries, the current timeout 5m is too short to wait for SyncToRemote to be true in the e2e test, probably due to the exponential backoff of Requeue: true. This commit lengthens its value to 10m to resolve this problem in an ad hoc way. A fundamental resolution should come in another PR.
  • test: use Eventually to check if PV is deleted in testDeleteRestoringPV
    • This is a kaizen.

@ushitora-anqou ushitora-anqou force-pushed the fix-discard-job branch 7 times, most recently from 440f0dd to 36afb28 Compare December 6, 2024 00:48
…dary

This commit ensures that sync-mode annotation is correctly set before
discard and import Jobs are created. Currently, mantle-controller does
NOT wait for a discard Job to finish before an import Job is created,
because an empty sync-mode passes this if condition:

https://github.com/cybozu-go/mantle/blob/fc095a2c6c9944faf2889aad703ee39e41ccc872/internal/controller/mantlebackup_controller.go#L1532-L1534

This behaviour isn't what we expect. This commit fixes this problem.

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
This commit sets ImagePullPolicy to discard Jobs. Without this setting,
this field is set to Always, and it breaks our e2e tests, because
there's no "controller:latest" image on the Internet.

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
…VolumeReconciler

`getCephClusterIDFromSCName` works correctly only for storage class
names provisioned by Rook/Ceph. However, the PVs requested to
PersistentVolumeReconciler aren't necessarily provisioned by Rook/Ceph.

This commit resolves the problem above by checking that the PV has the
correct finalizer before calling `getCephClusterIDFromSCName`.

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
In CreateOrUpdateMantleBackup rpc, we first need to create (or update) a
MantleBackup and then update its status. This "update-after-create"
process is likely to fail due to "the object has been modified" error,
unless the cache for kubeapi refreshes quickly after the creation. This
commit fixes this problem by using Patch instead of Update for the
status.

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
According to some tries, the current timeout 5m is too short to wait for
SyncToRemote to be true in the e2e test, probably due to the exponential
backoff of `Requeue: true`. This commit lengthens its value to 10m to
resolve this problem in an ad hoc way. A fundamental resolution should
come in another PR.

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
@ushitora-anqou ushitora-anqou marked this pull request as ready for review December 6, 2024 04:08
@satoru-takeuchi satoru-takeuchi merged commit 70f7007 into main Dec 6, 2024
3 checks passed
@satoru-takeuchi satoru-takeuchi deleted the fix-discard-job branch December 6, 2024 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants