Fix discard job #74

ushitora-anqou · 2024-12-04T06:33:38Z

This PR contains the following commits:

check sync-mode annot is set before starting to reconcile MB in secondary
- This commit ensures that sync-mode annotation is correctly set before discard and import Jobs are created. Currently, mantle-controller does NOT wait for a discard Job to finish before an import Job is created, because an empty sync-mode passes this if condition. This behaviour isn't what we expect. This commit fixes this problem.
set ImagePullPolicy: PullIfNotPresent to discard Job
- This commit sets ImagePullPolicy to discard Jobs. Without this setting, this field is set to Always, and it breaks our e2e tests, because there's no controller:latest image on the Internet.
check restoring PV finalizer before checking cluster ID in PersisntetVolumeReconciler
- (This is a kaizen.) getCephClusterIDFromSCName works correctly only for storage class names provisioned by Rook/Ceph. However, the PVs requested to PersistentVolumeReconciler aren't necessarily provisioned by Rook/Ceph. This commit fixes this problem by checking that the PV has the correct finalizer before calling getCephClusterIDFromSCName.
use Patch to update .Status.CreatedAt in CreateOrUpdateMantleBackup rpc
- (This is a kaizen.) In CreateOrUpdateMantleBackup rpc, we first need to create (or update) a MantleBackup and then update its status. This "update-after-create" process is likely to fail due to "the object has been modified" error, unless the cache for kubeapi refreshes quickly after the creation. This commit fixes this problem by using Patch instead of Update for the status.
test/e2e: lenghthen timeout to wait for SyncToRemote to be True
- (This is a kaizen.) According to some tries, the current timeout 5m is too short to wait for SyncToRemote to be true in the e2e test, probably due to the exponential backoff of Requeue: true. This commit lengthens its value to 10m to resolve this problem in an ad hoc way. A fundamental resolution should come in another PR.
test: use Eventually to check if PV is deleted in testDeleteRestoringPV
- This is a kaizen.

…dary This commit ensures that sync-mode annotation is correctly set before discard and import Jobs are created. Currently, mantle-controller does NOT wait for a discard Job to finish before an import Job is created, because an empty sync-mode passes this if condition: https://github.com/cybozu-go/mantle/blob/fc095a2c6c9944faf2889aad703ee39e41ccc872/internal/controller/mantlebackup_controller.go#L1532-L1534 This behaviour isn't what we expect. This commit fixes this problem. Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>

This commit sets ImagePullPolicy to discard Jobs. Without this setting, this field is set to Always, and it breaks our e2e tests, because there's no "controller:latest" image on the Internet. Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>

…VolumeReconciler `getCephClusterIDFromSCName` works correctly only for storage class names provisioned by Rook/Ceph. However, the PVs requested to PersistentVolumeReconciler aren't necessarily provisioned by Rook/Ceph. This commit resolves the problem above by checking that the PV has the correct finalizer before calling `getCephClusterIDFromSCName`. Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>

In CreateOrUpdateMantleBackup rpc, we first need to create (or update) a MantleBackup and then update its status. This "update-after-create" process is likely to fail due to "the object has been modified" error, unless the cache for kubeapi refreshes quickly after the creation. This commit fixes this problem by using Patch instead of Update for the status. Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>

According to some tries, the current timeout 5m is too short to wait for SyncToRemote to be true in the e2e test, probably due to the exponential backoff of `Requeue: true`. This commit lengthens its value to 10m to resolve this problem in an ad hoc way. A fundamental resolution should come in another PR. Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>

ushitora-anqou force-pushed the fix-discard-job branch 7 times, most recently from 440f0dd to 36afb28 Compare December 6, 2024 00:48

ushitora-anqou added 5 commits December 6, 2024 01:14

ushitora-anqou force-pushed the fix-discard-job branch from 32d19ac to 38fbc44 Compare December 6, 2024 01:16

test: use Eventually to check if PV is deleted in testDeleteRestoringPV

1e017d2

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>

ushitora-anqou requested a review from satoru-takeuchi December 6, 2024 04:08

ushitora-anqou marked this pull request as ready for review December 6, 2024 04:08

satoru-takeuchi approved these changes Dec 6, 2024

View reviewed changes

satoru-takeuchi merged commit 70f7007 into main Dec 6, 2024
3 checks passed

satoru-takeuchi deleted the fix-discard-job branch December 6, 2024 06:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix discard job #74

Fix discard job #74

ushitora-anqou commented Dec 4, 2024 •

edited

Loading

Fix discard job #74

Fix discard job #74

Conversation

ushitora-anqou commented Dec 4, 2024 • edited Loading

ushitora-anqou commented Dec 4, 2024 •

edited

Loading