Releases: libopenstorage/stork
v24.3.2
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PB-4394 | KDMP restore failed when the snapshot size exceeded the PVC size. User Impact: Users experienced failures during KDMP restore for filesystem-based storage provisions when the PVC content was larger than the PVC size. Resolution: Modified the PVC size to match the snapshot size. Affected Versions: 24.3.1 and earlier |
Major |
PB-8316 | Backups were incorrectly marked as successful, not partial, even when some volume backups failed. User Impact: Users were led to assume that all PVCs were successfully backed up, even when some had failed. Resolution: Updated the in-memory value of failedVolCount and the backup object to accurately reflect the number of failed backups.Affected Versions: 24.3.0 and 24.3.1 |
Major |
PB-8360 | Addition of IBM COS backup location failed with an UnsupportedOperation error when the bucket was unlocked.User Impact: Users could not add an IBM COS backup location if it was unlocked. Resolution: The UnsupportedOperation error is now ignored for unlocked IBM COS buckets, indicating the bucket is not locked.Affected Versions: 24.3.1 |
Major |
PB-7726 | VM Backup failed while executing auto exec rules if the virt-launcher pod of the VMs was not in a running state.User Impact: VM Backups failed when auto exec rules were applied. Resolution: Auto exec rules are now only executed on running virt-launcher pods.Affected Versions: 24.3.0 and 24.3.1 |
Major |
Known issues (Errata)
Issue Number | Issue Description | Severity |
---|---|---|
PWX-38905 | If a StorageClass is deleted on the source and an asynchronous DR operation is performed for resources (PV/PVC) that use the deleted StorageClass, the migration fails with the following error:Error updating StorageClass on PV: StorageClass.storage.k8s.io <storageClassName> not found Workaround: You need to recreate the deleted StorageClass to proceed with the DR operation. Affected versions: 24.3.0, 24.3.1, and 24.3.2. |
Major |
v24.3.1
Improvements
Improvement Number | Improvement Description |
---|---|
PWX-39128 | If the AWS init gets stuck for a long time and prevents Stork from starting up, you can skip the AWS driver init from the Stork imports by adding the environment variable SKIP_AWS_DRIVER_INIT="true" to the stork pod. |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-38383 | In certain scenarios, Kubernetes etcd was overloaded, and stork pods went into CrashLoopBackOff state with the following error:Controller manager: failed to wait for snapshot-schedule-controller caches to sync: timed out waiting for cache to be synced .User Impact: Stork failed and restarted multiple times due to the overloading of Kubernetes etcd .Resolution: We've added a --controller-cache-sync-timeout flag, using which you can tweak the value of the cache sync timeout based on your requirements. The default value is 2 minutes.Example: --controller-cache-sync-timeout=10 - sets the controller cache sync timeout as 10 minutes instead of the default 2 minutes.Affected Versions: 24.3.0 and earlier |
Minor |
PWX-36167 | The Stork health monitor was incorrectly considering stale node entries with an offline status for pod eviction. User Impact: If a node was repaired and returned with a different IP address, pods were inadvertently evicted from this online node due to the presence of stale node entries. Resolution: If a node entry with an 'online' storage status shares the same scheduler ID as an 'offline' node entry, the system will disregard the offline node entry when considering pod evictions. This change ensures that pods are not inadvertently evicted from nodes that have been repaired and are now online. Affected Versions: 24.3.0 and earlier |
Minor |
v24.3.0
Enhancements
- Stork now supports partial backups. If backup fails for any of the PVCs, the successful backups of other PVCs are still saved, and the status is displayed as
partial success
. #1716
Note: A partial backup requires at least one successful PVC backup. - Updated
golang
,aws-iam-authenticator
,google-cloud-cli
, andgoogle-cloud-sdk
versions to resolve security vulnerabilities. #1804 #1807
Bug fix
- Issue: In a Synchronous DR setup, when you perform a failover operation using the
storkctl perform failover
command, the witness node might be deactivated instead of the source cluster.
User Impact: After failover, the source cluster might remain in active state, and the PX volumes can still be mounted and used from the source cluster.
Resolution: After failover, now the source cluster is deactivated by default, and the witness node remains unaffected. #1829
24.2.5
Bug fix
- Issue: Strong hyperconvergence for pods is not working when using
stork.libopenstorage.org/preferLocalNodeOnly
annotation.
User Impact: Pods remain in a pending state.
Resolution: When thestork.libopenstorage.org/preferLocalNodeOnly
annotation is used, the pods are now scheduled in the node where the volume replica lies, and the strong hyperconvergence works as expected. #1818
24.2.4
Bug Fix
- Issue: During an OCP upgrade in a 3-node cluster, the MutatingWebhookConfiguration
stork-webhooks-cfg
is deleted if the leader Stork pod is evicted.
User Impact: Applications that require Stork as the scheduler will experience disruptions, and OCP upgrades will get stuck on a 3-node cluster.
Resolution: The MutatingWebhookConfiguration is now created after the leader election, ensuringstork-webhooks-cfg
is always available. #1810
Affected Versions: All
24.2.3
Note: For users currently on Stork versions 24.2.0, 24.2.1, or 24.2.2, Portworx by Pure Storage recommends upgrading to Stork 24.2.3.
Bug Fix
- Issue: If the VolumeSnapshotSchedule has more status entries than the retain policy limit, Stork may continue creating new VolumeSnapshots, ignoring the retain policy. This can happen if the retain limit was lowered or if there was an error during snapshot creation.
User Impact: Users saw more VolumeSnapshots than their retain policy was configured to allow.
Resolution: Upgrade to Stork version 24.2.3. #1800
Note: This fix doesn’t clean up the snapshots that were created before the upgrade. If required, you need to delete the old snapshots manually.
Affected Versions: 24.2.0, 24.2.1, and 24.2.2.
24.2.2
24.2.1
Enhancement
- Stork now supports Azure China environment for Azure backup locations. For more information, see Add Azure backup location.
Bug Fixes
-
Issue: If you were running Portworx Backup version 2.6.0 and upgraded the Stork version to 24.1.0, selecting the default VSC in the Create Backup window resulted in a
VSC Not Found
error.
User Impact: Experienced failures during backup operations.
Resolution: You can now choose the default VSC in the Create Backup window and create successful backups. #1744 -
Issue: If you deploy Portworx Enterprise with PX-Security enabled and take a backup on NFS backup location and then restore, restore used to fail.
User Impact: Unable to restore backups from the NFS backup location for PX security-enabled Portworx volumes.
Resolution: This issue is now fixed. #1733
24.2.0
Enhancements
-
Enhanced Disaster Recovery User Experience
In this latest Stork release, the user experience has been improved significantly, with a particular focus on performing failover and failback operations. These enhancements provide a smoother and more intuitive user experience by simplifying the process while ensuring efficiency and reliability.
Now, you can perform a failover or failback operation using the following
storkctl
commands:- To perform a failover operation, use the following command:
storkctl perform failover -m <migration-schedule> -n <migration-schedule-namespace>
- To perform a failback operation, use the following command:
storkctl perform failback -m <migration-schedule> -n <migration-schedule-namespace>
For more information on the enhanced approach, refer to the below documentation.
- To perform a failover operation, use the following command:
-
The Portworx driver is updated to optimize the API calls it makes to reduce the time taken for scheduling pods and monitoring pods if they need to be rescheduled when Portworx is down on nodes.
Bug Fixes
-
Issue: Migration schedules in the
admin
namespace were updated with true or false for theapplicationActivated
field when activating or deactivating a namespace, even if they did not migrate the particular namespace.
User Impact: Unrelated migration schedules were getting suspended.
Resolution: Stork now updates theapplicationActivated
field only for migration schedules that are migrating at least one of the namespaces being activated or deactivated. #1718 -
Issue: Updating the VolumeSnapshotSchedule resulted in a version mismatch error from Kubernetes when the update happened on a previous version of the resource.
User Impact: When the VolumeSnapshotSchedule is high, Stork logs are flooded with these warning messages.
Resolution: Fixed the VolumeSnapshotSchedule update with a patch to avoid the version mismatch error. #1665 -
Issue: Similar volume snapshot names were created when the VolumeSnapshotSchedule frequency matched and aftertrimming produced similar substrings.
User Impact: For one volume, a snapshot may not be taken but can be marked as successful.
Resolution: Adding a 4 digit randomness to the name to avoid name collisions for volumesnapshots resulting from different volumesnapshot schedules. #1686 -
Issue: Stork relies on Kubernetes DNS to locate services, but it also assumes the
.svc.cluster.local
domain for Kubernetes services.
User Impact: The clusters that modified Kubernetes DNS domains were not able to use Stork.
Resolution: Stork now works on clusters with a modified Kubernetes DNS domain. #1629 -
Issue: Resource transformation for CR was not supported.
User Impact: It was blocking some of the necessary transformations for resources that were required at the destination site.
Resolution: Now, resource transformation for CR is supported. #1705
Known Issues
- Issue: If you use the
storkctl perform failover
command to perform a failover operation, the Stork might not be able to scale down the KubeVirt pods, which could cause the operation to fail.
Workaround: Perform the failover operation by following the procedure on the below pages:
24.1.0
Enhancements
- Stork now supports Kubevirt VMs for Portworx backup and restore operations. You can now initiate VM-specific backups by setting the
backupObjectType
toVirtualMachine
. Stork automatically includes associated resources, such as PVCs, secrets, and ConfigMaps used as volumes and user data in VM backups. Also, Stork applies default freeze/thaw rules during VM backup operations to ensure consistent filesystem backups. - Cloud Native backups will now automatically default to CSI or KDMP with LocalSnapshot, depending on the type of schedules they create.
- Previously in Stork, for CSI backups, you were limited to selecting a single VSC from the dropdown under the
CSISnapshotClassName
field. Now you can select a VSC for each provisioner via theCSISnapshotClassMap
. - Now, the creation of a default VSC from Stork is optional.
Bug Fixes
-
Issue: Canceling an ongoing backup initiated by PX-Backup results in the halting of the post-execution rule.
User Impact: This interruption causes the I/O processes on the application to stop or the post-execution rule execution to cease.
Resolution: To address this, Stork executes and removes the post-execution rule CR as part of the cleanup procedure for the application backup CR. #1602 -
Issue: Generic KDMP backup/restore pods become unresponsive in environments where Istio is enabled.
User Impact: Generic KDMP backup and restore fails in the Istio enabled environments.
Resolution: Relaxed the Istio webhook checks for the stork created KDMP generic backup/restore pods. Additionally, the underlying issue causing job pod freezes has been resolved in Kubernetes version 1.28 and Istio version 1.19. #1623