Skip to content

Releases: libopenstorage/stork

23.11.0

22 Jan 04:23
Compare
Choose a tag to compare

New Features

  • You can now create and delete schedule policies and migration schedules using the new storkctl CLI feature. This enables you to seamlessly create and delete SchedulePolicy and MigrationSchedule resources, enhancing the DR setup process.
    In addition to the existing support for clusterPairs, you can now efficiently manage all necessary resources through storkctl. This update ensures a faster and simpler setup process, with built-in validations. By eliminating the need for manual YAML file edits, the feature significantly reduces the likelihood of errors, providing a more robust and user-friendly experience for DR resource management in Kubernetes clusters.

  • The new Storage Class parameter preferRemoteNode enhances scheduling flexibility for SharedV4 Service Volumes. By setting this parameter to false, you can now disable anti-hyperconvergence during scheduling. This provides an increased flexibility to tailor Stork's scheduling behavior according to your specific application needs.

Enhancement

  • Updated golang and google-cloud-sdk versions to resolve security vulnerabilities. #1587, #1588

Bug Fixes

  • Issue: Exclusion of Kubernetes resources such as deployments, statefulsets, and so on was not successful during migration.
    User Impact: The use of labels to exclude selectors proved ineffective in scenarios where the resource was managed by an operator that reset user-defined labels.
    Resolution: The introduction of the excludeResourceTypes feature now allows users to exclude certain types of resources from migration, providing a more effective solution compared to using labels. #1554

  • Issue: The applicationrestore function created using storkctl consistently restored to a namespace with the identical name as the source, causing users to be unable to restore to a different namespace.
    User Impact: Users faced limitations as they were unable to restore applications to a namespace other than the one with the same name as the source.
    Resolution: storkctl has been updated to address this issue by introducing support for accepting namespace mapping as a parameter, allowing users to restore to a different namespace as needed. #1545

  • Issue: The storkctl create clusterpair command was not functioning properly with HTTPS PX endpoints.
    User Impact: Migrations between clusters with SSL-enabled PX endpoints were not successful.
    Resolution: The issue has been addressed, and now both HTTPS and HTTP endpoints are accepted as source (src-ep) and destination (dest-ep) when using storkctl create clusterpair. #1537

  • Issue: The PostgreSQL operator generates an error related to the pre-existence of service account, role, and role bindings following a migration.
    User Impact: Users are unable to scale up a PostgreSQL application installed via OpenShift Operator Hub after completing the migration.
    Resolution: Excluded migration of service account, role, and role bindings if they have owner reference set to allow PostgreSQL pods to come up successfully. #1560

  • Issue: Real-Time Custom Resource (RT CR) enters a failed state when a transform rule includes either int or bool as a data type.
    User Impact: Migration involving resource transformation will not succeed.
    Resolution: Resolved the issue by addressing the parsing problem associated with int and bool types. #1532

  • Issue: Continuous crashes occur in Stork pods when the cluster contains a RT CR with a rule type set as slice and the operation is add.
    User Impact: Stork service experiences ongoing disruptions.
    Resolution: Implemented a solution by using type assertion to prevent the panic. Additionally, the problematic SetNestedField method is replaced with SetNestedStringSlice to avoid panics in such scenarios. You can also temporarily resolve the problem by removing the RT CR from the application cluster. #1530

  • Issue: Stork crashes when attempting to clone an application with CSI volumes using Portworx.
    User Impact: Users are unable to clone applications if PVCs in the namespaces utilize Portworx CSI volumes.
    Resolution: Now, a patch is included to manage CSI volumes with Portworx, which ensures the stability of application cloning functionality. #1591

  • Issue: When setting up a migration schedule in the admin namespace with pre/post-execution rules, these rules must be established in both the admin namespace and every namespace undergoing migration.
    User Impact: The user experience is less intuitive as it requires creating identical rules across multiple namespaces.
    Resolution: The process is now simplified as rules only require addition within the migration schedule's namespace. #1569

  • Issue: Stork was not honoring locator volume labels correctly when scheduling pods.
    User Impact: In cases where preferRemoteNodeOnly was initially set to true, pods sometimes failed to schedule. This issue was particularly noticeable when the Portworx volume setting preferRemoteNodeOnly was later changed to false, and there were no remote nodes available for scheduling.
    Resolution: Now, even in scenarios where remote nodes are not available for scheduling, pods can be successfully scheduled on a node that holds a replica of the volume. #1606

Known Issues

  • Issue: In Portworx version 3.0.4, several migration tests fail in an auth-enabled environment. This issue occurs specifically when running these tests in environments where authentication is enabled.
    User Impact: You may experience failed migrations, which will impact data transfer and management processes.
    Resolution: The issue has been resolved in Portworx version 3.1.0. Users experiencing this problem are advised to upgrade to version 3.1.0 to ensure smooth migration operations and avoid permission-related errors during data migration processes.

  • Issue: When using the storkctl create clusterpair command, the HTTPS endpoints for Portworx were not functioning properly.
    User Impact: This issue affects when you attempt migrations between clusters where px endpoints were secured with SSL. As a result, migrations could not be carried out successfully in environments using secure HTTPS connections.
    Resolution: In the upcoming Portworx 3.1.0 release, the storkctl create clusterpair command will be updated to accept both HTTP and HTTPS endpoints, allowing the specification of either src-ep or dest-ep with the appropriate scheme. This update ensures successful cluster pairing and migration in environments with SSL-secured px endpoints.

23.9.1

15 Dec 14:21
Compare
Choose a tag to compare

Bug Fixes

  • Issue: The generic backup of some PVCs in kdmp was failing due to the inclusion of certain read-only directories and files.
    User Impact: Difficulties restoring the snapshot as the restoration of these read-only directories and files resulted in permission denied errors.
    Resolution: Introduced the --ignore-file option in kdmp backup, enabling you to specify a list of files and directories to be excluded during snapshot creation. This ensures that during restoration, these excluded files and directories will not be restored. #1572

    Format for adding the ignore file list:

    KDMP_EXCLUDE_FILE_LIST: |
        <storageClassName1>=<dir-list>,<file-list1>,....
        <storageClassName2>=<dir-list>,<file-list1>,....
    

    Sample for adding the ignore file list:

    KDMP_EXCLUDE_FILE_LIST: |
        px-db=dir1,file1,dir2
        mysql=dir1,file1,dir2
    
  • Issue: The backup process does not terminate when an invalid post-execution rule is applied, leading to occasional failures in updating the failure status to the application backup CR.
    User Impact: Backups with invalid post-execution rules were not failing as expected.
    Resolution: Implemented a thorough check to ensure that backups with invalid post-execution rules are appropriately marked as failed, accompanied by a clear error message. #1582

23.9.0

07 Dec 08:07
Compare
Choose a tag to compare

New Feature

Enhanced support for Kubevirt VMs in Portworx Backup

This feature facilitates the backup and restoration of Kubevirt VMs through Portworx Backup. When Kubevirt VMs are included in the backup object, the restoration process from this backup ensures successful transformation of VMs, incorporating DataVolumeTemplate and adjusting masquerade interface configurations to ensure a successful completion of the restore operation.

Bug Fixes

  • Issue: Occasionally, the restoration process encounters an error stating resourceBackup CR already exists when the reconciler attempts to re-enter.
    User Impact: The restore operation is unsuccessful due to this error.
    Resolution: The issue has been resolved by addressing the issue of the already-existing resourceBackup CR error. The fix involves handling the error by ignoring it during the creation of the ResourceBackup CR. #1482

  • Issue: The attempt to add Tencent Cloud object storage fails during the objectlock support check due to a discrepancy in error reporting when object lock is not supported.
    User Impact: Users are unable to utilize Tencent Cloud object storage as a backup location for backup and restore operations.
    Resolution: A solution has been implemented by incorporating appropriate error handling checks during the verification of objectlock support for buckets in Tencent Cloud object storage. #1478

  • Issue: A warning event is recorded in the applicationbackup CR when the S3 bucket already exists.
    User Impact: The warning event is causing confusion for users.
    Resolution: To address this, the system now refrains from generating a warning event if the S3 object store indicates the ErrCodeBucketAlreadyExists code. #1481

  • Issue: Backing up the kube-system namespace is not a supported feature. However, in the case of all-namespace backups or backups based on namespace labels, the kube-system namespace is inadvertently included.
    User Impact: This inclusion of the kube-system namespace in backups causes complications during the restore process.
    Resolution: This issue has been resolved by excluding the kube-system namespace from all-namespace backups and backups based on namespace labels. #1506

  • Issue: The restore process based on CSI encounters failures in setups with the csi-snapshot-webhook admin webhook. This failure is attributed to a distinct error related to existing resources, specifically when creating the volumesnapshotclass resource.
    User Impact: Users are affected by the inability to perform CSI-based restores on setups featuring the csi-snapshot-webhook admin webhook.
    Resolution: The issue has been addressed by incorporating a pre-check through a get call before the create call. Now, the create call occurs only if the get call fails with a NotFound error, preventing conflicts related to existing resources. #1567

23.8.0

12 Oct 19:25
0968bda
Compare
Choose a tag to compare

New Features

  • Stork now supports both asynchronous and synchronous DR migration for applications managed by operators. When startApplications is set to false for migrations, Stork ensures that application pods remain inactive in the destination cluster after migration. Additionally, Stork provides the flexibility to scale down applications by modifying Custom Resource (CR) specifications, using the suspend options feature. For applications controlled by clusterwide operators that do not support scaling down via CR spec modifications, Stork offers a "stash strategy" to prevent application pods from becoming active prematurely during migration, ensuring a seamless transition to the destination cluster. #1451

  • Stork now supports import workflows using the DataExport CRs. This means you can now seamlessly transfer data from one PVC to another within the same cluster using rsync.

Improvement

  • Stork now enables you to optimize resource utilization and ensure a more efficient and targeted restoration of applications. You can now specify resourceTypes during the application backup process, allowing for a more granular selection of resources to be included in the backup. Moreover, when initiating an application restore, you can also choose specific resources to restore, providing greater flexibility in the recovery process.

Bug Fixes

  • Issue: ReplicaSets that had been migrated and were not under the management of deployments were unable to be activated or deactivated using storkctl.
    User Impact: Users should manually scale up or down the ReplicaSet using kubectl.
    Resolution: As part of the activation or deactivation process, storkctl is now capable of scaling up or down the migrated ReplicaSet. #1471

  • Issue: When using Stork 23.7, KubeVirt pods encounter a CrashLoopBackoff state, accompanied by the following error messages within the pod logs:

    /usr/bin/virt-launcher-monitor: /lib64/libc.so.6: version GLIBC_2.34' not found (required by /etc/px_statfs.so) 
    /usr/bin/virt-launcher-monitor: /lib64/libc.so.6: version GLIBC_2.33' not found (required by /etc/px_statfs.so)
    

    User Impact: KubeVirt VMs become unusable.
    Resolution: This issue has been resolved in Stork version 23.8. For any existing virt-launcher pods experiencing the
    CrashLoopBackOff state due to this bug, follow these steps after upgrading to Stork 23.8:

    1. Stop the KubeVirt VM.
    2. Restart the KubeVirt VM. #1493
  • Issue: Following migration, an ECK application installed using an operator demands all associated Custom Resource Definitions (CRDs) to be present to initiate successfully.
    User Impact: Users experiences difficulties in scaling up the Elasticsearch application due to the absence of essential CRDs after the migration.
    Resolution: To address this issue, the migration process will include the migration of associated CRDs for a Custom Resource, thereby preventing any obstacles in scaling up the ECK application post-migration. #1494

  • Issue: Following migration, applications controlled by Custom Resources (CRs) are automatically initiated if the operator is already operational in a distinct namespace.
    User Impact: This results in applications in the destination cluster starting unexpectedly, contrary to the desired behavior where they should remain inactive with startApplication: false.
    Resolution: To rectify this, a stashing strategy has been implemented for the CR content, storing it in a configmap. This ensures that the CR specification is applied only after activating the migration, allowing the applications to start as intended. #1451

23.7.3

26 Sep 20:25
Compare
Choose a tag to compare

Bug Fix

Issue: Migrations using a cluster pair created using —unidirectional option fail due to the absence of object store information in the destination cluster.
User Impact: Users couldn't run migrations with unidirectional cluster pair.
Resolution: Create object store information in the destination cluster for the migrations to succeed.
#1501, #1507, #1510

23.7.2

01 Sep 15:26
Compare
Choose a tag to compare

Bug Fix

Issue: If the status is larger than the maximum etcd request size (1.5M), the update of the kdmp generic backup status in the Volumebackup CR will fail.
User Impact: At times, the failure of the update to the Volumebackup CR causes the kdmp backup to also fail.
Resolution: Currently, the practice involves refraining from updating the actual status in the Volumebackup CR if it is large. Instead, the update is made within the log of the job pod. #303

23.7.1

22 Aug 15:17
Compare
Choose a tag to compare

Bug Fixes

  • Issue: The aws-iam-authenticator binary size was zero in the Stork container.
    User Impact: If kubeconfig is used with aws-iam-authenticator, then user will encounter failure when creating a cluster pair on Amazon EKS.
    Resolution: Updated the aws-iam-authenticator and the curl options to make sure binary gets downloaded successfully. #1472

  • Issue: Updates made to the parameter associated with large resources in the stork-controller-config configmap are not
    preserved when the Stork pod is restarted.
    User Impact: Whenever Stork pods restart, user needs to update the parameter related to large resources in the
    stork-controller-config configmap.
    Resolution: Ensure that updated value of the large-resource related parameter in the stork-controller-config configmap persists
    across Stork pod restarts. #1473

  • Issue: The backup object has an empty volume size for the Portworx volume, when you use a Stork version that is newer than
    23.6.0 but Portworx version is older than 3.0.0.
    User Impact: Portworx volumes with a version below 3.0.0 will display a volume size of zero.
    Resolution: Retrieval of volume size has been handled regardless of the versions of Stork and Portworx. #1474

23.7.0

02 Aug 17:03
Compare
Choose a tag to compare

New Features

Improvements

  • Updated golang, aws-iam-authenticator and google-cloud-sdk versions to resolve 20 Critical and 105 High vulnerabilities reported by the JFrog scanner. #1458

  • Added kubelogin utility in Stork container. #1448

Bug Fixes

  • Issue: Due to occasional delays in bound pod deletion, the current timeout setting of 30 seconds proves insufficient, resulting in backup failures.
    User Impact: At times, the backup process for a volume, which is bound with the "WaitForFirstConsumer" mode, encounters timeout errors and fails.
    Resolution: The timeout value has been extended to five minutes to ensure that the deletion of the bound pod, created for binding the volume with "WaitForFirstConsumer," will not encounter timeout errors. #1454

  • Issue: During the cleanup process, when the KDMP/Localsnapshot backup fails, the volumesnapshot/volumesnapshotcontent were not being removed.
    User Impact: Unnecessary accumulation of the stale volumesnapshot/volumesnapshotcontent was occurring.
    Resolution: Volumesnapshot/volumensnapshotcontent cleanup is now performed, even in the case of failed KDMP/localshapshot backups. #295

  • Issue: When the native CSI backup fails with a timeout error, the volumesnapshotcontent is not being deleted.
    User Impact: In the event of native CSI backup failures, the volumesnapshotcontent will accumulate.
    Resolution: Proper handling includes deleting the volumesnapshotcontent in case of failure as well. #1460

23.6.0

27 Jun 07:18
Compare
Choose a tag to compare

New Features

  • Added support for NFS share backup location for the applicationbackup and applicationrestore in stork. This support is currently supported only with the PX-Backup product. #1434

Bug Fixes

  • Issue: Update calls were happening to the volumesnapshotschedule in each 10 seconds even if there is no new update.
    User Impact: With lots of volumesnapshotschedules running, it is unnecessary load on api-server.
    Resolution: Avoiding updates if there is no change to volume snapshot list as part pf pruning. #1415

  • Issue: Restore size was taken from volumesnapshot size because for some CSI driver PVC was not bounded, if Volumesnapshot size less than source volume size.
    User Impact: CSI restore was failing with some of the storage provisioner.
    Resolution: Updating the restore volume size only when the volumesnapshot size is greate than source volume size. #1445

  • Issue: Mysql app can show inconsistent data after being restored.
    User Impact: All Mysql backup may fail to run mysql application after restore operation due to data inconsistency
    Resolution: Fixed the data inconsistency by holding the table lock for a required interval while backup is in progress. #1436

23.5.0

31 May 07:47
Compare
Choose a tag to compare

New Features

  • You can now provide namespace labels in the MigrationSchedule spec. This enables the specification of namespaces to be migrated. #1395

Improvements

  • Stork now supports the ignoreOwnerReferences parameter in the Migration and MigrationSchedule objects. This parameter enables Stork to skip the owner reference check and migrate all resources, and it removes the ownerReference while applying the resource. This allows migrating all the Kubernetes resources managed and owned by an application Operator’s CR. #1398
    NOTE: You need to update the storkctl binary for this change to take effect.

Bug Fixes

  • Issue: Restoring from a backup, which was taken from a previously restored PVC, was failing in the CSI system.
    User Impact: Unable to restore a backup that had been taken from a previously restored CSI PVC.
    Resolution: You can now successfully perform CSI restores using backups taken from already restored PVCs. #1409

  • Issue: You may encounter webhook errors when attempting to modify StatefulSets in order to update Stork as the scheduler.
    User Impact: Webhook related errors may give you the impression that the pod scheduler feature does not function properly.
    Resolution: Removing webhook for StatefulSets and Deployments as Stork already contains a webhook for pods that manages setting Stork as the scheduler. #1373

  • Issue: Incorrect migration status for service account deletion on source cluster.
    User Impact: The expected behavior to delete the service account on the destination cluster, based on migration status, does not occur.
    Resolution: Do not display the purged status for resources for which merging is supported on the destination cluster. #1368

  • Issue: Storkctl create clusterpair does not honor the port provided from CLI.
    User Impact: You cannot create bi-directional clusterpairs.
    Resolution: To create clusterpair, pass the port provided from CLI in the endpoints. #1383
    NOTE: You need to update the storkctl binary for this change to take effect.

  • Issue: Stork will delete pods running on the Degraded or Portworx StorageDown nodes.
    User Impact: There will be a disruption on application, even though drivers like Portworx support running applications on the StorageDown nodes.
    Resolution: Stork will not delete pods running on the StorageDown nodes. #1385