Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Trident volumes across the K8s cluster using Velero, PVC remains in pending status #957

Open
JacieChao opened this issue Dec 16, 2024 · 2 comments
Labels

Comments

@JacieChao
Copy link

Describe the bug
I want to migrate my Stateful applications from one cluster to another, and I used Velero with the CSI plugin to help migrate with Trident CSI volumes. After restoring the application on another cluster, the PVC remains in pending status.

Environment
Provide accurate information about the environment to help us reproduce the issue.

  • Trident version: v24.10.0
  • Trident installation flags used: helm install trident helm/trident-operator-100.2410.0.tgz -n trident
  • Container runtime: containerd v1.7.11-k3s2
  • Kubernetes version: v1.26.15
  • Kubernetes orchestrator: Rancher v2.8.7
  • OS: Ubuntu 22.04
  • NetApp backend types: ONTAP with AWS FSx

To Reproduce
Steps to reproduce the behavior:

  • Create a Stateful application on cluster A with Trident CSI.
  • Install Velero with CSI plugin on cluster A
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.10.0 \
--bucket ${s3-bucket-name} \
--backup-location-config region=${bucket-region} \
--snapshot-location-config region=${bucket-region} \
--features=EnableCSI \
--secret-file ./aws-credentials \
--kubeconfig cluster1.yaml
  • Backup this Stateful application under specified namespace by Velero.
  • Install Velero with CSI plugin on Cluster B
  • Restore with the backup on Cluster B by Velero
  • The PVC on cluster B remains in Pending status

Expected behavior
The PVC can be migrated across the clusters.

Additional context
After migration, I checked the VolumeSnapshot YAML which is the dataSource of migrated PVC. I got the error below:

failed to list snapshot for content velero-test-postgresql-1-jmc97-tcb7d: "rpc error: code = NotFound desc = snapshot content snapshot-53b6a703-68b5-4999-87e3-c2307f4ea932 not found;"

This Trident Snapshot CRD is not created automatically. Then I migrate the specified Trident Snapshot CRD manually to Cluster B. After migrating the Trident Snapshot CRD I can't get snapshots using tridentctl which can be found after restart the Trident controller pod.

Then the PVC got the new error message with:

failed to list snapshot for content velero-test-postgresql-1-jmc97-tcb7d: "rpc error: code = Internal desc = volume pvc-ea2b8b61-c3a9-4c0b-bb62-e676f4e53c4d was not found"

The trident Volume CRD also need to migrate to new cluster. Then I migrate the specified Trident Volume CRD manually to Cluster B. After migrating the Trident Volume CRD I can't get volumes using tridentctl which can be found after restart the Trident controller pod.

Even if I have the same Trident backend on two clusters I can't migrate the stateful application with trident volumes automatically. And the tridentctl and controllers can read the new Trident CRDs that migrate manually after restart the Trident controller instead of CRD changes.

I used Velero to migrate stateful applications with other CSI, and the CSI volumes can be handled properly without manual migrate CRDs.

So can we support migrate Trident Volumes across clusters? Or is there some other guidelines that I need to do to support so?

@JacieChao JacieChao added the bug label Dec 16, 2024
@wonderland
Copy link

In case you only change K8s clusters but stay in the same region and on the same FSxN system, Tridents volume import capability might be the better choice. It doesn't require any data movement and brings the existing PVC into another cluster. Roughly like this:

  • Make sure PV has a retain policy so it stays around when removing the PVC
  • Delete the PVC on the source cluster to safely detach it and make sure it is no longer in use
  • Note the internal name of the volume on FSxN (csi.volumeAttributes.internalName of the PV)
  • Use tridentctl import on the destination cluster to import the existing volume as a PVC. Note that this will rename the volume in FSxN, which is another safeguard so the source cluster can no longer access it
  • Clean up the stale PV object on the source cluster by patching it to a Delete policy

See https://docs.netapp.com/us-en/trident/trident-use/vol-import.html for details.

You can also take a look at Trident protect, which can be used for migration use cases, see https://docs.netapp.com/us-en/trident/trident-protect/trident-protect-migrate-apps.html

In case you need to migrate between regions/FSxN systems, the Trident protect replication feature might be a good match as well: https://docs.netapp.com/us-en/trident/trident-protect/trident-protect-use-snapmirror-replication.html

@JacieChao
Copy link
Author

@wonderland Thanks a lot.
I validated migrating pvc by tridentctl import in my test environment and it works well. I will try other scenarios later and give a feedback about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants