Here is short HOWTO to maintain and troubleshoot GKE and cryptonodes.
You may also use official doc to troubleshoot GKE cluster.
We assume kubectl default context is configured correctly to connect to the required cluster.
Locate required pods
kubectl get pod
Check pod logs (-f
to follow)
kubectl logs -f tezos-0 --tail=10
Check pod info to troubleshoot startup problems, liveness check problems, etc:
kubectl describe pod tezos-0
Restart pod if hung
kubectl delete pod tezos-0
Check allocated disk size
kubectl get pvc
Shell exec to container to check/troubleshoot from inside
kubectl exec -it tezos-0 sh
Let's assume you need to upgrade Tezos from v7-release
to v8-release
. Here is what you need to do:
- update
values-dev.yaml
you used before to deploy Tezos or create newvalues-dev.yaml
file with following content
image:
repository: tezos/tezos
tag: v8-release
- upgrade Tezos helm release in the cluster, we use release named
tezos
in example below
cd tezos-kubernetes
helm upgrade tezos charts/tezos/ --reuse-values --force --atomic --values values-dev.yaml
Let's assume you need to snapshot disk from pod tezos-0
. First we need to find what disk is actually used by this pod
kubectl describe pod tezos-0
Check output for Volumes, my case is
Volumes:
tezos-pvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: tezos-pvc-tezos-0
ReadOnly: false
We get ClaimName: tezos-pvc-tezos-0
, now we need to find corresponding PersistentVolume
:
kubectl describe pvc tezos-pvc-tezos-0
Check output for Volumes, my case is
Volume: pvc-b2148859-fb11-4513-afa1-fd9f5c8c4d82
And now we need to get disk name from PV
kubectl describe pv pvc-b2148859-fb11-4513-afa1-fd9f5c8c4d82
Check output for Source
Source:
Type: GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
PDName: gke-tezos-node-1-f9835-pvc-b2148859-fb11-4513-afa1-fd9f5c8c4d82
We get gke-tezos-node-1-f9835-pvc-b2148859-fb11-4513-afa1-fd9f5c8c4d82
, that's the name of disk we need to snapshot.
You may use official doc to create snapshot, here is quick example command to do so
gcloud compute disks snapshot gke-tezos-node-1-f9835-pvc-b2148859-fb11-4513-afa1-fd9f5c8c4d82 --snapshot-names=tezos-snapshot
It may be better to stop blockchain node to get consistent snapshot with high probability. Here is how you can do it for example with Tezos node:
kubectl scale statefulset tezos --replicas=0
Wait 1 minute and then create the snapshot. Use following command to start node again:
kubectl scale statefulset tezos --replicas=1
You may need to convert your snapshot to an image, for example to share the image
gcloud compute images create tezos-2020-06-01 --source-snapshot=tezos-snapshot
When you have someone who shared pre-synced cryptonode disk image with you, you can create a new disk from this image and use it with your cryptonode, and here is how.
- (optional) copy disk image to your project
gcloud compute images create tezos-2020-06-01 --source-image=tezos-2020-06-01 --source-image-project=<SOURCE-PROJECT>
- just create SSD disk from the image, pay attention to zone, it must be the same as your GKE cluster
gcloud compute disks create tezos --type pd-ssd --zone us-central1-a --image=tezos-2020-06-01 --image-project=<SOURCE-PROJECT>
Now we need to create PersistentVolume in Kubernetes(PV) and use this PV with PersistentVolumeClaim (PVC) we already have.
- adjust pv.yaml with your disk name, zones etc. In this manual we assume you have required storage classes already from cryptonode deployment.
- create PV via following command, use one of them:
kubectl create -f pv.yaml
- shutdown cryptonode:
kubectl scale statefulset tezos --replicas=0
give it some time to shutdown, you can monitor it with kubectl get pod -w
usually
- replace existing PVC by a copy with another disk name
tezos
, I usetezos-pvc-0
PVC in the example below:
# backup just in case
kubectl get pvc tezos-pvc-tezos-0 -o yaml > tezos-pvc-0.yaml
kubectl get pvc tezos-pvc-tezos-0 -o json|jq '.spec.volumeName="tezos-0"'| kubectl replace --force -f -
- start cryptonode up and check logs
kubectl scale statefulset tezos --replicas=1
kubectl get pod -w
kubectl logs -f tezos-0