-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All nats-streaming pods in CrashLoopBackOff state #40
Comments
Do you have a persistence volume for the replicas? Could you share more info about the deployment, for example on which cloud is it running? |
Thank you for responding.
|
Thanks for the info. On GKE do you have automatic node upgrades enabled? That drains the nodes and restarts all instances in a way that I think could affect the quorum of the cluster if only using local disk. |
I have Automatic node upgrades = Disabled, Automatic node repair = Enabled on that cluster. But I remember that I upgraded k8s manually 4 days ago. The thing is after I upgraded the nodes, I checked that all the pods are in running state, so this happened after that. |
Not having individual PV per pod makes the whole operator worthless. |
Also because you are relying on creating individual pods (with no anti-affinity even), rather than a statefulset - you can't actually even create PVCs/PVs that would carefully match the scheduling that Kubernetes does. IHMO People should probably just stop relying on the operator and use a proper statefulset straight up. |
Was't resilience supposed to be the great benefit of deploying NATS clusters? I came this morning to the office and found ALL
nats-streaming-1-*
pods inCrashLoopBackOff
with around 500 restarts, meanwhile ALL messages have been obviously lost.Even if I delete all pods it still doesn't recover. I have to delete the whole
natsstreamingcluster.streaming.nats.io/nats-streaming-1
and recreate it to make it work.The text was updated successfully, but these errors were encountered: