You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm looking to receive some advice or at the very least receive some insights into how everyone else is leveraging autoscaling when using Thanos and specifically in a router-ingestor configuration thats using the thanos-receive-controller for managing the hashring. I've run into some different quirks with Thanos while implementing Keda for the ingestor pods. For example, similar to the user in this discussion I've been noticing a rise in 500's and 503's when scaling events occur. I've monitored the hashring to ensure it was getting updated promptly during scaling events (it has). I've also noticed a complete halt in ingestion once we reach a high replica count (15+ pods) that I haven't been able to find an explanation for.
I know there are also concerns about scaling down too fast and accidentally losing data if you scale down before the retention window expires so we intentionally slow down the scale down operation, however I've also read in some places that scaling up can also be disruptive to Thanos. All in all, there doesn't seem to be much documentation for addressing autoscaling Thanos components so I wanted to see if/how everyone else is doing it and if there are any best practices that are known when pairing the two. The router component is also being scaled by the k8s HPA but we wanted to use Keda for the ingestor so we could use custom metrics to scale.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
I'm looking to receive some advice or at the very least receive some insights into how everyone else is leveraging autoscaling when using Thanos and specifically in a router-ingestor configuration thats using the thanos-receive-controller for managing the hashring. I've run into some different quirks with Thanos while implementing Keda for the ingestor pods. For example, similar to the user in this discussion I've been noticing a rise in 500's and 503's when scaling events occur. I've monitored the hashring to ensure it was getting updated promptly during scaling events (it has). I've also noticed a complete halt in ingestion once we reach a high replica count (15+ pods) that I haven't been able to find an explanation for.
I know there are also concerns about scaling down too fast and accidentally losing data if you scale down before the retention window expires so we intentionally slow down the scale down operation, however I've also read in some places that scaling up can also be disruptive to Thanos. All in all, there doesn't seem to be much documentation for addressing autoscaling Thanos components so I wanted to see if/how everyone else is doing it and if there are any best practices that are known when pairing the two. The router component is also being scaled by the k8s HPA but we wanted to use Keda for the ingestor so we could use custom metrics to scale.
Beta Was this translation helpful? Give feedback.
All reactions