fix: Add back the helm suspended metric #1111

jmleddy · 2024-11-18T22:04:51Z

At some point we had this and then we lost it. Discovered after we started suspending a bunch of things but could not get this metric to appear, meaning we are currently in a quasi-state of releases suspended across all our clusters that we don't know about.

At some point we had this and then we lost it. Discovered after we started suspending a bunch of things but could not get this metric to appear, meaning we are currently in a quasi-state of releases suspended across all our clusters that we don't know about. Signed-off-by: James M Leddy <jm.leddy@gmail.com>

stefanprodan · 2024-11-19T09:11:14Z

The suspend label has been included in the gotk_resource_info which is provided by kube-state-metrics. I recommend you migrate your alerts and dashboard to the new metric as the old ones have been deprecated long ago. Docs here: https://fluxcd.io/flux/monitoring/metrics/#resource-metrics

jmleddy · 2024-11-19T14:41:49Z

This is extra toil for us to get a back metric that we were alerting on and have completely lost. We have no idea how many helm releases are paused not applying resource request updates or whatever. And it's inconsistently applied. Why do our kustomizations still report when they are stalled but our helm releases don't? I realize that there are different maintainers that have different opinions about what metrics should be exposed, but to the end user this all just looks like "flux", since all the controllers come with flux.

For anyone that might find this PR and wonder what the kube-state-metrics config is, seems to be here

stefanprodan · 2024-11-19T15:14:04Z

I realize that there are different maintainers that have different opinions

The core maintainers make the decisions for the common behaviour of all Flux controllers and the metrics fall into this category. We made the decision to drop the resource specific metrics from the controllers exporters and rely on kube-prometheus-stack. The deprecation notice can be found here: https://fluxcd.io/flux/monitoring/metrics/#warning-deprecated-resource-metrics

This controller was last promoted to GA, so we removed the deprecated metrics from it, but we should've done that in all controllers. We'll make sure the old metrics are removed in the next release across all Flux components.

jmleddy · 2024-11-19T15:27:42Z

Thank you, though I would prefer you add back this metric everywhere to avoid requiring everyone to add 275 lines of yaml to their kube-prometheus-stack helm chart, the inconsistency is even worse than the first decision as it feels uneven. And probably also led us to slower detection of the issue as it was still finding suspends "sometimes". So looking forward to having a consistent view from the Flux controller maintainers here :)

jmleddy · 2024-11-19T15:32:43Z

Also, is this you? fluxcd/flux2-monitoring-example#35 (comment)

stefanprodan · 2024-11-19T20:44:32Z

@jmleddy you can read our motivation in this issue: fluxcd/flux2#4128

If you don't like the kube-state-metrics approach feel free to use the Flux Operator, the tradeoff is that you can't customise those metrics in any way.

jmleddy · 2024-11-19T20:58:58Z

Okay I thought the controller was running as part of the operator I must not have my kube config right. I'll run it as part of the ksm thanks!

jmleddy closed this Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Add back the helm suspended metric #1111

fix: Add back the helm suspended metric #1111

jmleddy commented Nov 18, 2024

stefanprodan commented Nov 19, 2024 •

edited

Loading

jmleddy commented Nov 19, 2024 •

edited

Loading

stefanprodan commented Nov 19, 2024 •

edited

Loading

jmleddy commented Nov 19, 2024

jmleddy commented Nov 19, 2024

stefanprodan commented Nov 19, 2024

jmleddy commented Nov 19, 2024

fix: Add back the helm suspended metric #1111

fix: Add back the helm suspended metric #1111

Conversation

jmleddy commented Nov 18, 2024

stefanprodan commented Nov 19, 2024 • edited Loading

jmleddy commented Nov 19, 2024 • edited Loading

stefanprodan commented Nov 19, 2024 • edited Loading

jmleddy commented Nov 19, 2024

jmleddy commented Nov 19, 2024

stefanprodan commented Nov 19, 2024

jmleddy commented Nov 19, 2024

stefanprodan commented Nov 19, 2024 •

edited

Loading

jmleddy commented Nov 19, 2024 •

edited

Loading

stefanprodan commented Nov 19, 2024 •

edited

Loading