Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(alerts): add instance label to KubeAggregatedAPIErrors #991

Merged
merged 6 commits into from
Dec 10, 2024

Conversation

sebastiangaiser
Copy link
Contributor

Supersede of #774

@skl please let me know what you think about this ;)

Signed-off-by: Sebastian Gaiser <sebastiangaiser@users.noreply.github.com>
@sebastiangaiser
Copy link
Contributor Author

Another idea would be to add a for: 10m.

Copy link
Collaborator

@skl skl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing and drawing my attention to this!

I ended up doing a bit of a deep dive to try and understand the metric properly, left you a suggestion on improving the PromQL query in the review.

Please could you add the 'for': '10m' to the alert?

And update the description to something like:

description: 'Kubernetes aggregated API {{ $labels.instance }}/{{ $labels.name }} has reported {{ $labels.reason }} errors%s.' % [
  (utils.ifShowMultiCluster($._config, ' on cluster {{ $labels.%(clusterLabel)s }}' % $._config),
],

I'm hoping all of that would resolve the underlying issue, wdyt?

alerts/kube_apiserver.libsonnet Outdated Show resolved Hide resolved
Co-authored-by: Stephen Lang <skl@users.noreply.github.com>
…adjust description

Signed-off-by: Sebastian Gaiser <sebastiangaiser@users.noreply.github.com>
@sebastiangaiser
Copy link
Contributor Author

Sounds good. Thank you @skl for investing time here, I added all your suggestions.

@skl skl changed the title fix(alerts): use sum by instance max for 'KubeAggregatedAPIErrors' fix(alerts): add instance label to KubeAggregatedAPIErrors Dec 10, 2024
Signed-off-by: Stephen Lang <stephen.lang@grafana.com>
@skl
Copy link
Collaborator

skl commented Dec 10, 2024

Fixed a syntax error for you in 21c0b87

@skl
Copy link
Collaborator

skl commented Dec 10, 2024

Lint failure inherited from main, this can be ignored as I'm dealing with that separately

prometheus_alerts.yaml:342-346 Warning: Aggregation using `without()` can be fragile when used inside binary expression because both sides must have identical sets of labels to produce any results, adding or removing labels to metrics used here can easily break the query, consider aggregating using `by()` to ensure consistent labels. (promql/fragile)

@skl skl merged commit d6ab1a7 into kubernetes-monitoring:master Dec 10, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants