Replies: 1 comment 1 reply
-
adding a
But this expression uses subquery (may be there is a better way to add We cannot create expression like this via Pyrra right now as I need |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to provide and SLO for platform services like istio, nginx-ingress-controller, etc. None of the existing SLO types like ratio/latency etc seem to be helping since I want to evaluate uptime of the nginx as a service, istiod as a service etc.
So I attempted to use BoolGauge which is promised to work for blackbox exporter type of situation.
Here is my SLO config
What I observe is that if the pods and the svc connected to those pods are up - I get
pyrra_availability = 100%
and alsoerror budget = 100%
But once I shutdown the pods to test error budget depletion.. availability metric as well as budget crashes to zero. I would have expected budget to burn down slowly.If I change the timeslot to the area where pods were up.. The pyrra_availability is reported as 100%.
Any idea what configuration is being wrongly done here? OR is this a bug in Pyrra recording rule expressions?
My analysis showed me that: both below expressions have
sum(up:sum1w{job="kubernetes-pods",namespace="app4",slo="sample-svc-uptime-slo"})
andsum(up:count1w{job="kubernetes-pods",namespace="app4",slo="sample-svc-uptime-slo"})
which in-turn use..
sum by (__name__, job, namespace) (sum_over_time(up{job="kubernetes-pods",namespace="app4"}[1w]))
and
sum by (__name__, job, namespace) (count_over_time(up{job="kubernetes-pods",namespace="app4"}[1w]))
Both these expressions sum_over_time and count_over_time have identical graphs. which is why the availability plummets to zero, I think.
Is my usage of
up
as metric wrong for such kind of SLO evaluation?Beta Was this translation helpful? Give feedback.
All reactions