-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix usage of the Metrics Timer. #135
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! I do think it might make sense keeping the job id in the dimensions. what do you say?
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #135 +/- ##
==========================================
+ Coverage 84.48% 84.76% +0.28%
==========================================
Files 22 22
Lines 709 709
==========================================
+ Hits 599 601 +2
+ Misses 110 108 -2
|
The matching is on the dimensions, so that wouldn't fix the issue. Apart from the memory issue that, I've just found out, affects Prometheus 1 only I think, if you use the job id in the dimensions, you're not aggregating anything. You're getting a separate histogram made out of one data point for each run of a job. Like this:
|
I think this is the kind of output that's useful?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool - I thought keeping the id around would make sense for debugging issues etc but TIL!
@0xTim not sure if you want to give this a once over too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
These changes are now available in 1.16.1
When using the
Timer
aggregate the measurements by status and job name rather than by status and job id.Job id is unique to each job run and therefore in the current implementation a new summary is created for each job run which doesn't collect useful metrics data and causes excessive memory usage.