Fix usage of the Metrics Timer. #135

maciejtrybilo · 2024-09-12T11:08:16Z

These changes are now available in 1.16.1

When using the Timer aggregate the measurements by status and job name rather than by status and job id.

Job id is unique to each job run and therefore in the current implementation a new summary is created for each job run which doesn't collect useful metrics data and causes excessive memory usage.

ptoffy

thanks! I do think it might make sense keeping the job id in the dimensions. what do you say?

codecov · 2024-09-12T13:32:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.76%. Comparing base (be4ac72) to head (0e88458).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #135      +/-   ##
==========================================
+ Coverage   84.48%   84.76%   +0.28%     
==========================================
  Files          22       22              
  Lines         709      709              
==========================================
+ Hits          599      601       +2     
+ Misses        110      108       -2

Files with missing lines	Coverage Δ
Sources/Queues/QueueWorker.swift	`96.66% <100.00%> (ø)`

... and 1 file with indirect coverage changes

maciejtrybilo · 2024-09-12T13:47:29Z

thanks! I do think it might make sense keeping the job id in the dimensions. what do you say?

The matching is on the dimensions, so that wouldn't fix the issue. Apart from the memory issue that, I've just found out, affects Prometheus 1 only I think, if you use the job id in the dimensions, you're not aggregating anything. You're getting a separate histogram made out of one data point for each run of a job.

Like this:

eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer_count 1
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer_sum 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.01", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.05", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.5", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.9", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.95", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.99", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.999", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer_count{success="true", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 1
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer_sum{success="true", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
# TYPE 17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer summary
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.01"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.05"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.5"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.9"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.95"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.99"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.999"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer_count 1
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer_sum 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.01", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.05", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.5", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.9", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.95", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.99", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.999", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer_count{success="true", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 1
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer_sum{success="true", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005

maciejtrybilo · 2024-09-12T14:10:16Z

I think this is the kind of output that's useful?

emailjob_jobdurationtimer_count 1002
emailjob_jobdurationtimer_sum 0.233929109
emailjob_jobdurationtimer{success="true", quantile="0.01", jobName="EmailJob"} 0.000174042
emailjob_jobdurationtimer{success="true", quantile="0.05", jobName="EmailJob"} 0.00017925
emailjob_jobdurationtimer{success="true", quantile="0.5", jobName="EmailJob"} 0.000211583
emailjob_jobdurationtimer{success="true", quantile="0.9", jobName="EmailJob"} 0.000280563
emailjob_jobdurationtimer{success="true", quantile="0.95", jobName="EmailJob"} 0.000302521
emailjob_jobdurationtimer{success="true", quantile="0.99", jobName="EmailJob"} 0.0003743125
emailjob_jobdurationtimer{success="true", quantile="0.999", jobName="EmailJob"} 0.000685083
emailjob_jobdurationtimer_count{success="true", jobName="EmailJob"} 1002
emailjob_jobdurationtimer_sum{success="true", jobName="EmailJob"} 0.233929109

ptoffy

cool - I thought keeping the id around would make sense for debugging issues etc but TIL!

ptoffy · 2024-09-12T14:24:28Z

@0xTim not sure if you want to give this a once over too

0xTim

LGTM

Fix usage of the Metrics Timer.

0e88458

maciejtrybilo requested review from gwynne and jdmcd as code owners September 12, 2024 11:08

ptoffy reviewed Sep 12, 2024

View reviewed changes

ptoffy approved these changes Sep 12, 2024

View reviewed changes

0xTim added the semver-patch Internal changes only label Sep 12, 2024

0xTim approved these changes Sep 12, 2024

View reviewed changes

0xTim merged commit 2d38cd2 into vapor:main Sep 12, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix usage of the Metrics Timer. #135

Fix usage of the Metrics Timer. #135

maciejtrybilo commented Sep 12, 2024 •

edited by penny-for-vapor bot

Loading

ptoffy left a comment

codecov bot commented Sep 12, 2024 •

edited

Loading

maciejtrybilo commented Sep 12, 2024

maciejtrybilo commented Sep 12, 2024

ptoffy left a comment

ptoffy commented Sep 12, 2024

0xTim left a comment

Fix usage of the Metrics Timer. #135

Fix usage of the Metrics Timer. #135

Conversation

maciejtrybilo commented Sep 12, 2024 • edited by penny-for-vapor bot Loading

ptoffy left a comment

Choose a reason for hiding this comment

codecov bot commented Sep 12, 2024 • edited Loading

Codecov Report

maciejtrybilo commented Sep 12, 2024

maciejtrybilo commented Sep 12, 2024

ptoffy left a comment

Choose a reason for hiding this comment

ptoffy commented Sep 12, 2024

0xTim left a comment

Choose a reason for hiding this comment

maciejtrybilo commented Sep 12, 2024 •

edited by penny-for-vapor bot

Loading

codecov bot commented Sep 12, 2024 •

edited

Loading