-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus metric ceremony #4034
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4034 +/- ##
======================================
- Coverage 72% 72% -0%
======================================
Files 368 369 +1
Lines 58484 59126 +642
Branches 58484 59126 +642
======================================
+ Hits 42327 42658 +331
- Misses 14060 14353 +293
- Partials 2097 2115 +18
... and 25 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Possible modification:
We can add the stage name just for reason number (2) |
utilities/src/with_std/metrics.rs
Outdated
non_const_labels: &[&str; { $labels.len() - $const_labels.len() }], | ||
) { | ||
if self.drop { | ||
self.labels.insert(non_const_labels.map(|s| s.to_string())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this appends to self.labels
every time you increment come counter (and the same in all other functions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case we need to drop the metric and we have some const_label, we need to specify const_label + non_const_label when we delete it.
In order to do that we need to save the non_const_labels we use so that when we delete it we can reconstruct all the combination used (const_labels + non_const_labels1, const_labels + non_const_labels2, etc...)
I will change the name of the field to something like non_const_label_used to make it more clear!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. You have to do this because there is no way to partially match labels when deleting (e.g. everything with a given cermeony_id) and we don't know all labels in advance? Seems like there is quite a bit of overhead/complexity just to increment an integer, and this feels error prone.
I'm just thinking out loud, but do we really need to use things like IntCounterVec
? It is not that difficult to produce the values in the format that prometheus expects. Say, if we stored these metrics in our own datastructure, we could easily have things like "delete all entires for a given ceremony id", e.g. using a map with ceremony id as the key. I don't think there is anything else we'd need to worry about with regards to cleaning up. Just feels like most difficulties/complexity comes from the fact that we have to work around the library's limitations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, unfortunately it is not possible to just specify a single label when deleting, you need to provide the full combination of labels in the correct order (I think because behind the scenes IntCounterVec is concatenating the string and hashing them to obtain a key to the map containing the actual value).
And yeah probably creating our own version of the prometheus client which we can tune to our needs would reduce complexity a lot! I guess we decided to go with the prometheus library at the beginning when we didn't know yet all the metrics required and how to handle them.
This is definitely something worth keeping in mind for the future, rn I think it is not worth to change everything tho.
…ONY_TIMEOUT_MISSING_MSG
…of wrapper, passed as a parameter when calling the macro
e7c17ae
to
7e90d22
Compare
@@ -346,23 +351,28 @@ macro_rules! build_counter_vec_struct { | |||
pub struct $struct_ident { | |||
metric: &'static $metric_ident, | |||
const_labels: [String; { $const_labels.len() }], | |||
labels: HashSet<[String; { $labels.len() - $const_labels.len() }]>, | |||
non_const_labels_used: HashSet<[String; { $labels.len() - $const_labels.len() }]>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine it is not too difficult to have to first part of this macro to be implemented using the second part of the marco with an empty const label array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it is possible to do so, but wouldn't this add some useless complexity to the metric which don't use the const_labels? (also we would then have to pass an empty array every time we interact with those metrics making the code a bit messy?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are probably ways to remove the empty array argument, but yeah, complicating the macro might not be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(dummy message so I can submit review)
do conversion inside the constructor add test to check deletion of metrics inside CeremonyMetrics strucs
…-> deletion done before as expected
utilities/src/with_std/metrics.rs
Outdated
request_test("metrics", reqwest::StatusCode::OK, "# HELP ceremony_bad_msg Count all the bad msgs processed during a ceremony\n# TYPE ceremony_bad_msg counter\nceremony_bad_msg{chain=\"Chain1\",reason=\"AA\"} 1\n# HELP ceremony_duration Measure the duration of a ceremony in ms\n# TYPE ceremony_duration gauge\nceremony_duration{ceremony_id=\"7\",ceremony_type=\"Keygen\",chain=\"Chain1\"} 999\n# HELP ceremony_msg Count all the processed messages for a given ceremony\n# TYPE ceremony_msg counter\nceremony_msg{ceremony_id=\"7\",ceremony_type=\"Keygen\",chain=\"Chain1\"} 2\n# HELP ceremony_timeout_missing_msg Measure the number of missing messages when reaching timeout\n# TYPE ceremony_timeout_missing_msg gauge\nceremony_timeout_missing_msg{ceremony_id=\"7\",ceremony_type=\"Keygen\",chain=\"Chain1\",stage=\"stage1\"} 5\n# HELP stage_completing Count the number of stages which are completing succesfully by receiving all the messages\n# TYPE stage_completing counter\nstage_completing{chain=\"Chain1\",stage=\"stage1\"} 2\nstage_completing{chain=\"Chain1\",stage=\"stage2\"} 1\n# HELP stage_duration Measure the duration of a stage in ms\n# TYPE stage_duration gauge\nstage_duration{ceremony_id=\"7\",chain=\"Chain1\",phase=\"processing\",stage=\"stage1\"} 78\nstage_duration{ceremony_id=\"7\",chain=\"Chain1\",phase=\"receiving\",stage=\"stage1\"} 780\n# HELP stage_failing Count the number of stages which are failing with the cause of the failure attached\n# TYPE stage_failing counter\nstage_failing{chain=\"Chain1\",reason=\"NotEnoughMessages\",stage=\"stage3\"} 1\n").await; | ||
|
||
//Second request we get only the metrics which don't depend on a specific label like ceremony_id | ||
request_test("metrics", reqwest::StatusCode::OK, "# HELP ceremony_bad_msg Count all the bad msgs processed during a ceremony\n# TYPE ceremony_bad_msg counter\nceremony_bad_msg{chain=\"Chain1\",reason=\"AA\"} 1\n# HELP stage_completing Count the number of stages which are completing succesfully by receiving all the messages\n# TYPE stage_completing counter\nstage_completing{chain=\"Chain1\",stage=\"stage1\"} 2\nstage_completing{chain=\"Chain1\",stage=\"stage2\"} 1\n# HELP stage_failing Count the number of stages which are failing with the cause of the failure attached\n# TYPE stage_failing counter\nstage_failing{chain=\"Chain1\",reason=\"NotEnoughMessages\",stage=\"stage3\"} 1\n").await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an observation, but it is nearly impossible to tell if this output is what we should expect (it is hard to parse and this contains a lot of noise e.g. help messages). Make me want to write our own metrics data structure even more (so we can look at it without encoding).
One thing you can do to improve readability somewhat is to use raw string literals:
request_test("metrics", reqwest::StatusCode::OK,
r#"# HELP ceremony_bad_msg Count all the bad msgs processed during a ceremony
# TYPE ceremony_bad_msg counter
ceremony_bad_msg{chain="Chain1",reason="AA"} 1
# HELP ceremony_duration Measure the duration of a ceremony in ms
# TYPE ceremony_duration gauge
ceremony_duration{ceremony_id="7",ceremony_type="Keygen",chain="Chain1"} 999
# HELP ceremony_msg Count all the processed messages for a given ceremony
# TYPE ceremony_msg counter
ceremony_msg{ceremony_id="7",ceremony_type="Keygen",chain="Chain1"} 2
# HELP ceremony_timeout_missing_msg Measure the number of missing messages when reaching timeout
# TYPE ceremony_timeout_missing_msg gauge
ceremony_timeout_missing_msg{ceremony_id="7",ceremony_type="Keygen",chain="Chain1",stage="stage1"} 5
# HELP stage_completing Count the number of stages which are completing succesfully by receiving all the messages
# TYPE stage_completing counter
stage_completing{chain="Chain1",stage="stage1"} 2
stage_completing{chain="Chain1",stage="stage2"} 1
# HELP stage_duration Measure the duration of a stage in ms
# TYPE stage_duration gauge
stage_duration{ceremony_id="7",chain="Chain1",phase="processing",stage="stage1"} 78
stage_duration{ceremony_id="7",chain="Chain1",phase="receiving",stage="stage1"} 780
# HELP stage_failing Count the number of stages which are failing with the cause of the failure attached
# TYPE stage_failing counter
stage_failing{chain="Chain1",reason="NotEnoughMessages",stage="stage3"} 1
"#
).await;
(Note that there should be no leading whitespace, otherwise the strings wouldn't be identical)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, maybe we could use some simple string manupulation before comparing them, .e.g
[
r#"# HELP ceremony_bad_msg Count all the bad msgs processed during a ceremony"#,
r#"# TYPE ceremony_bad_msg counter"#
].join('\n')
or even skipping the # HELP
and # TYPE
lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is not ideal, that's why I added the check_deleted_metrics function to be sure everything was correctly deleted, still I wanted to check the final output for a request that's why I kept this messy request_test() as well.
Anyway I followed your first suggestion and used the raw string literals, it is much more clear in this way what we are expecting.
utilities/src/with_std/metrics.rs
Outdated
} | ||
|
||
//First request after the ceremony ended we get all the metrics (same as the request above), and after we delete the ones that have no more reason to exists | ||
request_test("metrics", reqwest::StatusCode::OK, "# HELP ceremony_bad_msg Count all the bad msgs processed during a ceremony\n# TYPE ceremony_bad_msg counter\nceremony_bad_msg{chain=\"Chain1\",reason=\"AA\"} 1\n# HELP ceremony_duration Measure the duration of a ceremony in ms\n# TYPE ceremony_duration gauge\nceremony_duration{ceremony_id=\"7\",ceremony_type=\"Keygen\",chain=\"Chain1\"} 999\n# HELP ceremony_msg Count all the processed messages for a given ceremony\n# TYPE ceremony_msg counter\nceremony_msg{ceremony_id=\"7\",ceremony_type=\"Keygen\",chain=\"Chain1\"} 2\n# HELP ceremony_timeout_missing_msg Measure the number of missing messages when reaching timeout\n# TYPE ceremony_timeout_missing_msg gauge\nceremony_timeout_missing_msg{ceremony_id=\"7\",ceremony_type=\"Keygen\",chain=\"Chain1\",stage=\"stage1\"} 5\n# HELP stage_completing Count the number of stages which are completing succesfully by receiving all the messages\n# TYPE stage_completing counter\nstage_completing{chain=\"Chain1\",stage=\"stage1\"} 2\nstage_completing{chain=\"Chain1\",stage=\"stage2\"} 1\n# HELP stage_duration Measure the duration of a stage in ms\n# TYPE stage_duration gauge\nstage_duration{ceremony_id=\"7\",chain=\"Chain1\",phase=\"processing\",stage=\"stage1\"} 78\nstage_duration{ceremony_id=\"7\",chain=\"Chain1\",phase=\"receiving\",stage=\"stage1\"} 780\n# HELP stage_failing Count the number of stages which are failing with the cause of the failure attached\n# TYPE stage_failing counter\nstage_failing{chain=\"Chain1\",reason=\"NotEnoughMessages\",stage=\"stage3\"} 1\n").await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
succesfully -> successfully
* added macro to create gauges that get deleted * added ceremony_duration metric * fixed gauge to handle convertion to i64 * ceremony missing messages on timeout metric added * added chain label to CEREMONY_PROCESSED_MSG, CEREMONY_DURATION, CEREMONY_TIMEOUT_MISSING_MSG * modified macro to support drop (deletion of labels) on all the types of wrapper, passed as a parameter when calling the macro * added STAGE_DURATION metric * use collect_array * added STAGE_COMPLETING/STAGE_FAILING metrics * avoid saving labels already seen (add to the hashset) if we don't drop the metric * fixed missing imports caused by rebasing * fix double imports * avoid using format! and to_string every time -> use clone() * fixed typo * addressed PR comments * use Option for stage/ceremony _start do conversion inside the constructor add test to check deletion of metrics inside CeremonyMetrics strucs * fixed test * cargo fmt * added manual deletion inside tests and make sure it returns an error -> deletion done before as expected * address review comments
* added macro to create gauges that get deleted * added ceremony_duration metric * fixed gauge to handle convertion to i64 * ceremony missing messages on timeout metric added * added chain label to CEREMONY_PROCESSED_MSG, CEREMONY_DURATION, CEREMONY_TIMEOUT_MISSING_MSG * modified macro to support drop (deletion of labels) on all the types of wrapper, passed as a parameter when calling the macro * added STAGE_DURATION metric * use collect_array * added STAGE_COMPLETING/STAGE_FAILING metrics * avoid saving labels already seen (add to the hashset) if we don't drop the metric * fixed missing imports caused by rebasing * fix double imports * avoid using format! and to_string every time -> use clone() * fixed typo * addressed PR comments * use Option for stage/ceremony _start do conversion inside the constructor add test to check deletion of metrics inside CeremonyMetrics strucs * fixed test * cargo fmt * added manual deletion inside tests and make sure it returns an error -> deletion done before as expected * address review comments
* added macro to create gauges that get deleted * added ceremony_duration metric * fixed gauge to handle convertion to i64 * ceremony missing messages on timeout metric added * added chain label to CEREMONY_PROCESSED_MSG, CEREMONY_DURATION, CEREMONY_TIMEOUT_MISSING_MSG * modified macro to support drop (deletion of labels) on all the types of wrapper, passed as a parameter when calling the macro * added STAGE_DURATION metric * use collect_array * added STAGE_COMPLETING/STAGE_FAILING metrics * avoid saving labels already seen (add to the hashset) if we don't drop the metric * fixed missing imports caused by rebasing * fix double imports * avoid using format! and to_string every time -> use clone() * fixed typo * addressed PR comments * use Option for stage/ceremony _start do conversion inside the constructor add test to check deletion of metrics inside CeremonyMetrics strucs * fixed test * cargo fmt * added manual deletion inside tests and make sure it returns an error -> deletion done before as expected * address review comments
* added macro to create gauges that get deleted * added ceremony_duration metric * fixed gauge to handle convertion to i64 * ceremony missing messages on timeout metric added * added chain label to CEREMONY_PROCESSED_MSG, CEREMONY_DURATION, CEREMONY_TIMEOUT_MISSING_MSG * modified macro to support drop (deletion of labels) on all the types of wrapper, passed as a parameter when calling the macro * added STAGE_DURATION metric * use collect_array * added STAGE_COMPLETING/STAGE_FAILING metrics * avoid saving labels already seen (add to the hashset) if we don't drop the metric * fixed missing imports caused by rebasing * fix double imports * avoid using format! and to_string every time -> use clone() * fixed typo * addressed PR comments * use Option for stage/ceremony _start do conversion inside the constructor add test to check deletion of metrics inside CeremonyMetrics strucs * fixed test * cargo fmt * added manual deletion inside tests and make sure it returns an error -> deletion done before as expected * address review comments
Pull Request
Closes: PRO-xxx
Checklist
Please conduct a thorough self-review before opening the PR.
Summary
This PR adds the metric relative to a Ceremony: