Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: metrics NG #22

Merged
merged 17 commits into from
Jul 8, 2024
Merged

feat: metrics NG #22

merged 17 commits into from
Jul 8, 2024

Conversation

xDarksome
Copy link
Member

Description

Replaces metrics crate with the new one built on top of metrics facade.

Copy-pasted crate doc:

Alternative API facade to the [`metrics`] backend.

Priorities: performance, ergonomics (in that order).

Metrics reporting should be as fast as possible, without substantially
hurting the code ergonomics.

A trivial atomic counter increment MUST NOT allocate stuff on the heap
(looking at you [`metrics::counter`]) and it SHOULD NOT acquire locks or
do [`HashMap`](std::collections::HashMap) lookups unless absolutely
necessary.

Additionally:

  • remove metrics example, examples are now in metrics crate docs
  • update alloc example to use new metrics

How Has This Been Tested?

Some unit tests / doc tests inside metrics crate itself, plus going to test in the relay

Due Diligence

  • Breaking change
  • Requires a documentation update
  • Requires a e2e/integration test update

@xDarksome xDarksome self-assigned this Jul 1, 2024
type Target = M;

fn resolve_labels(&self, (label,): (EnumLabel<NAME, T>,)) -> &M {
let debug_panic = || {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this outside the function body?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It's only being used inside this function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. Maybe it's just me, but it feels weird to have these embedded util functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An item should have the smallest possible scope so you don't have a question "where else this thing is being used"

{
// TODO: Switch to `[(T, M); T::VARIANT_COUNT]` once generic parameters are
// allowed to be used in const expressions.
type MetricCollection = Vec<(T, M)>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR is focused on micro-optimizations, we should consider stack-allocated SmallVec for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't be on the stack anyway, as it's always static. And it will only allocate once per metric during the initialization
I guess it will remove a single indirection (heap pointer) for happy-path scenarios

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the idea is to reduce indirection and allocations on push/resize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some benchmarking and the difference is around a rounding error, however Vec allows to allocate exactly the amount we need, as we know all the label values beforehand.

};

let idx = label.0.ordinal();
let mut idx = if idx < 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use debug_assert!(idx >= 0)?

Copy link
Member Author

@xDarksome xDarksome Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I'm also doing idx = 0 in the branch, so the release build won't crash

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be replaced by idx.max(0).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check that the index is in [0, vec.len()) range

idx as usize
};

if idx >= self.collection.len() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I'd use debug_assert!() instead of custom panic macro/function.

}

pub struct StringCollection<T, M: 'static> {
inner: ArcSwap<HashMap<T, &'static M>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you're migrating the relay, can you guesstimate the highest cardinality that such StringCollection can have? If it's pretty low, maybe it'd make sense to replace HashMap with SmallVec.

Copy link
Member Author

@xDarksome xDarksome Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would depend on the application and whether people ab(using) this instead of using proper EnumLabels.
But for the relay the only "valid" use-cases are HTTP routes and status codes, which makes the cardinality <10

Probably makes sense to implement both versions, but let's try SmallVec

Copy link
Collaborator

@heilhead heilhead Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for the relay the only "valid" use-cases are HTTP routes and status codes, which makes the cardinality <10

If it's route_name * response_code it'd likely be more than 10. Not sure if it's worth it. Can you double check the existing HTTP route metrics to confirm cardinality?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different labels use separate collections, so it won't multiply them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely have <10 http routes in the relay

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark results: Binary search on Vec/SmallVec is 2x faster for 10 elements, and the same as HashMap lookup for 20 elements.
After that the performance falls significantly, so I suggest we leave HashMap here, as StringCollection is not the recommended way to implement labels anyway and it should support all other edge-cases.

let mut inner_clone: HashMap<_, _> = (**inner).clone();

let name = const { resolve_label_name::<NAME>() };
let label_ = Label::new(name, label.to_owned().to_string());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with the _ suffix?

Copy link
Member Author

@xDarksome xDarksome Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have the brain power to come up with a different name, basically there are two labels in scope and both of them are being used later. It's similar to the math notation when you have variable a and a'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks weird...

Copy link
Collaborator

@heilhead heilhead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, though I left a few suggestions.

The main thing missing is tests. Can we also cover the Box::leak stuff with some allocation tests to confirm there's no unsound or memory leak scenario?

@xDarksome
Copy link
Member Author

xDarksome commented Jul 3, 2024

@heilhead converting Duration to milliseconds everywhere is PITA, as metrics already supports Duration directly, but converts it to f64 seconds.

As we are going to update the graphs anyway, I'm removing the conversion here and from the relay, so Duration is always being represented in seconds.

Copy link
Collaborator

@heilhead heilhead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@xDarksome xDarksome merged commit af0464e into main Jul 8, 2024
6 checks passed
@xDarksome xDarksome deleted the feat/metrics-ng branch July 8, 2024 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants