-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First pass at tracking system metrics #48
base: main
Are you sure you want to change the base?
Conversation
This spawns a background thread to track common system metrics; currently only tracks a few, but we can add more as needed. I evaluated a few different metrics: - [opentelemetry-system-metrics](https://crates.io/crates/opentelemetry-system-metrics) - [sys_metrics](https://crates.io/crates/sys_metrics) - [heim](https://github.com/heim-rs/heim?tab=readme-ov-file) Of these, sys_metrics doesn't support windows, but is the most actively maintained. As a follow up, it might be useful to track Tokio runtime metrics as well: https://docs.rs/tokio/latest/tokio/runtime/struct.RuntimeMetrics.html
I should also note that this follows naming conventions as specified here: https://opentelemetry.io/docs/specs/semconv/general/metrics/#general-metric-semantic-conventions |
c66cfbd
to
32e8d9e
Compare
Do we want to track system-level metrics from within the process? Would it not be simpler and more flexible to let the users reap those from the underlying system, using whatever makes sense for them? |
I have two main thoughts here, though I'll defer to @KtorZ as he's the one that requested this.
|
That definitely also crossed my mind, and was part of the question when we talked about it on Discord. Yet, I also agree with @Quantumplation that it's a good complement because we can make metrics available that are runtime-specific rather than being process-specific. I still expect anyone doing ops to have their preferred ways of monitoring processes. But we can at least provide some simple metrics (if only for development) as well as some more fine-grained ones that aren't immediately observable from a process. |
@@ -39,7 +40,7 @@ async fn main() -> miette::Result<()> { | |||
let args = Cli::parse(); | |||
|
|||
let result = match args.command { | |||
Command::Daemon(args) => cmd::daemon::run(args, counter).await, | |||
Command::Daemon(args) => cmd::daemon::run(args, counter, metrics.clone()).await, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this counter
was merely me toying around passing a metric counter all-the-way down the ledger. What I had in mind for this was to become some sort of interface / handle to metrics in general; possibly abstract behind traits and driven by the tracing setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I had the same thought; I thought about trying to refactor it as part of this, but figured a lighter touch at least for the draft was better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I can take a pass at something like that in a follow up PR, if you want!)
This should make conditional compilation easier, since we can enable/disable the module as a whole
This spawns a background thread to track common system metrics; currently only tracks a few, but we can add more as needed.
I evaluated a few different metrics:
Of these, sys_metrics doesn't support windows, but is the most actively maintained.
As a follow up, it might be useful to track Tokio runtime metrics as well:
https://docs.rs/tokio/latest/tokio/runtime/struct.RuntimeMetrics.html
Opening this as a draft PR, looking for feedback on the structure / organization before I write a few more tests and add a few more metrics.