-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First pass at tracking system metrics #48
Open
Quantumplation
wants to merge
6
commits into
main
Choose a base branch
from
pi/sys-metrics
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 2 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
2e56f9f
First pass at tracking system metrics
Quantumplation 32e8d9e
chore: clippy fixes
Quantumplation d5a0185
Merge branch 'main' into pi/sys-metrics
Quantumplation 2d86c88
Disable system metrics on windows until the dependency supports windows
Quantumplation 96abd6e
Fix windows build
Quantumplation 0b257ef
Move internals into submodule
Quantumplation File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
use std::time::Duration; | ||
|
||
use miette::IntoDiagnostic; | ||
use opentelemetry::{ | ||
metrics::{Counter, Gauge, MeterProvider}, | ||
KeyValue, | ||
}; | ||
use opentelemetry_sdk::metrics::SdkMeterProvider; | ||
use sys_metrics::{ | ||
cpu::{CpuTimes, LoadAvg}, | ||
memory::Memory, | ||
}; | ||
use tokio::task::JoinHandle; | ||
use tracing::warn; | ||
|
||
pub fn track_system_metrics(metrics: SdkMeterProvider) -> JoinHandle<()> { | ||
tokio::spawn(async move { | ||
let counters = make_system_counters(metrics); | ||
let mut delay = Duration::from_secs(1); | ||
loop { | ||
// TODO(pi): configurable parameter? | ||
tokio::time::sleep(delay).await; | ||
|
||
let reading = match get_reading() { | ||
Ok(sys) => sys, | ||
Err(err) => { | ||
warn!("failed to read system metrics: {}", err); | ||
// Back off slightly so the logs aren't as noisy | ||
delay *= 2; | ||
if delay > Duration::from_secs(30) { | ||
delay = Duration::from_secs(30); | ||
} | ||
continue; | ||
} | ||
}; | ||
delay = Duration::from_secs(1); | ||
|
||
record_system_metrics(reading, &counters); | ||
} | ||
}) | ||
} | ||
|
||
struct SystemCounters { | ||
total_memory: Gauge<u64>, | ||
free_memory: Gauge<u64>, | ||
cpu_load: Gauge<f64>, | ||
user_time: Counter<u64>, | ||
} | ||
|
||
#[derive(Debug)] | ||
struct Reading { | ||
memory: Memory, | ||
cpu: CpuTimes, | ||
load: LoadAvg, | ||
} | ||
|
||
fn make_system_counters(metrics: SdkMeterProvider) -> SystemCounters { | ||
// TODO: standardize with the Haskell node somehow? | ||
let meter = metrics.meter("system"); | ||
let total_memory = meter | ||
.u64_gauge("memory.limit") | ||
.with_description("The total system memory, updated once per second") | ||
.with_unit("MB") | ||
.build(); | ||
|
||
let free_memory = meter | ||
.u64_gauge("memory.usage") | ||
.with_description("The free system memory, measured once per second") | ||
.with_unit("MB") | ||
.build(); | ||
|
||
let cpu_load = meter | ||
.f64_gauge("cpu.utilization") | ||
.with_description("the 1m average load, measured once per second") | ||
.build(); | ||
|
||
let user_time = meter | ||
.u64_counter("cpu.time") | ||
.with_description("the total cpu time spent in user processes") | ||
.with_unit("ms") | ||
.build(); | ||
|
||
SystemCounters { | ||
total_memory, | ||
free_memory, | ||
cpu_load, | ||
user_time, | ||
} | ||
} | ||
|
||
fn get_reading() -> miette::Result<Reading> { | ||
use sys_metrics::*; | ||
let memory = memory::get_memory().into_diagnostic()?; | ||
let cpu = cpu::get_cputimes().into_diagnostic()?; | ||
let load = cpu::get_loadavg().into_diagnostic()?; | ||
|
||
Ok(Reading { memory, cpu, load }) | ||
} | ||
|
||
fn record_system_metrics(reading: Reading, counters: &SystemCounters) { | ||
counters.total_memory.record(reading.memory.total, &[]); | ||
counters.free_memory.record(reading.memory.free, &[]); | ||
counters.cpu_load.record(reading.load.one, &[]); | ||
counters | ||
.user_time | ||
.add(reading.cpu.user, &[KeyValue::new("state", "user")]); | ||
counters | ||
.user_time | ||
.add(reading.cpu.system, &[KeyValue::new("state", "system")]); | ||
} | ||
|
||
#[cfg(test)] | ||
mod tests { | ||
use super::*; | ||
|
||
#[test] | ||
fn can_read_system_metrics() { | ||
let reading = get_reading().expect("failed to read system metrics"); | ||
assert!(reading.memory.free > 0, "failed to read free memory"); | ||
assert!(reading.memory.total > 0, "failed to read total memory"); | ||
assert!(reading.cpu.user > 0, "failed to read user cpu time"); | ||
assert!(reading.load.one > 0.0, "failed to read cpu load average"); | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this
counter
was merely me toying around passing a metric counter all-the-way down the ledger. What I had in mind for this was to become some sort of interface / handle to metrics in general; possibly abstract behind traits and driven by the tracing setup.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I had the same thought; I thought about trying to refactor it as part of this, but figured a lighter touch at least for the draft was better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I can take a pass at something like that in a follow up PR, if you want!)