Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[host receiver]: cpu reporting metrics #1534

Open
MovieStoreGuy opened this issue Oct 29, 2024 · 6 comments
Open

[host receiver]: cpu reporting metrics #1534

MovieStoreGuy opened this issue Oct 29, 2024 · 6 comments
Labels

Comments

@MovieStoreGuy
Copy link

Component(s)

receiver/hostmetrics

Is your feature request related to a problem? Please describe.

Currently the host cpu scraper reports the following optional metrics:

  • system.cpu.logical.count - combined total of threads available
  • system.cpu.physical.count - total number of cores available

If you consider the scenario that you are using a system that is a dual socket (or greater) motherboard, this could be somewhat confusing since there is no way to currently count the number of cpu sockets available.

Describe the solution you'd like

The ideal scenarios are:

  1. Improve the documentation around what each metrics is actually capturing with relation to threads and cores
  2. Add a metric to show the number of installed CPU
  3. Update the metric names to be system.cpu.cores.count, system.cpu.threads.count, and system.cpu.count to be clear on what they represent.

Describe alternatives you've considered

I don't think it is common to be running machines that have multiple CPU sockets, so I suspect why this hasn't come up sooner, however, in the current state of things. The other option would be to include a resource tag of the cpu socket name.
(Trying not to confuse myself with cpu, core, and thread)

Additional context

No response

@MovieStoreGuy MovieStoreGuy added the enhancement New feature or request label Oct 29, 2024
Copy link

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@ChrsMark
Copy link
Member

Since those are already part of Semantic Conventions I guess it should be discussed at the Sem Conv level first?

/cc @open-telemetry/semconv-system-approvers

@braydonk
Copy link
Contributor

braydonk commented Oct 30, 2024

I agree generally with system.cpu.cores.count, system.cpu.threads.count, and system.cpu.count. I find that simpler to understand and is more directly related to how the system reports this info to you (see lscpu on Linux for example).

Based on these SemConv naming rules, I think the metric names should be pluralized, since cores, threads, and physical CPUs are a countable quantity and are not UpDownCounters, i.e. system.cpu.cores, system.cpu.threads. system.cpus is a very weird metric name though, so I'm not sure. I will ask what semconv maintainers think, I don't have an opinion other than cpus just looks weird.

In addition, perhaps a system.cpu.sockets metric would be a good idea too. This is a similarly easy piece of info to get from the system.

@mx-psi mx-psi transferred this issue from open-telemetry/opentelemetry-collector-contrib Oct 30, 2024
@braydonk
Copy link
Contributor

Addendum to what I said above.

The CPU count can change on the same host resource (as last came up when we were talking about whether these metrics should be a resource attribute or a timeseries) so these might actually be instrumented as UpDownCounters after all. To summarize for either direction.

Additionally, to get around my cpus weirdness, we could keep the logical vs physical verbiage.

If this is considered a countable entity that falls under pluralization we should do:

  • system.cpu.physical.cores
  • system.cpu.logical.cores
  • system.cpu.threads
  • system.cpu.sockets

Or if these are UpDownCounters/gauges:

  • system.cpu.physical.core.count
  • system.cpu.logical.core.count
  • system.cpu.thread.count
  • system.cpu.socket.count

system.cpu.*.core.count is quite wordy though, so we could also go with what the issue originally suggest, system.cpu.count and system.cpu.core.count. My instinct is that I like the physical and logical verbiage anyway (though we definitely should improve our documentation on what that exactly means) but they are quite wordy with .count at the end.


I think the end state for these metrics is for them to become attributes on the eventual host entity. However, it would be really nice to get this defined well enough that we could adopt it in the collector before entities make it there; right now the original issue is correct that there is a usability problem that I'd like to get a fix for without waiting for Entities to standardize. If anyone has pointers for how these metrics can be designed such that it wouldn't be hard to transition them to entity attributes in the future.

@trask
Copy link
Member

trask commented Oct 30, 2024

here's a prior art just in case that helps: jvm.cpu.count

@braydonk
Copy link
Contributor

Thanks @trask that's good to know. That precedent is probably enough to say that this should be an UpDownCounter timeseries. I was waffling between whether this should be an UpDownCounter or a Gauge but if a similar metric like this is already an UpDownCounter then that may be enough. I also figured there was a potential use case where an operator may want to aggregate a physical CPU count metric among a pool of VMs to maybe aggregate cost, track CPU cores against some platform quota, or something like that. Kind of a niche case, but just because I was able to think of some manner of use case it seemed like it might be enough to call it an UpDownCounter as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

5 participants