You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a group of metrics cpu-stats with metrics average_load_<min>_minute where <min>::=one|five|fifteen.
It shows average over the given time span and also among all the cores.
It is sometime happens that although validator is very busy doing something, only few cores are loaded and all the rest are idle.
Currently, it is impossible to detect this situation with the given metric.
Proposed Solution
So it seems that with the current metric we average in two dimensions -- cores and time.
Here, we are interested in cores. Would be nice to have a simple metrics that could allow to measure distribution of the load among cores. Maybe number of busy cores (for example, load > 85%) and number of idle cores (load < 5%), not quite sure about the best practice here. Histogram is complicated to track, would be better to have 1-2 numbers instead.
Problem
We have a group of metrics
cpu-stats
with metricsaverage_load_<min>_minute
where<min>::=one|five|fifteen
.It shows average over the given time span and also among all the cores.
It is sometime happens that although validator is very busy doing something, only few cores are loaded and all the rest are idle.
Currently, it is impossible to detect this situation with the given metric.
Proposed Solution
So it seems that with the current metric we average in two dimensions -- cores and time.
Here, we are interested in cores. Would be nice to have a simple metrics that could allow to measure distribution of the load among cores. Maybe number of busy cores (for example, load > 85%) and number of idle cores (load < 5%), not quite sure about the best practice here. Histogram is complicated to track, would be better to have 1-2 numbers instead.
@yihau
The text was updated successfully, but these errors were encountered: