Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monitoring] Report pressure stall information in metrics (Linux) #41054

Open
andrewkroh opened this issue Sep 30, 2024 · 1 comment
Open

[Monitoring] Report pressure stall information in metrics (Linux) #41054

andrewkroh opened this issue Sep 30, 2024 · 1 comment
Labels
enhancement needs_team Indicates that the issue/PR needs a Team:* label

Comments

@andrewkroh
Copy link
Member

Describe the enhancement:

Report Linux Pressure Stall Information (PSI) metrics in the beat metrics. Include PSI info when

Describe a specific use case for the enhancement or feature:

When reviewing diagnostic snapshots this information would be used to detect if CPU, Memory, or IO pressure could be causing processing to stall. If this is occurring a lot (i.e. high percentage over avg300) then this would be strong signal that some aspect of the host is overloaded. And that might be a cause of the reported issue.


Here's are some example of reading PSI info from the CLI.

System Level PSI

cat /proc/pressure/{cpu,io,irq,memory}
some avg10=0.09 avg60=0.23 avg300=0.34 total=29210385599
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
some avg10=0.00 avg60=0.00 avg300=0.00 total=2022562678
full avg10=0.00 avg60=0.00 avg300=0.00 total=1843728046
full avg10=0.00 avg60=0.00 avg300=0.00 total=16487083525
some avg10=0.00 avg60=0.00 avg300=0.00 total=2107930
full avg10=0.00 avg60=0.00 avg300=0.00 total=1986722

Cgroup V2 PSI

If the host is using cgroup v2 and the process is a member of a cgroup, then we can get PSI information scoped to the tasks in the group.

cat /etc/mtab | grep cgroup
cgroup /sys/fs/cgroup cgroup2 ro,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0

ecf21fadb472:~# cat /proc/self/cgroup
0::/

cat /sys/fs/cgroup/{cpu,io,irq,memory}.pressure 
some avg10=0.00 avg60=0.00 avg300=0.00 total=41743102
full avg10=0.00 avg60=0.00 avg300=0.00 total=40889871
some avg10=0.01 avg60=0.09 avg300=0.03 total=63542837
full avg10=0.01 avg60=0.09 avg300=0.03 total=63432991
full avg10=0.00 avg60=0.00 avg300=0.00 total=64963754
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 30, 2024
@botelastic
Copy link

botelastic bot commented Sep 30, 2024

This issue doesn't have a Team:<team> label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement needs_team Indicates that the issue/PR needs a Team:* label
Projects
None yet
Development

No branches or pull requests

1 participant