Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scripts/lbnl_hw.nhc: nhc_hw_gather_data is slow at parsing /proc/cpuinfo #149

Open
sdiak opened this issue Apr 5, 2024 · 3 comments
Open

Comments

@sdiak
Copy link

sdiak commented Apr 5, 2024

Hello,

on a cluster that I have installed for a client, the function nhc_hw_gather_data() spends a lot of time parsing /proc/cpuinfo.

[root@<redacted> ~]# time nhc

real    0m9.824s
user    0m1.532s
sys     0m8.232s

With the proposed patch, the time is now reduced by a factor of 5+

[root@<redacted>~]# time nhc

real    0m1.692s
user    0m1.563s
sys     0m0.105s

The patch works by caching the content of /proc/cpuinfo in a local variable

Informations :

  • OS : Red Hat Enterprise Linux release 8.8 (Ootpa)
  • Kernel : 4.18.0-477.27.1.el8_8.x86_64
  • CPU : Dual-socket with AMD EPYC 9654 96-Core Processor

0001-Makes-proc-cpuinfo-parsing-faster-by-caching-the-con.patch

@OleHolmNielsen
Copy link

We have similar nodes, and NHC is somewhat faster:

time nhc

real 0m3.018s
user 0m0.253s
sys 0m2.084s

OS: Rocky Linux 8.9 (Green Obsidian)
Kernel: 4.18.0-513.18.1.el8_9.x86_64
CPU: Dual-socket AMD EPYC 9474F 48-Core Processor

@sdiak
Copy link
Author

sdiak commented Apr 5, 2024

This is running on a IDLE node where CPUs are set to powersave

@sdiak
Copy link
Author

sdiak commented Apr 5, 2024

This is a problem on this given site because CPU throttling when a power supply is lost makes slurm draining the node :

  • without the patch : nhc times out and slurm drains the node
  • with the patch : time nhc is 19.992s and the node stays in production with degraded performances

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants