You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in Harvard IQSS, we are working in creating a metrics widget for Open OnDemand.
After evaluating different datasources for the metrics, we have settled for getting the metrics directly from Slurm.
As a proof of concept, we created an script that uses sacct command to extract the relevant historic data from slurm to build the required metrics for our new dashboard widget. The historic data will be for a 30 days, 7 days and last 24 periods
A summary of our metrics requirements:
Number of CPU/GPU jobs by State: Completed, Timeout, Canceled, Out of Memory, Failed
CPU Metrics: Average Used, Average Efficiency, Average Allocated, Total Walltime
GPU Metrics: Average Allocated, Total Usage
Memory Metrics: Average Used, Average Efficiency, Average Allocated, Total Used
Time Metrics: Average Used, Average Efficiency, Average Allocated, Average Waiting Time
As an OnDemand prototype, we have created a Slurm adapter extensions via a monkey patch to execute the relevant sacct command. As well as a widget to calculate the metrics and display the results.
We will be creating a PR for the Slurm adapter changes to support this sacct command as we think it would be useful for other institutions.
The text was updated successfully, but these errors were encountered:
in Harvard IQSS, we are working in creating a metrics widget for Open OnDemand.
After evaluating different datasources for the metrics, we have settled for getting the metrics directly from
Slurm
.As a proof of concept, we created an script that uses
sacct
command to extract the relevant historic data from slurm to build the required metrics for our new dashboard widget. The historic data will be for a 30 days, 7 days and last 24 periodsA summary of our metrics requirements:
As an OnDemand prototype, we have created a
Slurm
adapter extensions via a monkey patch to execute the relevantsacct
command. As well as a widget to calculate the metrics and display the results.We will be creating a PR for the Slurm adapter changes to support this
sacct
command as we think it would be useful for other institutions.The text was updated successfully, but these errors were encountered: