-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize System Status app for use at other sites #92
Comments
Would it be best to remove Ganglia and focus on supporting Grafana? |
I would say no, since many sites still utilize Ganglia.
…-----------------------
Alan Chalker, Ph.D.
alanc@osc.edu<mailto:alanc@osc.edu>
614-247-8672
From: Mario Squeo <notifications@github.com>
Sent: Friday, September 25, 2020 10:26 AM
To: OSC/osc-systemstatus <osc-systemstatus@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Subject: Re: [OSC/osc-systemstatus] Generalize System Status app for use at other sites (#92)
Should we remove Ganglia and focus on supporting Grafana only?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https:/github.com/OSC/osc-systemstatus/issues/92*issuecomment-698960281__;Iw!!KGKeukY!kq6-3HtNDPITbzGHjzJtpyRf4mkdyy7kMnTofw8KTr32FVJmbLrlLdNEMnyV$>, or unsubscribe<https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/ABT3M4TMSKCEUUENWJQEYG3SHSR65ANCNFSM4RZTPR3Q__;!!KGKeukY!kq6-3HtNDPITbzGHjzJtpyRf4mkdyy7kMnTofw8KTr32FVJmbLrlLaVZ2Uz6$>.
|
Just remove that. |
What did you have in mind here? I think the MVP might not require this. |
@ericfranz It's not required. Since there's support for customization on the Dashboard, we could bring that functionality here eventually. |
We do have Ganglia but I know next to nothing about it. Though we could probably plug it in. I would vote for make it optional, if possible. What I was mostly after is output from sinfo to see what resources are available at a given time, so that people could decide what cluster and partition to use to get their job running ASAP. E.g. for our simplest cluster, sinfo gives this: We have 2 main partitions, the "lonepeak", with synonym "lonepeak-shared" for jobs that can share a node, and the "owner" partition, which consists of the liu-lp and fischer-lp nodes, their shared synonyms, and the guest access to the owner nodes (lonepeak-guest) and its shared synonym. So, in the simplest case, we could report the status of the "lonepeak" and "lonepeak-guest" partitions (alloc, idle, drain, mix=partially occupied), and, potentially how busy each owner partition is, as sometimes guests target specific owner nodes for smaller chances of preemptions (owner jobs preempt guest jobs). I hope this helps you with the generalization strategy, or possibly make some plug ins for site specific stuff like ours. And feel free to let me know if I can help with anything. |
@mcuma we made a few quick changes to the app that at least now runs at CHPC. Here is a screenshot: If you just get the latest code from the master branch and touch tmp/restart.txt it should run. If you are updating a previously cloned version you will need to Now, that said, as you can see from the screenshot, it just builds these graph for each cluster, not for partitions of a specific cluster. It seems like maybe what you are looking for would be best served by a custom widget when we are able to easily support that type of thing in OnDemand. |
Or maybe we are talking about the same graphs above, but being able to make graphs per partition instead of per cluster, or pick the cluster and partitions to do the graphs for? |
Great, let me try that and let you know how it went. It's looking good enough for now from the screenshot. I may hack around at it to get the two partitions separate (lonepeak, lonepeak-guest) if I get a chance. I should be able to do it from skimming the code. |
I confirm that the System Status works both on our test and production servers. Thanks for getting this fixed so quickly. |
More sites have expressed interest in the System Status app [1], there is still OSC specific code in the latest version. Let's generalize the status app so it can be dropped in and deployed at other sites.
The ideal scenario would be to support all of the adapters that Open OnDemand supports but I think focusing on supporting SLURM clusters is a good place to start.
Todo (WIP):
custom: systemstatus: partitions: [ serial, parallel ]
configuration that if exists, we create a separate graph for each (and then the sinfo/squeue calls are constrained to those partitions)[1] https://discourse.osc.edu/t/system-status-app/1129
The text was updated successfully, but these errors were encountered: