WRF-Hydro calibration with HTCondor #257
Replies: 7 comments
-
A little update, we met with @mnfienen and discussed the option of using HTCondor and WRF-Hydro calibration on cloud while benefiting from the existing wrf-hydro calibration workflow. We have few questions:
Thanks! |
Beta Was this translation helpful? Give feedback.
-
I'm adding @jlafonta-usgs, since he's the USGS product owner for all this. Thanks for the update, @arezoorn! |
Beta Was this translation helpful? Give feedback.
-
regarding 3, this option is the best in terms of making sure we could do the work in allocated time. We could finish it in 3 to upmost 4 weeks after getting allocation on Cheyenne. |
Beta Was this translation helpful? Give feedback.
-
@arezoorn re: your original #2 and #3 info, where you were quoting costs, were you referring to 500 basins b/c this is how many you expect to need to calibrate on a platform beyond Denali? or is just the increment for which you derived a cost estimate? |
Beta Was this translation helpful? Give feedback.
-
Just a number to use as a reference at this time, it might help us to keep our end of December deadline, though I am not certain about that. |
Beta Was this translation helpful? Give feedback.
-
k. thx for that clarification. it might be good to start to think about how many basins' worth of work needs to be done off of Denali. |
Beta Was this translation helpful? Give feedback.
-
@ishita9 you should be receiving credentials on our AWS account this morning. Please work with @mnfienen to make sure you're not surprised by any of our rules or our definitely-not-standard account configuration. re: additional funding, things are looking good. Please do not yet start any major spends as out how to actually pay for things. Ishita's experiments on cloud for < $10K are fine, just please keep me updated on that. |
Beta Was this translation helpful? Give feedback.
-
@arezoorn and @ishita9 have been working on calibrating WRF-Hydro for 1200 separate basins on Denali, along w/the Denali help team. Initial estimates for time to complete calibration are turning out to have been a bit optimistic. The main issue seems to be that only a single instance of the workflow can run on a Denali node, which results in under-utilization of CPUs on a node and longer run times. There is also job prioritization and per-user limits, but those are known characteristics of the system. We'll keep digging w/the Denali help team in case there are options that haven't been discussed/tried yet. Next week, we'll talk w/the Denali mgmt team about hopefully upping Quality of Service (QOS) priority for a short period. New process for us so not sure what they'll be able to accommodate.
We're working on a few "Plan B" things that might help us meet the timeline for delivery of the calibrated modeling application, including also leveraging NCAR's Cheyenne platform.
This issue is to record that Arezoo and Ishita will talk with @mnfienen about their workflow, Denali roadblocks, and what it might take to implement their workflow using HTCondor on cloud (including costs).
Beta Was this translation helpful? Give feedback.
All reactions