WRF-Hydro calibration with HTCondor #257

rviger-usgs · 2022-11-14T22:58:29Z

rviger-usgs
Nov 14, 2022

@arezoorn and @ishita9 have been working on calibrating WRF-Hydro for 1200 separate basins on Denali, along w/the Denali help team. Initial estimates for time to complete calibration are turning out to have been a bit optimistic. The main issue seems to be that only a single instance of the workflow can run on a Denali node, which results in under-utilization of CPUs on a node and longer run times. There is also job prioritization and per-user limits, but those are known characteristics of the system. We'll keep digging w/the Denali help team in case there are options that haven't been discussed/tried yet. Next week, we'll talk w/the Denali mgmt team about hopefully upping Quality of Service (QOS) priority for a short period. New process for us so not sure what they'll be able to accommodate.

We're working on a few "Plan B" things that might help us meet the timeline for delivery of the calibrated modeling application, including also leveraging NCAR's Cheyenne platform.

This issue is to record that Arezoo and Ishita will talk with @mnfienen about their workflow, Denali roadblocks, and what it might take to implement their workflow using HTCondor on cloud (including costs).

arezoorn · 2022-11-17T19:08:05Z

arezoorn
Nov 17, 2022

A little update, we met with @mnfienen and discussed the option of using HTCondor and WRF-Hydro calibration on cloud while benefiting from the existing wrf-hydro calibration workflow. We have few questions:

Ishita would need to have access to the cloud to see whether she can actually modify the workflow to fit cloud or not. How easy or difficult is to get access? And how long does it take time? Who should we get in touch to giver her access to the USGS cloud resources (if I can call it that!)?
The cost for cloud turns out to be roughly 70 to 100 K for 500 basins to calibrate. Not sure if that is a number we want to consider for digging into the cloud solution.
The cost on Cheyenne is a little cheaper, it is more like 50 K for 500 basins. Roland is looking into the option of purchasing hours from CISL for Cheyenne.

Thanks!
Arezoo

0 replies

rviger-usgs · 2022-11-17T19:14:52Z

rviger-usgs
Nov 17, 2022
Author

@ishita9 I'll initiate getting your credentials added to our AWS account. Don't think it'll be a big effort. We'll need to make sure we also get you the label for the project so costs are appropriately covered. Please watch your USGS email account for next steps (I'm guessing that is what the group that controls our account will use as a default communication channel).
we'll hold off on deciding whether to pursue this for a little bit; hoping Ishita's initial explorations firm up the budget on this. @mnfienen also mentioned that adjustment of workflow for AWS (mostly to leverage HTCondor) might also help optimize execution on Denali, which would be nice!
Okay, I'll presume beyond getting increased QOS priority on Denali, this is your preferred option/recommendation (and yes, starting to look at what might need to happen re: funding regarding this option).

I'm adding @jlafonta-usgs, since he's the USGS product owner for all this.

Thanks for the update, @arezoorn!

0 replies

arezoorn · 2022-11-17T19:47:09Z

arezoorn
Nov 17, 2022

regarding 3, this option is the best in terms of making sure we could do the work in allocated time. We could finish it in 3 to upmost 4 weeks after getting allocation on Cheyenne.

0 replies

rviger-usgs · 2022-11-17T20:00:22Z

rviger-usgs
Nov 17, 2022
Author

@arezoorn re: your original #2 and #3 info, where you were quoting costs, were you referring to 500 basins b/c this is how many you expect to need to calibrate on a platform beyond Denali? or is just the increment for which you derived a cost estimate?

0 replies

arezoorn · 2022-11-17T20:05:20Z

arezoorn
Nov 17, 2022

Just a number to use as a reference at this time, it might help us to keep our end of December deadline, though I am not certain about that.

0 replies

rviger-usgs · 2022-11-17T20:07:15Z

rviger-usgs
Nov 17, 2022
Author

k. thx for that clarification. it might be good to start to think about how many basins' worth of work needs to be done off of Denali.

0 replies

rviger-usgs · 2022-11-18T16:39:21Z

rviger-usgs
Nov 18, 2022
Author

@ishita9 you should be receiving credentials on our AWS account this morning. Please work with @mnfienen to make sure you're not surprised by any of our rules or our definitely-not-standard account configuration.

re: additional funding, things are looking good. Please do not yet start any major spends as out how to actually pay for things. Ishita's experiments on cloud for < $10K are fine, just please keep me updated on that.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WRF-Hydro calibration with HTCondor #257

{{title}}

Replies: 7 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

WRF-Hydro calibration with HTCondor #257

rviger-usgs Nov 14, 2022

Replies: 7 comments

arezoorn Nov 17, 2022

rviger-usgs Nov 17, 2022 Author

arezoorn Nov 17, 2022

rviger-usgs Nov 17, 2022 Author

arezoorn Nov 17, 2022

rviger-usgs Nov 17, 2022 Author

rviger-usgs Nov 18, 2022 Author

rviger-usgs
Nov 14, 2022

arezoorn
Nov 17, 2022

rviger-usgs
Nov 17, 2022
Author

arezoorn
Nov 17, 2022

rviger-usgs
Nov 17, 2022
Author

arezoorn
Nov 17, 2022

rviger-usgs
Nov 17, 2022
Author

rviger-usgs
Nov 18, 2022
Author