add GPU DYAMOND runs #659

juliasloan25 · 2024-03-01T05:40:21Z

Purpose

closes #658

Only adding a longrun, no shortrun.
This run exceeds the memory available on P100s. Caltech's V100s have 16GB and 32 GB options, neither of which is large enough for this job, according to https://www.hpc.caltech.edu/resources. Instead of running on central like the rest of the longruns, this job will run on clima (which has A100s with 80GB of memory). I've opened an issue to address the allocations seen in this run: #683

view run on buildkite here: https://buildkite.com/clima/climacoupler-longruns/builds/480#_

Content

add config file based on config/longrun_configs/dyamond_target.yml for longrun
- set anim: false for gpu-compatibility
add longrun using GPU

I have read and checked the items on the review checklist.

LenkaNovak

Is there a way we could request an H100 for this job only? I don't think the allocation enhancements will be addressed anytime soon. If that's possible, I would suggest commenting out the regular CI job for now, but retaining the longrun one, which we only run once a week on Sundays.

juliasloan25 · 2024-03-05T00:38:06Z

Is there a way we could request an H100 for this job only? I don't think the allocation enhancements will be addressed anytime soon. If that's possible, I would suggest commenting out the regular CI job for now, but retaining the longrun one, which we only run once a week on Sundays.

ClimaAtmos has a separate buildkite pipeline that runs target GPU simulations on clima (see the runs and the pipeline.yml itself). I can implement the same thing for us

LenkaNovak · 2024-03-05T05:33:46Z

Is there a way we could request an H100 for this job only? I don't think the allocation enhancements will be addressed anytime soon. If that's possible, I would suggest commenting out the regular CI job for now, but retaining the longrun one, which we only run once a week on Sundays.

ClimaAtmos has a separate buildkite pipeline that runs target GPU simulations on clima (see the runs and the pipeline.yml itself). I can implement the same thing for us

Does this allow us to specify the hardware for just one run though?

juliasloan25 · 2024-03-05T23:18:30Z

Is there a way we could request an H100 for this job only? I don't think the allocation enhancements will be addressed anytime soon. If that's possible, I would suggest commenting out the regular CI job for now, but retaining the longrun one, which we only run once a week on Sundays.

ClimaAtmos has a separate buildkite pipeline that runs target GPU simulations on clima (see the runs and the pipeline.yml itself). I can implement the same thing for us

Does this allow us to specify the hardware for just one run though?

No, it would be a separate pipeline where this job would be run. I think this will be useful for GPU scaling runs too

LenkaNovak

LGTM, thank you, @juliasloan25. Just had a question about the sim length.

LenkaNovak · 2024-03-09T02:30:30Z

config/longrun_configs/gpu_dyamond_target.yml

+monthly_checkpoint: false
+run_name: "gpu_dyamond_target"
+start_date: "19790301"
+t_end: "1days"


If it's a long run, could we run it for longer (e.g. 50 days) or does the simulation crash? 👀

LenkaNovak self-requested a review March 4, 2024 18:33

LenkaNovak reviewed Mar 4, 2024

View reviewed changes

juliasloan25 force-pushed the js/gpu-dyamond branch from 764c0ae to bc282ea Compare March 5, 2024 00:41

juliasloan25 force-pushed the js/gpu-dyamond branch 2 times, most recently from 4feeb75 to 2b7b7c1 Compare March 6, 2024 02:02

add GPU DYAMOND run

fc75f4b

juliasloan25 force-pushed the js/gpu-dyamond branch from d7ec7c8 to fc75f4b Compare March 8, 2024 23:08

juliasloan25 requested a review from LenkaNovak March 9, 2024 00:35

LenkaNovak approved these changes Mar 9, 2024

View reviewed changes

juliasloan25 merged commit f784726 into main Mar 9, 2024
9 checks passed

juliasloan25 deleted the js/gpu-dyamond branch March 9, 2024 04:51

juliasloan25 mentioned this pull request Mar 11, 2024

extend GPU DYAMOND run length #685

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add GPU DYAMOND runs #659

add GPU DYAMOND runs #659

juliasloan25 commented Mar 1, 2024 •

edited

Loading

LenkaNovak left a comment

juliasloan25 commented Mar 5, 2024

LenkaNovak commented Mar 5, 2024 •

edited

Loading

juliasloan25 commented Mar 5, 2024

LenkaNovak left a comment

LenkaNovak Mar 9, 2024

add GPU DYAMOND runs #659

add GPU DYAMOND runs #659

Conversation

juliasloan25 commented Mar 1, 2024 • edited Loading

Purpose

Content

LenkaNovak left a comment

Choose a reason for hiding this comment

juliasloan25 commented Mar 5, 2024

LenkaNovak commented Mar 5, 2024 • edited Loading

juliasloan25 commented Mar 5, 2024

LenkaNovak left a comment

Choose a reason for hiding this comment

LenkaNovak Mar 9, 2024

Choose a reason for hiding this comment

juliasloan25 commented Mar 1, 2024 •

edited

Loading

LenkaNovak commented Mar 5, 2024 •

edited

Loading