Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix GPU bug from sif model #719

Merged
merged 4 commits into from
Aug 14, 2024
Merged

fix GPU bug from sif model #719

merged 4 commits into from
Aug 14, 2024

Conversation

AlexisRenchon
Copy link
Member

@AlexisRenchon AlexisRenchon commented Aug 9, 2024

Fix GPU bug from sif model

Closes #719

Closes #717

@AlexisRenchon AlexisRenchon added the bug Something isn't working label Aug 9, 2024
@AlexisRenchon AlexisRenchon requested a review from Sbozzolo August 9, 2024 22:28
@AlexisRenchon
Copy link
Member Author

not sure why buildkite is failing now

ERROR: LoadError: Cannot locate artifact 'era5_land_forcing_data2021' for x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-julia_version+1.10.3 in '/central/scratch/esm/slurm-buildkite/climaland-ci/4745/climaland-ci/Artifacts.toml'

it passed just before, I just squashed.

@AlexisRenchon
Copy link
Member Author

worked on GPU:

[arenchon@hpc-22-32 ClimaLand.jl]$ julia -i --project=.buildkite experiments/benchmarks/land.jl
[ Info: Run: Global RichardsModel
[ Info: Resolution: (101, 15)
[ Info: Timestep: 900.0 s
[ Info: Duration: 21600.0 s
[ Info: Starting profiling
[ Info: Num samples: 28
[ Info: Average time: 14.7 s
[ Info: Max time: 14.8 s
[ Info: Min time: 14.6 s
[ Info: Standard deviation time: 0.00361 s
[ Info: Done profiling
[ Info: Saved compute flame to land_benchmark_gpu/flame_gpu.html
[ Info: Saved allocation flame to land_benchmark_gpu/alloc_flame_gpu.html

@AlexisRenchon
Copy link
Member Author

note this PR contains the change in #717 so we can close it after this is merged

@Sbozzolo Sbozzolo added the Run benchmarks Add this label to run benchmarks on clima label Aug 13, 2024
AlexisRenchon and others added 2 commits August 13, 2024 13:02
Since LAI changes infrequently, use a dataset with only weekly data so that we read data less frequently.
This new data set is created in this PR: https://github.com/CliMA/ClimaArtifacts/pull/5/files and on the Caltech cluster.

Co-authored-by: AlexisRenchon <a.renchon@gmail.com>
Co-authored-by: SBozzolo <gbozzola@caltech.edu>
Co-authored-by: kmdeck <kdeck@caltech.edu>
The implementation of the SIF model in the Canopy module broke GPU compatibility. This commit fixes this bug.

Co-authored-by: AlexisRenchon <a.renchon@gmail.com>
Co-authored-by: SBozzolo <gbozzola@caltech.edu>
Co-authored-by: kmdeck <kdeck@caltech.edu>
@AlexisRenchon
Copy link
Member Author

@kmdeck

Hello Kat!
Can you look at this PR? More specifically, the error related to the albedo Artifact...

This PR was passing CI at some point, then there was a problem with not finding the albedo file in Caltech box so we changed it, but now one of the test fail

@kmdeck
Copy link
Member

kmdeck commented Aug 14, 2024

@kmdeck

Hello Kat! Can you look at this PR? More specifically, the error related to the albedo Artifact...

This PR was passing CI at some point, then there was a problem with not finding the albedo file in Caltech box so we changed it, but now one of the test fail

I am taking a look now. I dont understand what happened with the file, but there are still NaNs present. maybe that is why it isnt passing? https://github.com/CliMA/ClimaLand.jl/actions/runs/10378241417/job/28734087789?pr=719#step:6:581

@AlexisRenchon
Copy link
Member Author

@kmdeck
Hello Kat! Can you look at this PR? More specifically, the error related to the albedo Artifact...
This PR was passing CI at some point, then there was a problem with not finding the albedo file in Caltech box so we changed it, but now one of the test fail

I am taking a look now. I dont understand what happened with the file, but there are still NaNs present. maybe that is why it isnt passing? https://github.com/CliMA/ClimaLand.jl/actions/runs/10378241417/job/28734087789?pr=719#step:6:581

does this look ok?
c546593

I think there's two albedo files, and we weren't sure which one to pick

sw_albedo_Amon_CESM2_historical_r1i1p1f1_gn_185001-201412_v2_no-nans.nc
And
sw_albedo_Amon_CESM2_historical_r1i1p1f1_gn_185001-201412_v2.nc

I am surprised the "no_nans" actually has nans.

The albedo artifact was broken, this commit updates the albedo file to
sw_albedo_Amon_CESM2_historical_r1i1p1f1_gn_185001-201412_v2_no-nans.nc

Co-authored-by: AlexisRenchon <a.renchon@gmail.com>
Co-authored-by: SBozzolo <gbozzola@caltech.edu>
@kmdeck kmdeck merged commit 89b329c into main Aug 14, 2024
7 of 9 checks passed
@AlexisRenchon AlexisRenchon deleted the ar/sif_GPU branch October 21, 2024 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Run benchmarks Add this label to run benchmarks on clima
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants