Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support global-workflow using Rocky 8 on CSPs #2998

Merged
merged 37 commits into from
Dec 24, 2024

Conversation

weihuang-jedi
Copy link
Contributor

@weihuang-jedi weihuang-jedi commented Oct 10, 2024

Description

With ParallelWorks now default Rocky 8 on CSPs, and move to Rocky 8 only after 1/1/2025,
we need to modify global-workflow module files to use Rocky 8 supported spack-stack,
and test compile and run to make sure all works under Rocky 8.

i) Rocky 8 update new features:

a. Wave worked in C48_S2SWA_gefs case, so turn SUPPORT_WAVES to "YES" in awspw.yaml.
Actually, if we did not set SUPPORT_WAVES to "YES", setup_expt.py will rise exception.

b. Using two type of nodes (chips/queues) on AWS, compute/process, where forecasts run in "compute" queue,
which is a big node (more cores), others run in "process" queue, which has small node (less cores).

ii) Rocky 8 update needs the following submodules at or newer than the tags below.

  1. gfs_utils:

commit 4848ecbb5e713b16127433e11f7d3edc6ac784c4 (HEAD, origin/develop, origin/HEAD, develop)
Author: Wei Huang wei.huang@noaa.gov
Date: Fri Oct 18 10:41:25 2024 -0600

Make gfs-utils compile on CSPs with Rocky 8 (#81)

Support Rocky 8 on CSPs.
  1. ufs_utils:

commit 23237610845c3a4438b21b25e9b3dc25c4c15b73 (HEAD)
Author: Wei Huang wei.huang@noaa.gov
Date: Wed Oct 9 11:55:13 2024 -0600

Support UFS_UTILS on CSPs under Rocky 8 (#989)

Fixes #982.
  1. upp:

commit 66a422db80ea129dd87285fe6e811d4b6e1fe29b (HEAD)
Author: Wei Huang wei.huang@noaa.gov
Date: Wed Oct 2 14:38:22 2024 -0600

Make UPP works with Rocky 8 on CSPs (#1034)

* Make UPP works with Rocky 8 on CSPs

* Remove unneeded path

* simplify modulefile
  1. ufs_model:

commit 29c2703c715ebdb47bbd4bcc811db340eae530e5 (HEAD)
Author: Cameron Book 43379611+ulmononian@users.noreply.github.com
Date: Tue Nov 12 13:08:12 2024 -0800

Add developmental test cases: idealized baroclinic wave and 2020 July CAPE cases + https://github.com/ufs-community/ufs-weather-model/pull/2459 (#2461)

* UFSWM - Add tests-dev ATM-only idealized dry baroclinic wave test and a 2020 July CAPE case
* UFSWM - Update modulefile to support Rocky 8 on CSPs, with ParallelWorks

---------

Co-authored-by: Wei Huang <wei.huang@noaa.gov>
Co-authored-by: Jong Kim <jong.kim@noaa.gov>

Resolves #2997

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

How has this been tested?

  • Clone and build on CSPs
  • Forecast-only on AWS
  • GEFS test on AWS

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • I have made corresponding changes to the system documentation if necessary

env/AZUREPW.env Fixed Show fixed Hide fixed
env/AZUREPW.env Fixed Show fixed Hide fixed
env/AZUREPW.env Fixed Show fixed Hide fixed
env/GOOGLEPW.env Fixed Show fixed Hide fixed
env/GOOGLEPW.env Fixed Show fixed Hide fixed
env/GOOGLEPW.env Fixed Show fixed Hide fixed
@DavidNew-NOAA
Copy link
Contributor

These are the lines in the log file that indicate a regression test failure:

 0: terminate called after throwing an instance of 'oops::TestReferenceFloatMismatchError'
 0:   what():  Test reference Float mismatch @ Line:10
 0: Test Val : 2.2505994886159897e-02
 0: Ref  Val : 2.2395884618163109e-02
 0: Delta    : 1.1011026799678802e-04
 0: Relative tolerance: 2.2450939752161502e-05
 0: Absolute tolerance: 1.0000000000000001e-05
 0: Test Line: 'water_vapor_mixing_ratio_wrt_moist_air       | Min:+1.6836553484722572e-08 Max:+2.2505994886159897e-02 RMS:+5.0110427820345893e-03'
 0: Ref Line : 'water_vapor_mixing_ratio_wrt_moist_air       | Min:+1.6847316430812498e-08 Max:+2.2395884618163109e-02 RMS:+5.0107896876815721e-03'

@RussTreadon-NOAA
Copy link
Contributor

Thank you @DavidNew-NOAA for looking at the output. As David notes, the reference check failed not due to DA but rather due to changes in the background used by H(x) in enkfgdas_atmensanlobs.

Do we expect this PR to alter forecast fields (deterministic and/or ensemble)? The modeling team should confirm that observed differences in model output are acceptable.

@RussTreadon-NOAA
Copy link
Contributor

This PR does not change the sorc/ufs_model.fd hash. g-w PR #3145 updated the sorc/ufs_model.fd hash. This updated hash is included in this PR.

It was noted in g-w PR #3163 that the updated sorc/ufs_model.fd hash altered forecast output. GDASApp reference files have been updated and are included in the updated sorc/gdas.cd hash in g-w PR #3163. As documented in g-w PR #3163, g-w CI passes on WCOSS2 (Dogwood), Hera, and Orion.

One path forward is to

  1. merge g-w PR #3163 into g-w develop
  2. update NOAA-EPIC:csps-rocky8 with the updated g-w develop
  3. rerun Hera CI for this PR

@aerorahul aerorahul added CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera and removed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed labels Dec 23, 2024
@emcbot emcbot added CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera labels Dec 23, 2024
@emcbot emcbot added CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully and removed CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress labels Dec 23, 2024
@emcbot
Copy link

emcbot commented Dec 24, 2024

CI Passed on Hera in Build# 1
Built and ran in directory /scratch1/NCEPDEV/global/CI/2998


Experiment C48_ATM_c157e5e2 Completed 2 Cycles: *SUCCESS* at Mon Dec 23 21:59:23 UTC 2024
Experiment C48mx500_3DVarAOWCDA_c157e5e2 Completed 2 Cycles: *SUCCESS* at Mon Dec 23 21:59:26 UTC 2024
Experiment C48mx500_hybAOWCDA_c157e5e2 Completed 2 Cycles: *SUCCESS* at Mon Dec 23 22:05:32 UTC 2024
Experiment C96_S2SWA_gefs_replay_ics_c157e5e2 Completed 1 Cycles: *SUCCESS* at Mon Dec 23 22:12:18 UTC 2024
Experiment C96C48_hybatmaerosnowDA_c157e5e2 Completed 3 Cycles: *SUCCESS* at Mon Dec 23 23:12:46 UTC 2024
Experiment C96C48_hybatmDA_c157e5e2 Completed 3 Cycles: *SUCCESS* at Mon Dec 23 23:12:54 UTC 2024
Experiment C96_atm3DVar_c157e5e2 Completed 3 Cycles: *SUCCESS* at Mon Dec 23 23:18:50 UTC 2024
Experiment C96C48_ufs_hybatmDA_c157e5e2 Completed 3 Cycles: *SUCCESS* at Mon Dec 23 23:56:03 UTC 2024
Experiment C48_S2SW_c157e5e2 Completed 2 Cycles: *SUCCESS* at Tue Dec 24 00:15:46 UTC 2024
Experiment C48_S2SWA_gefs_c157e5e2 Completed 1 Cycles: *SUCCESS* at Tue Dec 24 00:39:50 UTC 2024

Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. tests have passed on Hera and Hercules.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS CI-Wcoss2-Failed **Bot use only** CI testing on WCOSS for this PR has failed CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress labels Dec 24, 2024
@WalterKolczynski-NOAA
Copy link
Contributor

gdas build fails on WCOSS, but it fails in develop too

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Failed **Bot use only** CI testing on WCOSS for this PR has failed CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS labels Dec 24, 2024
@WalterKolczynski-NOAA
Copy link
Contributor

CI Tests set up to run in /lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_2998/RUNTESTS on WCOSS

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully and removed CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress labels Dec 24, 2024
@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit 290f1d2 into NOAA-EMC:develop Dec 24, 2024
10 of 11 checks passed
tsga added a commit to tsga/global-workflow that referenced this pull request Jan 4, 2025
* develop:
  Ensure OCNRES and ICERES have 3 digits in the archive script (NOAA-EMC#3199)
  Set runtime shell requirements within Jenkins Pipeline (NOAA-EMC#3171)
  Add efcs and epos to ufs_hybatm xml (NOAA-EMC#3192) (NOAA-EMC#3193)
  Fix GEFS and SFS compile flags in build_all.sh (NOAA-EMC#3197)
  Remove early-cycle EnKF forecast (NOAA-EMC#3185)
  Fix mod_icec bug in atmos_prod (NOAA-EMC#3167)
  Create compute build option (NOAA-EMC#3186)
  Support global-workflow using Rocky 8 on CSPs (NOAA-EMC#2998)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support global-workflow on CSPs with Rocky 8
7 participants