-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gaea C6 support for UFSWM #2448
Conversation
cpld_control_p8 intel fails for timing out, so there's work to tweak the configs to better match the C6 hardware. I think there's still lots of other items to check here, this is just a placeholder for now. Please feel free to send PR's to my fork/branch to add/adjust/fix any issues etc... |
Also, once things start falling into place, we'll need to make sure intelllvm support is available for c6. |
@BrianCurtis-NOAA, name change suggestion:
|
@BrianCurtis-NOAA Shall I re-try building with these |
cpld_control_p8 fails with:
and control_p8 runs to completion:
|
@DusanJovic-NOAA this look ok?:
|
Yes. |
@BrianCurtis-NOAA @jkbk2004 @FernandoAndrade-NOAA i believe EPIC now has full access to the |
@BrianCurtis-NOAA can you sync up branch? I think I am able to create baseline on c6: /gpfs/f6/bil-fire8/world-shared/role.epic/UFS-WM_RT/NEMSfv3gfs. |
Continue to see failures with various cases.
About 3 different behaviors and error messages:
@ulmononian @RatkoVasic-NOAA we need troubleshooting from lib side. |
@RatkoVasic-NOAA @BrianCurtis-NOAA |
Any combination is OK, as long as they are same length. |
@MichaelLueken just fyi regarding c5/c6 naming conventions. i recall there was a desire to sync the srw ci/cd pipeline w/ certain gaea c5/c6 naming conventions. |
I'll be going with gaeac6 and gaeac5, FYI. I'll make those changes at some point tomorrow. |
@BrianCurtis-NOAA @ulmononian @jkbk2004
Also adding in ./tests/fv3_conf/fv3_slurm.IN_gaea: |
please try what @RatkoVasic-NOAA has suggested in your job cards, before fv3.exe is run: export FI_VERBS_PREFER_XRC=0. this is a known issue inherent to the c5 system. may also try for c6. |
@jkbk2004 @BrianCurtis-NOAA |
@BrianCurtis-NOAA @jkbk2004 @ulmononian
If there is need more work on Gaea C6, I can make PR now. There are only 4 files that needed change, provided here. |
Let me put all of this together and update this PR. |
This is not up-to-date for either CMEPS or CDEPS. |
It will be interesting to see how ESMF version 8.8.0 will affect things on Gaea C6 with ESMF-managed threading. The latest feature frozen beta for 8.8.0 is v8.8.0b10. A lot of work for 8.8.0 was in one of the areas of the framework that affects high core count ESMF-managed threading runs. @GeorgeVandenberghe-NOAA reported some positive effects (using the earlier snapshot v8.8.0b09) for large core count runs on Gaea C5. |
I was going to build v8.8.b902 but should I instead just make b8.8.0b10
available everywhere I can and freeze on that for the next few months for
large core count runs? I can build it on hercules, orion, hera
(irrelevant there) , gaeaC5 and GaeaC6. I am forbidden of course from
building it on WCOSS2. I build in my private stack outside of spack-stack
before spack-stack is ready to include it.
…On Tue, Dec 17, 2024 at 5:36 PM Gerhard Theurich ***@***.***> wrote:
It will be interesting to see how ESMF version 8.8.0 will affect things on
Gaea C6 with ESMF-managed threading. The latest feature frozen beta for
8.8.0 is v8.8.0b10
<https://github.com/esmf-org/esmf/releases/tag/v8.8.0b10>. A lot of work
for 8.8.0 was in one of the areas of the framework that affects high core
count ESMF-managed threading runs. @GeorgeVandenberghe-NOAA
<https://github.com/GeorgeVandenberghe-NOAA> reported some positive
effects (using the earlier snapshot v8.8.0b09) for large core count runs on
Gaea C5.
—
Reply to this email directly, view it on GitHub
<#2448 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FXW6GI732CO4RHHPVT2GBOKTAVCNFSM6AAAAABPIFNEJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBZGEZTENRXGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
|
@GeorgeVandenberghe-NOAA it's probably worth some coordination with the spack-stack folks on the UFS side, like @AlexanderRichert-NOAA and @RatkoVasic-NOAA. Spack-stack is moving forward with the latest ESMF beta tag v8.8.0b10: JCSDA/spack-stack#1409 |
FYI: WCOSS2 won't accept a beta snapshot, so if we want to get the latest ESMF in WCOSS2, it will need an official release at some point soon. Also since the process has typically been slow, we will want to try getting that started as soon as there is an official release. |
@BrianCurtis-NOAA The official ESMF 8.8.0 release date is planned for early/mid January. |
But of course we do need the beta testing, so we understand how 8.8.0 will be doing in the field. |
Yep. Won't happen on WCOSS2 by policy 😡
…On Tue, Dec 17, 2024 at 8:03 PM Gerhard Theurich ***@***.***> wrote:
But of course we do need the beta testing, so we understand how 8.8.0 will
be doing in the field.
—
Reply to this email directly, view it on GitHub
<#2448 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FVDA4QYC7GUAW2O5AD2GB7STAVCNFSM6AAAAABPIFNEJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBZGQ4TMNJRGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
|
I will likely be ahead of spack-stack in making this available. You've
answered the question.. I will go with v8.8.0b10. Probably be available
on Gaea C5 under my stack tomorrow and Orion/Hercules on Thursday.
…On Tue, Dec 17, 2024 at 7:54 PM Gerhard Theurich ***@***.***> wrote:
I was going to build v8.8.b902 but should I instead just make b8.8.0b10
available everywhere I can and freeze on that for the next few months for
large core count runs? I can build it on hercules, orion, hera (irrelevant
there) , gaeaC5 and GaeaC6. I am forbidden of course from building it on
WCOSS2. I build in my private stack outside of spack-stack before
spack-stack is ready to include it.
@GeorgeVandenberghe-NOAA <https://github.com/GeorgeVandenberghe-NOAA>
it's probably worth some coordination with the spack-stack folks on the UFS
side, like @AlexanderRichert-NOAA
<https://github.com/AlexanderRichert-NOAA> and @RatkoVasic-NOAA
<https://github.com/RatkoVasic-NOAA>. Spack-stack is moving forward with
the latest ESMF beta tag v8.8.0b10: JCSDA/spack-stack#1409
<JCSDA/spack-stack#1409>
—
Reply to this email directly, view it on GitHub
<#2448 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FR2STURI2ZSR6ITUST2GB6NLAVCNFSM6AAAAABPIFNEJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBZGQ3TOOJVGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some slow compile times (s2swa_32bit_pdlib_sfs_intel, s2swa_debug_intel, s2s_intel, s2swa_faster_intel), may be worth monitoring if they persist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Nick, we definitely need to look into those compile times for C6.
A lot of thanks to EPIC group in helping to get this PR to the finish line. |
Is ESMF/8-8-10 in spack-stack on Gaea C6 or should I try to build it
there. I haven't focused much on C6 because ufs-weather-model didn't work
there until
just now and due to policies NCEP does not have much footprint on C6. I
can cobble together build and run systems but it's great I will no longer
need to
and I dream of being able to sunset all of my private hacks forever.
…On Wed, Dec 18, 2024 at 5:22 PM Brian Curtis ***@***.***> wrote:
A lot of thanks to EPIC group in helping to get this PR to the finish line.
—
Reply to this email directly, view it on GitHub
<#2448 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FRCW4IXG66EARKW67T2GGVNVAVCNFSM6AAAAABPIFNEJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJRHA4DINBWGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
George W Vandenberghe
*Lynker Technologies at * NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
***@***.***
301-683-3769(work) 3017751547(cell)
|
@GeorgeVandenberghe-NOAA not for now. UFS-WM is using spack-stack@1.6.0 (with esmf 8.6.0) and latest spack-stack (1.8.0) is installed with esmf@8.6.1. It can be added as chained environment, like @AlexanderRichert-NOAA did it on Hercules:
|
Commit Queue Requirements:
Description:
This PR will bring in all changes necessary to provide Gaea C6 support for UFSWM
Commit Message:
Priority:
Git Tracking
UFSWM:
Sub component Pull Requests:
UFSWM Blocking Dependencies:
Changes
Regression Test Changes (Please commit test_changes.list):
Input data Changes:
Library Changes/Upgrades:
Testing Log: