-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update modulefiles to use spack-stack unified environment #1707
Update modulefiles to use spack-stack unified environment #1707
Conversation
@ulmononian @Hang-Lei-NOAA What about status of spack stack installation on wcoss2? |
@jkbk2004 The v1.3.0 has been installed on acorn for UFS testing: |
following on what @Hang-Lei-NOAA mentioned: @AlexanderRichert-NOAA is currently conducting tests w/ this acorn installation, and we will merge his updates to the |
@ulmononian hopefully, wcoss2 modulefiles as well to make PR fully ready. BTW, as you expect baseline changes, can we work on quantify the effect of changes? I can work on with you. let me know. |
it is out of my wheelhouse to update wcoss2...@Hang-Lei-NOAA would you be willing to make any necessary updates to the wcoss2 modulefile to reflect implementation of spack-stack 1.3.0 unified environment? you could submit a PR directly to the branch associated w/ this PR, if that would work for you. @jkbk2004 yes -- let's touch base about quantifying the changes. |
update: the acorn modulefile updates from @AlexanderRichert-NOAA were just merged. |
@DusanJovic-NOAA by chance, did you ever have success running |
@ulmononian Will the SCOTCH library (required for the unstructured WW3) be available in this unified spack-stack? I am working on committing that capability to UFS and will need it available. I was able to compile on cheyenne.intel (NOAA-EMC/hpc-stack#501 (comment)) using your install there. |
No, I did not run any gnu/openmpi tests on Hera recently. |
looking into getting this added now. apologies for the delay. what version would be ideal? |
@ulmononian I know they are debugging issues w/ the SCOTCH library and are expecting to eventually need to rebuild once the issue is found. But for the low task counts in the RTs, the issue they're debugging shouldn't be problem. So I think the 7.0.3 (which is what I tested on Cheyenne) would be fine, unless you have a better idea. |
Matching what you have on cheyenne should be fine for the SCOTCH version. There is a known bug in scotch that the developers are working to solve. Once that fix is available (still in the debugging process) I will let you know and we'll need the new version. |
@DeniseWorthen @JessicaMeixner-NOAA thanks for this information. i'll try to add 7.0.3 for now. we should then be able to add it to the unified-environment installations on some platforms pretty quickly, i think. would orion, hera, and cheyenne work to start? also, any eta on that bugfix that will necessitate the version change? |
Hi @ulmononian, I'm working with the SCOTCH developer to resolve this, so I'll try to give an ETA. My best estimate is ~1-3 weeks. We are narrowing down where there issue is, though it is within the SCOTCH code base, so it is a little hard to be more exact as an outside developer. Hopefully we are near the end! |
@JessicaMeixner-NOAA Can better answer the timeline question. The other option, would be to add the METIS library for now to spack-stack (not on wcoss2 obviously) but it would also work for my purposes. Once SCOTCH is fixed, the tests could be switched to using that. I would like to be able to commit both a gnu and intel test for the unstructured mesh in UFS. |
I wonder if how multiple version of same package is handling with spack-stack. Some development work might need to use newer version of the library which is required for the PR at the end. For example, I am working on land component development which requires ESMF 8.5.0 beta release. Once I bring these changes to my fork, it will break my development unless I still keep using hpc-stack. Any idea, suggestion? |
thanks for this estimate, @MatthewMasarik-NOAA! @DeniseWorthen while i see the functional purpose of temporarily using METIS, my feeling is that it would be prudent to go directly to scotch in spack-stack given the wcoss2 rejection of METIS and overall workload. on another note: i definitely understand you wanting to inlcude gnu/intel tests. the issue isn't adding scotch actually (that should be simple), rather, sorting out the GNU/openmpi configuration of the spack-stack UE on hera and intel/cheyenne. until this is sorted, this modulefile PR will not be ready for review/merge. depending on your timeline to get the scotch/ww3 changes into develop, it may be best to utilize the hpc-stack installations. |
@uturuncoglu spack-stack is able to support multiple version within the same environment (e.g. the current unified environment does support multiple esmf and mapl versions). i believe @climbfuji or @AlexanderRichert-NOAA could provide more details regarding the paradigm/process for adding package versions needed for ufs-wm development (e.g. esmf 8.5.0 beta as you mentione), though i expect to the process to be similar to how hpc-stack updates are handled now. |
Fyi, if it's helpful SCOTCH should work fine for the range of use by the UFS RT's (MPI tasks < ~2K, would be OK). At this time I think it's most likely that the re-build for scotch that will be needed will be due to a change in the source code. It's possible, but unlikely, that we will need to alter the build process in any significant way. |
@ulmononian You comments on using METIS temporarily are understood. Is SCOTCH in the hpc-stack? I couldn't find it. |
i am going to transfer this conversation to NOAA-EMC/hpc-stack#501. |
@ulmononian Actually I think scotch is available in the stack, at least on hera for both intel and gnu. Many apologies for the confusion.
|
@DeniseWorthen the confusion was my fault because scotch/7.0.3 was NOT available until you mentioned it yesterday. i just did not notify in time that i ran the installations last night for the orion (intel) and hera (intel/gnu) stacks currently used by WM develop. let me know if you have any issues, please. further, scotch/7.0.3 should be ready for testing within the spack-stack unified environment by the end of today. i will post confirmation when done. any testing of that stack is appreciated! |
@DeniseWorthen @JessicaMeixner-NOAA @MatthewMasarik-NOAA i installed scotch/7.0.3 in the spack-stack UE on hera (both intel and gnu). it's not currently in my fork branch |
@ulmononian Thanks. I've been able to compile and run my unstructured feature branch with SCOTCH (using hpc-stack) on hera by adding
Is that the correct method I should be using? |
yes -- that looks right to me. very awesome that is it working for you!: |
@ulmononian I having trouble coordinating all the places this issue is being discussed. But as a heads-up, turnaround on hera is abysmal right now. I know scotch/hpc-stack/intel compiles and runs. I'm trying gnu now. I will test your spack-ue branch also, but it will be slow. On Cheyenne, I can also build and run w/ scotch/hpc-stack/intel. Scotch is not in the gnu hpc-stack there. Is everything available to try cheyenne w/ scotch for the spack-stack? Turnaround is much faster for me there. |
i understand. this is sort of an hpc-stack, spack-stack, and WM issue all at once so i'm not sure the best place for it. no rush for hera GNU (hpc-stack) or hera spack-stack UE tests. i can add it to cheyenne gnu hpc-stack and cheyenne spack-stack -- might be a few hours. |
@FernandoAndrade-NOAA can you run develop branch only to build compile_s2swa_faster_intel on jet? In that way, we may distinguish issues like current jet system situation or spack stack side. |
Using the current commit-1, on hera these were the compile times reported:
For jet, they were:
|
So, 01:36:02 compile time for compile_s2swa_faster_intel looks normal behavior on jet. at least nothing to do with this pr. |
I believe I know the cause of this on Jet. The applications are built with two sets of instructions |
@DavidHuber-NOAA Thanks for the note. |
Regression testing is complete. I've sent review requests. |
Oh @ulmononian , can you please resolve the 4 conversations above? |
@zach1221 Dusan is out this week, but I read through those and they can be addressed post PR, so they can be resolved. @junwang-noaa needs to comment on her question if she's OK addressing that outside of the PR. Otherwise we need to wait for those questions to be answered. |
@ulmononian The PR said that the results will change, have you looked at the differences to confirm the changes are expected? |
@mark-a-potts have you built the spack-stack on cloud? Would you please list the stack locations so that other developers can use them? Thanks a lot! |
Yes. The unified environment is installed under /contrib/EPIC/space-stack/spack-stack-1.4.1/envs/unified-dev |
@jkbk2004 ran butterfly tests and compared against hpc-stack results. everything looked reasonable. |
I have no interest in delaying this PR, but are the butterfly test results shown anywhere? I couldn't find anything in the associated issue (#1651) |
@ulmononian Thanks. I've linked the comment to the 1651 issue. |
Description
Now that spack-stack/1.4.1 has been released and the spack-stack Unified Environment (UE) installations are underway on all supported platforms (those pertinent to the WM are Acorn, Cheyenne, Gaea, Hera, Jet, Orion, NOAA Cloud/Parallelworks, and S4), modulefiles for these machines should be updated to use spack-stack in place of hpc-stack. Further,
ufs_common
will need to be updated to use the module versions included in the UE (which are at least up-to-date or newer than the currentufs_common
modules). UE installation and (some) testing information can be found here and here. More background info on the UE within the context of the WM can be found in #1651.Preliminary testing of the WM against the official UE installations has been performed on Hera, Orion, Cheyenne, Jet, Gaea, NOAA Cloud (Parallelworks), S4, and Acorn.
Modulefiles to be updated through this PR include: Acorn, Cheyenne, Gaea, Hera, Jet, Linux, MacOSX, Orion, NOAA Cloud (Parallelworks), and S4.
Some additional modifications may be required for certain platforms outside of the modulefiles alone (e.g., Cheyenne's fv3_conf files to address switch from mpt to impi, etc.).
While spack-stack is available and currently being tested on Hercules and Gaea C5, these machines are being addressed in separate PRs (#1733 and #1784, respectively).
This work is in collaboration with @AlexanderRichert-NOAA, @climbfuji, @mark-a-potts, and @srherbener.
Testing Progress (
cpld_control_p8
):spack-stack 1.4.1:
/lfs/h1/emc/nceplibs/noscrub/alexander.richert/ufs-weather-model-cameron-pr/tests/logs/log_acorn
;/lfs/h2/emc/ptmp/alexander.richert/FV3_RT/rt_30530
(@AlexanderRichert-NOAA)/glade/work/heinzell/1.4.1/tests/logs/log_cheyenne
;/glade/scratch/heinzell/FV3_RT/rt_11684
(@climbfuji)/lustre/f2/scratch/Cameron.Book/FV3_RT/rt_20311
(@ulmononian)/scratch1/NCEPDEV/nems/Alexander.Richert/ufs-weather-model-spack-stack-pr/tests/logs/log_hera
(@AlexanderRichert-NOAA)/scratch1/NCEPDEV/nems/Alexander.Richert/ufs-weather-model-spack-stack-pr/tests/logs/log_hera
(@AlexanderRichert-NOAA)/lfs4/HFIP/hfv3gfs/Cameron.Book/RT_RUNDIRS/Cameron.Book/FV3_RT/rt_5838
(@ulmononian)/work/noaa/stmp/cbook/stmp/cbook/FV3_RT/rt_409735
(@ulmononian)/scratch/users/mpotts/FV3_RT/rt_441936/cpld_control_p8_intel
(@mark-a-potts)Top of commit queue on: TBD
Input data additions/changes
Anticipated changes to regression tests:
Anticipation is that the majority of RTs will change as this is a fundamental stack change. New baselines will most likely be required.
Subcomponents involved:
Combined with PR's (If Applicable):
Commit Queue Checklist:
Linked PR's and Issues:
Depends on #1745 (currently GOCART submodule hash update and two .rc file changes cherry-picked from this PR)
#1651
Testing Day Checklist:
Testing Log (for CM's):