-
Notifications
You must be signed in to change notification settings - Fork 318
Meeting Notes 2021 Software
Feel free to reorder items in the list. Put them in priority order rather than when entered. We always need a good chunk of time for prioritization and looking at issues. Anyone is free to add to the agenda. Anyone is free to reorder, in general we let items that came first go first. Agenda items can also have a: brainstorm, make-decision, long-term planning, or make criteria for decision marker on them Recommend we periodically add highlights or wins of the week.
- Erik/Negin: Adjust time of standup meeting?
- Erik: Meetings for the rest of the year?
- Erik: Is this last meeting of 2021? What are the CTSM software "wins" of 2021? These are some that I saw looking through the ChangeSum file:
- FATES on main-dev
- New science: BHS, grid-cell error checks, Arctic fixes from Leah, NEON!, MIMICS-prep, Antartica compset, Medlyn for non-PHS, FATES-SP mode, PPE branch
- CN-Matrix work (not quite on master, but has been used to do spinutp faster)
- release-cesm2.2.01
- New scripts for subset and run NEON
- Start bounds at 1 on every task
- FATES sharing of fire data
- NUOPC the mediator default
- PTCLM aging off and replaced by the subset_data developed by Negin
- Erik/Bill/Adrianna: CDEPS issue #135
- Erik: Note from last week, features that are default for a given version can be removed when that version goes away. This could also be the case for features that are optional, but tied to a version. For example I would envision VIC be tied to clm4_5 and go away with it.
- Negin: Python command-line arguments and flags.
- Erik: go over content of answer changing tag
- Erik: Some thoughts on the purposes of this meeting. If we have time (we could also save this for January to start the year off on thinking about how these meetings should be done:
- CTSM science leads are here to direct priorities of software efforts
- CTSM scientists are here to represent users of the system
- CTSM SE's are here partially to get priority direction from CTSM leads
- CTSM SE's are also here to get technical help on their work for the week (let's make sure this working for everyone and all SE's are getting the input/feedback they need)
- FATES SE's are critically important parts of the team both as CTSM/FATES users and CTSM/FATES SE. So we need your input to make sure the CTSM system works well from a FATES user and SE perspective. We may want to have a guideline for discussion length that if something isn't resolved reasonably quickly that we setup a different meeting for it (because of the different needs of attendees) By the way I think python SE development needs a community to collaborate on best practices.
Erik raised the question of whether we can remove VIC.
There is a tension between wanting to maintain a multi-physics model on the one hand, but also recognizing that each option has a real maintenance cost.
Dave: a general rule is that it makes sense to remove an option when there is pretty clearly no scientific benefit of that option over others. (So, for example, we could remove the old snow cover fraction method because it's hard to justify that it has any scientific benefit over the newer method.)
Regarding VIC, we should probably get input from Martyn and Andy.
We had a (somewhat controversial/heated) discussion about how to handle flags like "--no-feature" vs. "--feature false". Negin gave a very helpful presentation on the different options for this.
Our general feeling is that we should stick with --feature
(for things where something is off by default and you want to turn it on) or --no-feature
(for things where something is on by default and you want to turn it off). There were concerns raised about the clarity of --no-feature
. However, this is the way that most Unix utilities work, and the advantage is, for something that is either on or off by default, you can see what the default operation is and what you can change just by looking at the available flag names, without having to read through all of the documentation of default values.
Adrianna made the good suggestion of, whenever possible, thinking about how --no-feature
can be reworded to be positive - e.g., rather than --no-single-pft
, instead use something like --allow-multiple-pfts
.
In any case, a lot of this will come down to having clear documentation. For these boolean flags, we feel it's more confusing than helpful to have something like "default: True" or "default: False". Instead, clearly document what the behavior is if you add the flag, and what the behavior is if you don't add the flag.
Feeling is we're supporting python 3.7 and later.
We should put in place some GitHub action-based testing to run the python testing on different python versions.
- Erik: NOTE I have to leave at just before 10am to get my COVID booster. Should be back afterwards.
- Ryan: FATES unit testing
- Will: Are there lightly use compsets or features we can remove (Gokhan's request to co-chairs)?
- New soils data from ISRIC should be available next week.
- Keep going through "bug - impacts science"
Point of this is: things that take some support but are less important, since we're getting a growing support burden without an increase in support resources.
One possibility is the river model: it would be great if we could just support a single river model. It might be best overall if we could just use mizuRoute, dropping MOSART and RTM. It sounds like it should be possible to make routings for arbitrary topography, which addresses the need of the paleo group.
We should keep in mind things that are taking some maintenance burden without much value. Can we make a list of such things that we can remove when we come across a project where maintaining those old things is becoming a burden.
Some examples of things that could probably be dropped are:
- VIC
- Old snow cover fraction method
- CNDV
We can also consider: When can we drop support for CLM4.5
Dave also points out: having the new toolchain for making grids will help remove some support needs.
Dave: also, encouraging people to use existing tools rather than everyone building their own custom tools.
- Adrianna: single point stuff
- Bill: Sam's upcoming PR on surface dataset tool
- netcdf via lfs: importance of having lfs installed; checker via github actions (test in a test repo?)
- sys test: note ncar_pylib loading method
Adrianna has been working on the script that Negin had been working on, to subset data for a single point run.
There have been some discussions about different cases that this should handle.
Should datm from tower data be included here?
- Will: this is so idiosyncratic that we shouldn't try to support that here
- Negin: put this flag in initially, but agrees this might not be something we want to support now
- Will: the supported tower sites (for PLUMBER) are already made. Could potentially point to those via user mods.
Regional: question about whether there should be an option to select an initial resolution, and how.
- Jackie: for FATES, a lot of people do regional cases. She typically subsets the 1 deg or 1/2 deg datasets to do this.
- Jackie: one of the challenges for a lot of people is finding the file paths. Currently you need to make a dummy case in order to find the most recent files.
- It might make sense to have the regional subsetting script create a dummy case under the hood: This way, if we update out-of-the-box surface datasets, we don't need to update the files pointed to by the regional subsetting script. So the user would need to specify a compset and grid, then the script would create the case and figure out the file paths.
- Adrianna was hoping there would be a way to call the namelist generation script directly without creating a case, but unfortunately that is hard (or impossible) right now. We should keep that in mind when pythonizing build namelist
- Jackie: one challenge can be coming up with the number of processors. It will fail if you give it too many processors.
Will: Will this be two scripts or one?
- Adrianna: right now it's one, but you pick regional or single-point when running it, but we might separate that into two separate scripts.
How to subset the mesh file? This will probably be a challenge. We should talk to Mariana or the ESMF group to see if they have any ideas. We might need to back up to the scrip grid file or even a domain file, subset that, and then build a mesh file.
- Bill: celebration: 200 forks of CTSM!
- Erik: How do we prevent issues like #1543 where a change was made, but then a change against that came in by accident afterwards?
- Bill: small, short-lived branches (I think this is the main kind of thing that CI was designed to solve)
- Bill: How should we store netCDF inputs and outputs for use in python testing? https://github.com/ESCOMP/CTSM/issues/1548
- Erik: Win for the week -- Bill and Greg quickly found problems I ran into with the nuopc tag. Thanks to both! Also the standup meeting was very helpful and has been helpful for things before this weekly meeting.
- Erik: I assume the other win for the week was the NEON meeting?
- Will / Bill: Go through "bug - impacts science" items
- Erik: Add Limitation as a label?
- Erik: I went to the meeting facilitation and also have been talking to Beatrice about meetings. I'd like to hear from people their experience of this meeting (also I'd love to do a quick one on one meeting with you). Beatrice pointed out that having attendees create the agenda would be considered odd (I think it works for us though), and I also discussed with her about working on other things during meetings (I think this is good and bad).
- Erik: As a member of MGEN (Men for Gender Equity at NCAR/UCAR/UCP) I'd like to hear the experiences of women in meetings. As a person with a lot of privilege it unfortunately takes effort to identify situations, and figure out how to respond. So please, women if you feel comfortable, setup a time to talk to me about your experiences. This was an assignment we talked about doing from our last months meeting.
Dave's suggestion: go through agenda items in priority order. Save at least the last 1/2 hour to go through issues and upcoming tags.
Does anyone have ideas for how we can keep others more informed of things that are discussed at the software meetings?
The upcoming tags project could be useful to people; many people may not be aware of that. We could also remind people of where to find the agenda.
Will could summarize the recent tags that have been made at the CLM science meetings.
We got through https://github.com/ESCOMP/CTSM/issues/1280
- Will: When do we make our first CTSM5.2 tag (and 'finalize' CTSM5.1)? We have made progress on a number of accomplishments that makes me inclined to releasing CTSM5.1 including FATES-SP, nuopc, +NEON and PPE branches. Should we release hillslope hydrology too? This seems worth considering from both a scientific and SE perspective for priorities and dates.
- Will: How do we consider supporting and curating tools for model analyses? Negin has great examples from NEON, Daniel has more for the PPE, ctsm-py also has some useful tools, and FATES likely does too? These wouldn't be on the escomp/ctsm repo, but elsewhere on github. How do organize & navigate this? Also maybe worth a science discussion too?
- Erik: How should we handle CLM_ACCELERATED_SPINUP turning off MOSART and RTM? I've got it hooked up in MOSART buildnml to set MOSART_MODE=NULL, but this is a awkward thing to do, you don't expect build-namelist to set XML variables. And there might be an order dependence in this (that could be removed by moving it to buildlib though). This is issues 30 and 48 in RTM/MOSART: https://github.com/ESCOMP/RTM/issues/30 and https://github.com/ESCOMP/MOSART/issues/48
- Erik/Adrianna: subset_data questions. Can we use base class for shared arguments between point and regional?
- Erik/Bill: cime no-leap issue at year 100. https://github.com/ESMCI/cime/issues/3135
- Bill: New GitHub projects (https://github.com/orgs/ESCOMP/projects/2/views/8 ). Would this be useful? I'm not sure.... I like that you can dump all issues into a single project and then slice and dice them in various ways (e.g., we could have categories for performance, big projects like water isotopes, etc.). But I'm really not sure if this would help us stay on top of our issues any better than our current systems. Thoughts?
Erik: One thing that still needs to come in is CN-Matrix.
We're adding a few other things to the ctsm5.1 milestone.
Adrianna: It would be helpful to make available some data that the various scripts could run on, so you can see how it's supposed to work (as a more useful tutorial). Others feel that would be very useful.
Is it also useful to have some general-purpose utilities?
This ties in with the need to rewrite the LMWG diagnostics package.
Dave suggests engaging with ESDS. Negin is planning on presenting NEON visualization stuff to ESDS.
Negin: one thing she did was something that reads all of the variables from a single-point simulation and plots them in an interactive way.
ILAMB is also going to keep growing, which we should keep in mind as part of the big picture.
Dave points out reasons why we wouldn't want separate compsets or even user mods directories - since typically you clone a non-spinup case to make a spinup case.
We'll probably go with a solution where you need to explicitly turnoff the runoff model, maybe with a check that forces you to be explicit about it if you have turned on accelerated spinup.
Idea of assigning some measure of size to issues (story points, rough time estimate, etc.) along with importance (how many users would this impact, how big of an impact): having these two pieces of information attached to each issue might help us find good issues to pick off (quick and/or important).
Adrianna also suggests reframing issues as user stories: this can make them clearer as well as provide more clear motivation for working on an issue.
- Bill: Some questions related to FATES performance (I was asked this recently and wasn't sure how to answer):
- Do the computationally intensive parts of FATES work on linked lists, or are data copied in and out of arrays for the computationally intensive portions?
- I know that CTSM is (significantly) more expensive when running with FATES, but does most of this extra expense occur within the FATES code (thus, working with the FATES data structures) or within the CTSM code (thus,working with the CTSM data structures, and things are just more expensive because there are more patches, for example)?
FATES uses linked lists almost exclusively, though there are some places where array calculations are done.
Bill's over-arching question is what percentage of the time in a CTSM-FATES run is spent working with linked lists vs. arrays (thinking about possibilities for vectorization).
Adrianna also wonders if there are things going on like sorting of the linked lists that themselves might be large costs.
The biggest point may be the need to do a more detailed performance analysis of FATES runs to understand where the time is being spent. How much of the time is spent in FATES itself vs. the CTSM code; how much of the FATES time is spent on the science computations vs. infrastructurey code.
We should add timer calls in the FATES code. Can also use ARM MAP and/or VTune for more detailed analyses.
- Ryan/Greg: Passing model timestep early for fates: https://github.com/ESCOMP/CTSM/pull/1304#discussion_r733053071
Long-term it could be good not to rely on the model's dtime being the same as "dtime" from the driver (which specifies the model coupling interval; currently we force the time step to be the same as the coupling interval, but that wouldn't need to be the case).
In the near-term, we're okay with either option: either passing dtime to initialize2 and using that in FATES's initialization, or splitting FATES's initialization to allow it to use the correct dtime.
- Erik/Greg: What are the best practices for creating baselines that we should adopt? Save generate until last run when all is working?
- Ryan/Greg: Discussion of reduced logging messages and frequent
BalanceCheck, solar radiation balance error (W/m2)
messages https://docs.google.com/presentation/d/1SGtt5wZKHyQeySXi1ED4xc0BDk4-WKJbNGPEOwbUNlM/edit?usp=sharing
One issue is that namelist baselines can sometimes not get updated if you rerun an existing test.
Bill's feeling is: he often ignores NLCOMP failures unless it's a tag where this is particularly relevant (e..g, a refactor of build-namelist). Others are okay with that for now.
Note compare_test_results
and bless_test_results
scripts in cime can be useful, but Bill's experience is that they are problematic in terms of namelist comparisons.
We had a long discussion on how to deal with warnings, which I'm not fully capturing here.
A big focus was on dealing with balance check warnings.
One idea would be to have a threshold of the number of warnings that are acceptable before killing the run, if a namelist flag is set. By default this wouldn't happen, but we'd do that in the default testmod. (Ideally the acceptability level would scale with the amount of time and number of grid cells owned by the processor.) But we're not sure if that's worth implementing.
In terms of the FATES solar energy balance warning that Jackie is running into: for now, suggestion is to increase the tolerance on both the warning and the error by 1 order of magnitude, and see if that fixes the issue. (Currently the warning and error tolerances for this check just differ by 2 orders of magnitude; we don't want them to be too much closer than that, since that raises possibilities of the run aborting when it shouldn't in a long production simulation.)
- Erik/Negin: Negin found an issue with the mkunitymap script which we were using as part of the process for CTSM-WRF coupling. This script is only useful for creating domain files for land-only regional cases. Since, NUOPC means we don't need domain files should we remove this from the process?
- Erik: In the update to NUOPC as the default I figure I'll remove PTCLM and associated stuff
- Bill: useful blog post on how to be a good author of PRs
Background: Negin was helping the Norwegian group with an issue in running CTSM-WRF for their domain. She is getting fracs much larger than 1 on the ESMF-generated mapping files. Will probably ask for help from the ESMF group.
Question of whether to maintain mkunitymap. Feeling is that, as long as ESMF works to create a mapping file between two identical grids, we should probably just use that to have one consistent process.
Note that we DO still need domain files for CTSM-LILAC.
Overall FYI type things....
- Erik: By the way Louisa said that GEOS-Chem is going to go with using the current dry-deposition methodology, rather than passing the guts of CLM into CAM. I think this is good.
- Erik: Beatrice in the communication review had some really useful ideas about meeting agenda's and is going to do a training on meeting facilitation in October. I thought her suggestions were really useful, and we might want to go over them after she presents them. I want to endorse attending her training.
- Erik/Adriana: Note, Adriana pointed this cartoon out to me which points out a specific sort of dysfunction that we should be sure to avoid. https://xkcd.com/1597/ Note, I'm not concerned with Adriana's git skills, but think the cartoon illustrates an important point about our process. Requiring scientists to learn too much about software tools (like git) so that they can't actually put effort into their core work, is not good. Which is maybe what the point of the cartoon is. git is a powerful tool, and critical for our work as SE's. But, it's good to examine who really needs to work with it as there is a learning curve. Same is true of other things as well, the question is how to get the right balance.
- Erik: When we meet in person I hope everyone will feel free to be nurturing and bring cookies, without fear of being thought less of. That's an area that I hope we all can fight any implicit biases we have.
GEOS-Chem will work with the current dry-deposition methodology, rather than passing the cuts of CLM into CAM.
Adrianna points out: it can be easier to understand & remember git commands if you understand the conceptual model (e.g., representation as a directed acyclic graph).
Adrianna points to https://rachelcarmena.github.io/2018/12/12/how-to-teach-git.html
Greg points to https://www.atlassian.com/git/tutorials as something else that he has found useful.
And Negin adds https://swcarpentry.github.io/git-novice/
Negin points out that another complexity is dealing with git within a docker container. She has figured out some ways to make this easier, and can write this up on the wiki.
- Erik/Adrianna: Adrianna suggests making Tutorial video of small tasks. Possibly split existing video up.
- Erik: Plan for move to NUOPC. We should give some guidance to users in wiki, README files, and ctsm-dev. And also go over some of this in a CLM science meeting. I think Mariana's PR should go before the update, and then the NEON UI afterwards. We also want to have cesm2_3_beta06 to happen before we do the update in CTSM.
- Erik: There's some work we need to get into NUOPC for the Hillslope model. See: https://github.com/ESCOMP/CMEPS/pull/230, https://github.com/ESCOMP/CESM_share/pull/9 and then this issue: https://github.com/ESCOMP/CDEPS/issues/120
Will discovered that datm is now reading calendar attributes – in particular, noleap vs. Gregorian. Just need to be aware of this: it needs to be set correctly on the files.
Adrianna suggests that some short video tutorials could be a helpful way for people to learn various things. e.g., screencasts of doing git stuff, how to find errors in logs, etc.
Having these on YouTube would let us see how much they're being viewed.
Having either short videos or videos broken into short chapters could be more useful than a single, multi-hour video. It sounds like adding chapter bookmarks to a video is pretty easy: see https://support.google.com/youtube/answer/9884579?hl=en
One thing we could do is, for the next CTSM tutorial, focus on how we can break things down into videos that can be posted.
Another thing we can do is: next time we see some need for some documentation of how to do something, try to do a video instead of written documentation: it could actually be faster (and more complete) to do it that way.
The NEON project is another good example of where this could be useful.
Some things that differ now:
- setting datm and streams settings
- cpl log has been replaced by drv log and med log
- domain file is (mostly) not required for nuopc (but some subtleties about this)
- also some development-centric things, like new capabilities for 3d stream fields
Plan:
- Ask Mariana to write up some things about the changes, probably in a CESM forum post that we can point people to
- In terms of CTSM-specific things, probably the main focus should be on updating our user's guide and README files that will now be incorrect in some ways
The streams differences are big enough that CTSM users probably do need to know about it: any changes you make to datm will need to change now. So probably at least worth bringing this up at a CLM science meeting (and we could record it!). But this isn't super urgent: don't need to have this ready the day we do the code update. Maybe plan on doing this in 4 weeks.
Maybe also do a wider announcement, probably by pointing people to whatever Mariana writes up.
- Dave/Peter: Talk about LULCC in FATES
- Erik: 360x720cru mask -- I was going to use the tx0.1v3 ocean mask, but it wasn't available. MOM is actually running a 0.66 degree grid, which is maybe more appropriate?
- PPE tests
Probably doesn't matter hugely. Anything that works should be fine.
Erik got things working with the tx0.1v3 mask, so will probably go with that.
Want to run a parallel SP perturbed parameter experiment.
First step is figuring out which parameters apply to SP. Doing that by simply running a set of short runs and seeing which ones are bit-for-bit.
Next phase: ramping up to PPE with tower sites.
- Ideally, we'd have a flexible system that could run over different sets of sites: NEON, PLUMBER, etc. (or a subset of one of those). One challenge of non-NEON sites is that there is more variability between sites.
- Note that PLUMBER would be somewhat similar to NEON, in that we have met data from the tower sites.
Dave: there is an initial prestaging of data; that's already been done for the PLUMBER sites and Ameriflux sites; similarly, we have already preprocessed the forcing data. So then the question is whether it's worth having a similar process for running these different classes of sites.
Will: it could be useful to have a set of usermods directories for these other sites, like we do for NEON. This points to the correct surface dataset, etc.
Keith: one difference between PLUMBER runs and NEON runs is that they wanted the output in local time. So they start the model at a time that starts at local midnight; his script figures out what time that is. That would need to be implemented in the NEON scripts. Erik: that could be done in the usermods directory.
Negin: distinction between running for tower sites vs. running for any arbitrary single point. Do we want a separate capability for running for any arbitrary single point (like what Adrianna is working on)? That will be discussed in an upcoming meeting.
Adrianna: Ideally we'd share as much code as possible between those two use cases.
- Dave: some examples of things that would be in common are: handling spinup; higher temporal frequency output
In terms of non-uniformity: Erik suggests this can be done in the usermods directory.
Strategies for running multiple sites in a single job: either multi-driver (the new version of multi-instance), or sparse grid.
- There are some performance considerations, but feeling is that that isn't the big bottleneck.
- A bigger issue is file management.
- But any of these tricks would be hard with the non-uniform PLUMBER sites.
- Overall feeling is that this probably isn't worth dealing with for now.
- Erik: Propose most PR's be drafts. Only have open PR's on top five or so that have a card in Upcoming Tags and are the ones at the top. If someone is working on more than on PR, only the next one should be open.
- Erik: mizuRoute would like to do it's development with the 360x720cru grid. I stopped making those files by default should we reconsider this? Should I make some now for that use? When running with mizuRoute do we want to always run CTSM on a regular grid and mizuRoute on HRU's, or would we want to run CTSM on HRU's as well? If the later we'd have to use Naoki's mapping to create the mapping files for mksurfdata_map.
- Erik: Adrianna is doing some work on the single point effort, which is great. It just seems like we should coordinate this with Negin and the NEON effort. Maybe another meeting?
- Erik: Who are the people interested in SLIM? I think it's Isla, Marysa, Wenwen, Dave and Gordon? I have a meeting on Friday for the kick off...
Erik suggests: Propose most PR's be drafts. Only have open PR's on top five or so that have a card in Upcoming Tags and are the ones at the top. If someone is working on more than on PR, only the next one should be open.
We'll try doing this.
Erik: mizuRoute would like to do it's development with the 360x720cru grid. I stopped making those files by default should we reconsider this? Should I make some now for that use? When running with mizuRoute do we want to always run CTSM on a regular grid and mizuRoute on HRU's, or would we want to run CTSM on HRU's as well? If the later we'd have to use Naoki's mapping to create the mapping files for mksurfdata_map.
People feel it makes sense to move ahead with the 1/2 degree grid. We're likely going to want to move towards that for other reasons anyway.
Would we ever run CTSM on the HRUs? Sean is doing this for CONUS. He was using a simpler grid than Naoki, but even so was finding that the default code would often crash due to the complex grid. But he is finding that there seems to be some scientific advantage of running on HRU-type grids for hillslope.
A big piece of complexity is configuring the datm streams.
The mechanism for changing this will differ with NUOPC; we should just target NUOPC. This will be simpler with NUOPC / CDEPS.
There is still a need for listing out the different files, which is one of the trickier parts.
Adrianna proposes having the subset_data script create the user datm streams file.
Erik has envisioned that you would initially create a usermods directory, then when you create a case, you would point to that usermods directory.
Adrianna is thinking that there would be one script to subset data for your given point (because you're likely going to want to use the same subset data for a bunch of cases), and then have a separate script for firing off the case.
For setting up a case, Negin feels it's better to start with the run_neon script. But it's a little different from run_neon; for example, neon has the compsets already defined. Then subset_data would be the other piece.
Where would the usermods get created? Probably at the time you create the subset data, since you only need to create the usermods directory once for a given subsetting of the data.
We could have a top-level script that wraps both (or have a flag to the runner that calls the subsetter if you haven't already done the subsetting).
- Erik: I noticed we should run pylint on buildnml/buildlib
- Erik: I uncovered issue #1372, when MCT is removed it would be good to resolve this. In looking at it I became concerned about NUOPC validation, but maybe I shouldn't worry...
- Erik: check_input_data gives a message about any file that's purposely set to " " (like finidat). Maybe we should remove those from ctsm.input_data_list?
- Dave: Lots of new output variables in latest tag, how should we manage the variable list html?
- Dave: Peter has time to start working on LULCC in FATES. Perhaps devote some of this meeting to LULCC in FATES technical discussion next week?
- Dave: I was contacted about a possible NASA proposal to implement carbon riverine transport into MOSART ... but wondering if I should push them to mizuRoute. Quick discussion about mizuRoute.
We should do this.
Bill's thought: would like to move the implementation into python/ctsm, then this will happen automatically.
Feeling is: remove these blank files from the input data list, to avoid this warning that could be confusing.
We should keep https://escomp.github.io/ctsm-docs/versions/master/html/users_guide/setting-up-and-running-a-case/master_list_file.html up to date.
In addition, Dave suggests that we add some version of the file to the clm50 documentation... doesn't need to be absolutely correct.
Also would like to have a version of this page for a FATES case. So have a second test that turns this on for a FATES case. Long-term we might want to just have a single page (especially once FATES history variables are namespaced with a FATES prefix), but for now a FATES case won't turn on a bunch of BGC variables, so we really do need two versions for the time being.
Also, add cautionary note at the top saying: not all variables are relevant / present for all cases.
If it's easy, also add what case was run at the top of this file... but that might turn into a rabbit hole, in which case we won't worry about it.
We'll plan to talk more about this soon.
A starting point would be having Peter pass in the information FATES would need, which includes some of the raw information (rather than the processed information that we currently have).
We're stuck in an in-between state: we don't have in-house expertise in MOSART, but mizuRoute isn't fully operational yet.
What are the barriers with adopting mizuRoute at this point?
- Jim Edwards recently fixed an issue with pio
- For mizuRoute's standard (complex) grids, there are issues creating the mapping files with ESMF. Naoki has a script to do this, but there isn't a simple process to do this for new land grids.
- Special cases, such as irrigation, ice runoff
- See also https://github.com/NCAR/mizuRoute/issues?q=is%3Aopen+is%3Aissue+label%3Acesm-coupling
We'll probably try to push down the mizuRoute path.
- Erik/Negin: Adopt "black" for our python process? https://black.readthedocs.io/en/stable/index.html
- Bill: I have no objections to this in principle, though since I don't feel that the problems it solves are significant, I'm concerned about the need to be careful about how this is integrated into the workflow for it to do more good than harm – specifically in terms of doing big reformats that can result in merge conflicts. If someone can propose an easy way (that won't take much of anyone's time to set up) to integrate this robustly into the development workflow, then I'm open to it. It may also be worth coordinating with Jason Boutte at LBL(?) who has proposed something similar for CIME, so that this is done consistently across these projects.
- Erik: I'd like to propose that we setup a github action that just verifies checked in code is validated by black. This could be similar to the action we have to make sure check boxes are addressed in a PR. It will take us running our python code through black by hand beforehand.
- Erik/Negin: Also google standard: https://google.github.io/styleguide/pyguide.html
- Erik: Prepare for possible reversion by putting things that might be reverted in single commits (or a small number). And it's good to easily identify them. That's another advantage of having simple issues to provide that identification. If you create an issue for something that ends up being reverted the documentation for it will all be in one place, and it will be easy to find the commit hashes involved. This is also why I'd say it's best to not "--squash" merges so that you get this kind of finer resolution of commits.
- Erik: Half degree issues. We removed the 360x720cru maps and replaced them with the 0.5x0.5 maps that we already had. I ended up undoing that. Sam and I talked and had a couple ideas. Try it with nuopc is one. And the other is since we can't remove the 360x720cru grid, why don't we remove the 0.5x0.5 grid. The later failed when I tried it.
- Erik: Half degree PE layout.
- Erik: For our Dynamic lakes test I need to allow methane to be off, but use_nitrif_denitrif on for BGC mode. I wanted to only allow use_nitrif_denitrif/methane to be switched for FATES. Doing so would require putting together another test. What I'm doing now is that this is marked as a warning and the model dies in preview_namelist, but you can allow it with an ignore-warnings flag.
Bill is concerned that it needs to be all or nothing, in that modifying existing code can lead to conflicts.
Erik's proposal is that we have a GitHub action checker that marks a PR as not ready if it doesn't pass black checks. Then, if this fails, you can manually run "black" in your branch and push the results.
Then, if people want to, they can integrate black in their editor (or via a pre-commit hook, etc.), but they wouldn't have to.
Bill, Negin and Erik are all happy with this.
So there are two steps:
- Install a GitHub action that checks the code via black
- Add a Makefile rule that runs black on all the code under python/ctsm
Negin notes that there are a few options for running black:
black python/ctsm/
black --check python/ctsm/
black --check --diff {whatever…}
Erik feels we should not apply this to contrib scripts, at least initially. Bill agrees.
We thought it would work to remove the 360x720cru grid (in order to pare down the number of source and destination files we need to support), but it failed when trying to run a case with this. We think the problem is a difference between whether longitudes are defined 0 to 360 or -180 to 180 – so there are inconsistencies with the domain file (not just in the labeling, but also in what is grid cell #1 in the ordering).
It probably still makes sense to only have one, but we'd need to figure out how to make that one work.
We might have more success just supporting this with nuopc, since nuopc doesn't require a domain file.
Part of the issue is that there are 0.5 deg raw datasets.
Erik: It seems that everything in CESM goes 0 to 360.
Sam: Should we have an error check in mksurfdata_map that checks this early so that issues like this are caught early (when creating the surface dataset), before you get to the run?
- It seems like the important thing is consistency with the domain file (or the mesh file in the case of nuopc). We could think about enforcing consistency for regular grids, though it wouldn't make sense for irregular grids.
Conclusions: Sigh. It seemed like a good idea to clean this up, but this is turning into a big rabbit hole. So we'll probably leave it for now.
- Erik: Tools to create calling tree? Came up in CESM tutorial...
- Erik: What involvement should Peter have for new raw datasets coming in? Is he the scientific lead on this? When do we have him present to us? Should he be involved in the tools project with Negin and Sam?
We're not aware of tools, at least for Fortran.
Dave: if we had a tool that allowed you to dynamically jump between things that would be great.
Bill showed some aspects of this using Fortran language server (available in any editor) but not the full hierarchy for Fortran (as you can get in other languages like python).
There are a number of projects that are pushing towards 3km resolution, so we need to think about how to get a collection of raw datasets that can support that.
We'll have some more discussions on new raw datasets in general – who's involved, who (if anyone) will lead this....
For some work, Peter has put all of the raw datasets at the same resolution... we'd need to talk about that approach vs. what we currently do.
For CLASP, we'll need to come up with a characteristic length scale – which is essentially how connected different surface types are.
We had some discussion about supporting high resolution datasets in our surface dataset generation toolchain. At first this felt concerning. But actually, if we're talking about a relatively small number of resolutions, and on our own systems, then we can continue to limp along with the regridding toolchain we have. And even in terms of community support, as long as the number of grids isn't too great, we could treat this creation of surface datasets as a support we provide: that might be easier than trying to maintain OCGIS robustly for all users.
- Bill: Should we remove the statement in the ChangeLog that you need to check the PFS test? I do this because we said we would, but I don't find it at all useful: I feel like we'd really need to dig more into timing data to be able to make a useful comparison.
- Erik: Created tools/site_and_regional subdirectory and tools/tests for them as well as changing README files. I marked the old scripts as deprecated. Needs some more work to add unit testing and a description of the workflow using the subset_data and modify_singlept_site scripts and then we can remove the old scripts and tests for them.
- Erik: I marked as fixed in ctsm5.1.dev052 fixed two NEON issues (#1429 and #1446). Negin agrees with the second but disagree's that the first is completely resolved, so that should be reopened. Also note it looks to me like there are a couple of outstanding NEON issues, just making sure you are aware.
- Erik/Will: Will proposes having MIMIC history variables with MIMICS as a prefix. See my note in #1318. https://github.com/ESCOMP/CTSM/pull/1318#discussion_r687289582
- Erik/Will/Sam: We met with Keith Lindsay about the Newton-Krylov spinup methodology he implemented for POP tracers. It looks like in principle it's doing something similar to what's being done in CN-Matrix, but in offline python code. The diff for CN-Matrix is 25k lines, while Keith has put in a similar size for his python solver repository. To support it for us might be a few hundred lines of YAML code and some specialized pre-conditioner python code.
- Erik: Note, I'm going to create surface datasets for TRENDY-2021 with latest ctsm-dev version (ctsm5.1.dev052) rather than the release branch, since now the best mksurfdata_map version is on the release branch.
The ChangeLog suggests checking performance changes for every tag, but this can turn into a rabbit hole and not be too useful (you really have to dive into the detailed timings for this to be useful).
We won't require checking performance with every tag; we'll just check it if we think performance is likely to have changed.
Erik: Created tools/site_and_regional subdirectory and tools/tests for them as well as changing README files. I marked the old scripts as deprecated. Needs some more work to add unit testing and a description of the workflow using the subset_data and modify_singlept_site scripts and then we can remove the old scripts and tests for them.
Moved PTCLM there and made what's likely to be the last tag of that before it goes away.
Negin points out: It can be challenging to have a totally generic functionality, because different use cases require overwriting different fields, and you have different data available, etc.
Note that this also relates somewhat to the work Sam is doing to overwrite fields for the simpler models project; plan is to reuse code for those two purposes.
For MIMICS, plan is to add a mimics prefix (or something like that) to each variable.
(Something like this will be even more important when we eventually move all parameters to an on-the-fly generated netcdf param file.)
Erik/Will/Sam: We met with Keith Lindsay about the Newton-Krylov spinup methodology he implemented for POP tracers. (This is something that he developed himself.) It looks like in principle it's doing something similar to what's being done in CN-Matrix, but in offline python code. The diff for CN-Matrix is 25k lines, while Keith has put in a similar size for his python solver repository. To support it for us might be a few hundred lines of YAML code and some specialized pre-conditioner python code.
Will is considering this for MIMICS. Keith made it sound like the Newton-Krylov code can handle soil BGC spinup pretty easily. Will feels like this is a potentially promising way to handle spinup.
We would lose the traceability piece of the matrix code, though none of us knows how to use that anyway.
Another advantage of the Newton-Krylov approach is that there is in-house expertise, which isn't as much the case for the matrix code.
Plan is to test the Newton-Krylov with the BGC code. Can compare answers, look at speeds, etc.
For MIMICS, this is appealing because we can get the spinup method working for our current BGC code, then use that same spinup method for MIMICS - separating the concerns of implementing MIMICS and doing the spinup.
The way Keith does this is: runs ocean for some amount of time, then spits out restart file (for final state) and history files (with time averages). Then the Newton-Krylov solver looks at the states and tendencies, then analytically solves to try to get equilibrium.
You then do this for a few iterations. May need to pass vegetation states, not just soil pools.
One other argument for this is that it may work more easily with FATES (Matrix would have been really challenging with FATES).
One possible downside with this method is that you need to maintain the state transitions (i.e., what fluxes flow into what states) separately in yaml & Fortran; but the benefits likely outweigh this. (This ties in with the preconditioner, I think, but don't really understand this....)
We should probably get a presentation and discussion from Keith L at a CLM-science meeting so we can all understand this better.
Ryan's sense is that FATES isn't amenable to matrix methods, because there are so many nonlinear terms. That's what Yiqi has said about MIMICS, too, though Keith L has given Will the impression that this method can work for MIMICS.
- Erik -- Can PPE branch be updated without disrupting the science that's already happening with it?
- Erik -- Keith can you give me some guidance on the Medlyn branch?
Keith: Probably needs to redo a control anyway given that there is new science since the last control.
At this point, the purposes the PPE branch are serving are:
- It has the CN matrix changes
- It's serving as a stable branch for Daniel's work
Maybe the easiest thing for Keith would be to make a one-off branch off of master that adds:
- CN matrix
- LUNA daylength change
Plan is for Daniel to incorporate these changes, if he agrees with them.
- Erik -- Shift time of standup meeting to something Negin can come to.
- Erik -- NEON changed the format of one of their files that breaks our code. How do we prevent them from doing that?
- Erik/Greg -- Running on LBL's cluster. Load balancing (probably just have the SE's talk this one over).
- Erik -- Jim has some trouble with some fire datasets for NEON that isn't obvious to me.
- Erik -- Methane/nitrif-denitrif and FATES. Methane was added as an option for FATES recently by Ryan. But, we are discussing how nitrif-dentrif requires Methane, doesn't that mean FATES requires methane? Is FATES without soil decomp? Can Methane be turned on without soil BGC?
- Erik/Bill -- Jim would like to do a tag that fixes some problems in the latest tag for NEON. Obviously, this means without testing on cheyenne.
Feeling is that these have generally been useful. Will move time to right after the CSEG meeting (Tuesday at 3 pm).
NEON changed the format of listing.csv, which breaks our code.
It turns out that Will asked for this change, and told them not to worry about backwards compatibility.
Moving forward, we will ask them to maintain backwards compatibility, but right now things are in flux.
Besides the listing.csv issue, there are a couple of other issues that are preventing NEON cases from running right now (one that Jim and Erik are talking about? and a zbedrock issue that impacts some sites)... working through them.
Will plan to do a tag fixing these issues.
Methane was added as an option for FATES recently by Ryan. But, we are discussing how nitrif-dentrif requires Methane, doesn't that mean FATES requires methane? Is FATES without soil decomp? Can Methane be turned on without soil BGC?
Ryan: FATES works with carbon-only mode BGC active.
Since we always want nitrif-denitrif on, we should always turn the methane model on, with or without FATES.
Will asks: if you are running carbon-only with FATES, why do you care about nitrif-denitrif?
- Ryan: we don't, really. But the coupling needs to be set up for nutrients (for E3SM for now), so would like to have this active and tested in the coupling with CTSM, too (otherwise would need to do it later).
Bottom line is that we'll remove the switch for methane. (For FATES, it shouldn't matter one way or the other.)
Ryan: one other possibility is that you could separate the portion of the code that is needed for nitrif-denitrif from the rest of the methane model. Then you could just run that. But feeling is that that isn't worth the time.
- Erik -- Military time for local history option? Should there be more checking to ensure the interval is contained within the averaging window?
Currently set up as Lxxxxx where xxxxx is a 5-digit number giving seconds in day.
Erik wonders if it would be better to do this in military time instead of seconds.
We do see some advantage to specifying this in military time – but it could be good to keep it consistent with other ways of specifying time and with how this is implemented in CAM. Since feelings aren't strong, and we aren't sure how much this will be used, let's keep it as is for now.
Should there be more checking to ensure the interval is contained within the averaging window? Probably not necessary.
For CLM5, we limited bedrock depth to 8.5m for spinup reasons. Should think about whether to apply that limiting to NEON data.
We had a long discussion on the topic of 3rd-party library dependencies in our python tools. The basic conclusion is that we'll try to compromise: If a 3rd-party library adds significant value, we'll use it (with approval from the ctsm-software group), but if there is only minor benefit over what's available in the python standard library, then prefer to stick with the standard library and avoid 3rd party dependencies.
Negin asks about possible changes to inputdata file naming. She has two related proposals:
- Changing date stamp on the file to a version number (v1, v2, etc.)
- Putting the version number only on a directory rather than every file within it
Bill and Erik feel (2) could be feasible in some situations. Specifically, if everything in a directory is updated all at once, then it makes sense to put the date on the directory rather than every file within the directory. The possible issue is if you think you may need to update one or a subset of files within that directory and it isn't worth recreating the entire directory.
(1) would require a broader CSEG discussion. It seems like there are pros and cons of each....
So one way to decide what to do is: If you find a problem with a given dataset, will it be okay to just update the whole directory? If so, can version the directory; if not (because the files are big, for example), it may be better to version the individual files and not the directory.
- Dave: Crops in FATES-SP https://github.com/NGEET/fates/issues/760
- Bill: GetGlobalWrite
https://github.com/NGEET/fates/issues/760
Can we have FATES handle crops in SP mode similarly to what's done in non-FATES – i.e., treating crops as grasses, but with a different subgrid structure – one crop per column? This way crops can use the same physics as natural veg in SP mode.
We may need to use Ryan's new column-level logical flag in more places.
Need to check some other subgrid setup (e.g., initSubgridMod, subgridMod) and set up of filters, as well as any new idiosyncracies that are coming in with the FATES-SP PR.
Need to double check that there are no assumptions in FATES that there is only one column that it's operating on, particularly in terms of subgrid weights.
Rename: WritePointContext
Places to make sure we use this output:
- BalanceCheckMod
- CNBalanceCheck
- CNPrecisionControl
- Urban radiation
Could look at all endrun calls... but there are a lot, so maybe defer this.
- PR 1414
- Sunniva: news from Norwegian group
- Erik: Question from CESM Diagnostics discussion from Matvey "Can there be links added for diagnostic tools on the CTSM GitHub? The same way there are links to setups and tutorials in the description and on CESM2 page or CLM wiki? Especially the tools that are useful for the unsupported runs?"
- Ryan: When to use aliases? (both formal aliases and variables acting as aliases) Case study: https://github.com/NGEET/fates/pull/738/files#diff-e5e2459b995e00503077bce86074962e2acc4780ca00f248242b153527655c42R866
- Fates SP mode ERS failing exact restart comparison (https://github.com/ESCOMP/CTSM/pull/1182#issuecomment-867350137)
- All: Look at issues
- All: Go through upcoming tags and prioritization
How can we improve our onboarding process?
Negin suggests more step-by-step guide on joining different GitHub repositories, Slack channels, etc.
Sunniva likes the good-first-issue tag.
Can point to the postprocessing documentation. But porting that to other machines is non-trivial. Keith always runs the diagnostics within the context of the postprocessing. For a while, he was maintaining the ability to run them outside that context, but that became difficult.
So for now, unfortunately, we can't really support this outside of standard uses on cheyenne.
We can try to work with Matvey to find some solution that works for him.
ILAMB can be run on other systems. This can be run outside of the postprocessing.
Are there any thoughts / plans about replacement for the LMWG diagnostics package? Not really. It may be that what Negin is building up for NEON can be expanded and merged with what's happening with AMWG. Daniel is also interested. But so far there's no funding to do the needed work.
Some key takeaways:
- Use associates to do renames
- Don't change the name (to facilitate grepping the code)
- This generally applies to subroutines argument names as well
We generally feel that associates are helpful to shorten mathematical equations and make them more understandable.
Can shorten names like EDPftvarcon_inst
that are used in many places throughout the code.
- Erik: Can we keep the PPE branch up to date with main-dev? That will help with issues that crop up on the branch.
- All: Look at issues
- All: Go through upcoming tags and prioritization
- Erik: I think we should move the Monday "standing meeting" to Tuesday at 11:30?
- Erik: Should I delete Ryan/Greg's branches after I merge their PR? I do that for mine...
- Erik: Bright spot -- our process caught a problem
- Sam's priorities
- All: Look at issues
- All: Go through upcoming tags and prioritization
It would be helpful to have Sam work with Negin on the python toolchain work.
It would also help Negin to have Sam available to consult on some WRF-CTSM things.
It could be good for Sam to reserve his final hours on this project for this sort of consulting-type work.
We need to figure out some path forward.
Bill: can see three paths forward (there may be others):
- Use OCGIS, at least for now
- Come up with some way(s) to bypass the need for any super-high-resolution (e.g., higher than about 3' resolution)
- Develop our own tools, maybe based on ideas from how WRF does the remapping?
Sam had issues with OCGIS historically: every time a few months went by, something would be broken.
How close is OCGIS to working?
- Sam: the last he heard, the new metadata (area variables) we need should be included. Sam hasn't tested the new version yet, so it might require a little help with getting the final piece working. But it probably isn't a huge amount of work to get things finally working
Negin wonders if xESMF would provide the capabilities we need.
Path forward:
- Negin will reach out to see if xESMF would be an option (both to xESMF developer and getting Rocky's thoughts)
- If not, we will try to integrate OCGIS for now; long-term plan still TBD based on OCGIS long-term support outlook
Feel free to reorder items in the list. Put them in priority order rather than when entered. We always need a good chunk of time for prioritization and looking at issues
- Erik: We tried a go about doing a tag together with Ryan and Negin. Was it helpful?
- Erik: We tried a round of a standup meeting with Bill, Naoki and I. We did keep it short and Bill and I worked on something together afterwards.
- Erik/Ryan: Plan to meet on Friday to go over his PR about FATES running Ch4 which starts introducing things that will be needed for running FATES with crops also. Anyone else want to participate.
- Erik: I'd like to look at improving our process on some regularity. How about we schedule a time every 3 months or so? In this meeting? Separately?
- Bill: For Sam's PR: should he update surface datasets to keep things consistent?
- All: Look at issues
- All: Go through upcoming tags and prioritization
Mixed feelings on this. But it sounds like it will be a while before we're ready for other surface dataset updates, so it might make sense to have Sam go ahead and create new datasets now so we have baselines for other changes.
Though actually, we could wait a little bit to see what happens with the workflow project, in case other answer changes to the surface dataset might be imminent.
Here are the Better Scientific Software Tutorial talks I recommend everyone watch. The entire series is below on the April 29th meeting notes. Note if you start watching the first it will normally queue the next part in he series if you have autoplay on. Watching all of these will take about an hour and a half.
- Part1 https://youtu.be/msWtgEw2VhY (18 minutes) Introduction "Better Software to increase scientific productivity" Science through computing is only as credible as the software the produces it. Challenges and best practices for scientific software.
- Part2 https://youtu.be/7bOblsb-6OA (21 minutes) Agile Methodologies for Software Development. There are other methodologies, but Agile is the closest to what we actually do, and makes the most sense for long term scientific projects. We would benefit from implementing some of the lessons learned in Agile methodologies (There are multiple versions that go under the "Agile" umbrella).
- Part5 https://youtu.be/yv4s01_VXWA (19 minutes) Scientific Software Design. Design principles for HPC Scientific software. Separation of concerns.
- Part8 https://youtu.be/8P638BH6PPs (20 minutes) Improving Reproducibility through better Software Practices. Why this is critical in scientific software. The credibility of your science depends on the credibility of your code and software practices.
- Part9 https://youtu.be/2zZeRdUqrIE (8 minutes) Summary. Work on your pain points.
Bill really likes the idea of having an onboarding checklist / guidance. This would include both workflow things (familiarity with our git process, testing, etc.) and aspects of the code that we feel are important to understand before an SE can be independent (getting familiarity with adding things to namelists, filters, adding history fields, etc.). Idea would be to have some process for bringing new developers up to speed, and having a sense when they have learned the essentials to be fairly independent.
- Scientists could also benefit from this, though they may not need to go through all of it.
Greg: in a previous position, they developed an onboarding process. Part of that was identifying some tasks that would be good for new people to take on.
What about onboarding of users? That's important, too, though probably needs a different set of guidance.
Part of the issue is that, while we have a lot of documentation for new developers, it can be hard to navigate it all. So part of this could be better organizing our documentation, and/or having a page that points to various guides.
Different categories:
- Software Engineer
- Scientific Model Developer
- Scientific Model User (User would include papers and UNIX guide, mailing lists meetings, etc.)
Greg: I'm assuming CESM-Lab will be a helpful onboarding tool as well. Containerization of ctsm/fates has been helpful in shortenting the time to first run for new post docs
Ryan: we have this walk-through that is based off of the CTSM tutorial workshop, I hope this has helped and facilitated on-boarding new users: https://github.com/NGEET/fates/wiki/Running-FATES:-A-Walk-Through,-February-2019
Some suggestion of having some guidance on editors, including pointing out various modern editor features that can help with productivity.
Could mention different editors (https://andreyorst.gitlab.io/posts/2020-04-29-text-editors/ is a helpful rundown), and possibly have example setups of editors. See also https://fortran-lang.org/learn/os_setup/text_editors
We'd like to have more continuous testing. However, it is challenging to make this work with our workflow that so heavily relies on baseline tests. This requires further discussion.
Unit testing and test driven development can be very useful. But writing maintainable unit tests can be very hard. So Bill isn't sure if it's worth trying to get scientists trained and experienced enough to do this.
Negin suggests code coverage tools.
- Erik: at the tutorial: They talked about using GNU gcov.
- Negin: see https://drive.google.com/file/d/1U-CVIpaDjbEqPiEjn2964-7X1SWDxQ8r/view
Bill's thought afterwards: Stefan Muszala spent a lot of time trying to get code coverage statistics for our test suite a while ago. The end result was much more confusing than helpful and we gave up on it. This isn't to say we shouldn't try again, but I'm wary of this becoming another time sink based on this experience.
Sunniva points out that it would be useful to provide some training / guidance to scientists on the kinds of things that are worth manually checking. For example, if you decrease a certain parameter, then you expect X output variable to go up.
- And Dave points out: you need to do it again at the end of the process, as the code is getting finalized.
Dave also suggests leveraging the diagnostics package and putting some tooling around that to screen for large changes in certain variables.
Dave: the other thing that scientific developers don't do is: They tend to just test in the configuration they're interested in. So having them run a few compsets in different configurations (e.g., with a short test list).
Adrianna: Also in FATES it would be nice to set up some parameter checks for “realistic, won’t crash the model” bounds. This could be run before the job even gets submitted.
- Erik: For CTSM the PPE project will likely help some of those.
Idea of being able to run parameterizations interactively.
Ryan points out that FATES does this: https://github.com/NGEET/fates/tree/master/functional_unit_testing/
Adrianna has found it very helpful to convert her tasks to user stories. She notes that user stories are higher level than tasks – but may have tasks under them. (Adrianna uses https://clickup.com/)
One question is whether we should include reviewing FATES issues in the CTSM-software meetings.
- Ryan supports this
- Erik suggests: I think we cover this by creating ctsm issues for fates issues we want to discuss and have ctsm input on.
Dave suggests (and others agree): make sure we have plenty of time for prioritization of upcoming tags: let's leave more time for that.
We sometimes go on too long on a given topic. Possible solution: Dave will tell us to stop when we've go on too long, since he's good at that. We want to make sure we leave enough time for going through PR / issue priority (at least about 1/2 hour).
Also, consider covering weekly agenda items by priority instead of just the order they were added?
Should we try something like scrum – or more generally, a short daily check-in / stand-up meeting? We see that there could be some value in that. But that is most useful when people are all working on the same big task.
- Greg: one of the big values of scrum is giving the opportunity to share roadblocks, particularly when your problem involves interfacing with other groups.
- Erik proposes trying this with a 15-minute Monday morning meeting. Mainly SEs, but scientists can join if they want to.
Possible idea of: a couple of times a year, setting aside a week or two to all go through the issue backlog: dealing with ones that can easily be dealt with, closing others.
-
Ryan/Will: FATES, nutrients & CH4 in E3SM
- Short term, do we already require CH4 in BGC runs?
- Medium term, similar PR in CTSM?
- Long term, decouple CH4 from nutrient code?
-
Negin: For the purpose of CESM-Lab: Are the Neon PRs going into cesm2.2 or cesm2.3? or Do we like to have a separate version of CESM-Lab for Neon?
For nitrification / denitrification, you need ch4 turned on. We need some changes so that this is done correctly with FATES. Ryan has some changes just about ready.
There are a couple of pieces of this:
- FATES needs to provide the necessary fields to allow the ch4 model to run (this wasn't as bad as Ryan originally feared)
- Nitrification / denitrification don't work correctly in the model if ch4 model is off, because the methane model is responsible for soil oxygen calculations. E3SM silently allowed this; Erik thinks CTSM has an error check to prevent running this way.
Long term, Will wonders if we should take soil oxygen concentrations out of the methane code – but that would be a big project.
Erik points out that there are namelist items for turning nitrif_denitrif on/off as well as methane on/off. This was for the sake of supporting CN mode. But since we're deprecating CN mode, should those be hard-coded to on?
- Will: probably, but need to think about how MIMICS would interact with that. Probably MIMICS would use the same ch4 & nitrif_denitrif schemes.
Ryan asks if there is any liability with having ch4 on in a coupled run? No, because we don't pass ch4 to the atmosphere.
NEON support will be in CESM2.3, not CESM2.2.
Erik asks if we should have additional testing for CESM-Lab. Negin doesn't think that's needed.
Erik will use v3 of the 0.1 deg ocean mask, which puts the Caspian Sea on land.
-
Erik/Greg: Wondering if there could be a project page for the single-point work? (https://github.com/ESCOMP/CTSM/projects/34)
-
Erik: Short term tool chain work vs. the long term project. Let's coordinate...
-
Erik: Updated CTSM-FATES wiki page. Should Erik keep doing last bit for FATES tags? Does Greg want to this on his own?
-
Will: Validation of CESM/CMEPS/CDEPS/nupoc configuration (to be used in CESM2.3), compared to default CESM2.2 cases (mct-data-model), discussed at co-chairs. Who's doing this? What do we need to communicate to model users?
-
Erik: half degree lake problem. Was there a reason to use the OCHIDEE landmask? Is using 10th degree ocean ok here?
-
Erik: Standards for input files that AMWG has. I suggest we adopt something similar. https://www.cesm.ucar.edu/working_groups/Atmosphere/required-meta-data-amwgcam-datasets.html
-
Erik: drydep design going on. Currently land unit types are independently being read inside of CAM (with different datasets than the CTSM subunit structure). This needs to be handled in a different manner with NUOPC. Will likely need to send land data to CAM.
-
Negin: I am wondering if there could be a neon folder (for example under python) in CTSM repository, which includes the sites and scripts for running over all the sites, etc. It has some advantages and disadvantages... I am not sure what would be the best way forward...
-
Negin: Taking subset_data.py out of tools/contrib... what needs to be done, should we pursue this?
-
Negin: Does CIME have any python infrastructure for submitting jobs on Cheyenne, Casper, etc.?
-
Erik: The SEA conference recordings are now available. Here are the Better Scientific Software Tutorial talks. Note if you start watching the first it will normally queue the next part in the series if you have autoplay on. Each part is about 15-20 minutes, last one only 8 minutes. Slides for all are here: https://bssw-tutorial.github.io/events/2021-03-25-iss.html
- Part1 https://youtu.be/msWtgEw2VhY Introduction "Better Software to increase scientific productivity" (ALL) Science through computing is only as credible as the software the produces it. Challenges and best practices for scientific software.
- Part2 https://youtu.be/7bOblsb-6OA Agile Methodologies for Software Development. There are other methodologies, but Agile is the closest to what we actually do, and makes the most sense for long term scientific projects. We would benefit from implementing some of the lessons learned in Agile methodologies (There are multiple versions that go under the "Agile" umbrella). (ALL)
- Part3 https://youtu.be/7VpYJIsr8bc. Git workflows. (This is perhaps one only the SE's should watch?)
- Part4 https://youtu.be/Rt7qF_lOCNE Software testing. Verification vs. Testing vs. Validation. Code coverage tools gcov. (All, but some parts are mainly SE)
- Part5 https://youtu.be/yv4s01_VXWA Scientific Software Design. Design principles for HPC Scientific software. Separation of concerns. (ALL)
- Part6 https://youtu.be/23DKQx1FRRg Software Testing Part II. Balancing test suite. (Should just SE's watch?)
- Part7 https://youtu.be/bbtBf_5w67Q Refactoring Scientific Software. (All, but some parts are mainly SE)
- Part8 https://youtu.be/8P638BH6PPs Improving Reproducibility through better Software Practices. Why this is critical in scientific software. The credibility of your science depends on the credibility of your code and software practices. (ALL)
- Part9 https://youtu.be/2zZeRdUqrIE Summary. Work on your pain points. (ALL)
All talks: https://sea.ucar.edu/conference/2021/agenda
The NEON work is a piece of the single point work, though not all of it. Note that some of the NEON work will be more generally applicable.
Erik, Negin & Jim have been putting in some capabilities that, while targeted to NEON, are generally applicable to any single point run.
In terms of making a separate single-point project: feeling is that's unnecessary. Let's stick with the existing NEON project. Can add separate issues for things needed for single-point FATES as needed, and can add those to the NEON project board if applicable.
Negin & Sam are focused on fixing a few remaining things. mkmapdata.py is working.
One issue is related to job submission. Bill, Negin and Erik have all written some job submission stuff. We should ideally settle on one set of infrastructure to handle this. There is some overlap with what cime does, though cime may have too much machinery (and being tied to a case) for what we need here.
Keith is going to do a climate-length run of this to validate CMEPS against MCT.
Erik: Sam's recollection was that we used a landmask that was needed for the specific runs for which that grid was created (TRENDY). But it sounds like that reason is no longer important.
Dave raises the question of whether the CRU data are available over lakes, but his sense is that they do.
Currently land unit types are independently being read inside of CAM. Erik suggests that we instead have CTSM pass the vegetation fractions.
Dave suggests getting Keith involved in this discussion.
Thought is:
- Top-level scripts under tools/neon
- Python modules under python/ctsm/neon
Some rationale:
- Keep python/ctsm focused on a python module/library
- Top-level scripts should be in a location that doesn't have the language in its name
We probably do want this moved out of tools/contrib.
To move this out of tools/contrib, we should have some kind of testing done with it.
We could have tools/subset_data, with the internals then moved into python/ctsm.
Let's aim for the meeting two weeks from now to talk about this.
- Will/Erik: Can we remove soil CN modules, testing, etc (the CN versions as opposed to the BGC versions which are CENTURY based)? This relates to #1340.
- Erik: Note I had a PR with a checklist item in a resolved conversation so it was hard to find without Bill's script gh-pr-query. Should we provide more support for that tool?
- Bill: I'm curious: I have trouble finding checklist items even in non-resolved conversations; do you have a way to find those without this tool?
- Erik/Bill: Should all tags have a PR?
- Bill: I don't mind doing this if others would find it more useful than bothersome (getting the extra notifications); I'd propose simply copying the ChangeLog entry into the PR comment immediately before merging.
- Erik: Accumulation variables happen even when they aren't needed. I think this is probably OK as it would take work to do it in a reasonable manner. The straightforward way would be to add logic embedded into the base classes that do the accumulation -- but this would be a mess.
Will: The old CN modules (from CLM4) aren't as good scientifically – for example, CN not vertically resolved. His impression is that few, if any, people are using these. Maintaining this is a maintenance issue, and makes the parameter file more confusing.
General feeling: probably fine to remove it.
- Wrap up interpolation issues with Bill
- Try to wrap up outstanding mapping PRs (now that the ESMF group has provided some fixes)
- WRF-CTSM sensitivity (NWP vs. SP): main thing is making sure the SP run we did is correct
- BGC spinup?
- If time, work on coupling to a different atmosphere model like COSMO?
Maybe not necessary.
One thought is: for accumulation variables specific to some science parameterization, have the accumulation variable stored in the science module that needs it (rather than in some central place like TemperatureType).
- Greg/Erik: Automatic testing of fates test list when tags made on cheyenne?
- Greg/Erik: FATES tag about snow
- Erik: FYI: mizuRoute tests are restarting correrctly now! Will call it's test list mizuroute. Why did we get away from the "aux_" prefix?
- Erik: Upcoming tags, does NEON have priority above others? Also our next CTSM tags go into CESM2.3.alpha04 now right?
- Bill: l2g_scale_type follow-ups
- Better name?
- Named constants
- Will make a project
- Priority?
- ERI issue?
Three purposes:
- Better coverage of FATES in our testing: may be handled by adding to the aux_clm test suite
- Having baselines available
- Catching things earlier
Question about how this will work in terms of workflow... thinking about how we moved away from post-tag testing in the past.
But it sounds like this would have value in this case: we would expect these tests to pass almost all the time (so not necessary for the person making a CTSM tag to check the results), but it's useful for the sake of baselines.
And actually, they would do something a little different from just testing on the latest CTSM tag: they would probably want to plug in the latest FATES tag to the context of the latest CTSM tag.
Some things they could leverage:
- cime nightly aux_baseline testing run by Chris Fischer
- auto-emailer cron job that looks for new tags
- Jim Edwards may have some similar things
Overall feeling: this seems worthwhile to move forward with.
Rename: landunit_mask
Agreement about moving to named constants
- Erik: Move talking. about. Better Scientific Software two weeks down. Hopefully, recordings are done by then.
- Erik: Idea from SEA Tutorial -- work on one "pain point" in the SE process at a time.
- Erik: Highly recommend the SEA Tutorial on Improving Scientific Software. We should consider watching the recording together. They gave out some resources as well that we might benefit from. I should likely go through it first.
- Erik: Another idea they had was to have checklists for on boarding and off boarding of people that contribute to the project. We have lots of people outside NCAR that contribute that we need to spinup.
- Erik: Another idea they had that I think maybe we have been able to accomplish is separating code into "fast changing" and "slow changing". People have to meet a higher bar to contribute to the later. But, also trying to separate code into parts for different domain experts. So software engineers work on parts having to do with parallelism and scientists work on the parts that directly have to do with their science for example. If the code is mixed it's hard for anyone to accomplish anything. Overall I think that's something we've been able to do.
- Bill: Following up about the isotope email
- Erik: Thoughts on GPU's for CTSM? Negin will be attending a workshop, will be good for us to hear about it.
GPUs make most sense when you have small kernels of code that are responsible for a large portion of the computational cost of the model, ideally without much conditional logic. Uncertain how well this would work for CTSM.
- Erik/Bill: We need to start transitioning to using the NUOPC driver
- Erik: PRTAllometrixCNPMod.F90 issues see #1129. Is there something beyond FORTRAN 2003 that is used here?
- Erik: FYI. Matrix spinup sequence is pretty good right now and consists of the following: CLM_ACCELERATED_SPINUP="on"; CLM_ACCELERATED_SPINUP="sasu"; use_matricn=T,use_soil_matrixcn=T; normal mode
- Erik: Should we turn reseed_dead_plants to true when spinup is done? Which steps should that be automatic for?
- Bill: FATES running mean: does this allow for restarting in the middle of an interval? It looks to me like the implementation isn't amenable to this.
- Bill: type labels on issues (one type; types that are green with a leading
-
) - Bill: is the wiki page on answer changes still worth keeping up to date? If so, I can go back through the last 20 tags and update this.
Over the next few months, we'll be migrating to using the NUOPC driver.
The NEON single point workflow will only work with NUOPC.
Some things still needed for NUOPC:
- Validation
- Maybe some remaining performance things
- coupler aux history files
When should reseed_dead_plants
be done? Dave: Only when going in to AD spinup by default. Thinks this should not be done in "sasu": if they can't survive AD spinup, they should be dead.
Erik suggests setting reseed_dead_plants
to true by default for the start of AD mode. Dave thinks that is reasonable, and may already be done. But he also notes that we're not at the stage quite yet of codifying this.
We're not sure whether the post-sasu step just needs to be one time step, or longer.
We had some discussion about needing to support mid-period restarts.
Ryan plans to make things default to the exponential approximation of a running mean like we do in CTSM, but provide an option to do a real running mean.
- Ryan's presentation on software practices
- Erik: When should we move from "master" to "main" for github? Should we send a message to ctsm-dev talking about it? Any other list to send to? I talked about this for CESM in a CSEG meeting, different components are doing it at different times. CAM and MOM have already done it.
Ryan: a big positive of CTSM is that there is a mission-driven mindset (as opposed to a bunch of independent mercenaries who just are focused on their own work).
Ryan's suggestion: zoom office hours, where people can join if they have questions
Dave raises the question: finding the balance of who should be attending software-oriented meetings.
Some questions of workflow: We've been relying on the GitHub PR workflow. To some extent, this has had positives. But there are also problems with this asynchronous workflow, where feedback tends to happen late in the project.
Sunniva points out that a lot of scientists lack some basic software development skills – a big one being git, but also Fortran as well as the CTSM code base. So there's a lot that people need to learn before they can even get to the point of opening a PR.
Dave: we can consider having a session at every working group meeting talking about git / software development workflow.
Sunniva and Bill (and others in Sunniva's group) both felt that the pair programming last week (for ozone) was really effective (and fun!).
General agreement that we should try more pair programming.
Question of at what stage pair programming would be most effective – given that science development often involves a year or more of experimentation.
- Maybe the answer is to be involved at a few stages: in the initial design then a couple of other times along the way.
- Also, to some extent we should just try some things and see what works.
Also continue to build a focus on rewarding code development – e.g., through the Andrew Slater award.
Whenever is useful for people using CTSM-FATES.
Note: Could run just the subset of the aux_clm test suite that is FATES tests (e.g., by using query_tests, grepping for fates and dumping it to a file that can be run) – but then you need to set up your baseline directories with sym links to the old baselines.
Maybe a good time frame for this is when we wrap up CTSM5.1 and make that switch. The main thing that needs to be done is changing anything needed in the documentation.
- Erik: High resolution struggles. Pat C. is working on 3-km global datasets. The landuse time-series file will be 1TByte. Since, the data that it represents is really only smoothed data from a much lower resolution this is kind of a waste of disk space. I'm not sure the answer to this, but we may need to think about a different way forward for very high resolution work than what we currently do.
- Erik: CN-Matrix. The spinup is done by saving the starting state on restart and then it impacts the solution every nyr_forcing years on calendar change. This should be refactored to use the accumulation capability which is more general and able to work on restart and etcetera. As implemented now it would require adding a new "instant" option to the accumulator.
- Erik: Another issue with CN-matrix is some of the spinup modes that we are no longer using -- hence we aren't testing. But, there is code specific to how they are handled so if we want these other modes to work we should test them. Or perhaps remove them if they aren't needed (this is the iloop stuff). Chris thought the "fast mode" spinup isn't useful (where it updates every year) because the differences in 20 years of climate is important.
- Erik: How should we handle c13/c14 arrays in the Matrix modules? On izumi in order to have them in associate statements they need to be allocated, but they aren't if use_c13/use_c14 is off.
- Erik/Naoki: mizuRoute. We have a f19-MERIT-Hydro configuration working. I wonder if CTSM is too course for the MERIT-Hydro grid? This configuration is such that mizuRoute becomes the bottleneck. One thing that needs to be done for mizuRoute is to run hybrid parallelism (MPI tasks and OpenMP threads). Technically CTSM isn't quite as good with OpenMP than with MPI, but if mizuRoute is the bottleneck that will be the way to run it.
- Erik: Was there anything that came up in the LMWG meetings that we should discuss?
- Erik: Turns out the cmip5 simulations were spoiled because they started in 2006, but the start_year for streams was set to 2015. I'd like to think of ways of preventing this type of problem. The main thing I've come up with is to make sure streams are set to "limit" rather than "cycle", so once they get to the end they don't go back to the beginning. The extend option is a little better, but if you set the start year wrong, it'll use that value until you get to the start year you set.
- Erik: NEON work. Jim thought of a great idea to overload the PTS_MODE option in cime so that you can use it for single point simulations so that you don't have to create domain files for single point simulations. This will require changes in cime as well as CTSM. See https://github.com/ESMCI/cime/pull/3868.
- Erik: izumi is now really hard to work with because most simulations fail the first submission, so you have to resubmit many times. It used to be that only a few would fail, and a single resubmission was enough, now it's constant. I sent a message to help at CGD ISG, and haven't heard back.
- Negin: WRF-CTSM questions on CESM forums. Do we want to keep WRF-CTSM threads under CTSM, CLM, MOSART, RTM in CESM forums?
1 TB of data for landuse time series for 3 km runs.
Would it make sense to use streams and interpolate on the fly? One issue is that we want the landuse timeseries to be consistent with the surface dataset in terms of our custom interpolation, at least for the PCT fields. Things other than PCT could be well suited to streams files.
Dave also questions whether we need transient land use for a 3 km run: they will probably run for a few years at most. This is probably the answer for now.
For now, we can open an issue and tell the high res group that we need resources and/or time to figure this out long-term.
There are new high-res soil data at 250m. We should consider pulling that in. (They also have a lower-res version.)
This also has higher vertical resolution than our current product. We may want to use the native soil layers then let mksurfdata_map map them to our soil layers. (Our current soil data has an intermediate soil layer structure that no longer really makes sense.)
This also includes pH, which eventually may be important.
Should we think about streams for soil texture? Dave thinks maybe no, long-term: We may move to more sophisticated treatment / pre-processing of soil variables which may be better suited for preprocessing in our surface dataset generation rather than using a streams file.
Erik's intro: The spinup is done by saving the starting state on restart and then it impacts the solution every nyr_forcing years on calendar change. This should be refactored to use the accumulation capability which is more general and able to work on restart and etcetera. As implemented now it would require adding a new "instant" option to the accumulator.
Should this be changed to use some accumulated / averaged quantity rather than instantaneous? Dave points out that, for many of the fields, which are states, the final value is actually better than a 20-year average. Leaf carbon is an exception, as is litter to some extent, but they may be relatively less important than some of the other pools.
Erik was wondering if he really wanted an average but just did what was simplest to code. But now it sounds like maybe the instantaneous is okay or better, at least for some of the fields.
We may need to discuss this further with Chris, Yiqi and others.
(See above for context.)
We'll defer this discussion.
nag complains when you try to associate a variable with an un-allocated array, which is the case for things like ciso variables.
For at least the example we looked at, feeling is it's better just not to have associate statements (or pointers): just access the variable directly (rather than having a bunch of associate blocks throughout the code).
Ryan also points out: maybe nag is being an overprotective parent here and we should find a way to bypass this particular check.
Still need to work out what we'd want to do here.
MizuRoute runs best hybrid, so we'd want to take advantage of the new capabilities Jim has been putting in to allow CTSM to run with a different number of threads than mizuRoute.
For now we should try to find a configuration at 1 degree that gives reasonable performance, even if that means degrading the mizuRoute grid. Then can iterate on that. For now the focus is on getting lakes connected, etc., so that Inne can move forward.
Current plan: If Farshid has time to work on this, then we can have the conversation about how / if we want to do something about this.
This will partly involve some evaluation of whether scientific advantages come from lateral transport or something else (e.g., that could be realized via hillslopes).
Whether or not we use gridcell-to-gridcell communication for hydrology, though, we'll probably eventually want it for other purposes: seed dispersal, beetles, fire, etc.
https://github.com/ESCOMP/CTSM/issues/1229
Sunniva is going to talk with Lei to try to move this forward.
One thing in terms of process is: we should maybe be better about replying to an issue offering encouragement (or not) when it's filed.
Erik: Turns out the cmip5 simulations were spoiled because they started in 2006, but the start_year for streams was set to 2015. I'd like to think of ways of preventing this type of problem. The main thing I've come up with is to make sure streams are set to "limit" rather than "cycle", so once they get to the end they don't go back to the beginning. The extend option is a little better, but if you set the start year wrong, it'll use that value until you get to the start year you set.
Currently:
- ndep is set to cycle; this is the worst issue
- urban & pop density are set to extend (which in this case leads to a small issue, but not a huge deal)
Cycle probably makes sense for a spinup run, though. Can we make the default differ based on whether you're doing a spinup run or transient run?
In terms of limit vs. extend: Dave isn't sure what to do. With limit, we may constantly be fielding questions, and extend may not be too bad.
The solution may be to develop and maintain a checklist of gotchas.
Jim had the idea to overload this for single point simulations so we don't need domain files for single point. Erik likes this idea.
- Dave: Recap of energy exchange discussion.
- Erik: NEON work, when we update surface datasets will we update the NEON tower sites at the same time? Will that be someone else's responsibility? Jim has a few NEON sites setup is that list going to grow to many many sites? How many? I assume the site location info. (location size and lat/lon/area/land-frac) won't change, but surface datasets should change with each general update?
- Erik: NEON work, I'd like to have this align with user-mods and CLM_USRDAT_NAME. So we aren't adding a new solution alongside already existing ones. Who should we work with on this? I'm going to talk to Jim tomorrow about this.
- Will: Need to create F-compset for CESM2.3+ for testing of CTSM5.1 in coupled simulations:
- Evaluate BHS effects in coupled simulations on mean climate and diel cycles.
- Development of new dust model inputs to CAM
- Needed for dycore testing (by April?).
- Eventually for CESM-FATES simulations.
- Erik: Some code style questions. Look at issue #1284
See https://github.com/ESCOMP/CTSM/pull/1249
Thoughts there are relevant for the more general case like hillslope (multiple columns, each with multiple patches).
Dave suggests: As a first implementation, maybe we just don't allow transient land cover combined with multiple vegetated columns. We're comfortable with that.
Scientifically, there isn't a right answer as to whether you have one pft per column or all pfts on the same column. But it's nice to have the flexibility.
Because of how MOM6 works - modeling water mass explicitly - it needs to track energy coming in. From the CTSM perspective, this relates to heat coming in via rivers. The other piece is tracking heat through precipitation.
For the precipitation piece: CAM will think about what it would take to track the temperature / energy of precipitation. If they are able to do this, then we may need to give some thought to how we want to handle it. Right now we don't explicitly model the temperature / heat associated with a few water pools. So we would either need to explicitly model the temperature of those pools or come up with a way to work around that (though, in the long-term, the latter could end up being as much effort as doing it right).
When we update surface datasets, will we also need to update NEON surface datasets? Will: probably yes, but probably by pulling from a default surface dataset. This workflow will be developed. Long-term, it would be great if that workflow is incorporated into the surface dataset creation; or maybe the NEON datasets would be created on the fly with CESMLab when you create a case?
Erik thinks NEON will differ from what we have done before for single-point: For other single point cases, the onus has been on the user to update their single point data set when needed. We may want something different for NEON.
Bottom line is that the plan for how to maintain these datasets hasn't received a lot of thought yet.
There will be around 40 sites. Creating the surface datasets for those 40 sites should be very fast, so it would be reasonable to add it to our automatic surface dataset generation script.
One next, more involved activity is allowing users to run the hillslope model for each site. But that is something beyond year 1.
An argument for a more user-defined workflow is that some people may want to apply this to other, non-NEON sites.
Concern with the addition of yet another method for single point. Probably want to consolidate this so there aren't so many different ways to run single point.
CAM is creating at least one F2000 compset with CTSM5.1 (at least, that's the plan).
Probably use compset long name for things that aren't going to be too frequent yet. For initial testing, we should just go ahead and run with the long name for now.
Our #1 priority should be code readability, unless we have a strong reason to think that efficiency matters for that part of the code.
- Greg: Nag compiler test failures on izumi: https://github.com/ESCOMP/CTSM/pull/1264#issuecomment-772851390 The nag compiler is aborting with a panic for a new FATES module. We've tried the updated nag compiler, and are investigating if earlier versions of the module work.
- Erik: Changed labels for FATES.
- Erik: FATES tags. Greg will run normal testing and fates test lists (and the fates test list on the baseline). FATES users should be encouraged to only use the CTSM tags that had the full FATES testing done on them.
- Erik: Should we add something to a tagname to denote that FATES testing was run on it? Or an easy to see checkbox in the ChangeLog file? Techically anytime the FATES science version is updated answers change when FATES is on -- when do they click the "answers change significantly in various versions"? Should there be an extra box for answers change for FATES versions?
- Erik/Sam/Will: Should CENTURY vs. CN vs. MIMICS soil BGC type become Object Oriented code as a refactoring? My thinking is not, but it is sort of a psuedo-OO setup as it has a base "class" module that is then extended in specific ways for the CN and BGC particular variants (DecompCascade modules). These will be extended with a MIMICS version and a MIMICS+ version.
- Erik: This issue: https://bb.cgd.ucar.edu/cesm/threads/issue-when-resubmiting.5879/ https://github.com/ESCOMP/CTSM/issues/1269 The long term solution will be to get the "v2" version. Do we just have a warning for 2011-2013? Do we make the default to stop at 2010?
- Erik: Something we've done that was bad was to NOT put units on restart variables. Hence, new code that comes in doesn't usually have units on it either. I think this is bad, although it would be better if restart and history files shared the same long_name and units. In any case when we add new variables to restart files should we add their units or not?
- Erik: Water isotope issue on the PPE branch. It shows up on the last commit on PPE, but doesn't show up before that, or on the WUE PR on it's own either.
- Erik: Longlei will need some help with the dust model changes, because it changes CTSM, CAM, and coupler.
We'll just delete all nag non-debug tests, since this doesn't seem to have much value, and we hate to waste time on trying to track down a compiler issue.
Dave's feeling is that the data aren't very good over Antarctica anyway, so you shouldn't be looking closely at what's happening there. So this should just be a warning.
There's no real talk of having a new version.
We could eventually fix it so there at least isn't a step function over Antarctica.
Let's put units on restart variables moving forward.
This involves moving some stuff that was in CAM into CTSM, then passing new fields through the coupler. So this requires some coordination.
Let's get Keith's eyes on this in addition to Erik. But let's not get caught up in the cime and CAM work ourselves.
- Bill: Slack thoughts/questions
- Erik/Sean: BalanceCheck talks about "urban model" stopping. There used to be separate statements for urban and non-urban, is the latest correct? Or did one section get lost by accident? Just want to make sure it's right.
- PR #1264: fates_main_api merge into ctsm master
- Post-1264 merge: diversification of fates testmods for
aux_clm
? - Erik: Note that right now I need to approve the FATES for PR, do the merge and make the tag. Do we want Ryan and Greg permission to do that? Or should we keep me doing the final part for a few more tags?
- Erik: Note, that bringing FATES tags to CTSM main-dev we will be more picky. Which in the longrun is better, but might be annoying at times. :-)
Greg is working through Erik's review.
Should we have the aux_clm test suite diversify the FATES test mods somewhat? Part of the point would be to catch anything that changes answers for alternative FATES configurations when something in CTSM changes.
- Bill's gut feeling: start by seeing how much we can get by diversifying the existing fates tests in the aux_clm test suite (rather than having them all test effectively the same configuration).
This question came up with respect to FATES, though could also apply to Sunniva, etc. Currently, the set of people who can actually push to master is very small, and limited to NCAR SEs.
Bill: torn between a desire to keep the process efficient (which partly means avoiding unnecessary hand-offs) and wanting to keep the number of people who have push access to master very small, and limited to NCAR SEs.
For now, we'll keep this final step done by NCAR SEs, but we can consider opening it up in the future.
Bill notes: the bigger piece of efficiency is deciding when a PR review is needed. If we can decide that (for example) code that has been approved by Ryan or Greg doesn't need a separate review from NCAR SEs, that will have a much bigger impact on efficiency than letting them be the ones to hit the "merge" button.
-
I hope this won't be a substitute for commenting in GitHub issues, because I've found it VERY helpful to have everything related to an issue in one place for later reference
-
If people want me to be engaged in this, I will need a tutorial on how to use Slack effectively, without it filling the workday with constant, irrelevant interruptions.
Erik was at first concerned about efficiency, but found this really effective for working through MizuRoute problems.
Negin feels like Slack and GitHub serve different purposes. Slack is better for synchronous communication.
Overall feeling: Erik & Negin raise some good points about a niche where Slack can be useful, and addressed some of Bill's concerns (integration with GitHub, threading conversations, etc.). For now we're keeping this open as a possible communication mechanism as an experiment. But the intended use is for things that would otherwise typically be done via in-person discussions. In particular, there is general agreement that it's very useful to have conversations on GitHub, so we should continue to put anything that may have long-term value there.
Grasses are dying with BHS. Unclear why this is. It may be due to a different change Sean made with respect to the LAI that impacts sensible heat flux. It may also be an interaction with the trees in the column.
Looking into it... if can't solve it, may just consider that this is a feature of the new model, and fix it in the context of getting the whole new model working well.
- Erik: How do we move forward with changes needed for CTSM5.2 surface datasets? Is if OK to advance mksurfdata_map disconnected from current surface datasets? We should only make new datasets when CTSM5.2 is ready to go. See #1252.
- Erik: Looks like Longlei will need some help to bring in changes to dust model.
- Will & Sunniva: #1260 soil decomposition model options [high latitude focus]
- Bill: CN Balance with CN Matrix
Current urban dataset is incompatible with code.
In addition, there was an update to the lake raw dataset. This is compatible with the code, but changes answers.
Bill: As a general rule, we should aim to have mksurfdata_map generate datasets that are the same as what's being used out-of-the-box, to avoid accidental answer changes if someone generates their own surface dataset.
Erik will back out these changes to the raw datasets we're using, and then put those changes on a branch to come in later.
For both urban and lakes, we're going to replace the old versions and switch to the new (no option to use the old).
Sam Levis is going to work on implementing MIMICS. So it would be good to do this in a way that doesn't end up implementing things twice, but instead reuses code as much as possible.
This turns out to be a more general issue: pconv = pprod10 = pprod100 = 0 for crops, so deadstem gets lost with landcover change.
We'll set pconv to 0 for crops.
- Dave: Questions for Ryan about stocking density in FATES (Erik note, I've set it up so that nstem from the params file is used for all BGC cases [whether BHS is on or off])
- Greg: fates_main_api update to ctsm5.1.dev020 and fixing CNFireArea change
- Erik: A FATES tag came out with info on the previous tag, is this OK? Since, we don't plan on doing very many of these I'm thinking that's fine.
- Erik: Should we talk about #1247 and getting BHS to work with FATES?
- Erik: For mizuRoute lake work to balance water mizuRoute should send back the amount of water it takes out. Normally that would be -P-E, but if the lake is drained it could be less than that. In that case CTSM will need to know and compensate for water balance. Is the idea to assume that doesn't happen and allow water imbalance? When should we meet to talk about lakes in CTSM/mizuRoute?
- Erik: Tracking down the energy balance error lead Sean to find the error in BHS. I plan to have him show me what he did for that. Do others want to be involved? I also want to have Keith show me how he's been doing the same on the matrix branch. Likely won't do either for at least a little while. So this is partially FYI.
- Erik: List of little things...
- Erik: There's two caveat's I put in the BHS tag: Setting of leaf/stem biomass should be refactored and removed from CanopyFluxes and put in either BGC or SP modules. There's a change in CNGapMortality that could change for DV when AD mode is on (should we fix this later?)
- Erik: One other thought on BHS there are new history variables for RAH and RAW labeled as 1 and 2 I think they really mean in and below canopy, so should be labeled that way for readability. I don't see why they are labeled with numbers rather than something a user could understand. Should we make an issue to change this?
- Erik: Why does spinup_factor_deadwood for fire subtract 1 from it for fire?
- Erik: There's another spinup_state simplification that I could do, but didn't. Later?
- Erik/Sean: Sean asks if dbh for prognostic crops on the params file should be the same as for grasses/generic crops. It isn't actually used, but people may ask "why are they different"?
- Erik: FYI. Because I had to change paramsfile anyway, the clm4_5 and clm5_0 params file do allow you to run BHS on. We didn't need this but it wasn't going to affect the time of anything.
- Bill: Issue with h2osoi_ice negative
- Bill: ozone
Sean: need number of stems per m^2
Ryan: these properties are emergent in FATES.
Sean: there are some global numbers, but they differ quite a bit from some of the site-level values that they used when initially developing the code (e.g., at Niwot Ridge). It seems like there's probably some compensation between parameters. Can we learn anything from FATES in terms of how the different parameters correlate? For example, correlation between stocking density and diameters?
There isn't a direct equation in FATES that relates these quantities, but you could look at FATES output to derive these relationships. Note that these emergent relationships are constrained by the need to hit a particular biomass, given some allometry equations.
For FATES, they like to run a lot of simulations at sites that have inventory data, and try to match the inventory data.
Some questions that come out of this:
- Should we make these changes now, before doing PPE stuff? That would
be Sean's inclination: do it now and learn from PPE.
- Dave: interested in getting the PPE going soon, but could do this if it's going to be quick.
- Sean's going to try to make this change. Will modify nstem and dbh for SP mode. For BGC mode, it calculates dbh based on nstem and biomass.
- Hard-coded parameter used for leaf mass per area: Sean has converted
this to use slatop. Do we want to put that in there?
- Dave: thinks this doesn't matter for now. Suggests making the change now in a separate PR.
Trying to use FATES fire data modules isn't working now.
Erik suggests the two of them talking.
Let's not worry about this
Ryan submitted a PR to Sean's branch with some changes needed. It's not urgent. A better time to get this in could be after the latest FATES is on master. Will require synchronized PRs in FATES and CTSM.
Erik: in #1247, we set the biomass in CanopyFluxes. Suggests a refactor where we pull this out, so that it is set different places in BGC and SP code, avoiding the conditional in CanopyFluxes.
Erik: For mizuRoute lake work to balance water mizuRoute should send back the amount of water it takes out. Normally that would be -P-E, but if the lake is drained it could be less than that. In that case CTSM will need to know and compensate for water balance. Is the idea to assume that doesn't happen and allow water imbalance? When should we meet to talk about lakes in CTSM/mizuRoute?
A few ways we could handle this:
- Send back an "unmet flux" to CTSM. We've thought about doing this for ice sheets, but never implemented it.
- Do something similar to irrigation, where we have a limit based on available water. Maybe the best, though Bill remembers some subtleties with this.
- Do something similar to what happens with irrigation without the volr limiting: pull extra from the ocean.
Dave suggests devoting a CLM meeting to this: it could be valuable for a lot of people.
Erik: There's two caveat's I put in the BHS tag: Setting of leaf/stem biomass should be refactored and removed from CanopyFluxes and put in either BGC or SP modules. There's a change in CNGapMortality that could change for DV when AD mode is on (should we fix this later?)
Dave: not a priority.
Erik will make an issue for this.
Changing name of RAH and RAW history variables.
- Sean: this was historical; changing this on history files makes sense.
Dave isn't sure. Charlie might know.
This isn't used, but should it be the same as for grasses/generic crops?
Sean: it's off by a 0; wonders if this is a typo.
Dave: is there a convention for unused parameters? Should we set this to a missing value? This has come up a few times.
- We could set this to _FillValue, but we think we currently don't handle this cleanly. We could relatively easily set _FillValue things to NaN, so that if you try to access them the code blows up in an obvious way.
- Mariana: Simplified method for specifying land mask
- Dave: Initializing C13 run without C13 on IC file
- Erik: Do we want FATES to work with Biomass Heat Storage? And if so do we want to hook it up now or later?
No longer need fatmlndfrac (domain) file. Instead, read in an ocean mesh and regrid on the fly in initialization.
For lilac, for backwards compatibility, still reading in the fatmlndfrac. We could move to just using the land mask that is already sent through the interface from the atmosphere, thus not requiring any additional file to specify the mask.
This leads to roundoff-level changes in latitude from the current baselines. And there will be some big changes in f10 cases; we'll do that as a separate step.
In single-point cases, assume mask is 1, and no need for a domain file or anything else.
There is some question about roundoff-differences in landmask arising from differences in processor count. Will look into this more.
Biomass heat storage will be on by default in CLM51 cases.
For now, these two can't be turned on together. We had some discussion about what would be needed for them to work together.
Sean is working on one final test in the test suite that's failing: spinup. Then Erik will rerun full testing.
https://github.com/ESCOMP/CTSM/pull/1231 is pretty much ready. But then
there are some other changes on CTSM master that need to be pulled in to
fates_main_api
. That needs to be done before we can consider
deprecating fates_main_api
.
Erik just added one clm51 fates test. Most tests in the test suite are still with clm50.
For now, there's no reason for fates users to use clm51. As we add more things, there may be a reason to switch - though, so far, a lot of the clm51 changes (e.g., Leah's arctic stuff & luna bug fixes) shouldn't impact fates.
At one point Erik suggested merging up to ctsm5.1.dev012 first, since that's what the PPE branch is based off of. But it might be fine to just merge all the way up to the latest master (currently dev020) all at once.
For merging fates_main_api
to master, we're pretty much okay with this
happening whenever. Greg will run full testing & write ChangeLog, then
Erik or Bill will do the final merge to master.
-
General
-
Documents
-
Bugs/Issues
-
Tutorials
-
Development guides
CTSM Users:
CTSM Developer Team
-
Meetings
-
Notes
-
Editing documentation (tech note, user's guide)