-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mksurfdata_map: replace source (SRC) files of various masks with SRC files with no mask #823
mksurfdata_map: replace source (SRC) files of various masks with SRC files with no mask #823
Conversation
1. I modified the list of SRC resolutions in mkmapdata.sh and in checkmapfiles.ncl to only include the "nomask" grid for each resolution. I removed the 360x720cru SRC resolution entirely because the only difference from the 0.5x0.5 SRC file was that longitudes were given 0 to 360 instead of -180 to 180. 2. I placed three new files in /glade/p/cesmdata/cseg/inputdata/mappingdata/grids that were not available previously: - SCRIPgrid_0.25x0.25_nomask_c191014.nc - SCRIPgrid_0.9x1.25_nomask_c191014.nc - SCRIPgrid_3x3min_nomask_c191014.nc 3. Modified namelist_defaults_ctsm, namelist_defaults_ctsm_tools, and namelist_definition_ctsm to be consistent with (1) and (2). 4. qsub regridbatch.sh appears to work for all SRC grids.
Can one of you explain why the UGRID file should be treated differently from the others? |
@billsacks the ugrid required different arguments to ESMF regrid weights. If that's still the case for OCGIS then it will need to be kept. But, if OCGIS detects the filetype -- you might not need to keep track of it. |
Thanks @ekluzek . Does that imply that we need to keep source masking done at the time of mapping file creation? @slevisconsulting seems to imply this, but I don't understand why we need to treat the UGRID file differently in that respect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just the first step in the process. This removes the maps that will be redundant when mksurfdata is changed to use the maps separately. Those two changes are linked together so more work is required before this can come in.
This part looks good though.
Ahhh, yes. OK I reread @slevisconsulting note about that. The reason the one UGRID might as well be handled differently is because it's the only 1km grid file. So you might as well keep the mask associated with it. It would actually move extra work into mksurfdata to manage the mask separately. For grids with several different masks this makes sense, but this is a high resolution grid and there's only one, so it will slow mksurfdata down to manage the mask separately. This does mean not everything is handled the same way -- but I think it's OK for this case. If it were a low resolution map and there was only one of them, you might as well handle it like the others. But, for the high resolution map we probably don't want to both pay the cost in changing mksurfdata AND pay the cost of making it slower by changing it. |
Okay, I see your points @ekluzek . On the other hand, we're going to pay a long-term cost in terms of the potential confusion of having things done differently for different raw datasets. I'm okay with this as long as this is documented somewhere - i.e., we should document that, in nearly all cases, we use mapping files with no masking and apply the mask separately in mksurfdata_map; but there is this one exception (and give the reasons you laid out above). |
Keeping record of a few issues that I have encountered.
The contents of these namelists differ (shown here in opposite order): The latter formatting caused me trouble when I tried running |
Following up on my previous post: |
- mkdomainMod.F90: Removed error check confirming that the input domain mask should equal the gridmap mask because the gridmap mask (SRC mask found in the "nomask" mapping file) now equals 1 everywhere - mkgridmapMod.F90: Replaced wtnorm with gridmap%frac_dst in subroutine gridmap_areaave_srcmask - mklaiMod.F90: Set mask_src equal to tdomain%frac instead of equal to 1 Answers over continent appear unchanged. Answers over coastal areas appear different by more than round off. I don't know if that's a problem.
@ekluzek (and @billsacks ?) pls check the work in my latest commit and let me know if my changes make sense to you when you get a chance. Note that I did not pass frac_src and frac_dst as arguments the way we had discussed. As far as I could tell, it was not necessary. Also note that these mods resulted in bigger than round off changes along coastlines. Do you interpret this as a problem with my approach? Or maybe in my approach I didn't capture everything that I should have... |
For frac_dst, see my above comment (#823 (comment)). For frac_src, I'm thinking that when you said this in your initial comment:
we actually meant mask_src (though I can't remember for sure). For lai, this is already passed in, which is why you didn't need to do that step. You'll need to pass in mask_src for most other fields. |
I interpret that as a possible problem that at least warrants further investigation. I'm just thinking: It may be worth the time to set up an artificial grid for testing purposes. For example, if you have a 0.5 deg raw dataset (is that the resolution of LAI?) you could set up an exactly 1 deg model grid. Then it would be relatively easy to see what the values in each model grid cell should be. |
I agree with @billsacks that I wouldn't expect the change to change results to more than roundoff. It's possible we could be convinced that answer changes beyond that would be OK, but I wouldn't expect it. One of the only reasons I can think of to expect answers to change beyond roundoff is if there's a bug in how the mask is handled either inside mksurfdata_map or in the ESMF regridder. |
Based on your comment @billsacks I could see the problem stemming from my setting I will look into it. |
Results now are the same as before the code modifications to within roundoff and roughly -1e-9 to 1e-9
@@ -57,7 +57,7 @@ module mkpctPftTypeMod | |||
end interface pct_pft_type | |||
|
|||
! !PRIVATE TYPES: | |||
real(r8), parameter :: tol = 1.e-12_r8 ! tolerance for checking equality | |||
real(r8), parameter :: tol = 1.e-9_r8 ! tolerance for checking equality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to raise the tolerance for this test to pass:
convert_from_p2g ERROR: default_pct_p2l must sum to 100
sum(default_pct_p2l) = 100.000000000149
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, together with the still-slightly-larger-than-expected differences from before, is making me nervous. This amounts to a relative error of 1e-12, which is about 4 orders of magnitude greater than double precision roundoff.
I feel like we should either (1) trace this greater error back to the source and convince ourselves that this is an acceptable level of error, or (2) play with the source-masking remapping algorithm to try to reduce roundoff-level errors.
Regarding (2), the first thing I'd try is to change this:
dst_array(no) = dst_array(no) + wt*mask_src(ni)*src_array(ni)/wtnorm(no)
I'd delete the division by wtnorm(no) here. Instead, I'd change the where block at the end of the routine:
where (wtnorm == 0._r8)
dst_array = nodata
end where
to:
where (wtnorm == 0._r8)
dst_array = nodata
elsewhere
dst_array = dst_array / wtnorm
end where
My gut feeling is that just doing this division once will be less prone to introducing errors in the final result, though I'm not sure of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if it would help, I'm happy to help with this investigation if you give me directions for how to reproduce what you've been doing for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@billsacks I tried your suggestion, but it did not help. I have now pushed new changes that correct the problem.
Latest commit: |
@slevisconsulting I'm confused. You say the latest commit is to within roundoff, but then you state it's within +-1.e-9 -- roundoff is roundoff is roughly +-1.e-15. +-1.e-9 might be OK, but it is more than roundoff, unless the numbers being differenced are larger than order 1.0. |
@ekluzek thank you for the correction. I'm sorry for using the wrong terminology. Is this level of change is acceptable? |
Ah, yes - good catch. I think it's arguable which one is correct to use in the remapping, but it makes sense that using the mask would be consistent with what was being done when we were relying on the ESMF regridder for source masking. |
In terms of your last two questions. We still want to change from scrip grid to ESMF mesh files. Is that what the UNSTRUCT format is? You might as well include them in what you are doing. The rimport script will fail if a file doesn't exist. I would suggest before you use it though that you do.... cd $CSMDATA So use the list to check for the files existence beforehand. |
I think I did this correctly: |
With BobO's correction, these tests that were failing previously now PASS: |
I think this PR is ready for final review and other near-merge activities. I expect this to be my last post until I return to work on 5/18. In my local branch directory In Now that I have replaced the |
Thank you for all of your work on this, @slevisconsulting! It really sounds like you're close now!
This is probably fine. However – partly connected to this increase in error, but more just for its own sake – if you haven't already done so, I'd suggest doing a quick review of the diffs that have come to master since the original version of this branch in the files modified by this PR. That is, look quickly at the diffs between ctsm1.0.dev080 and ctsm5.1.dev039 in relevant files: You had a couple of other questions about whether to keep various files and possibly import them to the inputdata repository. I don't have my head back in this enough to answer your specific questions, but my general rule is: keep any files that are actually needed (i.e., pointed to in xml), and delete the others (or move them to some other location outside of inputdata if they might be needed in the future but aren't currently referenced). |
@billsacks originally I had done this by looking at |
That's very helpful, but not really what I meant. Looking at the diffs on master (not bringing your branch into the picture at all) between those two ctsm tags can be helpful with these long-lived branches. For example, if someone introduced an entirely new module in mksurfdata_map that needs to be adjusted similarly to how you adjusted the existing modules, you wouldn't be able to see that just by looking at the diffs between your branch and master. I don't feel it's critical that you do this, but it tends to make me more comfortable to do that one extra quick check when I'm working with such a long-lived branch. |
I tried the rimport command
|
Accept permanently. |
rimport ran to completion. I have updated the ChangeLog. Question: will you want me to commit/push the changes to |
@slevisconsulting you should update mkmapdata.sh to point to the version of ESMF needed for this to work correctly. Is Bob O's version now the latest version of ESMF out there? If there's a later version that has his fixes in place you might not need it, but use the latest one that has the fixes needed in place. |
Add NEON sites. Add cime/cdeps support and capability to run NEON tower sites. This also brings in Negin's first version of the subset_data.py script to create datasets for single-point and regional cases. And a version of it was run to create the NEON surface datasets. To setup a NEON site do the following cd cime/scripts ./create_newcase --case myNEONtest --res CLM_USRDAT --compset IHist1PtClm51Bgc \ --user-mods-dir NEON/NIWO --driver nuopc --run-unsupported (There's also a I1PtClm51Bgc compset that can be used for fixed conditions) Note, also that several externals were updated to the version in cesm2_3_alpha03a. This means that answers change when CISM is turned on. Some of the grids for compsets tested were updated to use the new grid name "gris" in place of the older "gland".
@ekluzek and I talked about this today. What we want is similar to what was noted in this old comment: #823 (comment) (the list of testing to run). In particular, you should run the tools tests (I think you already have) and a single system test to make sure there are no errors in the xml files. It would probably be a good idea to re-verify that you can make all surface datasets via Makefile.data again (if you haven't done that since the merge up to the latest master). Before too long, we'd like to update all of the surface datasets so that they are consistent with these tools changes, but we (including @negin513) decided it makes sense to wait a bit on this in case there are other answer changes that will come in with the toolchain work. We also decided this can be the next tag to go in, given that you seem just about ready. So please go ahead with the final steps at your convenience. |
That's right, I completed these tests when I merged to dev038. At your recommendation, I am rerunning the Makefile.data test now that I have merged to dev039. As before, I am getting the following failures that I consider "expected" because they also occur in the baseline:
I will post an update in the next couple of hours if anything unexpected happens with the last 8 jobs in this test, which is in progress.
Sounds good. Let me know if you think of anything else that needs to happen. |
@slevisconsulting - I remembered one more thing that could be good to do before finalizing this PR, if you haven't already: Rerunning CIME's check_map tool on some (or all) of the new mapping files. I don't feel that this absolutely needs to be done, but the problems uncovered March, 2020 manifested as conservation errors in the check_map tool. See #823 (comment) for details. It would probably be good to at least check whether the 3min to spectral element (ne) mapping files are now good according to that tool. And if you can quickly (e.g., with a script) iterate over all of the new mapping files, it wouldn't hurt to just run this on everything to verify that the latest ESMF version properly conserves. I don't want this to become a big rabbit hole, though, so please feel free to ignore if this seems like something that would take more time than the value it provides. |
I'm also comfortable with deferring this check until after the tag is merged, in order to just get this in and done: Any issues uncovered are likely to be issues with the ESMF mapping tool, and as such might take a long time to resolve. Again, though, in either case, no need to spend significant time on this. |
Hmm, I tried the following with both the current module loads, as well as with the ones needed to get BobO's fix (see further below):
or
Both worked for this file that I generated last year Both failed for this file that I generated last year
And both returned a seg fault for the files created earlier this month, e.g.
So I think that you're right about this being a new can of worms... |
I'm not too concerned about the conservation error of 2.9e-12: I have a feeling that is just slightly greater than the tolerance, which is fairly arbitrary. But the seg fault on the new file indeed seems like a can of worms. Darn. I guess we can just move ahead without trying to do those tests. |
Ok. And would you like me to report the seg fault to someone in the ESMF group? |
No. This is a CIME tool, not an ESMF tool. It's not clear to me at this point where the problem lies. |
I went over the diffs brought in with the dev038 merge focusing on the following files and didn't spot anything concerning:
I also typed |
I just double checked this. Your conflict resolution looks good to me. |
@slevisconsulting - FYI, I'm making some minor adjustments to the ChangeLog entry: This tag doesn't actually change answers for the model itself, even though It will change answers once we update surface datasets. I'm clarifying that in the ChangeLog. Also, even when we do bring in new surface datasets, I wouldn't call the changes "significant" (according to the section near the top), since they are only about roundoff-level; so I'm unchecking the checkboxes for significant answer changes. |
Based on notes that I collected in a meeting with @ekluzek and @billsacks on 2019/9/24:
Description of changes
1) mkmapdata.sh requires no change in method; only changes in the list of SRC files.
Collapse multiple SRC files of a given resolution to a single "nomask" SRC file. A single weight (aka map) file will result that corresponds to each nomask case. Apply the mask in mksurfdat instead (see (2) below).
Create nomask files for SRC resolutions that don’t have them. Not for UGRID file(s) if I understood Erik correctly, but this comment may apply only to this UGRID file:
UGRID_1km-merge-10min_HYDRO1K-merge-nomask_c130402.nc
.2) mksurfdata
mkgridmapMod has loop over ns in subr. *default (used currently) and in subr. *sourcemask that I should now use, except for UGRID file(s) (again, exception may apply only to the above mentioned UGRID file).
Frac_src will be passed in as an arg now.
Frac_dst has same meaning as weight_norm, so pass weight_norm through subroutines to replace tgridmap%frac_dst.
Start with mklai. Switch the call from *default to *sourcemask. Mklai with the right mask should be identical (test 1).
mkindexmapMod has similar loop over ns that will need changing. Look out for any additional places that may need changing. If subr. *sourcemask2 is not in use, remove.
After mklai, work with mkpft and the rest.
May need to change the mask to “nomask” in some raw datasets.
Bill was looking at one with “topo” in the name and decided that this one has no masking.
3) Last step...
Mksurfdata.pl -debug will generate a namelist that I can modify by hand. Run manually with input from the namelist. This could end up with roundoff diffs.
Specific notes
Contributors other than yourself, if any:
@ekluzek @billsacks
CTSM Issues Fixed (include github issue #):
Fixes #286
Fixes #938
Are answers expected to change (and if so in what way)?
Expecting roundoff changes
Any User Interface Changes (namelist or namelist defaults changes)?
namelist_defaults_ctsm.xml
namelist_defaults_ctsm_tools.xml
namelist_definitions_ctsm.xml
all change, but this should be transparent to users working with default source (SRC) and destination (DST) resolutions. Users working with user-generated SRC resolutions will now need to create only one nomask SRC file per custom SRC resolution. They will then need to apply any associated custom masks at the "surface-dataset" generation stage of the process (see (2) above).
Progress and testing performed:
This far I am close to completing step (1) above. I have:
qsub regridbatch.sh
for the nomask list of SRC files; this has generated weight (aka map) files for all SRC/DST combinations, except the 1km DST grid as far as I can tell.