Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Fixing ObsPack diagnostics in GEOS-Chem and/or standardizing ObsPack file inputs #2328

Open
Tracked by #2312
yantosca opened this issue Jun 14, 2024 · 3 comments
Assignees
Labels
category: Discussion An extended discussion of a particular topic topic: Diagnostics Related to output diagnostic data topic: Input Data Related to input data

Comments

@yantosca
Copy link
Contributor

Your name

Bob Yantosca

Your affiliation

Harvard + GCST

Please provide a clear and concise description of your question or discussion topic.

The ObsPack diagnostic relies on input files from NOAA that often have inconsistent dimensions, depending on which sites they come from. This issue can be a central point of discussion for efforts related to fix outstanding issues in the ObsPack diagnostic in GEOS-Chem.

There are also few python packages for prepping ObsPack inputs that we could try to add to the community folder of GCPy. We can also keep track of these development efforts here.

Tagging @eastjames @jhaskinsPhD @alli-moon

@yantosca yantosca added topic: Input Data Related to input data topic: Diagnostics Related to output diagnostic data category: Discussion An extended discussion of a particular topic labels Jun 14, 2024
@yantosca yantosca self-assigned this Jun 14, 2024
@yantosca yantosca changed the title Discussion: FIxing ObsPack diagnostics in GEOS-Chem and/or standardizing ObsPack file inputs Discussion: Fixing ObsPack diagnostics in GEOS-Chem and/or standardizing ObsPack file inputs Jun 14, 2024
@jhaskinsPhD
Copy link
Contributor

Hi Folks,

I last had Obspack working for me in v13.3.4, but have gotten errors using it in GCClassic v14.0.0, v14.2.3 and v14.3.0. The error I get looks like this:

********************************************
* B e g i n   T i m e   S t e p p i n g !! *
********************************************

---> DATE: 2013/06/01  UTC: 00:00
 HEMCO already called for this timestep. Returning.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

In Ncrd_1d_Char #2:  NetCDF: Start+count exceeds dimension bound
     65536         6

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Code stopped from DO_ERR_OUT (in module NcdfUtil/m_do_err_out.F90) 

This is an error that was encountered in one of the netCDF I/O modules,
which indicates an error in writing to or reading from a netCDF file!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

real 51.28
user 441.31
sys 5.71
srun: error: notch201: task 0: Exited with exit code 231

I think this is coming from a call to m_do_err_out.F90 that I added when we made the update to include the Obspack Wildcard where it will throw an error if it can't find/read the obspack input files when the diagnostic is turned on, since before that it would actually complete the run and just not save any Obspack output without an error & it seems to be an issue with the expected dimensions of the inputs.

@eastjames has code to convert the default Obspack files into a format readable by GEOS-Chem here and I have code to make your own Obspack files for GEOS-Chem sampling here.

Between the two repos, I think we have a consistent way to build Obspack files for GEOS-Chem input that are consistent with the documentation on the Obspack inputs needed on ReadtheDocs, but I haven't done enough digging to figure out what exactly the newer versions are throwing this error.

I know that at some point between v13.0.0 and v13.3.4, the required Obspack Input files changed from requiring a "time components" input in [YYYY, MM, DD, HH, mm, SS] list to a 'time' input in units of seconds since 1970-01-01 00:00:00 and the code got a bit more picky about the types of the required inputs (e.g. having the Obspack ID required as a S200 bytes string) . The default Obspack files seem to have both of these inputs (in some of the files), but as someone pointed out at IGC11, the default variables in the different types of Obspack files varies a bit. I think there must be something in the GEOS-Chem code that reads in the files that is expecting one of these time components in a different format than we're giving it now... So, some info on what exactly it's expecting would be valuable.

In my repo, I've uploaded an example Obspack input file here I created using my obspack_io.py script for sampling at the SOAS site on June 1st, 2013 that did work in previous versions, but is now throwing the error above when I turn the diagnostic on. When I compared it to a file I generated from @eastjames's repo on more classic Obspack files, it seemed entirely consistent in the required inputs listed on ReadTheDocs & I thought it was just an error with my code until IGC11.

@yantosca
Copy link
Contributor Author

Thanks @jhaskinsPhD for this. I think complicating matters is that we had to switch from the netCDF-F77 to the netCDF-F90 interface (which was required by CESM). This shouldn't matter, but who knows. I will try to replicate your error with the sample ObsPack file.

For your reference, here is some info about netCDF strings.

@yantosca
Copy link
Contributor Author

Hi @jhaskinsPhD and @eastjames. I was looking at the example netCDF file from https://github.com/jhaskinsPhD/gcpy_campaigns folder (which I believe represents the most recent ObsPack format) and I get this output:

$ ncdump -cts obspack_input.20130601.nc | grep obs
1:netcdf obspack_input.20130601 {
3:      obs = 48 ;
6:      int64 obs(obs) ;
7:              obs:long_name = "Sample latitude" ;
8:              obs:_Storage = "chunked" ;
9:              obs:_ChunkSizes = 1024LL ;
10:             obs:_Endianness = "little" ;
11:             obs:_Storage = "contiguous" ;
12:             obs:_Endianness = "little" ;
13:     int64 time(obs) ;
23:     float latitude(obs) ;
33:     float longitude(obs) ;
43:     float altitude(obs) ;
54:     char obspack_id(obs, string200) ;
55:             obspack_id:long_name = "Unique ObsPack observation id" ;
56:             obspack_id:comment = "Unique observation id string that includes obs_id, dataset_id and obspack_num." ;
57:             obspack_id:_Storage = "chunked" ;
58:             obspack_id:_ChunkSizes = 1LL ;
59:             obspack_id:_DeflateLevel = 5LL ;
60:             obspack_id:_Storage = "contiguous" ;
61:     int64 CT_sampling_strategy(obs) ;
79: obs = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 

But in one of my older obspack data files that I had used for testing, I get this output:

$ ncdump -cts obspack_co2_1_OCO2MIP_2018-11-28.2018092 | grep obs
61:netcdf obspack_co2_1_OCO2MIP_2018-11-28.20180926 {
3:      obs = UNLIMITED ; // (662 currently)
7:      int obs(obs) ;
8:              obs:long_name = "obs" ;
9:              obs:_Storage = "chunked" ;
10:             obs:_ChunkSizes = 1024 ;
11:             obs:_Endianness = "little" ;
12:     int time(obs) ;
20:     int model_sample_window_start(obs) ;
28:     int model_sample_window_end(obs) ;
36:     float latitude(obs) ;
44:     float longitude(obs) ;
52:     float altitude(obs) ;
61:     float value(obs) ;
70:     double time_decimal(obs) ;
79:     int time_components(obs, calendar_components) ;
88:     char obspack_id(obs, string_of_200chars) ;
89:             obspack_id:long_name = "Unique ObsPack observation id" ;
90:             obspack_id:comment = "Unique observation id string that includes obs_id, dataset_id and obspack_num." ;
91:             obspack_id:_Storage = "chunked" ;
92:             obspack_id:_ChunkSizes = 1, 200 ;
93:             obspack_id:_DeflateLevel = 5 ;
94:     int obs_flag(obs) ;
95:             obs_flag:units = "binary" ;
96:             obs_flag:_FillValue = -9 ;
97:             obs_flag:long_name = "obspack flag" ;
98:             obs_flag:comment = "Determined by data provider (1: large spatial scale representation; 0: local/regional influence)" ;
99:             obs_flag:_Storage = "chunked" ;
100:            obs_flag:_ChunkSizes = 662 ;
101:            obs_flag:_DeflateLevel = 5 ;
102:            obs_flag:_Endianness = "little" ;
103:    int CT_sampling_strategy(obs) ;
111:    float CT_MDM(obs) ;
120:    float CT_RMSE(obs) ;
129:    int CT_assim(obs) ;
138:    int CT_may_reject(obs) ;
146:    int CT_may_localize(obs) ;
170: obs = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 

As you can see, the time variable has been updated from int (aka INTEGER) to int64 (aka INTEGER*8) between the older and newer file. I suspect this has to do with the Year 2038 problem. In other words, a 4-byte integer won't be sufficient to store Linux time values (seconds since 1970) after the year 2038. Then you need to use an 8-byte integer. Maybe they updated the format of the data files to be proactive.

Maybe a long-term fix would be to add logic into GEOS-Chem to test the type of the time variable and then read it into the appropriately-typed variable. I'm not sure I have time to do that now but it's something to think about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Discussion An extended discussion of a particular topic topic: Diagnostics Related to output diagnostic data topic: Input Data Related to input data
Projects
None yet
Development

No branches or pull requests

2 participants