-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate version history json file - for use by ES-DOCS #2
Comments
@davidhassell @eguil here is a first pass at a version lookup file - please review and let me know if fields should be ordered differently to make it easier to use. PCMDI/input4MIPs-cmor-tables/Versions/6.2.0.json Once we have finalised the format, I can regenerate the versions back in time. Any future release will have a new 6.x.y.json file generated |
@davidhassell @eguil, I'm wondering whether sorting these by |
@davidhassell @eguil it would be great to get your feedback soon on the format, as I am anticipating significant changes as new datasets are generated and published, and without feedback you're going to be stuck using the existing format of the json info |
@davidhassell @eguil I have made a change to the format, please take a look at the files now building in PCMDI/input4MIPs-cmor-tables/Versions |
@davidhassell @eguil This JSON file with the versions is very helpful. It would provide the user with even more information, if you add a short reason for deprecation. Currently, I can only mention that the data is deprecated and point the user to the current version and the version google doc (see e.g.: https://doi.org/10.22033/ESGF/input4MIPs.1120 ). |
@MartinaSt thanks for the feedback, if you see these files as useful for you, then please feel free to suggest changed/augmentations/amendments to the format and content, so that it's easiest to use for you. The plan that I had, was once a format had been finalized, then I will generate versions extending all the way back to the original release v6.0.0 (20th December 2016) as noted in the google doc I think also having each of the DOIs for published/DOI-minted data would also be a useful addition |
@durack1 Thanks, Paul. Having the change of the DRS and my matching in mind, it would be great if you could add:
Could you avoid '' notations, e.g. '2017-05-18 (-AIR-*)' and replace these by all individual versions? The current notation is difficult to parse. In the currentVersionNotes you have split the note into multiple list entries. It would be good to have a single note per data version. Example from the 6.2.1. JSON: It would be great if you could make these changes. Is this information sufficient or do you need more information from me? Adding the DOIs is an excellent idea. It would be easiest if we had the DRS of the data collection on the DOI granularity directly in the JSON, e.g. %(mip_era)s.%(activity)s.%(institution)s.%(source_id)s [CMIP6.input4MIPs.PCMDI.PCMDI-AMIP-1-1-2]. |
@MartinaSt I hadn't thought about your use of this, glad it will be useful for you. Can you take a pass at editing the current Versions/6.2.1.json version of the file to the format that you want? If I have an example of the changes that you want implemented, it'll be easier for me to propagate the changes across all datasets in the collection. |
@durack1 The ideal structure from the citation point of view would be with examples ImperialCollege and PNNL-JGCRI in the new DRS: { Doi information is accessible as JSON using the above DRS_ids via: |
@MartinaSt thanks for this, the citation was not the target for the existing format so I'll have to consider merging these both. @davidhassell @eguil it would be really useful for you to chime in, as once a format has been settled you'll have to deal with this anyway you can |
At what level of the DRS hierarchy are we planning to publish DOIs? If we want to auto-generate version history files then it is important to know which level of the directory structure they apply to. |
|
@agstephens You can see the citation granularity, which is in use for input4MIPs, in my example. I have used the DRS_id on the citation granularity as key. |
@esdoc-system-user your comment above "data field should be an array not an object", can you further explain? The current file version/format can be viewed here |
It may be some use to summarize how ES-DOC will be storing dataset descriptions. The properties we might collect are (summarized from the CIM definition)
Apart from name, all properties are optional. Currently, the name, availability and description are captured in the ES-DOC CMIP6 experiments spreadsheet, that is rendered in the ES-DOC viewer (https://search.es-doc.org , e.g. the descriptions of pre-industrial aerosols for esm-piControl may be see here) |
@davidhassell thanks for this, I believe these properties are exactly what I was hoping to gather, so will consider these along with the requirements outlined by @MartinaSt above #2 and propose a new format before preparing the 6.0.0 -> 6.2.3 version json files |
@durack1, independent of the format for the version information the current information is outdated (version 6.2.3, November 2017) When do you plan an update? |
Hi folks, as discussed at the WIP call this morning we need to work on the input4MIPs dataset version history so that this information can be provided for model simulations to be accurately documented (which combination of the numerous forcing datasets available). It would be useful for @davidhassell @charliepascoe to engage on this so that we can generate an easy to use format that can be updated live as additional datasets are updated and contributed to the project @eguil @momipsl @taylor13 @MartinaSt |
It will be necessary to assign the input4MIPs collection version - currently 6.2.14 (see here) for each valid dataset, and as these datasets are deprecated their version remains static, with the new version getting the new collection tag, so e.g. a new volcanic forcing dataset (v4) is released, the input4MIPs collection is incremented to 6.2.15, in the 6.2.14.json file the v3 file had collection version = 6.2.14, in the 6.2.15.json file the v3 file continues to have collection version 6.2.14, whereas v4 will have 6.2.15 |
@durack1 thanks for coming back to this issue. Please keep either the version information, which data providers included in the dataset names, or/and add the ESGF version under which the dataset was published. Otherwise I will loose the connection to the dataset version in the citation. |
@eguil @davidhassell @momipsl this is the conversation that we can hopefully spend some time finalizing tomorrow - the format that @MartinaSt suggested is above #2 |
Hi @durack1 and all, Thanks for taking the time to talk through this a couple of days ago. To summarize, these are the attributes in the JSON files that I think we can use for ES-DOC:
I understand that all of these items are readily available. Of course any extra attributes that are needed, e.g. for citations are all fine and will not affect ES-DOC. This could be, more or less, a mingling of Martina's and Paul's JSON examples:
Thanks, David |
@durack1 , @davidhassell : Following today's discussion in the input4MIPs meeting, I propose that we add an attribute "VersionLink", which enables to link to a document (PDF) with a detailed description of the issue with the dataset version. |
@MartinaSt just circling back around on this. Is there an API to call to query the DOIs issued by the DKRZ citation service? As the archive has grown so much now, I'm reluctant to try and hand-spin this versioning information. I'm looking into harvesting all the metadata attributes from the ESGF project so I can populate all the fields comprehensively |
@durack1 : I'd like to come back to this version documentation issue. As the errata are not accessible for the data citation (an access by DRS CV is required) the revised version information is the only possibility to access+display version/errata information on the DOI landing page. What are your plans with this version documentation? Any idea about a schedule? |
@mauzey1 it'd be great to get back to this and get it done. What did you need from me to get this finalized? |
@durack1
Each entry has the This table was made with this Python script: https://github.com/mauzey1/esgf-utils/blob/d1e4215fd36ffa67f3a46bba7a2cd324ce5121b2/update-reports/input4MIPs_report.py |
@MartinaSt @davidhassell we should really circle around on this so we can finalize the forcing versioning json and you guys can start using it. How does the above format #2 look? |
@durack1, thanks for getting this forward. I use this information to document version and error information on the DOI landing pages. Therefore I need information on (see above):
|
@MartinaSt, it seems the above includes such information, so e.g. version/versionInfo: Regarding the versionNotes, what format would you want for this, a drop-down selection, or free-form text? Just wondering the use case and whether a char limit etc is required. @davidhassell please chime in now, as once this format is set it's not likely to be revisited and may require ES-DOCs to harvest information separate to this version json data |
@durack1 Sorry, I did not scroll to the right to see the deprecated information. Regarding the errata information: The content is up to the data creators, so free text. The important part for me is that I can show a reliable and meaningful errata information for deprecated data or the reason for deprecation on the DOI landing page. |
@MartinaSt no problem, so how about if we have a situation where two datasets are latest? With the PCMDI-AMIP-X-Y-Z data, the 1.1.3 (from memory) was the official CMIP6 release, whereas normally a 6 monthly update is released, which deprecates the previous version (but not 1.1.3 which will always be available as "latest"). Is such logic a problem? Are there any other considerations we need to factor in whilst finalizing the format? |
@durack1 yes, the errata might be related to some but not all datasets. The "versionNote" is directly related to the "deprecated/latest/None" information and thus including it breaks the proposed json. Maybe we can assume that for such a case, tat there is only one reason for the deprecation of some (but not all) datasets and include it in the upper level alongside "version"? |
@MartinaSt well how about we proceed this way, we'll work to generate the 6.2.37 version of the json, review this and once we have a finalized format generate all the previous versions back to initial 6.0.0 (20th December 2016) release, sound good? |
@durack1 Any suggestion to get this finalized is a good one! So from my view: Go ahead! Just as two comments: I will wait for the final format before I do any code changes and I will do the change if the version includes the errata information for the users (otherwise the deprecation flag does not add much information to the already available version, which is part of the DRS). It's a matter of spending my time most efficiently on the different projects I am involved in... |
@MartinaSt completely understood, and agree (spending valuable time appropriately). @mauzey1 is addressing a number of high priority requirements, and this is in the queue after these, so I'd hope we can get the latest version json finalized first, then we can double check it contains everything in the format required and then roll back to the start. There are a couple of new datasets that have started to appear for review, so 6.2.37 will be incrementing over the coming months |
@MartinaSt @mauzey1 is working on this as a second priority to the CMIP publication page. He has already generated the attached, and so we'll need to tweak this format to get to the finish line |
@durack1 @mauzey1 Ok, now I am on the right page. Thanks for the JSON and your effort. It looks good except for: Sorry, to be persistent but the most important information for me is to get errata information or in other words to have the reason for deprecation in the JSON for the "deprecated" cases. Is it possible to add such an information to the JSON? |
@MartinaSt yes sorry for my loose comments. Yeah that should be possible, so if a dataset is "deprecated", we could add an additional field such as |
Thanks @durack1 ! |
@durack1 The ESGF database only provides whether or not a dataset was deprecated; it does not provide any notes about why it was deprecated. How will we get this information? Would we just contact the people who published the datasets for the reason for deprecation, and manually add it to the rest of the information? |
@mauzey1 thanks for circling around on this. I have this information, so let me know where this should be put, so we can integrate it |
@durack1 Is there a github repo where you could store that information? It would make that info more accessible and easy to update. |
I was hoping that this information would be stored alongside the version info. There is no current github based index, rather I would need to generate this for inclusion alongside the versioning info |
@durack1 Is there a file with this information you can post here? I would like to rewrite my program that created the input4MIPs version info file to include that information. That way if changes were to happen to the status of the datasets, we can update the deprecation info and rebuild the version info file. |
@mauzey1 that will be something that I need to go through notes (not digital) to generate - so a job for me. Do you have a list of the datasets that have been identified as deprecated? That will simplify my job, and I should be able to post text within this thread for simplicity. We will obviously have to figure out where to put this going forward, but get up-to-date first |
@durack1 Here is a list of datasets listed as "deprecated" from input4MIPs. |
Migrating this issue from https://github.com/PCMDI/input4MIPs-cmor-tables to this repo |
@davidhassell @charliepascoe just pinging you guys on this thread. We've (for years) had an aspiration to bring the forcing datasets used for simulations into the documentation list, so this will be our best way to achieve this - we now have a live repo with the latest data identity and status, we just need to wrap this up for documentation purposes |
@eguil @davidhassell this issue has been generated following the email correspondence regarding https://es-doc.org/cmip6-ensembles-conformance/
It will be useful to iterate over the format of the json version info within this issue
The text was updated successfully, but these errors were encountered: