-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a list of models and tools that can be used to test workflow managers #197
Comments
mHMWill start with mHM since it has great docs, the source is simple & clear, and not being a complete ESM coupled model it should be easier to run it (:crossed_fingers:). They provide two “test domains” that can be executed after So for the RO-Crate file, maybe an Autosubmit + mHM workflow could work. It'd be better if the workflow also prepared data for mHM based on the selected days for the workflow, thus using at least The easiest test scenario would be somewhere in Germany or Europe (as the data mentioned comes from EU agencies). But maybe it'd be possible to use somewhere else like Tamana-shi, Kumamoto, Japan, or Noumea, New Caledonia (or these two). 2023-04-08 The GIS data preparation step is a bit hard to follow, especially if ArcGIS Map is really needed (would be easier with QGIS). So creating the data for another basin looks like a task that demands more time than a few hours every other weekend. Let's see if there's some data ready to be used, and that can be used with different days.
2023-04-09 So; using their test domains, the That can be used, then, to create a workflow that takes as input the dates for these periods (or maybe just for running the model). The output of the workflow would be the outputs of the mHM model (netcdf files and another txt file). Perhaps we could also have an extra task to run ncview and export a plot, also used as output. All of this can be packed as an RO-Crate (without using FormalParameters), and it should run on any of these WMSs. 2023-04-11 Created a repository for an Autosubmit workflow to run mHM: https://github.com/kinow/auto-mhm-test-domains It includes the test domain data from 5.12.0, but that will be replaced by a task that clones the repository for v5.12.0 instead, to avoid including data with different license into the git repo. This will be a good test for an RO-Crate with an Autosubmit Project of type Git (that needs to be an input in the workflow).
The |
The idea here is to find at least a couple, maybe three or four, models and/or tools that can be used to create the same workflow in Cylc, ecFlow, Autosubmit, Steep WMS (cyclic), StreamFlow (cyclic w/ CWL dev loops), etc., and in the process take notes of what can be improved in each workflow manager.
At the same time, one of these will be used to produce RO-Crates and validate the Autosubmit RO-Crate implementation, and it will be uploaded to WorkflowHub.eu (ResearchObject/ro-crate-py#148).
The notes about the workflow implementation in different WMSs may be useful to find features that are missing or that could be improved in these WMSs, and at the same time provide a resource for the maintainers of these WMSs if they choose to support different cases (i.e. some WMSs may not be suitable for climate models with ensembles that require restarting/re-running, or to run NWP models with cyclic & with critical operational needs), or if they decide to support RO-Crate.
Requirements
Bonus points for the use case that:
Models and tools
Wave models
ecmwf-ifs/ecwam
NOAA-EMC/WW3
Earth System models
E3SM-Project/E3SM
Hydrology
UFZ/mhm
Software related to models
NCAR/PyCECT
Links
pangeo-data/awesome-open-climate-science
RO-Crates
While integrating these models and tools into workflows for different workflow managers, it's possible to take notes on how easy would be for these workflows to be archived as an RO-Crate.
It's clear now that:
1.1. That can be solved now with a custom JSON file containing entries compatible with the JSON-LD used to add/update entries in the RO-Crate file - Add methods for adding and updating JSON-LD directly (partials for WMS) ResearchObject/ro-crate-py#149
2.1. In cases like this, the approach above might be useful when combined with entries that provide a list of inputs/outputs, maybe using glob patterns like
**/*.nc
.2.2. It might be hard or nearly impossible to use BioSchemas FormalParameters as CWL/Galaxy/StreamFlow (these mainly rely on CWL, I think): Document how to create a Workflow Run Crate file ResearchObject/ro-crate-py#148 (comment). So in these cases we can just have a list of inputs & outputs as
File
andDataset
.The text was updated successfully, but these errors were encountered: