Best practice for feeding into WorkflowHub #581
Replies: 3 comments 4 replies
-
Hi, I just got alerted to this. I know @simleo is connected to several activities around this, in particular I'm thinking about Galaxy IWC workflows and the LifeMonitor GitHub app. Is there a solution that @hexylena could explore? |
Beta Was this translation helpful? Give feedback.
-
Discussed in WorkflowHub call today - further work needed to formalize this as with just a landing page and without the RO-Crate or repository it's much harder to "get" the workflow consistently. for IWC there is also the issue on what to index then -- https://github.com/galaxyproject/iwc is a GitHub repository without any Bioschemas markup exposed on the Web -- they could potentially be on http://galaxyproject.github.io/iwc ? It is stated the IWC workflows are listed on the public instances, but pages like https://usegalaxy.eu/workflows/list_published has a flat list of all the workflows, not a single page per workflow as traditional for Bioschema markup. Bioschemas have had a few ways on how to deal with these collections but no clear guidance on the Bioschemas website. It would be unclear what WorkflowHub would link to, perhaps https://usegalaxy.org/u/iwc/w/dctmd-calculations-with-gromacs-20 (just picking one of the instances?) as landing page that can have the markup and then links to the .ga file https://usegalaxy.org/workflow/export_to_file?id=195a608c481040de (or ideally Workflow RO-Crate) These are the kind of details and conventions that would have to be sorted before standalone Bioschemas could be a reliable import mechanism for WorkflowHub. In the first instance we therefore went with RO-Crate and CWL as they can be maintained in Git repositories and imported from there directly as done at the moment. |
Beta Was this translation helpful? Give feedback.
-
Currently, the submission of IWC workflows to WorkflowHub and LifeMonitor works as follows:
The plan is to replace this soon, having the LifeMonitor GitHub app handle all or part of the above steps. Both with the old and the new process, however, the registration process acts on the workflow repositories, not on a web page for the workflow that might have bioschemas markup embedded in it. So what would be nice to have is a way to add Bioschemas / Schema.org metadata to the source repository. This would add up to the metadata that's already being harvested, e.g., the "creator", "license", "release" etc. fields in the CWL has a formalism to add EDAM types to parameters and arbitrary metadata (Schema.org or other) to the workflow. Does / could Galaxy suppor something like that? If not, a convention could be set up to allow workflow authors to specify additional metadata through one or more additional files, e.g. in JSON-LD format. Note that a good chunk of the metadata, i.e., the part related to the workflow's internals (steps, tools, parameters and their relationships), can often be extracted automatically from the workflow. In Workflow Run RO-Crate we are defining a profile called Provenance Run Crate that can represent such metadata (prospective provenance), together with information about an execution of the workflow (retrospective provenance). The prospective provenance part is independent from specific executions, so it could be lifted up to Workflow RO-Crate and be supported by WorkflowHub (e.g., for diagrams, tabular descriptions, indexing by tool, ...). In runcrate (Workflow Run RO-Crate's toolkit) we're extracting this description from CWL workflows, and @pauldg is working on something similar for Galaxy workflows. |
Beta Was this translation helpful? Give feedback.
-
The above text originated in a discussion on the web site pull request BioSchemas/bioschemas.github.io#547 (comment)
Beta Was this translation helpful? Give feedback.
All reactions