-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to attach partials to a crate? #146
Comments
I don't think this is feasible, mainly because RO-Crate is not just about entities but also their relationships. Suppose you somehow loaded authors from a file like this: [
{
"@id": "https://orcid.org/0000-0002-1825-0097",
"@type": "Person",
"name": "Josiah Carberry"
},
{
"@id": "https://orcid.org/0000-0000-0000-0000",
"@type": "Person",
"name": "John Doe"
}
] What do you link them to? What's the crate entity that was authored by these persons? You'd need a way to express that, but the user that writes the above file does not even know they're going to end up in an RO-Crate. Even if that was doable, you'd be basically asking users to write full RO-Crate markup. The engine's code that's generating the RO-Crate is the only one who has all the knowledge to make sense of the whole thing. It knows that e.g. those are workflow authors (because they were entered in a file meant to communicate that information) and it knows what's the crate entity that represents the workflow. |
See from @dgarijo @PavelAntonia approach on https://github.com/oeg-upm/ya2ro on using a YAML template for making an RO-Crate, which can even look up ORCIDs, e.g.: type: "paper"
title: "DockerPedia: A Knowledge Graph of Software Images and their Metadata"
authors:
- # Daniel Garijo
orcid: http://orcid.org/0000-0003-0454-7145
role: "Researcher"
-
name: "Maximiliano Osorio"
position: "Research Programmer"
description: "Computer Scientist at the Information Sciences Institute of the University of Southern California." Could this be used to create a skeleton RO-Crate that then is augmented by the Autosubmit code? |
You could with ro-crate-py also load entities from a second RO-Crate of course. But copying them over may need some deep-copy logic in case they have references to other contextual entities (e.g. for affiliation) |
That looks similar to COMPSs & Autosubmit current approach. In COMPSs (cc @rsirvent) you have to use a YAML like COMPSs Workflow Information:
name: Name of your COMPSs application
description: Detailed description of your COMPSs application
license: Apache-2.0 # Provide better a URL, but these strings are accepted:
# https://about.workflowhub.eu/Workflow-RO-Crate/#supported-licenses
sources_dir: [path_to/dir_1, path_to/dir_2] # Optional: List of directories containing application source files.
# Relative or absolute paths can be used
sources_main_file: my_main_file.py # Optional Name of the main file of the application, located in one of the
# sources_dir. Relative paths from a sources_dir or absolute paths can be used
files: [main_file.py, aux_file_1.py, aux_file_2.py] # List of application files
# Relative or absolute paths can be used
Authors:
- name: Author_1 Name
e-mail: author_1@email.com
orcid: https://orcid.org/XXXX-XXXX-XXXX-XXXX
organisation_name: Institution_1 name
ror: https://ror.org/XXXXXXXXX # Find them in ror.org
- name: Author_2 Name
e-mail: author2@email.com
orcid: https://orcid.org/YYYY-YYYY-YYYY-YYYY
organisation_name: Institution_2 name
ror: https://ror.org/YYYYYYYYY # Find them in ror.org And in Autosubmit I implemented it so users have to provide this info that's missing from our workflow configuration: license: Apache-2.0 # Find in https://spdx.org/licenses/
authors:
- name: Bruno P. Kinoshita
email: bruno.depaulakinoshita@bsc.es
orcid: https://orcid.org/0000-0001-8250-4074
organisation_name: Barcelona Supercomputing Center
ror: https://ror.org/05sd8tv96 # Find them in https://ror.org
- name.... A common approach for these three implementations/tools would be really great.
Maybe there's something in JSON-LD to combine schemas or files? If so then we could have these implementations asking users to provide a partial JSON-LD or YAML-LD and then just merge it with the RO-Crate metadata? Thanks! |
I spent the afternoon today reading about JSON-LD, RO-Crate, and reading the From what I understood that the entities, mapped as Python classes in The So I think I could simplify the process of attaching external partial information to existing data within the crate, similar to how you would do My idea is to provide a JSON file, with a similar structure to the Here's what I sketched today (refrained from touching the code in Autosubmit until I have it clearer on my mind & on the paper) (oh, and using JS due to comments): # TODO: add a section to AS docs explaining the idea behind it, link to schemaOrg page and playground, and provide examples for data missing from AS that is interesting for workflow authors/devs, such as license, inputs, outputs, etc
{
# No context here, as these are partial/patches.
"@graph": [
# This is for the metadata itself, extra data that we want to add. Matching is through @id!
{
"@id": "ro-crate-metadata.json",
"license": "https://spdx.org/licenses/Apache-2.0.html",
"author": [
{
"@id": "https://orcid.org/0000-0001-8250-4074"
}
]
},
# This is for the Autosubmit processed/unified workflow configuration.
{
"@id": "./",
"author": [
{
"@id": "https://orcid.org/0000-0001-8250-4074"
}
],
"license": "https://spdx.org/licenses/Apache-2.0.html",
},
{
"@id": "https://spdx.org/licenses/Apache-2.0.html",
"@type": "CreativeWork",
"identifier": "Apache-2.0",
"name": "Apache License 2.0",
"url": "https://www.apache.org/licenses/LICENSE-2.0"
}
# This is related to authorship & affiliation.
{
"@id": "https://orcid.org/0000-0001-8250-4074",
"@type": "Person", # When the @type is present, we will search it in the ro-crate-py classes and call crate.add(#type-class, @id, properties).
"name": "Bruno P. Kinoshita",
"affiliation": {
"@id": "https://ror.org/05sd8tv96"
},
"contactPoint": {
"@id": "mailto: blabla@bsc.es"
}
},
{
"@id": "mailto: blabla@bsc.es",
"@type": "ContactPoint", # When the @type does not match a class, we will use ContextEntity.
"contactType": "Author",
"email": "blabla@bsc.es",
"identifier": "blabla@bsc.es",
"url": "https://orcid.org/0000-0001-8250-4074"
},
{
"@id": "https://ror.org/05sd8tv96",
"@type": "Organization",
"name": "Barcelona Supercomputing Center"
},
# This is for the inputs & outputs.
{
"@id": "autosubmit-complete-workflow.yml",
"input": [
{ "@id": "#param001" }
],
"output": [
{ "@id": "#output001" }
]
},
{
"@id": "#param001"... WIP
}
]
} With that I will simplify my code, and instead of parsing YAML and writing custom code to "stitch" things up, I will:
I started working on the inputs & outputs for Autosubmit, but I couldn't find good examples on WorkflowHub.eu. I will comment on #148 as I think that's pertinent to the latest comments there. -Bruno |
If you're only going to allow contextual entities as new entities to be added, there's no need to search for an existing type in the model. Just add it as a org_dict = {
"@id": "https://ror.org/05sd8tv96",
"@type": "Organization",
"name": "Barcelona Supercomputing Center"
}
org_id = org_dict.pop("@id")
org = crate.add(ContextEntity(crate, org_id, properties=org_dict)) For updates, removing all keys that start with entity._jsonld.update(update_dict) |
Hi,
For Autosubmit, since the workflow configuration doesn't contain the information needed for RO-Crate, I used the exact same approach from COMPSs and asked users to provide a YAML file with authors & license.
Then I create the objects and attach/add to the RO-Crate-py object.
The implementation in Autosubmit is similar, but not identical to COMPSs. Other workflow managers with similar need may craft yet another way of doing the same.
It would be nice if there was a way to load RO-Crate-py entities directly from a dictionary/YAML data. Something like
Not sure how to validate the format of the entities... maybe instead of YAML receive JSON-LD directly, or provide a tool/script to read SPARQL+SHACL, etc.?
Cheers
Bruno
The text was updated successfully, but these errors were encountered: