Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case study: NCBITaxon #7

Open
mslw opened this issue Nov 27, 2023 · 1 comment
Open

Case study: NCBITaxon #7

mslw opened this issue Nov 27, 2023 · 1 comment

Comments

@mslw
Copy link
Contributor

mslw commented Nov 27, 2023

This issue is about controlled dictionaries, and creating tabby input enriched by ontology lookup

Current state

The current sfb tabby requires sample[organism] to be expressed as ID in the NCBI organismal taxonomy, formatted as, e.g. NCBITaxon:9606.

This can translate (by string substitution) to http://purl.obolibrary.org/obo/NCBITaxon_9606, which can be looked up (also via an API), e.g. in OLS: NCBITaxon:9606 yielding e.g. label (i.e. Latin name) and exact synonym (genbank common name, i.e. English name).

For feeding this info to the catalog, I like using OpenMINDS controlled term for Species, because it has fields such as name (required), preferredOntologyIdentifier, and synonym. These map nicely into Latin name, IRI, and English name, and make it easy to create a catalog template for displaying this information.

Consequently, the dataset attribute is currently modelled as (note that using Species as attrubute IRI is probably not a good idea):

sample[organism]:
slot_uri: openminds:Species
required: true
multivalued: true
description: >-
Classification of organism(s) associated with, or studied
for the dataset. One or more organisms can be given, one per
column. Organisms must be identified by their ID in the
NCBI organismal taxonomy, which can be searched at
https://www.ebi.ac.uk/ols4/ontologies/ncbitaxon. For
example, the identifier for human or homo sapiens is
NCBITaxon:9606. The column value should be NCBITaxon:9606 in
this case.

and there is currently no range (or string pattern) defined, and there is no custom Species object definition.

Note: the same applies to sample[organismPart] / openminds:UBERONParcellation.

Questions

  • Should we define our Species object (with IRI pointing at OpenMINDS or not) with the three properties listed above?
  • Should we keep it as string, and just provide a string-matching pattern for validation

Thoughts

The problem is that a datalad-tabby convention could convert the NCBITaxon:1234 into a full IRI, but it couldn't (and probably shouldn't) perform an ontology lookup - links to a question, which stage of our processing we are modelling. With that, it cannot produce a valid openMINDS Species object (no name).

I really like an OpenMINDS-like representation for feeding data which is based on a controlled dictionary into the catalog.

I am tempted to define my own Species object, that would only have an IRI / preferred ontology identifier required, and other fields optional. Then, these fields could be filled in during preparation for the catalog. And our schema would sort-of live in the middle of the tabby-to-catalog process,

@mslw
Copy link
Contributor Author

mslw commented Dec 14, 2023

A snapshot from a whiteboard discussion, to refresh our memories:

PXL_20231213_143224428

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant