Skip to content

Commit

Permalink
Update and improve vessel DOSDP and crosswalk robot template (#17)
Browse files Browse the repository at this point in the history
* fix reference

* add part_of to vessel dosdp

* fix empty body part in crosswalk

* update data for dosdp and robot template

* add term id and label to missing BodySubPart

* remove double :: in FMA terms coming from source in crosswalk table

* update crosswalk robot template

* update components and patterns

* fix warning on markdown could not find link when use word in between `[ ]` in readme

* add changes in dosdp to readme

* refine text and update TODOs

---------

Co-authored-by: Anita Caron <anitacaron@users.noreply.github.com>
  • Loading branch information
Anita Caron and anitacaron authored Jun 4, 2024
1 parent 097d4fa commit 38e0801
Show file tree
Hide file tree
Showing 10 changed files with 6,197 additions and 4,246 deletions.
45 changes: 29 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,35 @@

## Pipeline workflow

This workflow is a working in process.

1. Use a [DOSDP](src/patterns/dosdp-patterns/vessel.yaml) to define all vessels available in the [datasource](https://github.com/hubmapconsortium/hra-vccf/blob/main/Vessel.csv). A python script is used to generate the [data](src/patterns/data/default/vessel.tsv) for the DOSDP.
- [label, human_label] The colum `VesselBaseName` is used for the VCCF term label and for `oio:obo_foundry_uniquename` annotation, adding "(Human)". We use this column as label because it's the vessel name without the “#N” at the end. This applies to vessels with more than one `BranchesFrom`. However, in cases where there is a specific number of vessels in the body, the `VesselBaseName` includes a number for each different vessel. In UBERON, these cases should be added only one term. **TODO: Remove vessels with numbers**.
- [parent] The column `VesselTypeID` is used for the vessel classification. Possible values are, as UBERON term, heart chamber, artery, arteriole, capillary, venule, vein, or sinus. In the cases there isn't a matching UBERON term for the vessel, defined in the column `UBERON`. In other words, when the vessel exists in FMA or no matching term is available. **TODO: add `VesselTypeID` for all cases**.
- When `VesselTypeID` is `heart chamber` the values in `BranchesFrom` and `Vessel` are the same, which in fact are not vessels. e should not add the `BranchesFrom` relationship because it's the same as in `Vessel` which is not true. It should include only the `VesselTypeID` as parent and link to the matching UBERON term. **TODO: Remove relationship in these cases**. **TODO: Remove heart chamber type terms from data related to vessel DP?**.
- [location] The column `BodyPart` is used for a simplified definition generation. In some cases, this can be redundant with data available in the [crosswalk table](https://github.com/hubmapconsortium/hra-vccf/blob/main/VesselOrganCrosswalk.csv) which defines specific relationship between the vessel and the tissue. **TODO: Improve the use of this column in the DOSDP**.
- [xrefs] The columns `ReferenceURL` and `ReferenceDOI` are used as xref for the definition in the pattern. Some values in `ReferenceURL` are the URL for the UBERON matching term. These cases the reference is empty. When the `ReferenceURL` is a PMC publication url, it's transformed to PMID. **TODO: Remove DOI if `ReferenceURL` is from PMC publication**. **TODO: Search for DOI in radiopaedia cases**
- [synonym, synonym_xref] The column `FMALabel` and `FMA` are used as exact synonym and synonym xref, respectively, when available.
- [taxon] The taxon NCBITaxon:9606 (Homo Sapiens) is asserted in all VCCF terms as `present in taxon` relationship.
1. [Robot template](src/templates/vessel_relation.tsv) to create relationship between vessels defined in the [datasource](https://github.com/hubmapconsortium/hra-vccf/blob/main/Vessel.csv).
This workflow is a work in process.

1. Use a [DOSDP](src/patterns/dosdp-patterns/vessel.yaml) to define all vessels available in the [datasource](https://github.com/hubmapconsortium/hra-vccf/blob/main/Vessel.csv). A Python script generates the [data](src/patterns/data/default/vessel.tsv) for the DOSDP.
- **label, human_label** The column `VesselBaseName` is used for the VCCF term label and for `oio:obo_foundry_uniquename` annotation, adding "(Human)". We use this column as a label because it's the vessel name without the “#N” at the end. This applies to vessels with more than one `BranchesFrom`. However, in cases with a specific number of vessels in the body, the `VesselBaseName` includes a number for each vessel. In UBERON, these cases should be added only one term.
- **TODO: Remove vessels with numbers**.
- **parent** The column `VesselTypeID` is used for the vessel classification. Possible values are, as UBERON terms, heart chamber, artery, arteriole, capillary, venule, vein, or sinus.
- **TODO: Add more specific classifications when possible.**
- When `VesselTypeID` is `heart chamber, ` the values in `BranchesFrom` and `Vessel` are the same, which are not vessels, and should not add the `BranchesFrom` relationship because it's the same as in `Vessel`, which is false. It should include only the `VesselTypeID` as a parent and link to the matching UBERON term.
- **TODO: Remove relationship in these cases**
- **TODO: Remove heart chamber type terms from data related to vessel DP?**.
- **location** The column `BodyPartID` is used as the logical axiom `part of` to identify the part of the body where the vessel is located.
- **location_label** The column `BodyPart` generates a simplified definition. In some cases, this can be redundant with data available in the [crosswalk table](https://github.com/hubmapconsortium/hra-vccf/blob/main/VesselOrganCrosswalk.csv), which defines the specific relationship between the vessel and the tissue.
- **TODO: Improve the use of this column in the DOSDP**.
- **xrefs** The columns `ReferenceURL` and `ReferenceDOI` are used as xref for the definition in the pattern. Some values in `ReferenceURL` are the URLs for the UBERON matching term. In these cases, the reference is empty. When the `ReferenceURL` is a PMC publication URL, it's transformed into PMID.
- **TODO: Remove DOI if `ReferenceURL` is from PMC publication**.
- **TODO: Search for DOI in radiopaedia cases**
- **fma_xref** When available, the column `FMA` is used as `oboInOwl:hasDbXref` for the term.
- **synonym, synonym_xref** The column `FMALabel` and `FMA` are used as exact synonym and synonym xref, respectively, when the `label` is different than the `FMALabel`. The synonym is added as lowercase.
- **taxon** The taxon NCBITaxon:9606 (Homo Sapiens) is asserted in all VCCF terms as `present in taxon` relationship.
2. [Robot template](src/templates/vessel_relation.tsv) to create relationship between vessels defined in the [datasource](https://github.com/hubmapconsortium/hra-vccf/blob/main/Vessel.csv).
- This uses the column `BranchesFrom`, which means the “parent” vessel that is one step closer to the heart. For the ontology, we use `connecting branch of`.
- Following data source documentation, for veins it is `drains to` rather than `connecting branch of`. **TODO: Define relationship by `VesselType`**.
1. [Robot template](src/templates/vessel_organ_crosswalk.tsv) to create relation between vessel and tissue. The data source is the [crosswalk table](https://github.com/hubmapconsortium/hra-vccf/blob/main/VesselOrganCrosswalk.csv).
- [Vessel] The column `Vessel` is the VCCF terms created in the DOSDP. The search is done by the label.
- [relationships] For each relationship in the `Relationship` column, is added in the template. **TODO: When there isn't a matching UBERON term in the column `BodySubPartID`, use `BodyPartID` which is the organ. However, we need to discuss what to do in cases `BodyPart` is angiosome**.
- Following data source documentation, for veins, it is `drains to` rather than `connecting branch of`.
- **TODO: Define relationship by `VesselType`**.
3. [Robot template](src/templates/vessel_organ_crosswalk.tsv) to create relation between vessel and tissue. The data source is the [crosswalk table](https://github.com/hubmapconsortium/hra-vccf/blob/main/VesselOrganCrosswalk.csv).
- **Vessel** The column `Vessel` contains the VCCF terms created in the DOSDP. The label performs the search.
- **Relationships** The template is added for each relationship in the `Relationship` column.
- When there isn't a matching UBERON term in the column `BodySubPartID`, a VCCF ID is created and the label from `BodySubPart` are added to the template. Then the new VCCF ID is used for the relationship. This process creates 76 terms.
- **TODO: We still need to discuss what to do in cases where `BodyPart` is angiosome**.
- **TODO: Double-check if there isn't any unmapped Uberon terms in `BodySubPart`.**

## Description of columns in [datasource](https://github.com/hubmapconsortium/hra-vccf/blob/main/Vessel.csv) [(source)](https://www.nature.com/articles/s41597-023-02018-0#Sec7)

Expand All @@ -38,7 +51,7 @@ This workflow is a working in process.

**UBERON**: The ID of the vessel in the UBERON ontology.

[Not directly used; UBERON import via ODK] **UBERONLabel**: The main label of the vessel in UBERON (imported from UBERON).
(Not directly used; UBERON import via OD) **UBERONLabel**: The main label of the vessel in UBERON (imported from UBERON).

**FMA**: The ID of the vessel in the FMA ontology.

Expand Down
Loading

0 comments on commit 38e0801

Please sign in to comment.