Skip to content

HACK FOR NF AND HELP END NF & OTHER RARE DISEASES FOR PATIENTS ALL OVER THE WORLD

License

Notifications You must be signed in to change notification settings

KrishnaTO/NF_hackathon2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NF_hackathon2020

COLLABORATIVE ONTOLOGY FOR NEUROFIBROMATOSIS

This project created a R-package essential to start compiling & assembling the varied data published through ontologies.

Specifically, the functions retrieve ontology data, based on a term, from BioOntology Repository, and parse it to get a combined ontology of the term. The reason for using BioOntology is mentioned in the 'Presentation Script' (unfortunately, time to record a presentation was not found as there weren't additional members, likely due to the seemingly complicated methods for handling Ontologies and inexperienced former opinion of the trivialities of making an Ontology).

However, this package is ready to be published as an ongoing project to connect to the base of a domain's knowledge base (Ontologies) via published data.

The R package can be found here:

https://github.com/KrishnaTO/OntologyTermAggregator

The demo is available here: https://github.com/KrishnaTO/NF_hackathon2020/blob/main/Demo/SearchviaCombinedOntology.Rmd

The reason Docker was not used was that this package & demo requires a user-specific API key. Although there probably is a method to adding the API key within the docker file, it still would not impact the reproducibility, as all of the data is hosted

The input used, Searchtoclasses_json.RData, is just a pre-downloaded convinience data object to be called for efficiency purposes from within the script. It will not be useful for any other term substitution other than the default within the demo.

The outputs are combined_output.csv, which hold the unique IDs, Descriptions, IRI links and other variables from the script, and the latter combined_output.complex.RData which holds the properties and metalinks for the term sources. At this time, the data can be followed by their variable names within the R global environment, and further development will follow due to time constraints.

To reproduce these results, please clone this repo, and run the Demo Rmd file. To fully use the functionality, you will have to obtain a BioOntology API Key (freely available via signup/account settings). If prefer not to, the input data will allow you to continue running the script from line 61.

Conclusion/Discussion:

Future considerations and TODOs for this package were originally outlined in the Presentation Script and quoted here:

Some additional potentials include exporting in Protege readable RDF format, which will allow researchers to visualize and share the term graphs relatively more easily.

Another is to rank terms by various factors to add some simplications before domain experts are required to curate. R can also be used to directly output the associations directly, instead of relying on Protege. These functions can also be extended to expand the trees with parents / children fitting of this combined term. Lastly, the overwhelming opportunity of applying Natural Language Processing to simplify the aggregated axioms, just as much as you can use NLP to process written articles to build upon term usage.

Future collaborators would ideally include the skills I'm sorely lacking, especially being able to run the OWL API, compiled from java. As well, collaborators who may have NLP knowledge to automate cleaning up the rampant minor modifications present throughout the results would be of great help. Lastly, anyone looking to decrypt RDF/XML syntax data for building knowledge graphs would be welcome!

About

HACK FOR NF AND HELP END NF & OTHER RARE DISEASES FOR PATIENTS ALL OVER THE WORLD

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published