bio-ontology-research-group / hpi-predict Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

The scripts to reproduce the analysis of the HPI prediction paper

GPL-3.0 license

0 stars 0 forks Branches Tags Activity

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
hyperopt.ipynb		hyperopt.ipynb
plots.ipynb		plots.ipynb
predictions_bacteria_intersection.tsv		predictions_bacteria_intersection.tsv
predictions_virus_intersection.tsv		predictions_virus_intersection.tsv
preprocessing_HPIDB.ipynb		preprocessing_HPIDB.ipynb
preprocessing_pathogen.ipynb		preprocessing_pathogen.ipynb
preprocessing_protein.ipynb		preprocessing_protein.ipynb
train&predict.ipynb		train&predict.ipynb
train.ipynb		train.ipynb

Repository files navigation

Prediction of Host Pathogen Interactions

The scripts to reproduce the analysis of the HPI prediction paper

Notebooks

All notebooks should be run with Python3.

1. Preprocessing

preprocessing_pathogen.ipynb: From PathoPhenoDB, generates the input file for OPA2Vec.
preprocessing_protein.ipynb: From HPO, MGI and GO, generates the input file for OPA2Vec (example is given for the intersection set of HP, MP and GO).
preprocessing_HPIDB.ipynb: From HPIDB, generates hpidb2.score.txt by converting entrez IDs to UniProt IDs hyperopt.

2. Hyperparameter optimization using hyperas

hyperopt.ipynb: Runs the hyperas library to tune hyperparameters.

3. Training, evaluation and prediction

train.ipynb: Trains the prediction model by leave-one-taxon-out cross-validation and computing the performance. The input files need to be changed for each set of features.
train&predict.ipynb: Trains and saves the model predictions to predictions.tsv.

4. Plotting the figures in the paper

plots.ipynb: generates the TPR plots and the AUC-Epoch figures.

Prediction results

predictions_bacteria_intersection.tsv: predicted protein partners for 29 bacteria taxa based on the intersection feature for proteins and the taxonomic propogation
predictions_virus_intersection.tsv: predicted protein partners for 96 virus taxa based on the intersection feature for proteins and the taxonomic propogation

About

The scripts to reproduce the analysis of the HPI prediction paper

machine-learning pathogen ontology predictive-analytics infectious-diseases host-pathogen

GPL-3.0 license

Custom properties

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 100.0%