Skip to content

Commit

Permalink
Fixed links in the README and produced a Python-specific README_PYPI …
Browse files Browse the repository at this point in the history
…for Pypi
  • Loading branch information
annazhukova committed Jun 15, 2022
1 parent c1bfac2 commit cfd3981
Show file tree
Hide file tree
Showing 4 changed files with 192 additions and 27 deletions.
3 changes: 2 additions & 1 deletion Manifest.in
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ include phylodeep/pca_a_priori/scalers/*
include phylodeep/pca_a_priori/simulated_data/*
include phylodeep/pretrained_models/models/*
include phylodeep/pretrained_models/scalers/*
include phylodeep/pretrained_models/weights/*
include phylodeep/pretrained_models/weights/*
include README_PYPI.md
43 changes: 22 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,25 @@

PhyloDeep is a python library for parameter estimation and model selection from phylogenetic trees, based on deep learning.

For more information on the method, including the covered parameter subspace, please refer to the preprint here: [bioRxiv](https://www.biorxiv.org/content/10.1101/2021.03.11.435006v1)
For more information on the method, including the covered parameter subspace, please refer to the preprint here: [bioRxiv](https://www.biorxiv.org/content/10.1101/2021.03.11.435006v3).

Together with the phylodeep package code (in the folder phylodeep), we provide:
- all data shown in the preprint (in the folder data_publication)
- used simulators to train deep learners (in the folder simulators)
- the tree analyzed in the study as a showcase application (in the folder test_tree_HIV_Zurich, for description and
original reference see below)
- all data shown in the preprint (in the folder [data_publication](data_publication))
- used simulators to train deep learners (in the folder [simulators](simulators))
- the tree analyzed in the study as a showcase application (in the folder [test_tree_HIV_Zurich](test_tree_HIV_Zurich),
for description and original reference see below).

The data are extensive (including 50.000 testing trees), we thus do not recommend copying the whole repository.

The installation time of the package can be up to several minutes, including downloading dependencies. The run time
should be a couple of seconds. The package was tested in Linux (Ubuntu 18.08), Windows 10 and MacOS.

## Installation

PhyloDeep is available for Python 3.6 on [pip](https://pypi.org/project/phylodeep).
PhyloDeep is available for Python 3.6 on [pip](https://pypi.org/project/phylodeep) (see the installation instructions below).

Alternatively, it can be used via Docker or Singularity: [evolbioinfo/phylodeep](https://hub.docker.com/r/evolbioinfo/phylodeep/tags).

## Installation

### Windows
For **Windows** users, we recommend installing __phylodeep__ via [Cygwin environment](https://www.cygwin.com/).
Expand Down Expand Up @@ -65,18 +67,19 @@ If you installed __phylodeep__ with conda, do not forget to activate the corresp
conda activate phyloenv
```


We recommend to perform a priori model adequacy first to assess whether the input data resembles well the
simulations on which the neural networks were trained.

Here, we use an HIV tree reconstructed from 200 sequences, published in Phylodynamics on local sexual contact networks
by Rasmussen et al in PloS Computational Biology in 2017, and that you can find at [github](./test_tree_HIV_Zurich/Zurich.trees)
and in this [repository](https://github.com/evolbioinfo/phylodeep/blob/main/test_tree_HIV_Zurich/Zurich.trees).
### Example data
Here, we use an HIV tree reconstructed from 200 sequences, published in "Phylodynamics on local sexual contact networks"
by Rasmussen _et al._ [[PLoS Comput. Biol. 2017]](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005448),
which you can find at [PairTree GitHub](https://github.com/davidrasm/PairTree)
and in [test_tree_HIV_Zurich/Zurich.trees](test_tree_HIV_Zurich/Zurich.trees).

### Python

```python
from phylodeep import BD, BDEI, BDSS, SUMSTATS, FULL
from phylodeep import BD, BDEI, BDSS, FULL
from phylodeep.checkdeep import checkdeep
from phylodeep.modeldeep import modeldeep
from phylodeep.paramdeep import paramdeep
Expand Down Expand Up @@ -107,7 +110,7 @@ param_BDSS = paramdeep(path_to_tree, sampling_proba, model=BDSS, vector_represen

### Command line

```python
```bash

# we use here a tree of 200 tips

Expand All @@ -124,20 +127,18 @@ paramdeep -t ./Zurich.trees -p 0.25 -m BDSS -v CNN_FULL_TREE -o HIV_Zurich_BDSS_
paramdeep -t ./Zurich.trees -p 0.25 -m BDSS -v FFNN_SUMSTATS -o HIV_Zurich_BDSS_FFNN_CI.csv -c
```

### Example of output and interpretations
Here, we use an HIV tree reconstructed from 200 sequences, published in Phylodynamics on local sexual contact networks
by Rasmussen et al in PloS Computational Biology in 2017, and that you can find at [github](./test_tree_HIV_Zurich/Zurich.trees)
### Example of output and interpretations

The a priori model adequacy check results in the following figures:

#### BD model adequacy test
![BD model adequacy](./phylodeep/test/BD_model_adequacy.png)
![BD model adequacy](phylodeep/test/BD_model_adequacy.png)

#### BDEI model adequacy test
![BDEI model adequacy](./phylodeep/test/BDEI_model_adequacy.png)
![BDEI model adequacy](phylodeep/test/BDEI_model_adequacy.png)

#### BDSS model adequacy test
![BDSS model adequacy](./phylodeep/test/BDSS_model_adequacy.png)
![BDSS model adequacy](phylodeep/test/BDSS_model_adequacy.png)

For the three models (BD, BDEI and BDSS), HIV tree datapoint (represented by a red star) is well inside the data cloud
of simulations, where warm colors correspond to high density of simulations. The simulations and HIV tree datapoint were
Expand Down Expand Up @@ -170,5 +171,5 @@ due to internal rescaling of all input trees. It should apply to any tree.

## Preprint

Voznica J, Zhukova A, Boskova V, Saulnier E, Lemoine F, Moslonka-Lefebvre M, Gascuel O (2021)
__Deep learning from phylogenies to uncover the transmission dynamics of epidemics__. [bioRxiv](https://www.biorxiv.org/content/10.1101/2021.03.11.435006v1)
Voznica J, Zhukova A, Boskova V, Saulnier E, Lemoine F, Moslonka-Lefebvre M, Gascuel O (2022)
__Deep learning from phylogenies to uncover the transmission dynamics of epidemics__. [bioRxiv](https://www.biorxiv.org/content/10.1101/2021.03.11.435006v3)
163 changes: 163 additions & 0 deletions README_PYPI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# PhyloDeep

PhyloDeep is a python library for parameter estimation and model selection from phylogenetic trees, based on deep learning.

For more information on the method, including the covered parameter subspace, please refer to the preprint here: [bioRxiv](https://www.biorxiv.org/content/10.1101/2021.03.11.435006v3).

The installation time of the package can be up to several minutes, including downloading dependencies. The run time
should be a couple of seconds. The package was tested in Linux (Ubuntu 18.08), Windows 10 and MacOS.

## Installation

### Windows
For **Windows** users, we recommend installing __phylodeep__ via [Cygwin environment](https://www.cygwin.com/).
First instal Python 3.6 and pip3 from the Cygwin packages. Then install __phylodeep__:
```bash
pip3 install phylodeep
```

### All other platforms

You can install __phylodeep__ for Python 3.6 with or without [conda](https://conda.io/docs/), following the procedures described below:

#### Installing with conda

Once you have conda installed, create an environment for __phylodeep__ with Python 3.6 (here we name it phyloenv):

```bash
conda create --name phyloenv python=3.6
```

Then activate it:
```bash
conda activate phyloenv
```

Then install __phylodeep__ in it:

```bash
pip install phylodeep
```

#### Installing without conda

Make sure that Pyhon 3.6 and pip3 are installed, then install __phylodeep__:

```bash
pip3 install phylodeep
```

## Usage

If you installed __phylodeep__ with conda, do not forget to activate the corresponding environment (e.g. phyloenv) before using PhyloDeep:
```bash
conda activate phyloenv
```

We recommend to perform a priori model adequacy first to assess whether the input data resembles well the
simulations on which the neural networks were trained.

### Example data

Here, we use an HIV tree reconstructed from 200 sequences, published in "Phylodynamics on local sexual contact networks"
by Rasmussen _et al._ [[PLoS Comput. Biol. 2017]](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005448),
which you can find at [PairTree GitHub](https://github.com/davidrasm/PairTree)
and in [test_tree_HIV_Zurich/Zurich.trees](https://github.com/evolbioinfo/phylodeep/blob/main/test_tree_HIV_Zurich/Zurich.trees).

### Python

```python
from phylodeep import BD, BDEI, BDSS, FULL
from phylodeep.checkdeep import checkdeep
from phylodeep.modeldeep import modeldeep
from phylodeep.paramdeep import paramdeep


path_to_tree = './Zurich.trees'

# set presumed sampling probability
sampling_proba = 0.25

# a priori check for models BD, BDEI, BDSS
checkdeep(path_to_tree, model=BD, outputfile_png='BD_a_priori_check.png')
checkdeep(path_to_tree, model=BDEI, outputfile_png='BDEI_a_priori_check.png')
checkdeep(path_to_tree, model=BDSS, outputfile_png='BDSS_a_priori_check.png')


# model selection
model_BDEI_vs_BD_vs_BDSS = modeldeep(path_to_tree, sampling_proba, vector_representation=FULL)

# the selected model is BDSS

# parameter inference
param_BDSS = paramdeep(path_to_tree, sampling_proba, model=BDSS, vector_representation=FULL,
ci_computation=True)

# for the interpretation of results, please see below
```

### Command line

```bash

# we use here a tree of 200 tips

# a priori model adequacy check: highly recommended
checkdeep -t ./Zurich.trees -m BD -o BD_model_adequacy.png
checkdeep -t ./Zurich.trees -m BDEI -o BDEI_model_adequacy.png
checkdeep -t ./Zurich.trees -m BDSS -o BDSS_model_adequacy.png

# model selection
modeldeep -t ./Zurich.trees -p 0.25 -v CNN_FULL_TREE -o model_selection.csv

# parameter inference
paramdeep -t ./Zurich.trees -p 0.25 -m BDSS -v CNN_FULL_TREE -o HIV_Zurich_BDSS_CNN.csv
paramdeep -t ./Zurich.trees -p 0.25 -m BDSS -v FFNN_SUMSTATS -o HIV_Zurich_BDSS_FFNN_CI.csv -c
```

### Example of output and interpretations

The a priori model adequacy check results in the following figures:

#### BD model adequacy test
![BD model adequacy](https://raw.githubusercontent.com/evolbioinfo/phylodeep/main/phylodeep/test/BD_model_adequacy.png)

#### BDEI model adequacy test
![BDEI model adequacy](https://raw.githubusercontent.com/evolbioinfo/phylodeep/main/phylodeep/test/BDEI_model_adequacy.png)

#### BDSS model adequacy test
![BDSS model adequacy](https://raw.githubusercontent.com/evolbioinfo/phylodeep/main/phylodeep/test/BDSS_model_adequacy.png)

For the three models (BD, BDEI and BDSS), HIV tree datapoint (represented by a red star) is well inside the data cloud
of simulations, where warm colors correspond to high density of simulations. The simulations and HIV tree datapoint were
in the form of summary statistics prior to applying PCA. All three models thus pass the model adequacy check.

We then apply model selection using the full tree representation and obtain the following result:

| Model | Probability BDEI | Probability BD | Probability BDSS |
| -------- | ------------- | ------------- | ------------- |
| __Predicted probability__ | 0.00 | 0.00 | 1.00 |

The BDSS probability is by far the highest: it is the BDSS model that is confidently selected

Finally, under the selected model BDSS, we predict parameter values together with 95% CIs:

| | R naught | Infectious period | X transmission | Superspreading fraction |
| ------------- | ------------- | ------------- | ------------- | ------- |
| __predicted value__ | 1.69 | 9.78 | 9.34 | 0.079 |
| __CI 2.5%__ | 1.40 | 8.12 | 6.65 | 0.050 |
| __CI 97.5%__ | 2.08 | 12.26 | 10 | 0.133 |

The point estimates for parameters that are no time related (R naught, X transmission and Superspreading fraction) are
well inside the parameter ranges of simulations and thus seem valid (R naught between 1 and 5, x transmission between 3
and 10, superspreading fraction between 0.05 and 0.20).


The time related parameters (infectious and eventually incubation period for BDEI model) are in the same units as the
branches of input tree, here in years (9.78 years). The covered parameter space for time related parameters is large
due to internal rescaling of all input trees. It should apply to any tree.

## Preprint

Voznica J, Zhukova A, Boskova V, Saulnier E, Lemoine F, Moslonka-Lefebvre M, Gascuel O (2022)
__Deep learning from phylogenies to uncover the transmission dynamics of epidemics__. [bioRxiv](https://www.biorxiv.org/content/10.1101/2021.03.11.435006v3)
10 changes: 5 additions & 5 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@
os.path.join('pretrained_models', 'models', '*.json'),
os.path.join('pretrained_models', 'scalers', '*.pkl'),
os.path.join('pretrained_models', 'weights', '*.h5'),
'README.md']},
long_description=open('README.md').read(),
'README_PYPI.md']},
long_description=open('README_PYPI.md').read(),
long_description_content_type='text/markdown',
classifiers=[
'Development Status :: 2 - Pre-Alpha',
'Development Status :: 4 - Beta',
'Environment :: Console',
'Intended Audience :: Science/Research',
'Topic :: Scientific/Engineering :: Artificial Intelligence',
Expand All @@ -24,13 +24,13 @@
'Topic :: Software Development :: Libraries :: Python Modules',
'Programming Language :: Python :: 3'
],
version='0.2.61',
version='0.3',
description='Phylodynamic paramater and model inference using pretrained deep neural networks.',
author='Jakub Voznica',
author_email='jakub.voznica@pasteur.fr',
maintainer='Anna Zhukova',
maintainer_email='anna.zhukova@pasteur.fr',
url='https://github.com/evolbioinfo/pastml',
url='https://github.com/evolbioinfo/phylodeep',
keywords=['phylodynamics', 'molecular epidemiology', 'phylogeny', 'model selection',
'paramdeep', 'phylodeep', 'deep learning', 'convolutional networks'],
install_requires=['ete3', 'pandas', 'numpy', 'scipy==1.1.0', 'scikit-learn==0.19.1',
Expand Down

0 comments on commit cfd3981

Please sign in to comment.