Skip to content

Estimating Phylogenetic trees using six microorganisms 16S rRNA gene with Unsupervised Learning, web based tools and Molecular Evolutionary Genetics Analysis MEGA7

License

Notifications You must be signed in to change notification settings

Shuyib/Phylogenetic-tree-study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phylogenetic tree study Continuous Integration/Continuos Delivery Binder

Phylogenetic-tree-study

Estimating Phylogenetic trees using 30 microorganisms (previously 6 organisms: review data folder and notebook. Looking at the 16S rRNA gene with Unsupervised Learning, web based tools and Molecular Evolutionary Genetics Analysis MEGA7. Further we are looking at motifs and finding out what they do.

It is important to know these regions since they can potentially give use clues about the regions we can target for targeted DNA therapies.

How to run in a virtual environment

Make the virtual environment. When working in your own system

python3 -m venv phylo-env   

Activate the virtual environment.

source phylo-env/bin/activate   

Install packages. You need an email to be in your .bashrc file to run biopython.

make install run_script

In your terminal, in the directory where you cloned this repository. Run this command to run notebooks.

jupyter notebook Phylogenetic_trees_unsupervised_learning.ipynb

Previously, we've not provided a codebook/data description file since one of the headings cover that in the notebook. Otherwise, you can check out the notebook or the HTML file i've provided in the repository.

How to run project in Docker

Build DockerFile

sudo docker build -t phylo-exp .

Run the Docker image

sudo docker run -it -p 8888:8888 --rm phylo-exp:latest

Initialize dvc to the folder. To allow us to use dvc functionality to be used in the repo. NB. files will be created in the directory.

dvc init

Track changes to the different data files. The reason why we are doing this is because these files will change during the experiment especially if the investigator wants to try other experiments with more data.

dvc add updated_data/data.md
dvc add updated_data/sequences.fasta
dvc add updated_data/sequence_metrics.csv

Commit changes to save the changes that have occured in the repository.

dvc commit