This repository contains the scripts (in jupyter notebooks) to generate the figure for the analysis of Saccharopolyspora genomes in Nuhamunada, M., O.S. Mohite et al. (2023) using BGCFlow.
git clone https://github.com/matinnuhamunada/saccharopolyspora_manuscript.git
# create and activate new conda environment
conda create -n bgcflow pip -y
conda activate bgcflow
# install BGCFlow wrapper
pip install git+https://github.com/NBChub/bgcflow_wrapper.git
# clone BGCFlow to "bgcflow" folder
bgcflow clone bgcflow
- Donwload the dataset containing the BGCFlow runs from Zenodo
# move to bgcflow dir
cd bgcflow
# download and extract dataset
wget https://zenodo.org/record/8018055/files/saccharopolyspora_dataset.zip
unzip saccharopolyspora_dataset.zip
# go back to the manuscript dir
cd ../saccharopolyspora_manuscript/
# edit the location of the bgcflow dir to the right directory
nano config.yaml
Install these conda environments:
mamba env create -f python_notebook.yaml
mamba env create -f r_notebook.yaml
mamba env create -f <bgcflow_dir>/workflow/envs/cblaster.yaml
- There are two kind of notebooks, R (.R.ipynb) and python (.python.ipynb)
- Run the notebook using the corresponding conda environment:
python_notebook
orr_notebook
- Start jupyter session
# for python
conda activate python_notebook
jupyter lab
# for R
conda activate r_notebook
jupyter lab
- Run the notebooks in order
Matin Nuhamunada, Omkar S. Mohite, Patrick V. Phaneuf, Bernhard O. Palsson, and Tilmann Weber. (2023). BGCFlow: Systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets. bioRxiv 2023.06.14.545018; doi: https://doi.org/10.1101/2023.06.14.545018
Nuhamunada, Matin, & Mohite, Omkar Satyavan. (2023). BGCFlow Analysis of Saccharopolyspora Genomes (0.1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8018055