Skip to content

Jupyter notebook companion for the manuscript "BGCFlow: Systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets."

License

Notifications You must be signed in to change notification settings

NBChub/saccharopolyspora_manuscript

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README

This repository contains the scripts (in jupyter notebooks) to generate the figure for the analysis of Saccharopolyspora genomes in Nuhamunada, M., O.S. Mohite et al. (2023) using BGCFlow.

USAGE

1. Clone this repository

git clone https://github.com/matinnuhamunada/saccharopolyspora_manuscript.git 

2. Set up BGCFlow

# create and activate new conda environment
conda create -n bgcflow pip -y
conda activate bgcflow

# install BGCFlow wrapper
pip install git+https://github.com/NBChub/bgcflow_wrapper.git

# clone BGCFlow to "bgcflow" folder
bgcflow clone bgcflow

2. Download the dataset

  • Donwload the dataset containing the BGCFlow runs from Zenodo
# move to bgcflow dir
cd bgcflow

# download and extract dataset
wget https://zenodo.org/record/8018055/files/saccharopolyspora_dataset.zip
unzip saccharopolyspora_dataset.zip

3. Set configurations

# go back to the manuscript dir
cd ../saccharopolyspora_manuscript/

# edit the location of the bgcflow dir to the right directory
nano config.yaml

4. Setting up Conda Environments

Install these conda environments:

mamba env create -f python_notebook.yaml
mamba env create -f r_notebook.yaml
mamba env create -f <bgcflow_dir>/workflow/envs/cblaster.yaml

5. Run the notebooks

  • There are two kind of notebooks, R (.R.ipynb) and python (.python.ipynb)
  • Run the notebook using the corresponding conda environment: python_notebook or r_notebook
  • Start jupyter session
# for python
conda activate python_notebook
jupyter lab
# for R
conda activate r_notebook
jupyter lab
  • Run the notebooks in order

Citation

Matin Nuhamunada, Omkar S. Mohite, Patrick V. Phaneuf, Bernhard O. Palsson, and Tilmann Weber. (2023). BGCFlow: Systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets. bioRxiv 2023.06.14.545018; doi: https://doi.org/10.1101/2023.06.14.545018

Nuhamunada, Matin, & Mohite, Omkar Satyavan. (2023). BGCFlow Analysis of Saccharopolyspora Genomes (0.1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8018055

About

Jupyter notebook companion for the manuscript "BGCFlow: Systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets."

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • HTML 97.6%
  • Jupyter Notebook 2.4%