seqSight is a tool to jointly profile microbial strains, genes, and biosynthetic gene clusters from metagenomics data; it is designed to provided maximum utility to the user by incorporating a number of analysis modules for the quantification of not only bacterial strains but also gene families and biosynthetic gene clusters. seqSight also incorporates quality control modules and visualization tools.
Xinyang Zhang, Tyson Dawson, Keith A. Crandall, Ali Rahnavard (2023+), seqSight: jointly profile microbial strains, genes, and biosynthetic gene clusters from metagenomics data, https://github.com/omicsEye/seqSight
Please check our omicsEye Support Forum for common questions before open issue thread there.
Database | Description |
---|---|
MIBiG |
MIBiG is a comprehensive database that focuses on natural product BGCs. It focuses on providing curated BGCs with associated metadata, including chemical structures, gene annotations, and experimental data. MIBiG serves as a centralized resource for known BGCs and aims to standardize the annotation and reporting of BGC information. |
antiSMASH |
antiSMASH, in addition to being a BGC prediction tool, maintains a database of predicted BGCs. The database includes a broad range of BGCs, covering various secondary metabolite classes. |
ClusterFinder |
The ClusterFinder database is a repository of predicted BGCs identified by the ClusterFinder tool. It focuses on BGCs identified in bacterial genomes and provides information on gene clusters, predicted products, and associated metadata. It aims to facilitate the exploration and analysis of BGC diversity in bacteria. |
PRISM |
PRISM hosts a database of predicted BGCs identified by the PRISM tool. It includes a collection of BGCs and associated secondary metabolite annotations. The database aims to facilitate the analysis and comparison of BGCs across different genomes. |
BiG-FAM |
BiG-FAM maintains a database of biosynthetic gene families identified by the BiG-FAM tool. It provides information on conserved gene families across BGCs and supports comparative analysis and functional characterization. |
-
Generality: seqSight uses sequence reads as input with filtering and QC.
-
Mapping database
- Taxonomic Reference Genomes
- Biosynthesis gene clusters
-
Downstream Analysis:
- Gene Family Pathway Analysis
- Matplotlib
- Python 3.9
- Numpy 1.9.*
- Pandas (version >= 0.18.1)
- bowtie2 v2.4.4
- biopython 1.79
- Scipy (version >= 0.17.0)
- Cython (version >= 0.29.2)
- Scikit-learn (version >= 0.14.1)
If you have a working conda on your system, you can safely skip to step three.
- Install conda
Go to the Anaconda website and download the latest version for your operating system.
DO NOT FORGET TO ADD CONDA TO your system PATH - Second is to check for conda availability
open a terminal (or command line for Windows users) and run:
conda --version
it should output something like this:
conda 4.12.0
if not, you must make conda available to your system for further steps. if you have problems adding conda to PATH, you can find instructions here.
- Third create a new conda environment (let's call it seqSight_env) with the following command:
conda create --name seqSight_env python=3.9
- Then activate your conda environment:
conda activate seqSight_env
-
Finally, install seqSight:
-
You can directly install it from GitHub:
python -m pip install git+https://github.com/omicsEye/seqSight
- or before running the following line you should change your directory to the same directory that you have cloned the seqSight repo:
python -m pip install .
To test if seqSight is installed correctly, you may run the following command in the terminal:
seqSight -h
Which yields seqSight command line options.
$ seqSight -h
usage: seqSight [-h] [-U MAP_INPUTREAD] [-1 MAP_INPUTREAD1] [-2 MAP_INPUTREAD2] [-targetRefFiles MAP_TARGETREF] [-filterRefFiles MAP_FILTERREF]
[-targetAlignParams MAP_TARGETALIGNPARAMS] [-filterAlignParams MAP_FILTERALIGNPARAMS] [-outDir MAP_OUTDIR] [-outAlign MAP_OUTALIGN] [-indexDir MAP_INDEXDIR]
[-targetIndexPrefixes MAP_TARGETINDEX] [-filterIndexPrefixes MAP_FILTERINDEX] [-targetAlignFiles MAP_TARGETALIGN] [-filterAlignFiles MAP_FILTERALIGN]
options:
-h, --help show this help message and exit
-U MAP_INPUTREAD Input Read Fastq File (Unpaired/Single-end)
-1 MAP_INPUTREAD1 Input Read Fastq File (Pair 1)
-2 MAP_INPUTREAD2 Input Read Fastq File (Pair 2)
-targetRefFiles MAP_TARGETREF
Target Reference Genome Fasta Files Full Path (Comma Separated)
-filterRefFiles MAP_FILTERREF
Filter Reference Genome Fasta Files Full Path (Comma Separated)
-targetAlignParams MAP_TARGETALIGNPARAMS
Target Mapping Bowtie2 Parameters (Default: seqSight chosen best parameters)
-filterAlignParams MAP_FILTERALIGNPARAMS
Filter Mapping Bowtie2 Parameters (Default: Use the same Target Mapping Bowtie2 parameters)
-outDir MAP_OUTDIR Output Directory (Default=. (current directory))
-outAlign MAP_OUTALIGN
Output Alignment File Name (Default=outalign.sam)
-indexDir MAP_INDEXDIR
Index Directory (Default=. (current directory))
-targetIndexPrefixes MAP_TARGETINDEX
Target Index Prefixes (Comma Separated)
-filterIndexPrefixes MAP_FILTERINDEX
Filter Index Prefixes (Comma Separated)
-targetAlignFiles MAP_TARGETALIGN
Target Alignment Files Full Path (Comma Separated)
-filterAlignFiles MAP_FILTERALIGN
Filter Alignment Files Full Path (Comma Separated)
The two required input parameters are:
-i or --input:
reads.--output-folder
: a folder containing all the output files
A list of all options are provided in #options section.
$ seqSight -h
usage: seqSight [-h]
- Bayesian Reassignment
- Taxonomic profiling
- Visualization
seqSight's repository features utility scripts to help in the manipulation of sample output and its visualization. These scripts can be found under the utils folder in the seqSight directory.
The script merge_seqSight_tables.py allows to combine seqSight output from several samples to be merged into one table Bugs (rows) vs Samples (columns) with the table enlisting the relative normalized abundances per sample per bug.
merge_seqSight_tables.py [path_of_folder_contains_outputs] > output/merged_abundance_table.txt
- Go to the seqSight/Notebooks, download FiveTargetNum.tsv, FiveTargetReads.tsv and stackedplot.ipynb.
- FiveTargetNum.tsv and FiveTargetReads.tsv are two output files that generated from seqSight.
- Run the code either on the google colab or in your loacl environment.
- The stacked bar plot show the composition distribution and their corresponding reads.
- The final look could be liked the following: