seqSight

seqSight is a tool to jointly profile microbial strains, genes, and biosynthetic gene clusters from metagenomics data; it is designed to provided maximum utility to the user by incorporating a number of analysis modules for the quantification of not only bacterial strains but also gene families and biosynthetic gene clusters. seqSight also incorporates quality control modules and visualization tools.

Citation:

Xinyang Zhang, Tyson Dawson, Keith A. Crandall, Ali Rahnavard (2023+), seqSight: jointly profile microbial strains, genes, and biosynthetic gene clusters from metagenomics data, https://github.com/omicsEye/seqSight

Attention

Please check our omicsEye Support Forum for common questions before open issue thread there.

seqSight user manual

Database	Description
`MIBiG`	MIBiG is a comprehensive database that focuses on natural product BGCs. It focuses on providing curated BGCs with associated metadata, including chemical structures, gene annotations, and experimental data. MIBiG serves as a centralized resource for known BGCs and aims to standardize the annotation and reporting of BGC information.
`antiSMASH`	antiSMASH, in addition to being a BGC prediction tool, maintains a database of predicted BGCs. The database includes a broad range of BGCs, covering various secondary metabolite classes.
`ClusterFinder`	The ClusterFinder database is a repository of predicted BGCs identified by the ClusterFinder tool. It focuses on BGCs identified in bacterial genomes and provides information on gene clusters, predicted products, and associated metadata. It aims to facilitate the exploration and analysis of BGC diversity in bacteria.
`PRISM`	PRISM hosts a database of predicted BGCs identified by the PRISM tool. It includes a collection of BGCs and associated secondary metabolite annotations. The database aims to facilitate the analysis and comparison of BGCs across different genomes.
`BiG-FAM`	BiG-FAM maintains a database of biosynthetic gene families identified by the BiG-FAM tool. It provides information on conserved gene families across BGCs and supports comparative analysis and functional characterization.

Real world examples
- Visualization
Support

Features

Generality: seqSight uses sequence reads as input with filtering and QC.
Mapping database
- Taxonomic Reference Genomes
- Biosynthesis gene clusters
Downstream Analysis:
- Gene Family Pathway Analysis

seqSight

REQUIREMENTS

INSTALLATION

If you have a working conda on your system, you can safely skip to step three.

Install conda
Go to the Anaconda website and download the latest version for your operating system.
DO NOT FORGET TO ADD CONDA TO your system PATH
Second is to check for conda availability
open a terminal (or command line for Windows users) and run:

conda --version

it should output something like this:

conda 4.12.0

if not, you must make conda available to your system for further steps. if you have problems adding conda to PATH, you can find instructions here.

Third create a new conda environment (let's call it seqSight_env) with the following command:

conda create --name seqSight_env python=3.9

Then activate your conda environment:

conda activate seqSight_env

Finally, install seqSight:
You can directly install it from GitHub:

python -m pip install git+https://github.com/omicsEye/seqSight

or before running the following line you should change your directory to the same directory that you have cloned the seqSight repo:

python -m pip install .

Getting Started with seqSight

Test seqSight

To test if seqSight is installed correctly, you may run the following command in the terminal:

seqSight -h

Which yields seqSight command line options.

Options

$ seqSight -h
usage: seqSight [-h] [-U MAP_INPUTREAD] [-1 MAP_INPUTREAD1] [-2 MAP_INPUTREAD2] [-targetRefFiles MAP_TARGETREF] [-filterRefFiles MAP_FILTERREF]
                [-targetAlignParams MAP_TARGETALIGNPARAMS] [-filterAlignParams MAP_FILTERALIGNPARAMS] [-outDir MAP_OUTDIR] [-outAlign MAP_OUTALIGN] [-indexDir MAP_INDEXDIR]
                [-targetIndexPrefixes MAP_TARGETINDEX] [-filterIndexPrefixes MAP_FILTERINDEX] [-targetAlignFiles MAP_TARGETALIGN] [-filterAlignFiles MAP_FILTERALIGN]

 
options:
  -h, --help            show this help message and exit
  -U MAP_INPUTREAD      Input Read Fastq File (Unpaired/Single-end)
  -1 MAP_INPUTREAD1     Input Read Fastq File (Pair 1)
  -2 MAP_INPUTREAD2     Input Read Fastq File (Pair 2)
  -targetRefFiles MAP_TARGETREF
                        Target Reference Genome Fasta Files Full Path (Comma Separated)
  -filterRefFiles MAP_FILTERREF
                        Filter Reference Genome Fasta Files Full Path (Comma Separated)
  -targetAlignParams MAP_TARGETALIGNPARAMS
                        Target Mapping Bowtie2 Parameters (Default: seqSight chosen best parameters)
  -filterAlignParams MAP_FILTERALIGNPARAMS
                        Filter Mapping Bowtie2 Parameters (Default: Use the same Target Mapping Bowtie2 parameters)
  -outDir MAP_OUTDIR    Output Directory (Default=. (current directory))
  -outAlign MAP_OUTALIGN
                        Output Alignment File Name (Default=outalign.sam)
  -indexDir MAP_INDEXDIR
                        Index Directory (Default=. (current directory))
  -targetIndexPrefixes MAP_TARGETINDEX
                        Target Index Prefixes (Comma Separated)
  -filterIndexPrefixes MAP_FILTERINDEX
                        Filter Index Prefixes (Comma Separated)
  -targetAlignFiles MAP_TARGETALIGN
                        Target Alignment Files Full Path (Comma Separated)
  -filterAlignFiles MAP_FILTERALIGN
                        Filter Alignment Files Full Path (Comma Separated)

Input

The two required input parameters are:

-i or --input: reads.
--output-folder: a folder containing all the output files

A list of all options are provided in #options section.

Output

$ seqSight -h
usage: seqSight [-h]

seqSight piplines

Taxonomic profiling

Bayesian Reassignment

Taxonomic profiling

Visualization

Utilities

seqSight's repository features utility scripts to help in the manipulation of sample output and its visualization. These scripts can be found under the utils folder in the seqSight directory.

Merge Tables

The script merge_seqSight_tables.py allows to combine seqSight output from several samples to be merged into one table Bugs (rows) vs Samples (columns) with the table enlisting the relative normalized abundances per sample per bug.

merge_seqSight_tables.py [path_of_folder_contains_outputs] > output/merged_abundance_table.txt

Visulization Demo

Go to the seqSight/Notebooks, download FiveTargetNum.tsv, FiveTargetReads.tsv and stackedplot.ipynb.
FiveTargetNum.tsv and FiveTargetReads.tsv are two output files that generated from seqSight.
Run the code either on the google colab or in your loacl environment.
The stacked bar plot show the composition distribution and their corresponding reads.
The final look could be liked the following:

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
Notebooks		Notebooks
data		data
img		img
seqSight		seqSight
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seqSight

Citation:

Attention

seqSight user manual

Contents

Features

seqSight

REQUIREMENTS

INSTALLATION

Getting Started with seqSight

Test seqSight

Options

Input

Output

seqSight piplines

Taxonomic profiling

Utilities

Merge Tables

Visulization Demo

About

Releases

Packages

Contributors 4

Languages

License

omicsEye/seqSight

Folders and files

Latest commit

History

Repository files navigation

seqSight

Citation:

Attention

seqSight user manual

Contents

Features

seqSight

REQUIREMENTS

INSTALLATION

Getting Started with seqSight

Test seqSight

Options

Input

Output

seqSight piplines

Taxonomic profiling

Utilities

Merge Tables

Visulization Demo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages