Smash++

A fast tool to find and visualize rearrangements in DNA sequences.

Install

To install Smash++ on various operating systems, follow the instructions below. It requires CMake (>= 3.9) and a C++14 compliant compiler. Note that a precompiled executable is available for 64 bit operating systems in the experiment/bin directory.

Docker

Pull the image by

docker pull smortezah/smashpp

and run it:

docker run -it smortezah/smashpp

Conda

Install Miniconda, then run the following:

conda install -c bioconda -y smashpp

Ubuntu

Install Git, CMake and g++:

  apt update && apt install -y git cmake g++

Clone Smash++ and install it:

  git clone --depth 1 https://github.com/smortezah/smashpp.git
  cd smashpp
  bash install.sh

macOS

Install Homebrew, Git and CMake:

  /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  brew install git cmake

Clone Smash++ and install it:

  git clone --depth 1 https://github.com/smortezah/smashpp.git
  cd smashpp
  bash install.sh

Windows

Install WSL (Windows Subsystem for Linux), then clone Smash++ and install it, like in Ubuntu:

git clone --depth 1 https://github.com/smortezah/smashpp.git
cd smashpp
./install.sh

Note: in all operating systems, in the case of permission denial, you can use sudo bash install.sh instead of ./install.sh.

Run

./smashpp [OPTIONS]  -r <REF-FILE>  -t <TAR-FILE>

For example,

./smashpp -r ref -t tar

It is recommended to choose short names for reference and target sequences.

Options

To see the possible options for Smash++, type:

./smashpp

which provides the following:

SYNOPSIS
  ./smashpp [OPTIONS]  -r <REF-FILE>  -t <TAR-FILE>

OPTIONS
  Required:
  -r  <FILE>         = reference file (Seq/FASTA/FASTQ)
  -t  <FILE>         = target file    (Seq/FASTA/FASTQ)

  Optional:
  -l  <INT>          = level of compression: [0, 6]. Default -> 3
  -m  <INT>          = min segment size: [1, 4294967295]     -> 50
  -e  <FLOAT>        = entropy of 'N's: [0.0, 100.0]         -> 2.0
  -n  <INT>          = number of threads: [1, 255]           -> 4
  -f  <INT>          = filter size: [1, 4294967295]          -> 100
  -ft <INT/STRING>   = filter type (windowing function):     -> hann
                       {0/rectangular, 1/hamming, 2/hann,
                       3/blackman, 4/triangular, 5/welch,
                       6/sine, 7/nuttall}
  -fs [S][M][L]      = filter scale:
                       {S/small, M/medium, L/large}
  -d  <INT>          = sampling steps                        -> 1
  -th <FLOAT>        = threshold: [0.0, 20.0]                -> 1.5
  -rb <INT>          = ref beginning guard: [-32768, 32767]  -> 0
  -re <INT>          = ref ending guard: [-32768, 32767]     -> 0
  -tb <INT>          = tar beginning guard: [-32768, 32767]  -> 0
  -te <INT>          = tar ending guard: [-32768, 32767]     -> 0
  -ar                = consider asymmetric regions           -> no
  -nr                = do NOT compute self complexity        -> no
  -sb                = save sequence (input: FASTA/FASTQ)    -> no
  -sp                = save profile (*.prf)                  -> no
  -sf                = save filtered file (*.fil)            -> no
  -ss                = save segmented files (*.s[i])         -> no
  -sa                = save profile, filetered and           -> no
                       segmented files
  -rm k,[w,d,]ir,a,g/t,ir,a,g:...
  -tm k,[w,d,]ir,a,g/t,ir,a,g:...
                     = parameters of models
                <INT>  k:  context size
                <INT>  w:  width of sketch in log2 form,
                           e.g., set 10 for w=2^10=1024
                <INT>  d:  depth of sketch
                <INT>  ir: inverted repeat: {0, 1, 2}
                           0: regular (not inverted)
                           1: inverted, solely
                           2: both regular and inverted
              <FLOAT>  a:  estimator
              <FLOAT>  g:  forgetting factor: [0.0, 1.0)
                <INT>  t:  threshold (no. substitutions)
  -ll                = list of compression levels
  -h                 = usage guide
  -v                 = more information
  --version          = show version

AUTHOR
  Morteza Hosseini     seyedmorteza@ua.pt

SAMPLE
  ./smashpp -r ref -t tar -l 0 -m 1000

To see the options for Smash++ Visualizer, type:

./smashpp -viz

which provides the following:

SYNOPSIS
  ./smashpp -viz [OPTIONS]  -o <SVG-FILE>  <POS-FILE>

OPTIONS
  Required:
  <POS-FILE>         = position file, generated by
                       Smash++ tool (*.pos)

  Optional:
  -o  <SVG-FILE>     = output image name (*.svg).    Default -> map.svg
  -rn <STRING>       = reference name shown on output. If it
                       has spaces, use double quotes, e.g.
                       "Seq label". Default: name in header
                       of position file
  -tn <STRING>       = target name shown on output
  -l  <INT>          = type of the link between maps: [1, 6] -> 1
  -c  <INT>          = color mode: [0, 1]                    -> 0
  -p  <FLOAT>        = opacity: [0.0, 1.0]                   -> 0.9
  -w  <INT>          = width of the sequence: [8, 100]       -> 10
  -s  <INT>          = space between sequences: [5, 200]     -> 40
  -tc <INT>          = total number of colors: [1, 255]
  -rt <INT>          = reference tick: [1, 4294967295]
  -tt <INT>          = target tick: [1, 4294967295]
  -th [0][1]         = tick human readable: 0=false, 1=true  -> 1
  -m  <INT>          = minimum block size: [1, 4294967295]   -> 1
  -vv                = vertical view                         -> no
  -nrr               = do NOT show relative redundancy       -> no
                       (relative complexity)
  -nr                = do NOT show redunadancy               -> no
  -ni                = do NOT show inverse maps              -> no
  -ng                = do NOT show regular maps              -> no
  -n                 = show 'N' bases                        -> no
  -stat              = save stats (*.csv)                    -> stat.csv
  -h                 = usage guide
  -v                 = more information
  --version          = show version

AUTHOR
  Morteza Hosseini     seyedmorteza@ua.pt

SAMPLE
  ./smashpp -viz -vv -o simil.svg ref.tar.pos

Example

After installing Smash++, copy its executable file into example directory and go to that directory:

cp smashpp example/
cd example/

There is in this directory two 1000 base sequences, the reference sequence named ref, and the target sequence, named tar. Now, run Smash++ and the visualizer:

./smashpp -r ref -t tar
./smashpp -viz -o example.svg ref.tar.pos

Cite

Please cite the following, if you use Smash++:

M. Hosseini, D. Pratas, B. Morgenstern, A.J. Pinho, "Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements," GigaScience, vol. 9, no. 5, 2020. DOI: 10.1093/gigascience/giaa048

Issues

Please let us know if there is any issues.

License

Smash++ is licensed under GNU GPL v3.

Name		Name	Last commit message	Last commit date
Latest commit History 2,056 Commits
example		example
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smash++

Install

Docker

Conda

Ubuntu

macOS

Windows

Run

Options

Example

Cite

Issues

License

About

Releases

Packages

Languages

License

asilab/smashpp

Folders and files

Latest commit

History

Repository files navigation

Smash++

Install

Docker

Conda

Ubuntu

macOS

Windows

Run

Options

Example

Cite

Issues

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages