Next-generation genomic mobile element and structural variant identification tool.
Chapulin is a robust, portable and blazing fast tool to identify mobile element insertions as well as structural variants in resequenced population data with a reference assembly. Chapulin uses alignment files (SAM) to scan putative mobile element / structural variant associated reads and then perform calls based on a select arbitrary threshold or a calculate probability threshold. Chapulin 's input and output formats are fully compatible with commonly used software, e.g. RepeatModeler.
Chapulin offers two different scanning modes:
Additionally, to improve user experience, Chapulin offers Cache Registering (CR), Generate Configuration (GC) and AutoCompletion (AC).
Soon to come
Prerequisites:
brew install danielrivasmd/chapulin
Soon to come
curl -SsL https://fbecart.github.io/ppa/debian/KEY.gpg | sudo apt-key add -
sudo curl -SsL -o /etc/apt/sources.list.d/fbecart.list https://fbecart.github.io/ppa/debian/fbecart.list
sudo apt update
sudo apt install chapulin
Soon to come
Prerequisites:
cargo install chapulin
Soon to come
Simply download the release binary for your operating system. Chapulin is self-contained so it does not need dependencies.
Chapulin is written in Rust, so you'll need to grab a Rust installation in order to compile it.
To build Chapulin :
git clone https://github.com/DanielRivasMD/Chapulin
cd Chapulin
cargo build --release
./target/release/chapulin --version
Chapulin 0.1.0
To run the test suite, use:
cargo test
To view the documentation, run:
cargo doc
To open the documentation in your browser, run:
cargo doc --open
Use chapulin help
or chapulin -h
or chapulin --help
to display help on commandline. Running chapulin
with no arguments nor flags also triggers help.
chapulin 0.1.0
Daniel Rivas <danielrivasmd@gmail.com>
Chapulin: Mobile Element Identification
Software for mobile element identification in resequenced short-read data with a reference genome.
Available subcommands are:
Mobile Element (ME): performs sequence similarity search to a customized mobile element library and
insertion calls by probability or a set threshold. Aliases: 'me', 'MobileElement'.
Structural Variant (SV): performs read selection based on alignment data and variant calls by
probability or a set threshold. Aliases: 'sv', 'StructuralVariant'.
Cache Registering (CR): checks for reference genome and mobile element library cache in
configuration directory. In case caches are not found, reads files and writes cache. Aliases: 'cr',
'CacheRegistering'.
Generate Configuration (GC): generates a configuration template. Observe that not all values from
config file are used at all times. Aliases: 'gc', 'GenerateConfiguration'.
AutoCompletion (AC): generates autocompletions to stdout for your shell. Pipe into a file and
install to get help when using Chapulin. See `chapulin AC --manual` for details. Aliases: 'ac',
AutoCompletion'.
USAGE:
chapulin [SUBCOMMAND]
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
SUBCOMMANDS:
AC AutoCompletion [aliases: ac, AutoCompletion]
CR Cache Registering [aliases: cr, CacheRegistering]
GC Generate Configuration [aliases: gc, GenerateConfiguration]
ME Mobile Element Identification [aliases: me, MobileElement]
SV Structural Variant Identification [aliases: sv, StructuralVariant]
T Testing
help Prints this message or the help of the given subcommand(s)
Mobile Element mode is meant for scanning mobile elements in a host genome, such as LTR-elements. This mode depends on an alignment file to the desire reference, for the purpose of locating putative insertions on chromosomal coordinates, and a mobile element library, which can be produced by using RepeatModeler on a reference genome. Alternatively, a customized library can be used, which makes it ideal to adapt Chapulin 's search algorithm for other purposes, such as orphan gene discovery. Example Mobile Element subcommand.
Structural Variant mode will scanned for structural variants in a host genome, for instance insertions, duplications, inversions. The algorithm relies on read length, read depth and read orientation provided by the raw reads and the alignment file. Additionally, a list of known structural variant coordinates can be input to enhance the possibility of identification. Example Structural Variant subcommand.
Cache Registering mode can write cache files to read from, which will improve running time. This is useful in case you might want to analyze several individuals using a single mobile element library or a single reference genome. Example Cache Registering subcommand.
Chapulin does not come with a configuration file by default. However by using Generate Configuration (GC) you can generate a editable toml
file with preloaded defaults. In case you make a mistake editing the configuration, Chapulin 's error handling will let you know exactly what failed and display examples to help you fix it. Example Generate Configuration subcommand.
In case you want to explore Chapulin interactively, you might want to install autocompletion for your shell by running AutoCompletion (AC). Specific instructions on how to install these autocompletions for your shell can be found with the flag --manual
. You can also preview your run by running any command with the flag --dry-run
. Example AutoCompletion subcommand.
Soon to come
Below you can find a config example. Please observe that you can also obtain a config template by using Generate Configuration
or GC
command from Chapulin.
chapulin ME -c <CONFIG>
chapulin SV -c <CONFIG>
chapulin CR
chapulin GC
chapulin AC
The word chapulín derives from Náhuatl chapōlin, where the compounds chapā[nia] and ōlli mean "to bounce" and "rubber", respectively. Thus meaning "insect that bounces like rubber".
Inhabitants and visitors of Mexico City will be familiar to the 'Chapulín' image for its reference to the beautiful "Chapultepec" or "Chapulín's hill" forest, castle and metro station. It also alludes to the delicious "chapulín's tacos" eaten in Central and South Mexico.
Chapulin is distributed under the terms of the GNU GENERAL PUBLIC LICENSE.
See LICENSE for details.
Author's Note: This package is still under active development and is subject to change.