VNtyper 2.0 is an advanced pipeline designed to genotype MUC1 coding Variable Number Tandem Repeats (VNTR) in Autosomal Dominant Tubulointerstitial Kidney Disease (ADTKD-MUC1) using Short-Read Sequencing (SRS) data. This version integrates enhanced variant calling algorithms, robust logging mechanisms, and streamlined installation processes to provide researchers with a powerful tool for VNTR analysis.
- Features
- Installation
- Usage
- Pipeline Overview
- Dependencies
- Pipeline Logic Diagram
- Results
- Notes
- Citations
- Contributing
- License
- Contact
-
Variant Calling Algorithms:
- Kestrel: Mapping-free genotyping using k-mer frequencies.
- code-adVNTR (optional): Profile-HMM based method for VNTR genotyping.
-
Comprehensive Logging:
- Logs both to the console and a dedicated log file.
- Generates MD5 checksums for all downloaded and processed files.
-
Flexible Installation:
- Supports installation via
pip
usingsetup.py
. - Provides Conda environment setup for easy dependency management.
- Supports installation via
-
Subcommands:
install-references
pipeline
fastq
bam
kestrel
report
cohort
VNtyper 2.0 can be installed using either pip
with setup.py
or via Conda environments for streamlined dependency management.
-
Clone the Repository:
mkdir vntyper git clone https://github.com/hassansaei/vntyper.git cd vntyper pip install .
VNtyper 2.0 offers multiple subcommands that can be used depending on your input data and requirements. Below are the main subcommands available:
To run the entire pipeline on paired-end FASTQ files or BAM files:
vntyper pipeline \
--config-path /path/to/config.json \
--fastq1 /path/to/sample_R1.fastq.gz \
--fastq2 /path/to/sample_R2.fastq.gz \
--output-dir /path/to/output/dir \
--threads 4
Alternatively, using a BAM file:
vntyper pipeline \
--config-path /path/to/config.json \
--bam /path/to/sample.bam \
--output-dir /path/to/output/dir \
--threads 4
vntyper install-references \
--output-dir /path/to/reference/install \
--config-path /path/to/config.json \
--skip-indexing # Optional: skip BWA indexing if needed
Generate a summary report for your VNTR genotyping analysis:
vntyper report \
--output-dir /path/to/output/dir \
--config-path /path/to/config.json
Process raw FASTQ files to prepare them for genotyping:
vntyper fastq \
--fastq1 /path/to/sample_R1.fastq.gz \
--fastq2 /path/to/sample_R2.fastq.gz \
--output-dir /path/to/output/dir
vntyper bam \
--alignment /path/to/sample.bam \
--output-dir /path/to/output/dir \
--threads 4
VNtyper 2.0 integrates multiple steps into a streamlined pipeline. The following is an overview of the steps involved:
- FASTQ Quality Control: Raw FASTQ files are checked for quality.
- Alignment: Reads are aligned using BWA (if FASTQ files are provided).
- Kestrel Genotyping: Mapping-free genotyping of VNTRs.
- (Optional) adVNTR Genotyping: Profile-HMM based method for VNTR genotyping (requires additional setup).
- Summary Report Generation: A final HTML report is generated to summarize the results.
VNtyper 2.0 relies on several tools and Python libraries. Ensure that the following dependencies are available in your environment:
- Python >= 3.9
- BWA
- Samtools
- Fastp
- Pandas
- Numpy
- Biopython
- Pysam
- Jinja2
- Matplotlib
- Seaborn
- IGV-Reports
You can easily set up these dependencies via the provided Conda environment file.
Below is a logical overview of the VNtyper pipeline:
graph TD
A[Input: FASTQ/BAM] -->|Quality Control| B[Alignment BWA]
B -->|Genotyping| C[Kestrel]
C --> D[Optional: adVNTR]
D --> E[Generate Summary Report]
E --> F[Output: VCF, Summary HTML]
- This tool is for research use only.
- Ensure high-coverage WES data is used to genotype MUC1 VNTR accurately.
- For questions or issues, refer to the GitHub repository for support.
If you use VNtyper 2.0 in your research, please cite the following:
- Saei H, Morinière V, Heidet L, et al. VNtyper enables accurate alignment-free genotyping of MUC1 coding VNTR using short-read sequencing data. iScience. 2023.
- Audano PA, Ravishankar S, et al. Mapping-free variant calling using haplotype reconstruction from k-mer frequencies. Bioinformatics. 2018.
- Park J, Bakhtiari M, et al. Detecting tandem repeat variants in coding regions using code-adVNTR. iScience. 2022.
We welcome contributions to VNtyper. Please refer to the CONTRIBUTING.md file for guidelines.
VNtyper is licensed under the BSD 3-Clause License. See the LICENSE file for more details.