🧬🛠️De Bruijn Graph-Based Assembler

📝Introduction

This repository contains an implementation of a de Bruijn graph-based assembler to assemble the genome of the enterovirus A71. This genome is particularly interesting due to its short length: 7408 nucleotides, linear, and non-segmented.

The provided fastq file was generated using the ART program [Huang 2011] with the following command:

art_illumina -i eva71.fna -ef -l 100 -f 20 -o eva71 -ir 0 -dr 0 -ir2 0 -dr2 0 -na -qL 41 -rs 1539952693

The reads have maximum quality (41) and contain no insertions. Only the reads corresponding to the 5' -> 3' strands are provided.

In the folder debruijn-tp/data/, you will find:

eva71.fna: the genome of the virus of interest
eva71_plus_perfect.fq: reads

🔄Dependency Installation

To set up the environment, run the following commands:

conda env create -f environment.yml
conda activate genome-assembler

🧑‍💻️Usage

Clone the repository and navigate to the project folder:

git clone git@github.com:zhukovanadezhda/genome-assembler.git
cd genome-assembler

Run the assembler with:

python3 debruijn.py \
-i <input fastq file>  \ # single-end fastq file
-k <kmer size>         \ # optional, default is 21
-o <output file>         # file with the contigs

⚙️Testing

You can test the program by running:

pytest --cov=debruijn

🎁Example of Usage

To assemble the enterovirus A71 genome, you can run the following command:

python3 debruijn/debruijn.py -i data/eva71_plus_perfect.fq

This will produce a file named contigs.fasta, which contains the contig(s) generated by the assembler. If the program successfully resolves the genome, only one contig should be produced, representing the complete assembly of the enterovirus A71 genome.

Verifying the Assembly with BLAST

To confirm that the assembled genome matches the reference genome, you can use BLAST to compare the generated contig against the reference sequence. Follow these steps:

Install BLAST+ using:

sudo apt install ncbi-blast+

Create a BLAST database from the reference genome:

makeblastdb -in data/eva71.fna -dbtype nucl

Run the BLAST comparison:

cd data
blastn -query ../contigs.fasta -db eva71.fna

Example BLAST Output

Here is an example of what the beginning of the BLAST report might look like:

>EVA71_BrCr_U22521 EVA71
Length=7408

Score = 13649 bits (7391),  Expect = 0.0
Identities = 7391/7391 (100%), Gaps = 0/7391 (0%)
Strand=Plus/Plus

Query  1     GTGGGTTGTCACCCACCCACAGGGTCCACTGGGCGCTAGTACACTGGTATCTCGGTACCT  60
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  12    GTGGGTTGTCACCCACCCACAGGGTCCACTGGGCGCTAGTACACTGGTATCTCGGTACCT  71
...

This output shows a perfect match between the assembled genome and the reference genome, indicating that the assembly was successful.

✉️Contact

If you have any questions, feel free to contact me via email: nadiajuckova@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
debruijn		debruijn
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬🛠️De Bruijn Graph-Based Assembler

📝Introduction

🔄Dependency Installation

🧑‍💻️Usage

⚙️Testing

🎁Example of Usage

Verifying the Assembly with BLAST

Example BLAST Output

✉️Contact

About

Languages

License

zhukovanadezhda/genome-assembler

Folders and files

Latest commit

History

Repository files navigation

🧬🛠️De Bruijn Graph-Based Assembler

📝Introduction

🔄Dependency Installation

🧑‍💻️Usage

⚙️Testing

🎁Example of Usage

Verifying the Assembly with BLAST

Example BLAST Output

✉️Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages