-
Notifications
You must be signed in to change notification settings - Fork 0
Calculate Assembly Statistics
Emily Delorean edited this page Oct 7, 2024
·
2 revisions
Use bbtools to calculate assembly stats. It is important to specify -Xmx5g t=4
or bbtools stats.sh
will not run.
#!/bin/bash
#SBATCH --job-name=STATS
#SBATCH --ntasks-per-node=4
#SBATCH -o "%x_%j.o"
#SBATCH -e "%x_%j.e"
################
module load bbtools
echo ${file}
stats.sh -Xmx5g t=4 in=assembly.fa
This script compares the heterozygosity between the two genome assemblies using mummer alignments.
#!/bin/bash
#SBATCH --job-name="genome_heterozygosity"
#SBATCH -n 40
#SBATCH -t 1-10:00:00
#SBATCH --output="%x_%j.o" # job standard output file (%j replaced by job id)
#SBATCH --error="%x_%j.e" # job standard error file (%j replaced by job id)
module load mummer
query='HDA149v1.0_assembly.fasta'
ref='HDA330v1.0_assembly.fasta'
out='HDA149_onto_HDA330'
nucmer --prefix=${out} ${ref} ${query}
# Non repeat snps
show-snps -Clr ${out}.delta > ${out}_nonrepeat.snps
# Count the number of SNPS
wc -l HDA149_onto_HDA330_nonrepeat.snps
- It generatestwo files. 1) the alignment: HDA149_onto_HDA330.delta and 2) the SNPs HDA149_onto_HDA330_nonrepeat.snps
- Each SNP is on a single line in the .snps file. Use
wc -l HDA149_onto_HDA330_nonrepeat.snps
to count the number of lines and subtract the 5 header lines to count the number of SNPs. Divide the number of SNPs by the genome size and multiply by 100 to get the percent heterozygosity. For example, if there are 4088858 lines in the snps file and the chile pepper genome is 3.5 Gb then (4,088,853 SNPs / 3,500,000,000 bp) x 100 = 0.1168% heterozygosity.