Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ivar variants: reducing q-score threshold results in high P-value for deletion #96

Open
pmenzel opened this issue Apr 22, 2021 · 1 comment

Comments

@pmenzel
Copy link

pmenzel commented Apr 22, 2021

Dear @gkarthik ,

I tried to change the option -q in ivar variants to values below the default 20 and discovered a problem with calling deletions once the q-score threshold is set to a value below 10.
My data is SARS-CoV2 and the deletion in question is the del69/70 in spike at position 21764.

With this command I tried all values for -q from 0 to 20:

for i in `seq 0 20`;
do
  samtools mpileup -A -d 0 --reference nCoV-2019.reference.fasta -B -Q 0 input.bam | \
  ivar variants -r nCoV-2019.reference.fasta -m 20 -p q$i.variants -q $i -t 0.25;
done

The output for position 21764 is:

grep 21764 q*.variants.tsv | column -t -s$'\t'
q0.variants.tsv:MN908947.3   21764  A  -TACATG  4079  4070  33  4058  0  0   0.990239  4098  1  FALSE  NA  NA  NA  NA  NA
q1.variants.tsv:MN908947.3   21764  A  -TACATG  4070  4061  34  4058  0  1   0.990239  4098  1  FALSE  NA  NA  NA  NA  NA
q2.variants.tsv:MN908947.3   21764  A  -TACATG  4070  4061  34  4058  0  2   0.990239  4098  1  FALSE  NA  NA  NA  NA  NA
q3.variants.tsv:MN908947.3   21764  A  -TACATG  4070  4061  34  4058  0  3   0.990239  4098  1  FALSE  NA  NA  NA  NA  NA
q4.variants.tsv:MN908947.3   21764  A  -TACATG  4070  4061  34  4058  0  4   0.990239  4098  1  FALSE  NA  NA  NA  NA  NA
q5.variants.tsv:MN908947.3   21764  A  -TACATG  4070  4061  34  4058  0  5   0.990239  4098  1  FALSE  NA  NA  NA  NA  NA
q6.variants.tsv:MN908947.3   21764  A  -TACATG  4070  4061  34  4058  0  6   0.990239  4098  1  FALSE  NA  NA  NA  NA  NA
q7.variants.tsv:MN908947.3   21764  A  -TACATG  4070  4061  34  4058  0  7   0.990239  4098  1  FALSE  NA  NA  NA  NA  NA
q8.variants.tsv:MN908947.3   21764  A  -TACATG  4070  4061  34  4058  0  8   0.990239  4098  1  FALSE  NA  NA  NA  NA  NA
q9.variants.tsv:MN908947.3   21764  A  -TACATG  4070  4061  34  4058  0  9   0.990239  4098  1  FALSE  NA  NA  NA  NA  NA
q10.variants.tsv:MN908947.3  21764  A  -TACATG  4070  4061  34  4058  0  10  0.990239  4098  0  TRUE   NA  NA  NA  NA  NA
q11.variants.tsv:MN908947.3  21764  A  -TACATG  4070  4061  34  4058  0  11  0.990239  4098  0  TRUE   NA  NA  NA  NA  NA
q12.variants.tsv:MN908947.3  21764  A  -TACATG  4070  4061  34  4058  0  12  0.990239  4098  0  TRUE   NA  NA  NA  NA  NA
q13.variants.tsv:MN908947.3  21764  A  -TACATG  4070  4061  34  4058  0  13  0.990239  4098  0  TRUE   NA  NA  NA  NA  NA
q14.variants.tsv:MN908947.3  21764  A  -TACATG  4070  4061  34  4058  0  14  0.990239  4098  0  TRUE   NA  NA  NA  NA  NA
q15.variants.tsv:MN908947.3  21764  A  -TACATG  3891  3882  34  4058  0  15  0.990239  4098  0  TRUE   NA  NA  NA  NA  NA
q16.variants.tsv:MN908947.3  21764  A  -TACATG  3891  3882  34  4058  0  16  0.990239  4098  0  TRUE   NA  NA  NA  NA  NA
q17.variants.tsv:MN908947.3  21764  A  -TACATG  3891  3882  34  4058  0  17  0.990239  4098  0  TRUE   NA  NA  NA  NA  NA
q18.variants.tsv:MN908947.3  21764  A  -TACATG  3891  3882  34  4058  0  18  0.990239  4098  0  TRUE   NA  NA  NA  NA  NA
q19.variants.tsv:MN908947.3  21764  A  -TACATG  3891  3882  34  4058  0  19  0.990239  4098  0  TRUE   NA  NA  NA  NA  NA
q20.variants.tsv:MN908947.3  21764  A  -TACATG  3891  3882  34  4058  0  20  0.990239  4098  0  TRUE   NA  NA  NA  NA  NA

For -q 10 and above, the variant is found with P-value = 0, and for -q 9 and below, it is not found with P-value = 1.

The main difference between the files is that the column ALT_QUAL takes on exactly the value set for -q, probably because of

q = min_qual; // For insertions and deletion ust use minimum quality.

This value is then used here

err = pow(10, ( -1 * (it->mean_qual)/10));
for calculating the P-value, and it seems that it somehow misbehaves at values below 10?

@pmenzel
Copy link
Author

pmenzel commented Nov 5, 2021

Seeing the same issue with the 6p deletion GAGTTCA22028G:EFR156G in Delta:

When setting -q 10, the P-value is all good:

samtools mpileup -A -d 0 --reference nCoV-2019.reference.fasta -B -Q 0 mapped.primertrimmed.sorted.bam | \
grep -w 22028 | \
ivar variants -r nCoV-2019.reference.fasta -m 20 -p variants -q 10 -t 0.25 && column -t -s$'\t' variants.tsv
[mpileup] 1 samples in 1 input files
[mpileup] Max depth set to maximum value (2147483647)
A GFF file containing the open reading frames (ORFs) has not been provided. Amino acid translation will not be done.
REGION      POS    REF  ALT      REF_DP  REF_RV  REF_QUAL  ALT_DP  ALT_RV  ALT_QUAL  ALT_FREQ  TOTAL_DP  PVAL         PASS  GFF_FEATURE  REF_CODON  REF_AA  ALT_CODON  ALT_AA
MN908947.3  22028  G    -AGTTCA  25      25      34        25      0       10        1         25        0.000322779  TRUE  NA           NA         NA      NA         NA

When setting -q 0, the P-value is set to 1:

samtools mpileup -A -d 0 --reference nCoV-2019.reference.fasta -B -Q 0 mapped.primertrimmed.sorted.bam | \
grep -w 22028 | \
ivar variants -r nCoV-2019.reference.fasta -m 20 -p variants -q 0 -t 0.25 && column -t -s$'\t' variants.tsv
[mpileup] 1 samples in 1 input files
[mpileup] Max depth set to maximum value (2147483647)
A GFF file containing the open reading frames (ORFs) has not been provided. Amino acid translation will not be done.
REGION      POS    REF  ALT      REF_DP  REF_RV  REF_QUAL  ALT_DP  ALT_RV  ALT_QUAL  ALT_FREQ  TOTAL_DP  PVAL  PASS   GFF_FEATURE  REF_CODON  REF_AA  ALT_CODON  ALT_AA
MN908947.3  22028  G    -AGTTCA  25      25      34        25      0       0         1         25        1     FALSE  NA           NA         NA      NA         NA


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant