Skip to content

Commit

Permalink
Merge pull request #32 from EBIvariation/remove_non_ATGC
Browse files Browse the repository at this point in the history
Remove variant that have non ATGC bases in the reference
  • Loading branch information
tcezard authored Aug 23, 2021
2 parents f7303a5 + bc3661f commit f44fac1
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 7 deletions.
12 changes: 5 additions & 7 deletions conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,8 @@ channels:
- conda-forge
- bioconda
dependencies:
- bedtools=2.29.2
- bowtie2=2.4.1
- minimap2=2.17
- samtools=1.9
- bcftools=1.9
- bedops=2.4.39
- tabix=0.2.6
- bedtools
- minimap2
- samtools
- bcftools
- tabix
4 changes: 4 additions & 0 deletions variant_remapping_tools/reads_to_remapped_variants.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from Bio.Alphabet import generic_dna
import pysam

nucleotide_alphabet = {'A', 'T', 'C', 'G'}

def reverse_complement(sequence):
return str(Seq(sequence, generic_dna).reverse_complement())
Expand All @@ -27,6 +28,9 @@ def calculate_new_variant_definition(left_read, right_read, ref_fasta, original_
new_ref = fetch_bases(ref_fasta, left_read.reference_name, left_read.reference_end + 1,
right_read.reference_start - left_read.reference_end).upper()

if len(set(new_ref).difference(nucleotide_alphabet)) != 0 :
failure_reason = 'Reference Allele not in ACGT'

new_pos = left_read.reference_end + 1

# 1. Handle reference strand change
Expand Down

0 comments on commit f44fac1

Please sign in to comment.