Skip to content

Disambiguate reads that were mapped to multiple references

License

Notifications You must be signed in to change notification settings

clintval/neodisambiguate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

neodisambiguate

Install with bioconda Anaconda Version Unit Tests Java Version Language License

Disambiguate reads that were mapped to multiple references.

Torres del Paine

Install with the Conda or Mamba package manager after setting your Bioconda channels:

❯ conda install neodisambiguate

Introduction

Alignment disambiguation is commonly performed on sequencing data from transduction, transfection, transgenic, or xenographic (including patient derived xenograft) experiments. This tool works by comparing various alignment metrics between a template that has been aligned to many different references in order to determine which reference is the most likely source. Disambiguation of aligned reads is made per-template and information across primary, secondary, and supplementary alignments is used as evidence.

All templates which are positively assigned to a single source reference are written to a reference-specific output BAM file. Any templates with ambiguous reference assignment are written to an ambiguous input-specific output BAM file.

Only BAMs produced from the Burrows-Wheeler Aligner (bwa) or STAR are currently supported. Input BAMs of arbitrary sort order are accepted, however, an internal sort to queryname will be performed unless the BAM is already in queryname sort order. All output BAM files will be written in the same sort order as the input BAM files. Although paired-end reads will give the most discriminatory power for disambiguation of short-read sequencing data, this tool accepts paired, single-end (fragment), and mixed pairing input data.

Features

  • Accepts SAM/BAM sources of any sort order
  • Will disambiguate an arbitrary number of BAMs, all aligned to different references
  • Writes the ambiguous alignments to a separate directory
  • Extensible implementation which supports alternative disambiguation strategies
  • Benchmarks show high accuracy: Click Here

Command Line Usage

❯ neodisambiguate -i infile1.bam infile2.bam -o out/disambiguated

Example Usage

To disambiguate templates for sample dna00001 that are aligned to human (A) and mouse (B):

❯ neodisambiguate -i dna00001.A.bam dna00001.B.bam -o out/dna00001 -n hg38 mm10
tree out/
  out/
  ├── ambiguous-alignments/
  │  ├── dna00001.A.ambiguous.bai
  │  ├── dna00001.A.ambiguous.bam
  │  ├── dna00001.B.ambiguous.bai
  │  └── dna00001.B.ambiguous.bam
  ├── dna00001.hg38.bai
  ├── dna00001.hg38.bam
  ├── dna00001.mm10.bai
  └── dna00001.mm10.bam

Local Installation

Bootstrap compilation and build the executable with:

./mill neodisambiguate.executable
./bin/neodisambiguate --help

Prior Art

This project was inspired by AstraZeneca's disambiguate: