Numerous state-of-the-art algorithms for ONT data analysis such as sequence alignment, error correction, variant calling, haplotype phasing, genome assembly and genome polishing have a direct or indirect reliance on the specific flowcell type and basecaller configuration.
Evidence of the required information is provided via hyperlinks attached to each software name.
Software | Application | Flowcell type | Basecaller type | Basecaller version | Basecalling mode [1] | Indirect dependence |
---|---|---|---|---|---|---|
Minimap2 | Sequence alignment | ✅ | ||||
HERRO | Error correction | ✅ | ||||
Clair3 | Variant calling | ✅ | ✅ | ✅ | HAC/SUP [2] | |
DeepVariant | Variant calling | ✅ | ||||
PEPPER | Variant calling | ✅ | ✅ | ✅ | SUP | |
Dysgu | Variant calling | ✅ | ||||
nanomonsv | Variant calling | ✅ | ✅ | ✅ | unspecified | |
Medaka | Variant calling | ✅ | ✅ | ✅ | FAST, HAC, SUP | |
WhatsHap | Haplotype phasing | ✅ [3] | ||||
Margin | Haplotype phasing | ✅ | ✅ | ✅ | unspecified | |
HapCut2 | Haplotype phasing | ✅ [3] | ||||
LongPhase | Haplotype phasing | ✅ [3] | ||||
Flye/MetaFlye | Genome assembly | ✅ | ✅ | ✅ | HAC/SUP | |
Shasta | Genome assembly | ✅ | ✅ | ✅ | HAC for Guppy4, SUP for Guppy6 | |
Canu | Genome assembly | ✅ | ||||
wtdbg2 | Genome assembly | ✅ [4] | ||||
MarginPolish | Genome polishing | ✅ | ||||
HomoPolish | Genome polishing | ✅ | ||||
Medaka | Genome polishing | ✅ | ✅ | ✅ | FAST, HAC, SUP |
[1] In “Basecalling mode”, “/” means the models might be used interchangeably, while “,” means each mode has a distinct associated model. If a mode is not listed, it means the current version of the software does not explicitly support it.
[2] Clair3 uses the same model for the HAC and SUP modes of R9 Guppy 6 data, but uses different models for the HAC and SUP modes of R10 data.
[3] These algorithms depend on the outputs of variant callers, which may be influenced by flowcell type or basecaller configuration.
[4] Wtdbg2 depends on sequencing error rate, which is influenced by flowcell type and basecaller configuration.
We tested three popular ONT data analysis software which required flowcell type or basecaller to select the best model or choose the specific parameter using the widely used HG002
data.
The exact version of the tested software is listed below.
Software | Version | Application | Detail documentation |
---|---|---|---|
Clair3 | 1.0.4 | variant calling | Clair3 |
Shasta | 0.11.1 | genome assembly | Shasta |
Medaka | 1.11.3 | genome polishing | Medaka |
Please download the shared HG002
FASTQ through ScienceDB and decompressed the FASTQ files to the folder basecalled_data
before conducting ONT software performances evaluation.
The data is shared through https://www.scidb.cn/en/detail?dataSetId=b9eca82475a64772a67ec9b7dac2beb3
You can follow the instruction here to download the shared data.