Skip to content

Latest commit

 

History

History
56 lines (42 loc) · 4.91 KB

README.md

File metadata and controls

56 lines (42 loc) · 4.91 KB

ONT software misuse config or model

List of ONT data analysis software with reliance on flowcell types or basecaller configurations

Numerous state-of-the-art algorithms for ONT data analysis such as sequence alignment, error correction, variant calling, haplotype phasing, genome assembly and genome polishing have a direct or indirect reliance on the specific flowcell type and basecaller configuration.

Evidence of the required information is provided via hyperlinks attached to each software name.

Software Application Flowcell type Basecaller type Basecaller version Basecalling mode [1] Indirect dependence
Minimap2 Sequence alignment
HERRO Error correction
Clair3 Variant calling HAC/SUP [2]
DeepVariant Variant calling
PEPPER Variant calling SUP
Dysgu Variant calling
nanomonsv Variant calling unspecified
Medaka Variant calling FAST, HAC, SUP
WhatsHap Haplotype phasing [3]
Margin Haplotype phasing unspecified
HapCut2 Haplotype phasing [3]
LongPhase Haplotype phasing [3]
Flye/MetaFlye Genome assembly HAC/SUP
Shasta Genome assembly HAC for Guppy4, SUP for Guppy6
Canu Genome assembly
wtdbg2 Genome assembly [4]
MarginPolish Genome polishing
HomoPolish Genome polishing
Medaka Genome polishing FAST, HAC, SUP

[1] In “Basecalling mode”, “/” means the models might be used interchangeably, while “,” means each mode has a distinct associated model. If a mode is not listed, it means the current version of the software does not explicitly support it.

[2] Clair3 uses the same model for the HAC and SUP modes of R9 Guppy 6 data, but uses different models for the HAC and SUP modes of R10 data.

[3] These algorithms depend on the outputs of variant callers, which may be influenced by flowcell type or basecaller configuration.

[4] Wtdbg2 depends on sequencing error rate, which is influenced by flowcell type and basecaller configuration.

ONT software performances using the correct and wrong config

We tested three popular ONT data analysis software which required flowcell type or basecaller to select the best model or choose the specific parameter using the widely used HG002 data.

The exact version of the tested software is listed below.

Software Version Application Detail documentation
Clair3 1.0.4 variant calling Clair3
Shasta 0.11.1 genome assembly Shasta
Medaka 1.11.3 genome polishing Medaka

Data

HG002 basecalled data

Please download the shared HG002 FASTQ through ScienceDB and decompressed the FASTQ files to the folder basecalled_data before conducting ONT software performances evaluation.

The data is shared through https://www.scidb.cn/en/detail?dataSetId=b9eca82475a64772a67ec9b7dac2beb3

You can follow the instruction here to download the shared data.