nanodisco
is a toolbox for de novo discovery of all the three types (6mA, 5mC and 4mC) of DNA methylation from individual bacteria and microbiomes using nanopore sequencing. For microbiomes, nanodisco also supports the use of DNA methylation patterns as natural epigenetic barcodes to facilitate high resolution metagenomic binning. Specifically, nanodisco can be used to:
- De novo discover DNA methylation motifs, identify specific type (6mA, 5mC or 4mC, namely typing) of a methylation motif, and identify which specific position within the motif is methylated (namely fine mapping).
- Perform metagenomic binning based on microbial DNA methylation pattern by constructing and clustering a methylation profile matrix.
- Integrate the two functionalities above together for de novo methylation motif discovery from microbiomes, and metagenomic analysis.
We are actively developing nanodisco
to facilitate usage and broaden features. All feedback is more than welcome. You can reach us on twitter (@iamfanggang and @AlanTourancheau) or directly through the GitHub issues system.
02/13/21: Updated to v1.0.2
, including a new nanodisco score
command and the --split_fasta
option to generate binned fasta files.
nanodisco
is distributed as a fully functional image bypassing the need to install any dependencies others than the virtualization software. We currently recommend using Singularity (v3.2.1 and above), which can be installed on Linux systems and is often the preferred solution by HPC administrators (Quick Start). nanodisco
was tested extensively with Singularity v3.2.1 and v3.5.2.
singularity pull --name nanodisco.sif library://fanglab/default/nanodisco # Download the image from cloud.sylabs.io
singularity build nd_env nanodisco.sif # Create a container named nd_env
To showcase the toolbox applications and facilitate an understanding of the methods, we provide examples for the analysis of two datasets presented in our preprint. Those datasets can be download with the following commands from within a nanodisco
container: get_data_bacteria
and get_data_microbiome
.
singularity build --sandbox nd_example nanodisco.sif # Create a writable container (directory) named nd_example
singularity run --no-home -w nd_example # Start an interactive shell to use nanodisco, type `exit` to leave
The image retrieved from Singularity Hub with singularity pull
(nanodisco.sif) is already build and can be reused at will. Containers built with those instructions are writable meaning that results from nanodisco analysis can be retrieved when the container is not running. Outputs for the following commands can be found at ./path/to/nd_example/home/nanodisco/analysis
.
Goal: Identify the specific type (6mA, 5mC or 4mC, namely typing) of a methylation motif, and identify which specific position within the motif is methylated (namely fine mapping). The detailed method is described in the preprint.
Inputs:
- Current differences file (pre-computed in the following example, can be generated with
nanodisco difference
) - Reference genome file (.fasta)
- Methylation motifs for which one wants to perform typing and fine mapping
Outputs: For each queried methylation motif, nanodisco
identifies the methylation type and the methylated position summarized in a heatmap (analysis/Ecoli_motifs/Motifs_classification_Ecoli_nn_model.pdf
). See Figure 4d in the preprint as an example. In addition, the predicted methylation type and methylated position for each motif is compiled in a text file (analysis/Ecoli_motifs/Motifs_classification_Ecoli_nn_model.tsv
).
1. AACNNNNNNGTGC: highest value (85) is on the 6mA row with offset +1 (relative to the first base), meaning that the second base (A) is 6mA
2. CCWGG: highest value (95) is on the 5mC row with offset +1 (relative to the first base), meaning that the second base (C) is 5mC
3. GATC: highest value (91) is on the 6mA row with offset +1 (relative to the first base), meaning that the second base (A) is 6mA
4. GCACNNNNNNGTT: highest value (84) is on the 6mA row with offset +2 (relative to the first base), meaning that the third base (A) is 6mA
Example commands:
get_data_bacteria # Retrieve E. coli current differences and reference genome
nanodisco characterize -p 4 -b Ecoli -d dataset/EC_difference.RDS -o analysis/Ecoli_motifs -m GATC,CCWGG,GCACNNNNNNGTT,AACNNNNNNGTGC -t nn -r reference/Ecoli_K12_MG1655_ATCC47076.fasta
In this example, the current differences file (EC_difference.RDS
) was generated on a whole E. coli nanopore sequencing dataset, from the preprint, using nanodisco difference
. Runtime is ~1 min with 4 threads (~6.5 GB memory used).
Goal: Construct methylation profiles for metagenomic contigs, identify informative features, and perform methylation binning for high-resolution metagenomic analysis.
Inputs:
- Current differences file (pre-computed in the following example)
- Metagenomic de novo assembly (.fasta)
- Metagenomic contigs coverage files
- De novo discovered methylation motifs
- (Optional) Annotation for metagenome contigs (e.g. species of origin) and List of contigs from Mobile Genetic Elements (MGEs)
Outputs: t-SNE scatter plots that demonstrates the species level clustering of metagenomic contigs (analysis/binning/Contigs_methylation_tsne_MGM1_motif.pdf
) as presented in the preprint Figure 5a.
Example commands:
get_data_microbiome # Retrieve current differences, de novo metagenome assembly, etc
nanodisco profile -p 4 -r reference/metagenome.fasta -d dataset/metagenome_subset_difference.RDS -w dataset/metagenome_WGA.cov -n dataset/metagenome_NAT.cov -b MGM1_motif -o analysis/binning --motifs_file dataset/list_de_novo_discovered_motifs.txt
nanodisco binning -r reference/metagenome.fasta -s dataset/methylation_profile_MGM1_motif.RDS -b MGM1_motif -o analysis/binning
nanodisco plot_binning -r reference/metagenome.fasta -u analysis/binning/methylation_binning_MGM1_motif.RDS -b MGM1_motif -o analysis/binning -a reference/motif_binning_annotation.RDS --MGEs_file dataset/list_MGE_contigs.txt
In this example, the current differences file (metagenome_subset_difference.RDS
) was generated on a mouse gut microbiome nanopore sequencing dataset, MGM1 from the preprint, using nanodisco difference
. This examples correspond to the procedure refered to as guided methylation binning where methylation motifs were already de novo discovered. Runtime is ~10 min with 4 threads and ~4 GB of memory used.
For a comprehensive description of nanodisco
including installation guide, and a detailed tutorial, please consult the complete documentation.
Tourancheau, A., Mead, E.A., Zhang, XS. & Fang, G. Discovering multiple types of DNA methylation from bacteria and microbiome using nanopore sequencing. Nat Methods (2021). doi:10.1038/s41592-021-01109-3