This repository contains the official implementation of Deepred-Mt, along with instructions for reproducing results presented in "Deepred-Mt: Deep Representation Learning for Predicting C-to-U RNA Editing in Plant Mitochondria", by A. A. Edera, I. Small, D. H. Milone, and M. V. Sanchez-Puerta. Download PDF.
In land plants, the editosome is a highly sophisticated molecular machine able to convert post-transcriptionally cytidines into uridines (C-to-U) at highly specific RNA positions called editing sites. This RNA editing seems to be partially governed by cis elements, which still remain recalcitrant to characterization.
Deepred-Mt is a novel neural network able to predict C-to-U editing sites in angiosperm mitochondria. Given an RNA sequence, consisting of a central cytidine flanked by 20 nucleotides on each side, Deepred-Mt scores how probable its editing is.
The score is computed from complex cis elements or motifs automatically extracted from the flanking bases by a multi-layer convolutional neural network, whose full architecture is schematically shown below.
To submit RNA/DNA sequences for predicting their C-to-U editing sites with Deepred-Mt, use the following link:
Note 1: To be able to submit, you must be logged in with a Google Account (e.g., Gmail).
Note 2: If difficulties are experienced when submitting sequences, try to use Google Chrome as the web browser.
If you encounter problems when submitting sequences please report an issue.
To install Deepred-Mt on your computer, the following dependencies must be installed:
First, create and activate a new Conda environment
conda create -n deepredmt python=3.7
conda activate deepredmt
Next, install Deepred-Mt from the sources
pip install -U "deepredmt @ git+https://github.com/aedera/deepredmt.git"
Once installed, Deepred-Mt can be executed on the command line to predict
C-to-U editing sites from a desired FASTA
file. Here
is an example FASTA file called seqs.fas
:
deepredmt seqs.fas
This command extracts cytidines from the FASTA file to make predictions based on their surrounding nucleotides.
The following notebooks reproduce experiments in the article.
The experiments reported in the manuscript used three datasets built from these FASTA files, extracted from nucleotide sequences encoding mitochondrial proteins from 21 plant species. In these files, 'E' nucleotides indicate C-to-U editing sites identified by using published RNAseq data, obtained from the European Nucleotide Archive.
Dataset | Description |
---|---|
Training data | 41-bp nucleotide windows whose center positions are either unedited (C) or edited (E) cytidines. Nucleotide windows are labeled according to both the nucleotide in their central positions (0/C, 1/E) and their corresponding editing extents (a value ranging from 0 to 1) |
Task-related sequences | Sequences used for the augmentation strategy proposed in the article. These sequences are 41-bp nucleotide windows whose center positions are thymidines homologous to one of the editing sites in the training data |
Control data | Control data containing fake editing signal "GGCG" within the downstream regions of nucleotide windows that are labeled as 1 (edited) |
More information on the data format is provided here.
In our experiments, Deepred-Mt was compared to two state-of-the-art methods for predicting editing sites: PREP-Mt and PREPACT. The following figure shows precision-recall curves obtained from the predictions of each method. Deepred-Mt achieves the highest F1 scores and the best areas under the curves (AUPRC) for two predictive scenarios: one excluding synonymous sites (dashed lines) and other including them (solid lines).
Method | Excluded | Included | ||
---|---|---|---|---|
AUPRC | F1 | AUPRC | F1 | |
PREPACT | 0.91 | 0.89 | 0.79 | 0.82 |
PREP-Mt | 0.88 | 0.91 | 0.76 | 0.84 |
Deepred-Mt | 0.96 | 0.92 | 0.91 | 0.86 |
Contributions from anyone are welcome. You can start by adding a new entry here.
Deepred-Mt is licensed under the MIT license. See LICENSE for more details.