Skip to content
Ilia Abolhasani edited this page May 31, 2023 · 3 revisions

AmiR-P3 Wiki

Welcome to the wiki for AmiR-P3! This wiki provides documentation and guides for using the AmiR-P3 repository. Below is an outline of the available wiki pages:

  • Overview: Learn about the purpose and goals of AmiR-P3.
  • Installation: Instructions on how to install and set up AmiR-P3.
  • Documentation: Detailed documentation on the integrated components and tools.
  • Contributing: Information on how to contribute to the AmiR-P3 project.

Pages

Overview

AmiR-P3 is an advanced ab initio plant miRNA prediction pipeline written in Python 3.*. It is designed to address the challenges associated with miRNA prediction in plants by leveraging various computational techniques and integrating several tools.

The pipeline allows users to adjust prediction criteria based on state-of-the-art biological knowledge of plant miRNA properties. It starts by finding potential homologs of known plant miRNAs in the input sequence(s) and ensures they do not overlap with protein-coding regions. Then, it computes minimum free energy structures of the presumed RNA sequences using tools such as contrafold, mfold, unafold, viennaRNA, and mxfold2 (if installed). A pre-trained deep learning classification model is used to predict potential miRNAs based on the computed structures. Finally, a set of criteria is applied to select the most likely miRNAs from the predicted set.

AmiR-P3 offers features such as comprehensive sequence analysis, compatibility with viennaRNA, and optional coding sequence filtration using diamond or blastx. It provides accurate miRNA predictions across various plant species, making it suitable for both conserved and novel miRNA discovery.

To get started with AmiR-P3, please refer to the Installation and Usage pages for detailed instructions on installation, setup, and how to run the pipeline.

Installation

To use AmiR-P3, it is recommended to use the provided Docker image. The Docker image contains all the necessary software, making the installation process straightforward. Follow these steps to install and run AmiR-P3 using Docker:

  1. Install Docker on your system by following the official Docker installation guide for your operating system.

  2. Pull the AmiR-P3 Docker image from Docker Hub by running the following command:

       docker pull micrornaproject/amir-p3
  3. Once the image is downloaded, you can run the AmiR-P3 pipeline using the following command:

       docker run -v /path/to/input:/data -v /path/to/output:/output docker.com/amir-p3 python3 amiR-P3.py -i /data/input_sequence.fasta -o /output/predicted_miRNAs.fasta

    Make sure to replace /path/to/input and /path/to/output with the actual paths to your input sequence file and desired output directory, respectively. The input_sequence.fasta should contain the genomic sequence(s) from which you want to predict miRNAs.

    Alternatively, you can connect to the AmiR-P3 Docker container in interactive mode, which allows you to manually execute commands and interact with the pipeline. Follow the instructions below to use the interactive mode:

    sudo docker run -it micrornaproject/amir-p3:latest /bin/bash   
    # Run the AmiR-P3 pipeline inside the Docker container
    python3 amiR-P3.py --input ./data/example/example_genome.fasta --experiment test
  4. AmiR-P3 will generate the predicted miRNAs in the specified output directory.

Note: The Docker image contains all the necessary software, except for mxfold2, which needs to be manually installed if selected as the second structure prediction software. To install mxfold2 manually, follow these steps:

  1. Download mxfold2:
    wget https://github.com/keio-bioinformatics/mxfold2/releases/download/v0.1.1/mxfold2-0.1.1.tar.gz
  2. Install mxfold2:
    pip3 install mxfold2-0.1.1.tar.gz
    

NR Database Download

To use the Diamond or Blastx tools for finding non-coding sequences, you will need to download the NR (non-redundant) proteins database. Follow these steps to download the NR database:

  1. Visit the following link: NR Database Download

  2. Download the nr.gz file to your local machine.

  3. Extract the contents of the nr.gz file. You can use any suitable tool or command for extracting gzip archives.

After extracting the database, you will need to provide the path to the NR database file when running the AmiR-P3 pipeline using the --nr command-line argument.

Please note that the NR database is a large file, and the download process may take some time depending on your internet connection speed.

Documentation

The AmiR-P3 pipeline is designed to predict putative miRNAs from genomic or transcriptomic sequences in plants. It leverages various computational techniques and integrates several tools to provide accurate and reliable predictions. This section provides detailed documentation and usage instructions for AmiR-P3.

AmiR-P3 follows a systematic workflow to predict miRNAs, consisting of the following steps:

  1. Homology search: AmiR-P3 searches for potential homologs of known plant miRNAs in the input sequence(s) while ensuring they do not overlap with protein-coding regions.

  2. Secondary structure prediction: The pipeline computes the minimum free energy structure of the presumed RNA sequence using one of the available secondary structure prediction methods: MFold, ViennaRNA, CONTRAfold, or MXfold2.

  3. Deep learning-based classification: AmiR-P3 employs a pre-trained deep learning classification model to predict potential miRNAs based on the computed RNA structures.

  4. Selection criteria: A set of criteria is applied to select the most likely miRNAs from the predicted set, taking into account parameters such as seed position, Jaccard similarity thresholds, and protein coding elimination.

Contributing

We welcome contributions to the AmiR-P3 project! If you would like to contribute, please follow the guidelines below:

Bug Reports and Feature Requests

If you encounter any bugs or have suggestions for new features, please open an issue on the issue tracker. When submitting a bug report or feature request, please provide as much detail as possible to help us understand and reproduce the issue.

Pull Requests

We encourage you to submit pull requests with bug fixes, improvements, or new features. To contribute code, follow these steps:

  1. Fork the repository and create a new branch for your contribution.

  2. Make your changes in the new branch. Ensure that your code adheres to the project's coding style and guidelines.

  3. Test your changes to ensure they work as intended.

  4. Commit your changes and push them to your forked repository.

  5. Submit a pull request to the main repository. Provide a clear and descriptive title for your pull request, along with a detailed description of the changes you made.

  6. Your pull request will be reviewed by the project maintainers. They may provide feedback or request further changes before merging your contribution.

Code Style

Please follow the existing code style and conventions used in the project. Ensure that your code is properly formatted, documented, and free of linting errors.

Documentation

Improvements to the documentation, including the wiki pages, are also highly appreciated. If you find any gaps or errors in the documentation, please open an issue or submit a pull request to help us improve it.

Non-Commercial Usage

Please note that the AmiR-P3 repository is free for non-commercial usage only. If you intend to use the code or the pipeline for commercial purposes, please contact the project maintainers to discuss licensing options.

We appreciate your understanding and collaboration in contributing to the AmiR-P3 project!