SPRINT2 is an enhanced version for SPRINT to detect RNA editing sites from RNA-seq data.
Publication: https://doi.org/10.1093/bioinformatics/btx473
You can install the package via pip in Linux:
pip install sprint2
or install via Git clone:
git clone https://github.com/xieyunxiao/SPRINT2/
Repeat annotation files can download at: https://github.com/xieyunxiao/SPRINT2/repeat_data/
Double-stranded RNA annotation files can download at: https://github.com/xieyunxiao/SPRINT2/dsRNA_data/
Unix & Python == 3.8
If you had installed Anaconda, please run:
bash require.bash
If you didn't install Anaconda, please install these tools:
SAMTOOLS == 1.2 (using htslib == 1.2.1)
BEDTOOLS == v2.30.0
BWA == 0.7.12
BLAT == v.36x7
Please install some python packages via pip in Linux:
pip install -r requirements
Step1.Mask A with G on reference FASTA file.
usage: MaskAGref [-h] [-r REFERENCE] [-o OUTPUT] [-t GTF] [-b BWA] [-p CPU]
optional arguments:
-h, --help show this help message and exit
path to reference FASTA file (default:~/reference.fa)
-o OUTPUT, --output OUTPUT
path to output directory (default:~/)
-t GTF, --gtf GTF path to transcript annotation GTF file #Optional
-b BWA, --bwa BWA path to BWA (default:~/bwa)
-p CPU, --cpu CPU CPU number (default:CPU=1)
Step2.Get candidate double-stranded RNAs.
optional arguments:
-h, --help show this help message and exit
path to output directory (default:~/)
path to reference FASTA file (default:~/reference.fa)
path to transcript annotation bed file (default:~/transcript.bed)
-b BLAT_FILE, --Blat_file BLAT_FILE
path to BLAT (default:~/blat)
path to BEDTOOLS (default:~/bedtools)
-lc LC_PATH, --lc_path LC_PATH
path to Low_complexity.txt (default:~/Low_complexity.txt)
-p CPU, --cpu CPU CPU number (default:CPU=1)
Step3.Get RNA editing sites. (Note: Do not use .GZ files. Please "gunzip" the ".fastq.gz" files.)
usage: getRES [-h] [-s SNV_PATH] [-o RES_PATH] [-r1 READ1] [-r2 READ2] [-r REFERENCE_FILE] [-rp REPEAT_FILE] [-b BWA] [-B BEDTOOLS] [-S SAMTOOLS] [-ds CANDIDATE_DSRNA] [-p CPU]
Get candidate double-stranded RNA
optional arguments:
-h, --help show this help message and exit
path to SNV directory (default:~/1_SNV_calling/)
path to RES directory (default:~/2_RES_calling/)
-r1 READ1, --read1 READ1
path to FASTQ_1 file (default:~/test_1.fastq.fa)
-r2 READ2, --read2 READ2
path to FASTQ_2 file #Optional
path to reference FASTA file (default:~/reference.fa)
-rp REPEAT_FILE, --Repeat_file REPEAT_FILE
path to repeat annotation BED file #Optional
-b BWA, --bwa BWA path to BWA (default:~/bwa)
path to BEDTOOLS (default:~/bedtools)
path to SAMTOOLS (default:~/samtools)
path to candidate dsRNA directory #Optional
-p CPU, --cpu CPU CPU number (default:CPU=1)
Here is an example of how to use sprint2:
cd test/
MaskAGref -o ./ -r reference.fa -t annotation.gtf -b ./bwa -p 10
getDsRNA -o ./ -r reference.fa -t transcript.bed -b ./blat -B /local/bin/bedtools -lc Low_complexity.txt -p 10
getRES -s ./1_SNV_calling/ -o ./2_RES_calling/ -r1 test_1.fastq -r2 test_2.fastq -r reference.fa -rp repeat.bed -b ./bwa -B /local/bin/bedtools -S ./samtools -ds ./dsRNA_file/ -p 10
Step1.Install RepeatMasker: Download the latest version of RepeatMasker installation package from the RepeatMasker website (http://www.repeatmasker.org/RMDownload.html) and install it on a Linux system using the following commands:
wget http://www.repeatmasker.org/RepeatMasker/RepeatMasker-4.1.5.tar.gz
tar zxvf RepeatMasker-4.1.5.tar.gz
cp RepeatMasker-4.1.5.tar.gz /usr/local/
cd /usr/local/
tar zxvf RepeatMasker-4.1.5.tar.gz
cd RepeatMasker
perl ./configure
During the installation, other software and tools required by RepeatMasker need to be downloaded and installed.
Step2.Download genome sequence: Download the fasta format file of species genome sequence from UCSC Genome Browser, for example:
wget http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/hg19.fa.gz
gunzip hg19.fa.gz
Step3.Download the sequence databases required by RepeatMasker: Please download the required sequence databases from the RepeatMasker website, for example:
wget https://www.dfam.org/releases/Dfam_3.7/families/Dfam.h5.gz
gunzip Dfam.h5.gz
mv Dfam.h5 /usr/local/RepeatMasker/Libraries
tar zxvf RepeatMaskerGenomeAnnotations_LATEST.tar.gz
Step4.Please use the following command to run RepeatMasker:
RepeatMasker -species human -dir output_dir hg19.fa
The -species parameter specifies the sequence database used by RepeatMasker, the -dir parameter specifies the output directory, and hg19.fa is the input sequence file. After the analysis is finished, RepeatMasker will generate multiple result files in the output directory, including the masked sequence file and the statistics file.
Step5.Get repeat annotation file and low complex annotation file
mv hg19.out repeat.bed
getLowComplexity.py repeat.bed Low_complexity.txt
Note that the sequence databases downloaded in the above steps are protected by copyright and require a license to use. Also, the above code is for reference only and the actual operation steps may vary depending on the situation.
- reference.fa: Reference genome FASTA file.
- annotation.gtf: Gene annotation GTF file.
- transcript.bed: Transcripts BED file (Optional file).
| Chrom | Start | End | transcript | Strand | Gene Symbol | Transcript ID | Transcript type |
- read1.fastq: Required!!!
- read2.fastq: If single-read sequencing, ignore this parameter.
- repeat.bed: Repeat annotation file. If this parameter is missing, without repeat-based regular RES.
- Low_complexity.txt: Low complexity region file.
- dsRNA_file/: Candidate double-stranded RNA. If this parameter is missing, without dsRNA-based regular RES.
| Chrom1 | Start1 | End1 | Chrom2 | Start2 | End2 |
- bwa: Path to BWA.
- blat: Path to BLAT.
- bedtools: Path to BEDTOOLS.
- samtools: Path to SAMTOOLS.
- reference_mskAG.fa: Mask A with G on reference FASTA file.
- dsRNA_file/: Candidate double-stranded RNA.
- 1_SNV_calling/: SNV files. regular.snv | hyper_mskAG.snv | hyper_mskTC.snv
| Chrom | Start(0-base) | End(1-base) | Type | Supporting_reads |
- 2_RES_calling/: RES files.
| Chrom | Start(0-base) | End(1-base) | Type | Supporting_reads | Depth |
