RAFFT is a folding tool that builds fast-folding paths for a given sequence. Starting from the completely unfolded structure, it quickly identifies stems with an FFT-based technique. Then, it forms the stem that improves the overall stability. Multiple folding paths can be explored and displayed. Therefore, given a sequence, the user will obtain several structures or folding paths.
- Vienna RNA package (version >= 2.4.17)
- Numpy, scipy with python 3
To install RAFFT in your local python library
pip install . --user --upgrade
Then two command line tools will be available: rafft
for the folding and
rafft_kin
for the kinetic analysis.
Another more performant implementation of the core algorithm is available here: rafft-rs
This implementation is able to generate output in a compatible format that can be used for the same kinetic analysis using rafft_kin
.
Please refer to this repository for documentation about differences in usage.
rafft-rs
is also added as a submodule to this repository. Use git submodule init
after cloning to pull its code.
For the examples, we used the Coronavirus frameshifting stimulation element obtained from RFAM.
To display only the final structures:
rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU -ms 5
To display the visited/saved intermediates:
rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU -ms 5 --traj
The algorithm has two critical parameters:
-ms <INT>
is the number of saved structures at each folding step (default=1)-n <INT>
is the number of positional lag to search for stems (default=100)
The search of stems can be tuned using weights on base pairs types:
--GC <FLOAT>
GC base pairs weight (default = 3.0)--AU <FLOAT>
AU base pairs weight (default = 2.0)--GU <FLOAT>
GU base pairs weight (default = 1.0)
The input is one sequence in the standard input or a simple text file (it can be in Fasta format).
For the trajectory output format: at each step, numbered from 0 to 3, the structures saved are given below the sequence with their stability (computed with Vienna RNA API).
GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU # ---------0---------- .................................................................................. 0.0 # ---------1---------- .....(((((((((((..........)))))))))))............................................. -14.0 ..................................((((((((................))))))))................ -6.8 ...................(((((.............................................)))))........ -6.4 ..................................((((.......))))................................. -5.5 (((((..............))))).......................................................... -4.6 # ---------2---------- ........((((((((..........))))))))(((((((((((.........))).))))))))................ -23.1 ........((((((((..........))))))))((((((((..((........))..))))))))................ -20.9 ...................(((((..........(((((((((((.........))).))))))))...)))))........ -18.8 ........((((((((..........))))))))((((((((...((....)).....))))))))................ -18.7 .....(((((((((((.((.....)))))))))))))....................((((((.............)))))) -18.2 # ---------3---------- ........((((((((.((.....))))))))))(((((((((((.........))).))))))))................ -24.0 ........((((((((.((.....))))))))))(((((((((((((....)).))).))))))))................ -24.0 ........((((((((..........))))))))(((((((((((.........))).))))))))................ -23.1 ........((((((((.((.....))))))))))((((((((..((........))..))))))))................ -21.8 ........((((((((..........))))))))((((((((..((........))..))))))))................ -20.9
The output of the final structures only is the following.
GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU ........((((((((.((.....))))))))))(((((((((((.........))).))))))))................ -24.0 ........((((((((.((.....))))))))))(((((((((((((....)).))).))))))))................ -24.0 ........((((((((..........))))))))(((((((((((.........))).))))))))................ -23.1 ........((((((((.((.....))))))))))((((((((..((........))..))))))))................ -21.8 ........((((((((..........))))))))((((((((..((........))..))))))))................ -20.9
To create the fast-folding path figures, one can use the utility
utility/plot_path.py
on rafft output:
It uses VARNA to produce the secondary structure representation, should be download directly from its website
cd example
rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU -ms 5 --traj > rafft.out
python ../utility/plot_path.py rafft.out -he 500 -wi 900 -rv 1 -o path_5.png
With 20 saved structures:
From the above fast-folding graph, one can produce kinetic trajectories. Starting from the completely unfolded structures, it simulates the folding process.
cd example
rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU -ms 20 --traj > rafft_20.out
rafft_kin rafft_20.out -o kinetic.png --plot -mt 40
The output has the following form
[...] <structure> <population> <Energy> <Structure ID> .....(((((((((((.((.....)))))))))))))((((((((.........))).)))))................... 0.009 -23.2 44 ........((((((((..........))))))))(((((((((((.........))).))))))))................ 0.011 -23.1 21 ((((((((.......))).)))))(((.....(((((((((((((.........))).))))))))...))......))).. 0.014 -23.6 62 (((((.(((.......))))))))(((.....((((((((((..((........))..))))))))...))......))).. 0.014 -23.6 63 ........((((((((.((.....))))))))))(((((((((((.........))).))))))))................ 0.049 -24.0 42 ........((((((((.((.....))))))))))(((((((((((((....)).))).))))))))................ 0.049 -24.0 43 (((((.(((.......))))))))(((((.....(((((((((((.........))).))))))))..)))))......... 0.063 -24.5 41 (((((.(((.......))))))))(((((.....(((((((((((((....)).))).))))))))..)))))......... 0.063 -24.5 61 (((((.(((.......))))))))(((.....(((((((((((((.........))).))))))))....)).....))).. 0.168 -25.1 60 (((((.(((.......))))))))(((.....(((((((((((((.........))).))))))))...))......))).. 0.531 -25.8 59
From the fast-folding graph, one can also draw a landscape using the multidimensional scaling algorithm to map the structures onto a plan. It tries to preserve as much as possible the base pair distance between structures.
cd example
rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU -ms 20 --traj > rafft_20.out
python ../utility/surface.py rafft_20.out -o landscape.png
(Initial and minimum energy structure are circled in black)
The folding function and the kinetic function can both be called from the rafft package.
from rafft.rafft import fold
from rafft.rafft_kin import kinetics
seq = "GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU"
final_set_struct, trajectory = fold(seq, max_stack=20, traj=True)
traj_k, times, struct_list, equi_pop = kinetics(trajectory, 40, 32)
for st, nrj, prob, sid in equi_pop[::-1][:10]:
print(f"{st} {prob:5.2f}")
The dataset curated we used for the benchmarks is in
benchmarks_results/benchmark_cleaned_all_length.csv
.
The benchmark results files (and associated script to produce them) are given in the following table (for details about those results, see the associated reference):
Method | file | Notes |
---|---|---|
RAFFT | rafft_100n_50ms_best_nrj_scores.csv | -n 100 -ms 50 (best energy) |
rafft_100n_50ms_scores.csv | -n 100 -ms 50 (best score) | |
rafft_200n_200ms_scores.csv | -n 200 -ms 200 (best score) | |
MFE | mfe_scores.csv | bench_mfe.py |
ML | mxfold_scores.csv | bench_mxfold.py |
analysis.org
and utils_analysis.py
contain the pieces of script used to
perform the analysis and the figures.
For the test case, we used the command line given in the Usage section above. Figures were derived from their output.