Patch for the AbinitioRelax application in the Rosetta macromolecular modelling suite. Implements bilevel optimisation and Iterated Local Search (ILS) for improved conformational sampling.
- Rosetta v3.4 source code (other versions will not work). This requires a license which is freely available to non-commercial users.
- C++ compiler (tested with GCC 4.8.2 on Linux and llvm-gcc-4.2.1 on OSX)
- Python 2 (for running SCons; tested with v2.7.5 on Linux and v2.7.12 on OSX)
- patch (tested with GNU patch v2.7.1 on Linux and patch v2.5.8 on OSX)
- Linux (preferred) or Mac OSX (see Note for Mac users below)
This patch modifies the source code for the AbinitioRelax application, as well as some core Rosetta classes. I have not tested and cannot guarantee functionality of other applications in the Rosetta suite after the patch has been applied. I recommend that you keep the patched source tree separate from any other installations of Rosetta that you might have.
The rosetta_database/
directory is not modified in any way, so you can keep a single copy of that.
-
Obtain Rosetta source distribution and database (version 3.4)
You need to obtain a Rosetta licence from https://www.rosettacommons.org/software and follow the instructions to download Rosetta. Make sure to download version 3.4, as other versions will not work. You will need at least the two downloads marked "required" (Source code and database).
-
Apply the patch.
The scriptpatch_rosetta3.4
handles the patch process. Supply it with the full path to therosetta_source
directory you downloaded in step 1.$ cd /path/to/Bilevel-ILS-Rosetta $ ./patch_rosetta3.4.sh /path/to/rosetta_source
The script will do some simple checks to ensure that the patch can be applied.
-
Compile patched version of Rosetta.
Once the patch has succeeded, you will need to compile the patched Rosetta source tree. Rosetta comes with a version of SCons, a Python-based build system.$ cd /path/to/rosetta_source $ ./external/scons-local/scons.py bin mode=release
Optionally, you can specify the build target as
bin/AbinitioRelax
to only compile everything necessary for that application. The compilation process can be quite slow, but on multi-core systems you can supply e.g.-j24
to get SCons to use multiple threads (24 in this example). Successful compilation is usually indicated byscons: done building targets.
-
Test
Sample inputs and outputs are available in
test/
. These were generated on a Linux system.Modify
test/inputs/options.txt
to correctly indicate the paths to your patchedrosetta_source
androsetta_database
directories. Then run the patched executable:$ cd /path/to/Bilevel-ILS-Rosetta/test/inputs $ /path/to/rosetta_source/bin/AbinitioRelax.default.linuxgccrelease @options.txt
Note that the extension applied to the executable will depend on the system and compiler used. For example, on a Mac using clang it would be
AbinitioRelax.default.macosclangrelease
. As far as I'm aware, there is no difference between the.default
and non-.default
versions of the executable.
-
The
example
directory
This directory contains an example options file and a helper script that generates it. The settings in this options file are typical of a "production" run, and these settings were used for all results in our paper. The key difference between these settings and those intest/inputs/options.txt
is the setting of theincrease_cycles
parameter. -
The
tools
directory
This directory contains the script used to make the patch files, and a list of all files that were added or modified relative to standard Rosetta. You can use the script if you want to make your own version of our protocols. You may need to modify paths defined therein.
By default, SCons will try to use whatever gcc
refers to in your shell. On standard configurations of MacOS for example, this is usually Apple's LLVM compiler. To change the default, edit the file /path/to/rosetta_source/tools/build/basic.options
.
Another common error when running SCons is e.g.
### KeyError: "Unknown version number 4.8 for compiler 'gcc'"
In this case, you will need to let SCons know about this version of GCC. Edit the file /path/to/rosetta_source/tools/build/options.settings
and add the correct version (in this example, "4.8") to the list of GCC versions provided.
I have only tested this on OSX versions <= 10.9.5, using rather old compilers (llvm-gcc-4.2). I think this version was current at the time Rosetta 3.4 was released, and I simply kept my system that way. The easiest way to get up and running is to use this (or similar) compiler as your system default gcc
. If you do not want this, it is still possible to use a compiler other than the system default. I was able to use gcc48
from macports, but this requires editing the build configuration files in /path/to/rosetta_source/tools/build
. Please raise an issue and let me know if you are trying to do this, as it is a bit fiddly.
I did try to build with Apple's newer LLVM compilers (v6.0), but this did not work. I suspect it might be down to adjusting compiler options in /path/to/rosetta_source/tools/build/basic.settings
- you may be able to get ideas from the version of this file in newer releases of Rosetta, which do work fine on up-to-date MacOS systems. All my production work was done on Linux with GCC, so this is just something I never spent much time on. Please let me know if you are able to get it to work!
Follow the approach in Step 4 of the Installation instructions. An example "production" version of the options file is given in the example/
directory. To generate sets of 1000 decoys, we ran 100 independent runs of our protocols with these options, the key being -abinitio:increase_cycles 100
. This setup uses the same budget of function evaluations as 1000 runs of regular Rosetta AbinitioRelax with -abinitio:increase_cycles 10
.
The fragment sets used in our papers can be found here:
In addition to the default energy-based archiving strategy, you can choose to use a couple of different archiving strategies for local minima. Two examples are the Stochastic Ranking-based archiver operating on contact maps (SRCM) and the Elitist Random (ER) archiver. For details on these archivers, consult Chapter 5 of my PhD thesis, which is available here. A paper based on this chapter is in preparation.
Once you have applied the bilevel-ILS patch to Rosetta, navigate to /path/to/rosetta_source/src/protocols/moves/MonteCarlo.cc
and edit the (mis-named) function MonteCarlo::create_archives_from_cmdline()
, using the examples provided in the comments there (see also Known Isuues below). Some archiver types have additional options, examples of which are also provided. Each archiver is added to a std::vector < MGFUtils::MGFArchive >
, and has the prototype
MGFUtils::MGFArchive(base_size, max_size, archive_type, tag)
, where
base_size
and max_size
are int
s that specify the base and maximum number of structures in the archive, archive_type
is one of a few predefined strings that selects the type of archiver, and tag
is a string that is used to uniquely identify the specific archiver, and is appended to the filenames of the output structures. tag
must be unique across all archivers. Some archivers can be configured with additional options, examples of these are given. An example is the 'rho' parameter for archives based on stochastic ranking, which can be set e.g. to 0.5 using
Archives.back().set_SR_PROB(0.5);
immediately after adding the archive.
You'll notice that a few additional archivers are also implemented; feel free to experiment! Only the energy-based archiver is enabled by default. Once you've made your edits, you will need to recompile the codebase using the instructions above.
If you find this repo useful, please cite our paper in Scientific Reports.
- Table with all newly defined command-line options, what they do and their defaults
-
The
nstruct
parameter must always be set to 1.Setting this parameter to anything greater than 1 will still work, but you will overwrite output files from prior iterations of the
nstruct
loop. In order to run the application many times to get a larger decoy set, I recommend running each run in a separate working directory.The actual number of structures returned by the application is set when configuring the solution archiving strategy(ies) employed. This is done in the (admittedly mis-named) function
MonteCarlo::create_archives_from_cmdline()
insrc/protocols/moves/MonteCarlo.cc
. My plan was eventually to introduce some sort of configuration file for this part of the application, so that users don't need to recompile everytime they want to try out a new configuration of archivers. Please raise an issue if this is something you'd find useful. -
The commandline parameter
-frags:annotate
must (currently) always be provided.This option makes the application keep track of the source of the structural info (PDB template and residue numbers) for each residue in the structure, following each fragment insertion. Correctly implementing this was tricky and needed many edits to many files.