Shaun M. Kandathil, Joe G. Greener and David T. Jones
University College London
- Changes for compatibility with PyTorch 0.4.0 and above but reading in trained model saved with 0.3.0. The 'reference' version using PyTorch 0.3 is still provided.
- Minor bugfixes in test script.
- Updated documentation.
-
Bash shell
-
C and C++ compilers (tested with GCC 4.4.2, 4.8.5, and 5.4.0)
-
Python 2 or 3 (preferably miniconda/anaconda, as this makes the PyTorch install much easier; tested on miniconda Python3)
-
The following Python modules:
- PyTorch >= 0.4.0 (tested on 1.1.0)
-
Third-party programs:
- HH-suite v3.0+ and a recent UniClust30 database (for making alignments; skip if you will only use pre-made alignments)
- CCMpred v0.1.0 (Need this exact version; available here)
- FreeContact 1.0.21 (available here)
- Legacy BLAST 2.2.26 (executables
blastpgp
andmakemat
) and a suitable non-redundant database, e.g. Uniref90, formatted usingformatdb
(needed to generate PSIPRED and SOLVPRED inputs). Legacy BLAST is available here.
All other required programs written by our group are now bundled in this repo and do not need to be installed separately.
On some distributions, the C++ compiler is a separate add-on package and may not be installed by default. For example, on CentOS you will need to yum install
packages gcc
AND gcc-c++
.
For most conda
users, conda install -c pytorch pytorch
should work. Alternatively, visit https://pytorch.org/get-started/locally/
cd src/; make; make install
Edit run_DMP.sh
to indicate the paths to the third-party programs listed above, as well as other variables such as the number of threads to use for various programs. User-editable variables are demarcated by comment lines. We do not recommend changing anything outside this region unless you know what you are doing.
cd test; ./testDMP.sh
The script will use the configuration you have provided in run_DMP.sh
and run a test contact prediction.
NB: the test script will not run PSI-BLAST or HHBlits; the script runs only the remaining parts of the DMP pipeline (using running Option 4 below).
Different versions of OSs, compilers etc. can lead to differing contact scores (as well as outputs from the feature generation programs), so we only test the ranking of the top-L predicted contacts against a reference output.
Run /path/to/run_DMP.sh -h
to see the available options. DMP runs a number of programs to generate input features; their outputs are stored in a number of intermediate files. By default, DMP will attempt to reuse any files with the correct filenames (this is useful for debugging and allows you to 'continue' a failed run). You can force regeneration of intermediate files with the --force
option.
At a minimum, you must provide the path to a FASTA-formatted target sequence in order to run DMP. There are a few different ways to run it:
/path/to/run_DMP.sh -i input.fasta
/path/to/run_DMP.sh -i input.fasta -a input.aln
/path/to/run_DMP.sh -i input.fasta -m input.mtx
Option 4: From sequence, pre-made alignment, and PSSM in legacy BLAST makemat format (does not require BLAST or HHblits):
/path/to/run_DMP.sh -i input.fasta -a input.aln -m input.mtx
The primary output from the script is a file with the extension .con
that has the contact predictions in CASP format, without headers and footers.
A number of intermediate files are also generated during a run, corresponding to the outputs from the various input feature generation programs. These are then composed into a few more files so that the data are in the correct format for the ResNet to use. These latter files have the extensions .map
, .21c
and .fix
.
By default, all of these intermediate files are retained after the run has completed; this behaviour can be changed by specifying --cleanup
to run_DMP.sh
. When --cleanup
is specified, only the PSICOV-formatted MSA (extension .aln
), the PSI-BLAST PSSM (extension .mtx
) and the .con
file are retained. The input .fasta
sequence file is never deleted.
If you find DeepMetaPSICOV useful, please cite our paper in Proteins: https://onlinelibrary.wiley.com/doi/full/10.1002/prot.25779
Yes. DMP was trained on data output by specific versions of the feature generation programs, and you need to use the same versions during inference.
The version of alnstats
in this repository is not affected by the bug in question. Since we verified that the training of DMP did not suffer from the bug, we are releasing the bug-free version for inference. If for any reason you'd like the buggy version of alnstats
, do get in touch.
We know. Keen users will also have spotted that we use a number of input features in common with MetaPSICOV. We wanted to assess whether we improve over MetaPSICOV, DeepCov etc. using exactly the same training data, where possible.