FEP/REST pipeline can be used to calculate the relative free energy change upon changing one part of molecule to the other. Examples include:
- FE change upon inducing mutation into the protein. By calculating relative change of unfolding FE, we can find which mutation stabilizes or destablizes the protein, protein complex, or protein-ligand binding.
- FE change upon changing a part of a ligand. Calculating relative change of total binding FE, we can find which ligand can bind to the protein the most.
In the example below I show how to calculate the stability change of a protein in terms of the Gibbs free-energy of folding.
If you have supercomputer, this part is recommended to be run locally. You will need to git clone
this (fepsuite) repository both locally and remotely.
You first need Eigen3 library to compile fepgen
. If you are using Ubuntu you can install via
sudo apt install libeigen3-dev
If you are using other OS, you can just download the Eigen3 archive and extract it. In that case, modify EIGENFLAGS
in Makefile
to include/
directory of Eigen.
After that, compile the fepgen program:
cd feprest/fepgen
make
If you have successfully compiled the program, you will be able to run fepgen
:
% ./fepgen
usage: ./fepgen --structureA=string --structureB=string --topologyA=string --topologyB=string --structureO=string --topologyO=string [options] ...
options:
--help Print this message
-v, --verbose Be more verbose
-q, --quiet Suppress unnecessary information
--maxdist Maximum distances to fit (double [=1])
--force-bond Cluster bond force constants (double [=500000])
…
This fepgen
program is used to generate input file. We assume that you will copy files generated by fepgen
to remote computer center via scp
or rsync
; how to generate files will be described later.
We are currently using FASPR to prepare protein structure with mutation.1 FASPR
can be prepared as:
git clone https://github.com/tommyhuangthu/FASPR.git
cd FASPR
g++ -ffast-math -O3 -o FASPR src/*.cpp
If you successfully compiled FASPR
, you should be able to run FASPR as a command:
% ./FASPR
###########################################################################
FASPR (Version 20200309)
A method for fast and accurate protein side-chain packing,
which is an important problem in protein structure prediction
and protein design.
Copyright (c) 2020 Xiaoqiang Huang
Yang Zhang Lab
Dept. of Computational Medicine and Bioinformatics
Medical School
University of Michigan
Email:tommyhuangthu@foxmail.com, xiaoqiah@umich.edu
###########################################################################
Usage: ./FASPR -i input.pdb -o output.pdb
[-s sequence.txt] to load a sequence file
If you are using supercomputer, you need to run this on on remote side.
Before running the pipeline, prepare mdtraj and pymbar (ver 3.0.3) package for python3.
pip3 install numpy cython==0.29.37 --user
pip3 install mdtraj pymbar==3.0.3 --user
python3 -c "import mdtraj" # checks whether it works correctly
Newer cython
(>=3.0) changed the interface significantly and mdtraj
does not work with the new one.
pymbar
must be 3.0.3
because pymbar
frequently changes the API within minor version update. (We are planning to remove pymbar
dependency in the future.)
To run feprest
pipeline you need to compile function-enhanced version of GROMACS. Currently we are only supporting GROMACS 2020.
tar xf gromacs-2020.7.tar.gz
cd gromacs-2020.7
patch -p1 < /path/to/fepsuite/feprest/gmx_patch/gromacs-2020-hrex.patch
mkdir build; cd build
cmake -DCMAKE_INSTALL_PREFIX=$HOME/opt/gromacs-2020-hrex -DGMX_MPI=on ..
make -j8 && make install
If you are using CUDA 12+, GROMACS 2020 has a trouble compiling. In this case, adding target computing capability in cmake as -DGMX_CUDA_TARGET_COMPUTE="80;86;89"
may be helpful. Note "80;86;89"
part should be numbers corresponding to your GPU card; see NVIDIA's official CC information for details.
If you have trouble compiling gromacs, consult your supercomputer center's support team.
Here, we show how to generate FEP/REST input in the protein mutation case. For other case see the advanced guide at the bottom of this document.
Make a working directoy and download PDB file of T4 Lysozyme.
mkdir ~/feprest-test
cd ~/feprest-test
cp -r /path/to/fepsuite/forcefields/amber14sb_OL15_fs1.ff .
wget https://files.rcsb.org/download/2LZM.pdb.gz
gunzip 2LZM.pdb.gz
At this moment, you may get either 2lzm.pdb
or 2LZM.pdb
depending on your OS. From now on I assume you have 2LZM.pdb.
Fortunately, PDB 2LZM do not have any missing residues, so we can directly feed this structure into the mutant structure prepration. But first we need to remove water.
grep -v HOH 2LZM.pdb > 2LZM_nowat.pdb
Then, feed it to preparation script:
python3 /path/to/fepsuite/feprest/tools/prep_mutation_fep.py --faspr /path/to/FASPR/FASPR --pdb 2LZM_nowat.pdb --mutation L99A --ff amber14sb_OL15_fs1
If the program finishes without a problem, you should find four directories: wt
, L99A
, wt_L99A
and wt_L99A_ref
.2 The first one wt
corresponds to wild-type structure (without mutation), the second one L99A
corresponds to Leu99Ala mutation. The third one wt_L99A
is the "mixed" structure of wild-type and Leu99Ala mutant. If you visualize wt_L99A/fepbase.pdb
, you will see at residue 99 that Leu sidechain and Ala hydrogens are overlapping.
Likewise, if you look into fepbase.top
, you will see residue 99 (originally Leu) in [ atoms ]
section is mixed with Ala residue:
1560 O 98 ALA O 1560 -5.67900e-01 1.60000e+01 O -5.67900e-01 1.60000e+01
1561 N 99 LEU N 1561 -4.15700e-01 1.40100e+01 N -4.15700e-01 1.40100e+01
1562 H 99 LEU H 1562 2.71900e-01 1.00800e+00 H 2.71900e-01 1.00800e+00
1563 CX 99 LEU CA 1563 -5.18000e-02 1.20100e+01 CX 3.37000e-02 1.20100e+01
1564 H1 99 LEU HA 1564 9.22000e-02 1.00800e+00 H1 8.23000e-02 1.00800e+00
1565 A2C 99 LEU CB 1565 -1.10200e-01 1.20100e+01 CT -1.82500e-01 1.20100e+01
1566 C 99 LEU C 1566 5.97300e-01 1.20100e+01 C 5.97300e-01 1.20100e+01
1567 O 99 LEU O 1567 -5.67900e-01 1.60000e+01 O -5.67900e-01 1.60000e+01
1568 HC 99 LEU HB1 1568 4.57000e-02 1.00800e+00 PHA 0.00000e+00 1.00800e+00
1569 HC 99 LEU HB2 1569 4.57000e-02 1.00800e+00 PHA 0.00000e+00 1.00800e+00
1570 A3C 99 LEU CG 1570 3.53100e-01 1.20100e+01 PHA 0.00000e+00 1.20100e+01
1571 HC 99 LEU HG 1571 -3.61000e-02 1.00800e+00 PHA 0.00000e+00 1.00800e+00
1572 CT 99 LEU CD1 1572 -4.12100e-01 1.20100e+01 PHA 0.00000e+00 1.20100e+01
1573 HC 99 LEU HD11 1573 1.00000e-01 1.00800e+00 PHA 0.00000e+00 1.00800e+00
1574 HC 99 LEU HD12 1574 1.00000e-01 1.00800e+00 PHA 0.00000e+00 1.00800e+00
1575 HC 99 LEU HD13 1575 1.00000e-01 1.00800e+00 PHA 0.00000e+00 1.00800e+00
1576 CT 99 LEU CD2 1576 -4.12100e-01 1.20100e+01 PHA 0.00000e+00 1.20100e+01
1577 HC 99 LEU HD21 1577 1.00000e-01 1.00800e+00 PHA 0.00000e+00 1.00800e+00
1578 HC 99 LEU HD22 1578 1.00000e-01 1.00800e+00 PHA 0.00000e+00 1.00800e+00
1579 HC 99 LEU HD23 1579 1.00000e-01 1.00800e+00 PHA 0.00000e+00 1.00800e+00
1580 PHA 99 ALA HB1 1580 0.00000e+00 1.00800e+00 HC 6.03000e-02 1.00800e+00 ; <- Ala's HB1
1581 PHA 99 ALA HB2 1581 0.00000e+00 1.00800e+00 HC 6.03000e-02 1.00800e+00 ; <- Ala's HB2
1582 PHA 99 ALA HB3 1582 0.00000e+00 1.00800e+00 HC 6.03000e-02 1.00800e+00 ; <- Ala's HB3
1583 N 100 ILE N 1583 -4.15700e-01 1.40100e+01 N -4.15700e-01 1.40100e+01
If you are familiar with GROMACS [ atoms ]
format, you can also see some atoms have different states between A and B. For example, atom CB
is
1565 A2C 99 LEU CB 1565 -1.10200e-01 1.20100e+01 CT -1.82500e-01 1.20100e+01
, where the atom has A2C
atomtype (AMBER 2C
) in state A while it has CT
atomtype in state B.
The fourth, wt_L99A_ref
contains the "reference state", where proteins are cut into three-residue peptide with residues 98, 99 and 100 (residues are abruptly cut without capping). This state is used to calculate the free-energy change in the unfolded state.
scp -r wt_L99A wt_L99A_ref amber14sb_OL15.ff your-supercomputer.somewhere:somedirectory
ssh your-supercomputer.somewhere:
cd somedirectory
Then, copy template file from rundir_template to start the calculation.
cp -r /path/to/fepsuite/feprest/rundir_template/* .
Update run.zsh
and para_conf.zsh
according to your environment. If you are using GPU you typically may want to update the following part:
# Number of MPI processes per replica. For GPU, recommended = 1.
PARA=8 # -> 1
# Number of threads per process. Recommended: for CPU: 1, for GPU: 2-6.
TPP=1 # -> e.g. 4
After the update, run the pipeline with
./run.zsh wt_L99A all
./run.zsh wt_L99A_ref all
and the pipeline will run to the end. The pipeline consists of stages 1-8. Stage 1-3 runs with single replica (minimization, NVT, NPT equilibration run), and stage 4+ runs with multiple copies of the system (NVT, NPT, replica-exchanged production run).
If the script stops due to an error, and if you want to rerun the stage, you can rerun the stage by giving appropriate stage number. For example
./run.zsh wt_L99A 3 4
runs stage 3, waits the completion of stage 3, then runs stage 4 of wt_L99A
. You can see the current completed stages in wt_L99A/done_step.txt
.
If everything works fine, output should be generated at wt_L99A/bar1.log
and wt_L99A_ref/bar1.log
. The final line of each log file represents the free-energy change. By subtracting the reference state from the calculation (such like wt_L99A/bar1.log
- wt_L99A_ref/bar1.log
), you will get the free-energy change upon mutation. If the calculation goes without the problem it should be something around +3-8 kcal/mol, i.e. L99A is predicted to destabilize the protein significantly. Note the experimental value for the free-energy difference of folding is +4.5 kcal/mol.3
(TODO)
Current script does not work with MacOS (gsed
instead of sed
is needed)
Footnotes
-
You can use - and devs are actually using - other softwares to generate mutated structures, e.g. pymol, MODELLER, AlphaFold2, or PDBfixer, to name a few. See the advanced guide for the manual mutagenesis process. ↩
-
prep_mutation_fep.py
tries to reuse the intermediate files to reduce the number of calls to commands such asgmx grompp
. While this is harmless for typical uses, if you want to regenerate files with different parameters (water models, thickness, ions, etc.), you need to clean directories to force regenerating files, e.g.rm -rf wt L99A wt_L99A
. ↩ -
Eriksson AE, Baase WA and Matthews BW. J. Mol. Biol. 229, 747-769 (1993). If you are going to look into the article, please note the sign is the opposite because Eriksson et al. considered the free-energy difference of unfolding. ↩