Skip to content

Latest commit

 

History

History
242 lines (173 loc) · 12.1 KB

README.md

File metadata and controls

242 lines (173 loc) · 12.1 KB

FEP/REST pipeline: free-energy perturbation pipeline for relative free-energy difference analysis

What's this?

FEP/REST pipeline can be used to calculate the relative free energy change upon changing one part of molecule to the other. Examples include:

  • FE change upon inducing mutation into the protein. By calculating relative change of unfolding FE, we can find which mutation stabilizes or destablizes the protein, protein complex, or protein-ligand binding.
  • FE change upon changing a part of a ligand. Calculating relative change of total binding FE, we can find which ligand can bind to the protein the most.

How to use the pipeline

In the example below I show how to calculate the stability change of a protein in terms of the Gibbs free-energy of folding.

Preparations

Preparation 1: (local) compiling fepgen, FEP input file generator

If you have supercomputer, this part is recommended to be run locally. You will need to git clone this (fepsuite) repository both locally and remotely.

You first need Eigen3 library to compile fepgen. If you are using Ubuntu you can install via

sudo apt install libeigen3-dev

If you are using other OS, you can just download the Eigen3 archive and extract it. In that case, modify EIGENFLAGS in Makefile to include/ directory of Eigen.

After that, compile the fepgen program:

cd feprest/fepgen
make

If you have successfully compiled the program, you will be able to run fepgen:

% ./fepgen
usage: ./fepgen --structureA=string --structureB=string --topologyA=string --topologyB=string --structureO=string --topologyO=string [options] ... 
options:
      --help                   Print this message
  -v, --verbose                Be more verbose
  -q, --quiet                  Suppress unnecessary information
      --maxdist                Maximum distances to fit (double [=1])
      --force-bond             Cluster bond force constants (double [=500000])
…

This fepgen program is used to generate input file. We assume that you will copy files generated by fepgen to remote computer center via scp or rsync; how to generate files will be described later.

Preparation 2: (local) preparing FASPR to mutate amino acid on protein

We are currently using FASPR to prepare protein structure with mutation.1 FASPR can be prepared as:

git clone https://github.com/tommyhuangthu/FASPR.git
cd FASPR
g++ -ffast-math -O3 -o FASPR src/*.cpp

If you successfully compiled FASPR, you should be able to run FASPR as a command:

% ./FASPR
###########################################################################
                    FASPR (Version 20200309)                 
  A method for fast and accurate protein side-chain packing, 
which is an important problem in protein structure prediction
and protein design.

Copyright (c) 2020 Xiaoqiang Huang
Yang Zhang Lab
Dept. of Computational Medicine and Bioinformatics
Medical School
University of Michigan
Email:tommyhuangthu@foxmail.com, xiaoqiah@umich.edu
###########################################################################
Usage: ./FASPR -i input.pdb -o output.pdb
[-s sequence.txt] to load a sequence file

Preparation 3: (remote) preparing python libraries

If you are using supercomputer, you need to run this on on remote side.

Before running the pipeline, prepare mdtraj and pymbar (ver 3.0.3) package for python3.

pip3 install numpy cython==0.29.37 --user
pip3 install mdtraj pymbar==3.0.3 --user
python3 -c "import mdtraj"   # checks whether it works correctly

Newer cython (>=3.0) changed the interface significantly and mdtraj does not work with the new one. pymbar must be 3.0.3 because pymbar frequently changes the API within minor version update. (We are planning to remove pymbar dependency in the future.)

Preparation 4: (remote) compiling patched GROMACS

To run feprest pipeline you need to compile function-enhanced version of GROMACS. Currently we are only supporting GROMACS 2020.

tar xf gromacs-2020.7.tar.gz
cd gromacs-2020.7
patch -p1 < /path/to/fepsuite/feprest/gmx_patch/gromacs-2020-hrex.patch
mkdir build; cd build
cmake -DCMAKE_INSTALL_PREFIX=$HOME/opt/gromacs-2020-hrex -DGMX_MPI=on ..
make -j8 && make install

If you are using CUDA 12+, GROMACS 2020 has a trouble compiling. In this case, adding target computing capability in cmake as -DGMX_CUDA_TARGET_COMPUTE="80;86;89" may be helpful. Note "80;86;89" part should be numbers corresponding to your GPU card; see NVIDIA's official CC information for details.

If you have trouble compiling gromacs, consult your supercomputer center's support team.

Preparing the input structure

Here, we show how to generate FEP/REST input in the protein mutation case. For other case see the advanced guide at the bottom of this document.

Preparing FEP input file with prep_mutation_fep.py

Make a working directoy and download PDB file of T4 Lysozyme.

mkdir ~/feprest-test
cd ~/feprest-test
cp -r /path/to/fepsuite/forcefields/amber14sb_OL15_fs1.ff .
wget https://files.rcsb.org/download/2LZM.pdb.gz
gunzip 2LZM.pdb.gz

At this moment, you may get either 2lzm.pdb or 2LZM.pdb depending on your OS. From now on I assume you have 2LZM.pdb.

Fortunately, PDB 2LZM do not have any missing residues, so we can directly feed this structure into the mutant structure prepration. But first we need to remove water.

grep -v HOH 2LZM.pdb > 2LZM_nowat.pdb

Then, feed it to preparation script:

python3 /path/to/fepsuite/feprest/tools/prep_mutation_fep.py --faspr /path/to/FASPR/FASPR --pdb 2LZM_nowat.pdb --mutation L99A --ff amber14sb_OL15_fs1

If the program finishes without a problem, you should find four directories: wt, L99A, wt_L99A and wt_L99A_ref.2 The first one wt corresponds to wild-type structure (without mutation), the second one L99A corresponds to Leu99Ala mutation. The third one wt_L99A is the "mixed" structure of wild-type and Leu99Ala mutant. If you visualize wt_L99A/fepbase.pdb, you will see at residue 99 that Leu sidechain and Ala hydrogens are overlapping.

Likewise, if you look into fepbase.top, you will see residue 99 (originally Leu) in [ atoms ] section is mixed with Ala residue:

1560   O   98   ALA      O 1560 -5.67900e-01  1.60000e+01   O -5.67900e-01  1.60000e+01
1561   N   99   LEU      N 1561 -4.15700e-01  1.40100e+01   N -4.15700e-01  1.40100e+01
1562   H   99   LEU      H 1562  2.71900e-01  1.00800e+00   H  2.71900e-01  1.00800e+00
1563  CX   99   LEU     CA 1563 -5.18000e-02  1.20100e+01  CX  3.37000e-02  1.20100e+01
1564  H1   99   LEU     HA 1564  9.22000e-02  1.00800e+00  H1  8.23000e-02  1.00800e+00
1565  A2C  99   LEU     CB 1565 -1.10200e-01  1.20100e+01  CT -1.82500e-01  1.20100e+01
1566   C   99   LEU      C 1566  5.97300e-01  1.20100e+01   C  5.97300e-01  1.20100e+01
1567   O   99   LEU      O 1567 -5.67900e-01  1.60000e+01   O -5.67900e-01  1.60000e+01
1568  HC   99   LEU    HB1 1568  4.57000e-02  1.00800e+00 PHA  0.00000e+00  1.00800e+00
1569  HC   99   LEU    HB2 1569  4.57000e-02  1.00800e+00 PHA  0.00000e+00  1.00800e+00
1570  A3C  99   LEU     CG 1570  3.53100e-01  1.20100e+01 PHA  0.00000e+00  1.20100e+01
1571  HC   99   LEU     HG 1571 -3.61000e-02  1.00800e+00 PHA  0.00000e+00  1.00800e+00
1572  CT   99   LEU    CD1 1572 -4.12100e-01  1.20100e+01 PHA  0.00000e+00  1.20100e+01
1573  HC   99   LEU   HD11 1573  1.00000e-01  1.00800e+00 PHA  0.00000e+00  1.00800e+00
1574  HC   99   LEU   HD12 1574  1.00000e-01  1.00800e+00 PHA  0.00000e+00  1.00800e+00
1575  HC   99   LEU   HD13 1575  1.00000e-01  1.00800e+00 PHA  0.00000e+00  1.00800e+00
1576  CT   99   LEU    CD2 1576 -4.12100e-01  1.20100e+01 PHA  0.00000e+00  1.20100e+01
1577  HC   99   LEU   HD21 1577  1.00000e-01  1.00800e+00 PHA  0.00000e+00  1.00800e+00
1578  HC   99   LEU   HD22 1578  1.00000e-01  1.00800e+00 PHA  0.00000e+00  1.00800e+00
1579  HC   99   LEU   HD23 1579  1.00000e-01  1.00800e+00 PHA  0.00000e+00  1.00800e+00
1580 PHA   99   ALA    HB1 1580  0.00000e+00  1.00800e+00  HC  6.03000e-02  1.00800e+00 ; <- Ala's HB1
1581 PHA   99   ALA    HB2 1581  0.00000e+00  1.00800e+00  HC  6.03000e-02  1.00800e+00 ; <- Ala's HB2
1582 PHA   99   ALA    HB3 1582  0.00000e+00  1.00800e+00  HC  6.03000e-02  1.00800e+00 ; <- Ala's HB3
1583   N  100   ILE      N 1583 -4.15700e-01  1.40100e+01   N -4.15700e-01  1.40100e+01

If you are familiar with GROMACS [ atoms ] format, you can also see some atoms have different states between A and B. For example, atom CB is

1565  A2C  99   LEU     CB 1565 -1.10200e-01  1.20100e+01  CT -1.82500e-01  1.20100e+01

, where the atom has A2C atomtype (AMBER 2C) in state A while it has CT atomtype in state B.

The fourth, wt_L99A_ref contains the "reference state", where proteins are cut into three-residue peptide with residues 98, 99 and 100 (residues are abruptly cut without capping). This state is used to calculate the free-energy change in the unfolded state.

Copy FEP input file to remote server

scp -r wt_L99A wt_L99A_ref amber14sb_OL15.ff your-supercomputer.somewhere:somedirectory
ssh your-supercomputer.somewhere:
cd somedirectory

Running pipeline on remote server

Then, copy template file from rundir_template to start the calculation.

cp -r /path/to/fepsuite/feprest/rundir_template/* .

Update run.zsh and para_conf.zsh according to your environment. If you are using GPU you typically may want to update the following part:

# Number of MPI processes per replica. For GPU, recommended = 1.
PARA=8  # -> 1

# Number of threads per process. Recommended: for CPU: 1, for GPU: 2-6.
TPP=1   # -> e.g. 4

After the update, run the pipeline with

./run.zsh wt_L99A all
./run.zsh wt_L99A_ref all

and the pipeline will run to the end. The pipeline consists of stages 1-8. Stage 1-3 runs with single replica (minimization, NVT, NPT equilibration run), and stage 4+ runs with multiple copies of the system (NVT, NPT, replica-exchanged production run).

If the script stops due to an error, and if you want to rerun the stage, you can rerun the stage by giving appropriate stage number. For example

./run.zsh wt_L99A 3 4

runs stage 3, waits the completion of stage 3, then runs stage 4 of wt_L99A. You can see the current completed stages in wt_L99A/done_step.txt.

Understanding output

If everything works fine, output should be generated at wt_L99A/bar1.log and wt_L99A_ref/bar1.log. The final line of each log file represents the free-energy change. By subtracting the reference state from the calculation (such like wt_L99A/bar1.log - wt_L99A_ref/bar1.log), you will get the free-energy change upon mutation. If the calculation goes without the problem it should be something around +3-8 kcal/mol, i.e. L99A is predicted to destabilize the protein significantly. Note the experimental value for the free-energy difference of folding is +4.5 kcal/mol.3

Advanced guide: preparing FEP/REST input manually

(TODO)

Advanced guide: matching atoms manually

Limitations

Current script does not work with MacOS (gsed instead of sed is needed)

Footnotes

  1. You can use - and devs are actually using - other softwares to generate mutated structures, e.g. pymol, MODELLER, AlphaFold2, or PDBfixer, to name a few. See the advanced guide for the manual mutagenesis process.

  2. prep_mutation_fep.py tries to reuse the intermediate files to reduce the number of calls to commands such as gmx grompp. While this is harmless for typical uses, if you want to regenerate files with different parameters (water models, thickness, ions, etc.), you need to clean directories to force regenerating files, e.g. rm -rf wt L99A wt_L99A.

  3. Eriksson AE, Baase WA and Matthews BW. J. Mol. Biol. 229, 747-769 (1993). If you are going to look into the article, please note the sign is the opposite because Eriksson et al. considered the free-energy difference of unfolding.