- Matt Robinson -
<matthew.robinson@yale.edu>
chain_joiner 'fills in' missing residues in PDB files and creates a new, fixed PDB file. The package mainly uses MODELLER, a homology and comparitive modelling program created by the Sali lab at UCSF. In order to fill in the missing residue gaps, MODELLER is used to create a homology model with the original PDB file serving as a template and the full sequence serving as the target sequence to model.
The process of adding in the missing residues is highly based on this helpful tutorial on the Sali lab's website:
-
Use Modeller to extract the sequence from the original PDB file with missing residues.
-
Create an alignment file, alignment.ali, between the original PDB structure with missing residues and the full fasta sequence (usually available on PDB website).
-
Use Modeller to create the actual homology model.
NOTE: The way the alignment file is constructed in chain_joiner is not guarenteed to work for every possible protein. It is often a good idea to check the automatically generated alignment file if the output seems suspect. Also, the results of chain_joiner will not be that great if the loop is of substantial size. For this, more detailed homology modelling is likely needed. All of the mistakes are my own and not due to MODELLER.
Running with locally stored PDB and fasta files:
join_chains -p pdbfile.pdb -f fastafile.fasta [options]
Running with PDB and sequence files from the Protein Data Bank's website:
chain_joiner_online -p PDB_ID [options]
Input Arguments:
[options]
-a, --automodel
The simplest method for simple comparitive modeling. Will not give
great results but suggested when many chain breaks are present. [default: True]
-f, --fixed_automodel
Builds an automodel and keeps the non-missing residues fixed,
whereas they can move in the other methods. [default: False]
-l, --loopmodel
Builds a model by refining the loop with the missing residues.
Suggested when have one small chain break in the PDB. [default: False]
USING CONDA:
The simplest way to install chain_joiner is through the conda package manager:
conda install -c mc-robinson -c conda-forge -c salilab chain_joiner
Using this method, conda will automatically install both biopandas and modeller, which chain_joiner depends on.
NOTE: one must first obtain a MODELLER license key (free for academic users) from the Sali lab's website. Then set this license equal to an enviroment variable called KEY_MODELLER
.
USING PIP:
One can also download chain_joiner from the Python Package Index (PyPI).
pip install chain_joiner
However, this will require that both MODELLER and biopandas are installed separetly. MODELLER, will likely involve a conda install anyway. So it is suggested you just install chain_joiner with conda, which will handle the MODELLER install automatically.
- Sebastian Raschka. Biopandas: Working with molecular structures in pandas dataframes. The Journal of Open Source Software, 2(14), jun 2017. doi: 10.21105/joss.00279. URL http://dx.doi.org/10.21105/joss.00279.
- B. Webb, A. Sali. Comparative Protein Structure Modeling Using Modeller. Current Protocols in Bioinformatics, John Wiley & Sons, Inc., 5.6.1-5.6.32, 2014.
If you run into an error/issue with chain_joiner, please let me know at matthew.robinson@yale.edu. Feel free to also contribute and send me a pull request.