DisVis-based filtering of contacts from co-evolution data (or other sources)
Here you can find the dataset described in the manuscript: Improving the Quality of Co-evolution Intermolecular Contact Prediction with DisVis Siri Camee van Keulen, Alexandre M.J.J. Bonvin
The content of this Github repository can be downloaded in its entirety from Zenodo:
In this repository you can find 26 directories for each complex in the dataset. The name of each directory is composed of the PDB ID, the Green ID (see manuscript) and the number of true contacts within the top 10 according to Green et al. [1].
PDBID_GreenID_contacts (e.g. 1FM0_allpdb0609_6)
│ GreenID_contacts_pdb_disvis_top20_10A.txt
│
└─── ana_scripts
│
└─── p1_coevol_20restraints_0%
│
└─── p2_coevol_20restraints_50%
|
└─── p3_coevol_10restraints_0%
|
└─── p4_disvis_10restraints_0%
|
└─── p5_coevol_10restraints_50%
|
└─── p6_disvis_10restraints_50%
|
└─── p7_coevol_5restraints_0%
|
└─── p8_disvis_5restraints_0%
|
└─── p9_disvis_20restraints_zscore_lt0_5_50%
|
└─── p10_disvis_20restraints_zscore_lt1_50%
In each complex folder, 10 directories and one file can be found. In this file (GreenID_contacts_pdb_disvis_top20_10A.txt) the top 20 contacts (excluding unresolved residue contacts) are described according to DisVis format.
DisVis Format Example
A 53 CA B 11 CA 0 10
Here a contact is described between the CA atom of residue 53 of chain A and the CA atom of residue 11 of chain B. The lower bound is 0 Angstrom and the upper bound is 10 Angstrom.
This file can be used together with the pdb files of the complex on the DisVis webserver [2] to calculate the z-score for each contact. The pdb files of each complex can be found in every protocol folder in the complex directory and are named by combining the GreenID, the number of true contacts within the top 10 according to Green et al. [1] and the chain ID (e.g. allpdb0609_6_A.pdb
). Both pdb files for chain A and chain B are required to run the DisVis calculation.
Ten directories in each complex directory include the files to perform the protocols described in the manuscript. The numbering of the protocols is according to Table 2 in the manuscript.
The name of each protocol includes the protocol number according to the manuscript, contact method which was used to arrange the contacts in the distance restraint file (disvis or coevolution), the number of contacts included in the distance restraint file and the percentage of random removal for the contact list during docking (e.g. p6_disvis_10restraints_50%
).
Inside each protocol the architecture is as follows:
ProtocolNumber_ContactMethod_NumberOfContacts_Removal (e.g p1_coevol_20restraints_50%)
│ GreenID_contacts_chainA.pdb
| GreenID_contacts_chainB.pdb
│ ambig.tbl
| hbonds.tbl
| run.cns
|
└─── output
- Protein Structures:
GreenID_contacts_chainA.pdb
coordinates for chain AGreenID_contacts_chainB.pdb
coordinates for chain B
- Distance restraints:
ambig.tbl
Ambiguous interaction restraintshbonds.tbl
Unambiguous restraints defined to keep the chains together in case of chain break
- Docking input:
run.cns
The HADDOCK parameter file defining the docking protocol and settings
The output directory includes selected docking output and output from the analysis scripts included in the ana_scripts
directory.
ambig.tbl
Used Ambiguous interaction restraints during dockingDockQ.dat
List of DockQ output for all 200 models of itw also include the i-RMSD valuescluster.out
cluster output list of the generated models in itwfile.list
ranked models according to haddock itw scoreclusters_haddock-sorted.stat_best4
ranked clusters according to haddock itw scorefile_nam_clust{Cluster_Number}_best4
top 4 models for every cluster
Each complex directory includes an ana_scripts
directory (see Section 1.1.). Here all scripts are made available to obtain the DockQ.dat
and cluster files from the output directory
.
target.pdb
Reference complex structure with renumbered atoms and renamed chainID that matches HADDOCK outputcluster-fnat.csh
fraction-native.csh
i-rmsd_to_xray.csh
l-rmsd_to_xray.csh
run_all-no-it0.csh
make-target-files.csh
run_all.csh
run_all-dockQ.csh
run_dockQ.csh
All required scripts for the analysistarget.contacts10
target.izoneA
target.contacts5
target.izoneB
target.izone
target.lzone
All required files for the analysis
the predicted 200 models for each protocol can be found on Zenodo: