This repository contains a comprehensive pipeline designed to evaluate protein-ligand interactions using molecular docking. The pipeline automates several critical steps in the docking process, making it easier and faster to conduct docking experiments. It supports multiple docking software tools, including Vina and OpenEye (OE), and is designed for evaluating small molecules against biological targets. This is useful for a large compound library or molecules generated from a diffusion model, which this page will also use.
- Automated Pipeline: Automates the retrieval, preparation, and docking of protein-ligand complexes. Scripts used for docking are detailed in the
pipeline_scripts
folder. - Flexibility: Supports docking with both Vina and OpenEye, providing flexibility depending on your needs.
- Error Handling: Includes robust error handling and logging to ensure smooth execution of the docking experiments.
- Extensive Documentation: Detailed descriptions of each part of the pipeline, including the methods for preparation, docking, and evaluation, detailed in the Jupyter Notebooks in the
jupyter_notebooks
folder.
Overall, the pipeline consists of 3 stages (not including evaluation of results):
- Download: Retrieves protein-ligand complexes from the PDB database.
- Splitting: Separates the complex into individual protein (receptor) and ligand files.
- Fixing Receptor Structure: Optional fixing of the receptor file for docking purposes, including adding missing residues and hydrogens.
- Receptor Preparation: Prepares the receptor file with the necessary modifications (hydrogen addition, metal ion preservation, etc.) for docking.
- Ligand Preparation: Ensures the ligand file is in the correct format and properly configured for interaction with the binding site.
- Design Unit Creation (OE Only): Generates design units from the protein-ligand complex for OpenEye docking.
- Docking: Executes the docking experiments using Vina or OE.
- Scoring: Extracts and records the binding scores, such as ΔG (Vina) or Chemgauss4 (OE), into a CSV file.
- Data Storage: Systematically stores all inputs, outputs, and intermediate files, ensuring reproducibility.
- Error Rate Analysis: Monitors the pipeline's performance by calculating the error rate.
- Docking Results Evaluation: Benchmarks the computed binding free energies against experimental values or cross-validates them against other docking platforms.
-
Clone the Repository:
git clone https://github.com/meyresearch/EZDock.git
-
Install Dependencies: Make sure to install all required Python libraries as listed in requirements.txt.
pip install -r requirements.txt
-
If there is any missing packages: Go to step_1b_docking_complex_to_dG.ipynb to see if anything is missing. Uncomment the notebook sections that are missing and execute to install them.
The pipeline is done within the python script EZDock_vina.py and EZDock_OE.py.
Before moving on to docking, you need to set up the license path for your OpenEye Toolkits by changing the following:
license_path = '/home/PATH_TO_LICENSE/oe_license.txt'
This is required for both the Vina and the OE pipelines, as some OE functionalities are used to help split complexes into proteins and ligands.
To run this pipeline, you need to edit the python script and specify the paths for the ligand, receptor and reference complex you wish to use for docking, specifically these few lines:
all_ligands = glob.glob('Ligand_Folder/*.sdf') # change this to the path where the ligands are stored
receptor_file = 'receptor.pdb' # change this to the name of the receptor
complex_file = 'complex.pdb' # change this to the name of the reference complex
os.system(f'cp Complex_Folder/{complex_file} .') # change this to the path where the complexes are stored
os.system(f'cp Receptor_Folder/{receptor_file} .') # change this to the path where the receptors are stored
Then, you can specify whether you'd like the pipeline to preserve water, preserve metal, and specify the name of the output CSV file containing all the docking results, by changing the corresponding variables in this funcion:
oe_process_lig_prot(ligand_file,
receptor_file,
complex_file,
preserve_water=True, # default is False
preserve_metal=True, # default is True
csv_out_file='oe_docking_data.csv') # output csv file
or in the Vina pipeline:
vina_process_lig_prot(ligand_file,
receptor_file,
complex_file,
preserve_water=True, # default is False
preserve_metal=True, # default is True
csv_out_file='oe_docking_data.csv') # output csv file
For a detailed explanation of the pipeline codes and how each part works, please visit step_1b_docking_complex_to_dG.ipynb.
To see how the figures in the figures folder were generated and how the data was cleaned, visit step_2_docking_stats_plots.ipynb.
A selected number of results of the docking experiments are stored in the results/ directory, with each run creating a folder containing:
- Docking Scores: A CSV file with the docking scores and any errors encountered after running a high-throughput screening. There may be cleaned versions with only valid data kept, and updated with additional information like ligand SMILES.
- Starting Files: The files that used as input.
- Prepared Files: The prepared receptor and ligand files.
- Output Files: The docked ligand pose(s). (For Vina, this is {exp}_out.pdbqt; for OE, this is {exp}_docked.sdf and there might be 2 files, of which only one is the ligand in the correct place - only the score of the correct poses are kept in the resulting CSV file.)
- Logs: Detailed logs of the docking process for troubleshooting.
- Use of
argparse
to allow users to define variables on the commandline and use the script as an executable.
Contributions are welcome! Please fork this repository and submit a pull request with your improvements or bug fixes.
Selected datasets are available in this Git repo, but you can click here to see all the data files generated for the experiments conducted for robustness testing and for the diffusion model-generated molecules. You will need your UoE login.
Note: this is 17 GB in a zipped file!
See the SILVR_w_EDM and EZDock_OE.py folder for all the codes used. This is mostly codes from SILVR except for the docking parts.