KarmaDock: a deep learning paradigm for ultra-large library docking with fast speed and high accuracy
Ligand docking is one of the core technologies in structure-based virtual screening for drug discovery. However, conventional docking tools and existing deep learning tools may suffer from limited performance in terms of speed, pose quality and binding affinity accuracy. Here we propose KarmaDock, a deep learning approach for ligand docking that integrates the functions of docking acceleration, binding pose generation and correction, and binding strength estimation. The three-stage model consists of the following components: (1) encoders for the protein and ligand to learn the representations of intramolecular interactions; (2) E(n) equivariant graph neural networks with self-attention to update the ligand pose based on both protein–ligand and intramolecular interactions, followed by post-processing to ensure chemically plausible structures; (3) a mixture density network for scoring the binding strength. KarmaDock was validated on four benchmark datasets and tested in a real-world virtual screening project that successfully identified experiment-validated active inhibitors of leukocyte tyrosine kinase (LTK).
If you find it useful, please cite:
Efficient and accurate large library ligand docking with KarmaDock Zhang, Xujun#; Zhang, Odin#; Shen, Chao; Qu, Wanglin; Chen, Shicheng; Cao, Hanqun; Kang, Yu; Wang, Zhe; Wang, Ercheng; Zhang, Jintu; Deng, Yafeng; Liu, Furui; Wang, Tianyue; Du, Hongyan; Wang, Langcheng; Pan, Peichen*; Chen, Guangyong*; Hsieh, Chang-Yu*; Hou, Tingjun*. Published in: Nature Computational Science, 2023, Vol. 3, No. 9, pp. 789-804. DOI: 10.1038/s43588-023-00511-5
The package development version is tested on Linux: Ubuntu 18.04 operating systems.
Dependencies for KarmaDock:
pytorch
pyg
rdkit=2022.09.1 (important!!!)
mdanalysis
prody
git clone https://github.com/schrojunzhang/KarmaDock.git
you can install the env via yaml file
cd KarmaDock
conda env create -f karmadock_env.yaml
or you can download the conda-packed file, and then unzip it in ${anaconda install dir}/anaconda3/envs
. ${anaconda install dir}
represents the dir where the anaconda is installed. For me, ${anaconda install dir}=/root .
mkdir ${anaconda install dir}/anaconda3/envs/karmadock
tar -xzvf karmadock.tar.gz -C ${anaconda install dir}/anaconda3/envs/karmadock
conda activate karmadock
Assume that the project is at /root
and therefore the project path is /root/KarmaDock.
You can download the PDBBind 2020 core set without preprocessing from the PDBBind website OR you can download the version where protein files were prepared by Schrodinger.
cd /root/KarmaDock
wget https://zenodo.org/record/7788083/files/pdbbind2020_core_set.zip?download=1
unzip -q pdbbind2020_core_set.zip?download=1
The purpose of this step is to identify residues that are within a 12Å radius of any ligand atom and use them as the pocket of the protein. The pocket file (xxx_pocket_ligH12A.pdb) will also be saved on the complex_file_dir
.
cd /root/KarmaDock/utils
python -u pre_processing.py --complex_file_dir ~/your/PDBBindDataset/path
e.g.,
cd /root/KarmaDock/utils
python -u pre_processing.py --complex_file_dir /root/KarmaDock/pdbbind2020_core_set
This step will generate graphs for protein-ligand complexes and save them (*.dgl) to graph_file_dir
.
cd /root/KarmaDock/utils
python -u generate_graph.py
--complex_file_dir ~/your/PDBBindDataset/path
--graph_file_dir ~/the/directory/for/saving/graph
e.g.,
cd /root/KarmaDock/utils
python -u generate_graph.py --complex_file_dir /root/KarmaDock/pdbbind2020_core_set --graph_file_dir /root/KarmaDock/pdbbind_graph
This step will perform ligand docking (predict binding poses and binding strengthes) based on the graphs. (finished in about 0.5 min)
cd /root/KarmaDock/utils
python -u ligand_docking.py
--graph_file_dir ~/the/directory/for/saving/graph
--model_file ~/path/of/trained/model/parameters
--out_dir ~/path/for/recording/BindingPoses&DockingScores
--docking True/False whether generating binding poses
--scoring True/False whether predict binding affinities
--correct True/False whether correct the predicted binding poses
--batch_size 64
--random_seed 2023
e.g.,
cd /root/KarmaDock/utils
python -u ligand_docking.py --graph_file_dir /root/KarmaDock/pdbbind_graph --model_file /root/KarmaDock/trained_models/karmadock_screening.pkl --out_dir /root/KarmaDock/pdbbind_result --docking True --scoring True --correct True --batch_size 64 --random_seed 2023
Assume that the project is at /root
and therefore the project path is /root/KarmaDock.
You can download the DEKOIS 2.0 dataset without preprocessing from the DEKOIS website OR you can download the version where protein files were prepared by Schrodinger, glide-docked poses were provided.
cd /root/KarmaDock
wget https://zenodo.org/record/8131256/files/DEKOIS2.zip?download=1
unzip -q DEKOIS2.zip?download=1
This step will perform virtual screening for a specific target PDK1 (predict binding poses and binding strengthes).
You can run the following command on CPUs before performing virtual screening (generate graphs in advance)
cd /root/KarmaDock/utils
python -u virtual_screening_pipeline.py
--mode generate_graph
--ligand_smi ~/the/directory/for/ligand/library/smi
--protein_file ~/the/directory/for/target/protein/pdb
--crystal_ligand_file ~/the/directory/for/crystal/ligand/mol2/for/binding/pocket
--graph_dir ~/the/directory/for/saving/ligand/graphs
--random_seed 2023
e.g.,
cd /root/KarmaDock/utils
python -u virtual_screening_pipeline.py --ligand_smi /root/KarmaDock/DEKOIS2/pdk1/active_decoys.smi --protein_file /root/KarmaDock/DEKOIS2/pdk1/protein/pdk1_protein.pdb --crystal_ligand_file /root/KarmaDock/DEKOIS2/pdk1/protein/pdk1_ligand.mol2 --graph_dir /root/KarmaDock/DEKOIS2/pdk1/karmadock_liggraph --random_seed 2023
Then, you can run the following command on GPUs to perform virtual screening (predict binding poses and binding strengthes)
cd /root/KarmaDock/utils
python -u virtual_screening_pipeline.py
--mode vs
--protein_file ~/the/directory/for/target/protein/pdb
--crystal_ligand_file ~/the/directory/for/crystal/ligand/mol2/for/binding/pocket
--graph_dir ~/the/directory/for/saving/ligand/graphs
--out_dir ~/path/for/recording/BindingPoses&DockingScores
--score_threshold 50
--batch_size 64
--random_seed 2023
--out_uncoorected
--out_corrected
e.g.,
cd /root/KarmaDock/utils
python -u virtual_screening_pipeline.py --protein_file /root/KarmaDock/DEKOIS2/pdk1/protein/pdk1_protein.pdb --crystal_ligand_file /root/KarmaDock/DEKOIS2/pdk1/protein/pdk1_ligand.mol2 --graph_dir /root/KarmaDock/DEKOIS2/pdk1/karmadock_liggraph --out_dir /root/KarmaDock/DEKOIS2/pdk1/karmadocked --score_threshold 50 --batch_size 64 --random_seed 2023 --out_uncoorected --out_corrected
For pure GPU machines, you can run the following command to perform virtual screening (generate graphs on the fly)
cd /root/KarmaDock/utils
python -u virtual_screening.py
--ligand_smi ~/the/directory/for/ligand/library/smi
--protein_file ~/the/directory/for/target/protein/pdb
--crystal_ligand_file ~/the/directory/for/crystal/ligand/mol2/for/binding/pocket
--out_dir ~/path/for/recording/BindingPoses&DockingScores
--score_threshold 50
--batch_size 64
--random_seed 2023
--out_uncoorected
--out_corrected
e.g.,
cd /root/KarmaDock/utils
python -u virtual_screening.py --ligand_smi /root/KarmaDock/DEKOIS2/pdk1/active_decoys.smi --protein_file /root/KarmaDock/DEKOIS2/pdk1/protein/pdk1_protein.pdb --crystal_ligand_file /root/KarmaDock/DEKOIS2/pdk1/protein/pdk1_ligand.mol2 --out_dir /root/KarmaDock/DEKOIS2/pdk1/karmadocked --score_threshold 50 --batch_size 64 --random_seed 2023 --out_uncoorected --out_corrected