Equivalent Graph Neural Network-based Accurate and Ultra-fast Virtual Screening of Small Molecules Targeting miRNA-Protein Complex
If you find it useful, please cite:
Equivalent Graph Neural Network-based Virtual Screening of Ultra-large chemical libraries Targeting miRNA-protein complex Huabei Wang; Zhimin Zhang; Guangyang Zhang, Ming Wen* and Hongmei Lu*. Will Published in: DOI:
autodock vina
python
The package development version is tested on Linux: Ubuntu 22.04 operating systems.
Dependencies for miRPVS:
pytorch
pyg
rdkit=2022.09.1
git clone https://github.com/huabei/miRPVS.git
you can install the env via yaml file
cd miRPVS
conda env create -f requirements.yaml
conda activate miRPVS
this project use ashleve/lightning-hydra-template as the base project.
The entire docking dataset can be downloaded from the ZINC20 Tranches, and we chose a subset of the drug-like data containing 3D structures as the docking dataset.
The molecules were downloaded directly into pdbqt format, which can be used directly for autodock vina docking.
Due to the large amount of data, we can construct an index file for the entire dataset in order to facilitate statistics and sampling of the dataset. By running the following command, you can generate an index file for each subfolder, as well as a structural information file for the molecules.
# assert the zinc data is placed in zinc_drug-like_3d folder
cd data
mkdir zinc20_drug-like_3d
cd zinc20_drug-like_3d
# run zinc download file here.
# after download complete, get all molecule index.
cd ..
ls zinc20_drug-like_3d | xargs -I {} python create_zinc20_hdf5.py {}
Use the following command to extract 1/600 of all molecules to train the model and randomly sample 10k molecules out of the extracted molecules to optimize the docking parameters.
cd data
python sample_data.py zinc20_drug-like_3d
Docking a large number of molecules is recommended to be done using multiple compute nodes, if you are using a slurm cluster, you can refer to the file data/submit_batch_dock_job.py
to assign the docking task.
After getting the docking results for all molecules, assumed that all the docking energies are saved in the dock_results folder. Run the following command to construct the training dataset.
cd data
python construct_train_dataset_from _dock_output.py dock_results
It will create dataset folder to save the raw data.
config
folder
You just need to configure your own hyperparameters in config/experiment and then run:
python src/train.py experiment=egnn
The configuration used for this job is also stored in the config/experiment directory and can be used directly.The config folder contains the hyperparameter configuration files for the training model, which can be changed as appropriate.
Similarly, modify the configuration file config/experiment/egnn_tune.yaml
and run the following command to perform a hyperparameter search of the model.
python src/train.py experiment=egnn_tune
The config/eval.yaml
file needs to be configured with your data locations, model parameter paths, etc. And run:
python src/eval.py
The config/predict.yaml
file needs to be configured with your data locations, model parameter paths, etc. And run:
python src/predict.py