Equivalent Graph Neural Network-based Accurate and Ultra-fast Virtual Screening of Small Molecules Targeting miRNA-Protein Complex

Software Requirements

autodock vina

python

OS Requirements

The package development version is tested on Linux: Ubuntu 22.04 operating systems.

Python Dependencies

Dependencies for miRPVS:

pytorch
pyg
rdkit=2022.09.1

Installation Guide

download this repo

git clone https://github.com/huabei/miRPVS.git

install env

you can install the env via yaml file

cd miRPVS
conda env create -f requirements.yaml
conda activate miRPVS

this project use ashleve/lightning-hydra-template as the base project.

Dataset Download and Processing

The entire docking dataset can be downloaded from the ZINC20 Tranches, and we chose a subset of the drug-like data containing 3D structures as the docking dataset.

The molecules were downloaded directly into pdbqt format, which can be used directly for autodock vina docking.

Due to the large amount of data, we can construct an index file for the entire dataset in order to facilitate statistics and sampling of the dataset. By running the following command, you can generate an index file for each subfolder, as well as a structural information file for the molecules.

# assert the zinc data is placed in zinc_drug-like_3d folder
cd data
mkdir zinc20_drug-like_3d
cd zinc20_drug-like_3d
# run zinc download file here.
# after download complete, get all molecule index.
cd ..
ls zinc20_drug-like_3d | xargs -I {} python create_zinc20_hdf5.py {}

Sample Ligand and Docking

Sample

Use the following command to extract 1/600 of all molecules to train the model and randomly sample 10k molecules out of the extracted molecules to optimize the docking parameters.

cd data
python sample_data.py zinc20_drug-like_3d

Docking

Docking a large number of molecules is recommended to be done using multiple compute nodes, if you are using a slurm cluster, you can refer to the file data/submit_batch_dock_job.py to assign the docking task.

Constructing the training dataset

After getting the docking results for all molecules, assumed that all the docking energies are saved in the dock_results folder. Run the following command to construct the training dataset.

cd data
python construct_train_dataset_from _dock_output.py dock_results

It will create dataset folder to save the raw data.

Train model

config folder

You just need to configure your own hyperparameters in config/experiment and then run：

python src/train.py experiment=egnn

The configuration used for this job is also stored in the config/experiment directory and can be used directly.The config folder contains the hyperparameter configuration files for the training model, which can be changed as appropriate.

Model Tuning

Similarly, modify the configuration file config/experiment/egnn_tune.yaml and run the following command to perform a hyperparameter search of the model.

python src/train.py experiment=egnn_tune

eval

The config/eval.yaml file needs to be configured with your data locations, model parameter paths, etc. And run:

python src/eval.py

Screen The Whole Dataset

The config/predict.yaml file needs to be configured with your data locations, model parameter paths, etc. And run:

python src/predict.py

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
ckpt		ckpt
configs		configs
data		data
img		img
notebooks		notebooks
src		src
.project-root		.project-root
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements.yaml		requirements.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Equivalent Graph Neural Network-based Accurate and Ultra-fast Virtual Screening of Small Molecules Targeting miRNA-Protein Complex

Contents

Software Requirements

OS Requirements

Python Dependencies

Installation Guide

download this repo

install env

Dataset Download and Processing

Sample Ligand and Docking

Sample

Docking

Constructing the training dataset

Train model

Model Tuning

eval

Screen The Whole Dataset

About

Releases

Packages

Languages

huabei/miRPVS

Folders and files

Latest commit

History

Repository files navigation

Equivalent Graph Neural Network-based Accurate and Ultra-fast Virtual Screening of Small Molecules Targeting miRNA-Protein Complex

Contents

Software Requirements

OS Requirements

Python Dependencies

Installation Guide

download this repo

install env

Dataset Download and Processing

Sample Ligand and Docking

Sample

Docking

Constructing the training dataset

Train model

Model Tuning

eval

Screen The Whole Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages