Automated 3D Pre-training for Molecular Property Prediction

This repository provides the source code for 'Adaptive 3D Pre-Training for Molecular Property Prediction'. 3D PGT aims to use geometric information to design pre-training tasks to enhance the property prediction tasks of downstream 2D molecular graphs. The whole process consists of two stages：

In the 3D pre-training stage, 3D PGT performs several generative pre-training tasks on the dataset containing 3D information
In the finetune stage, 3D PGT fine-tunes the pre-trained model on molecular datasets containing only 2D topological structures and performs property prediction tasks

Requirements

python>=3.7, pytorch=1.10.0, pytorch_geometric==2.0.4, numpy>=1.21.2, pandas>=1.3.4
rdkit>=2022.9.3, scikit-learn>=1.1.2, ogb>=1.3.3

Dataset Preprocessing

For dataset preprocessing in GEOM, please use the following commands:

python GEOM_dataset_preparation.py -n_mol $N_MOL --n_upper $N_UPPER --data_folder $ SLURM_TMPDIR

For PCQM4Mv2 dataset, it is a recently published dataset for the OGB Large Scale Challenge built to aide the development of state-of-the-art machine learning models for molecular property prediction. The task is for the quantum chemistry task of predicting the HOMO-LUMO energy gap of a molecule.

For 3D pre-training in 3D-PGT

You can implement adaptive 3D pre-training by running the following code:

# Running 3D PGT for pre-training on GEOM dataset
python pretrain_main.py --cfg configs/GPS/pre-train_Drugs.yaml wandb.use False

# Running 3D PGT for pre-training on PCQM4Mv2 dataset
python pretrain_main.py --cfg configs/GPS/pcqm4m-GPS.yaml wandb.use False

For Downstream tasks

You can use the following code to finetune downstream tasks, but pay attention to setting the addresses of downstream task datasets and pre-trained model files in the config file.

Running 3D PGT for finetuning on GEOM-Drugs dataset
python main.py --cfg configs/GPS/finetune_Drugs.yaml

Cite

Please kindly cite our paper if you use this code:

@article{wang2023automated,
  title={Automated 3D Pre-Training for Molecular Property Prediction},
  author={Wang, Xu and Zhao, Huan and Tu, Weiwei and Yao, Quanming},
  journal={arXiv preprint arXiv:2306.07812},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
datasets		datasets
graphgps		graphgps
run		run
tests		tests
unittests		unittests
8benchmark.sh		8benchmark.sh
8benchmark_muv+hiv.sh		8benchmark_muv+hiv.sh
8benchmark_sider+clintox.sh		8benchmark_sider+clintox.sh
Esol.sh		Esol.sh
GEOM_dataset_preparation.py		GEOM_dataset_preparation.py
LICENSE		LICENSE
Method_Framework.png		Method_Framework.png
README.md		README.md
Supplementary_Appendix.pdf		Supplementary_Appendix.pdf
extended_feature.py		extended_feature.py
finetune_l4_1k.sh		finetune_l4_1k.sh
finetune_main.py		finetune_main.py
finetune_p4_1k.sh		finetune_p4_1k.sh
lipo.sh		lipo.sh
local_feature.py		local_feature.py
main.py		main.py
pretrain_main.py		pretrain_main.py
setup.py		setup.py
train_class_8.sh		train_class_8.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated 3D Pre-training for Molecular Property Prediction

Requirements

Dataset Preprocessing

For 3D pre-training in 3D-PGT

For Downstream tasks

Cite

About

Releases

Packages

Languages

License

LARS-research/3D-PGT

Folders and files

Latest commit

History

Repository files navigation

Automated 3D Pre-training for Molecular Property Prediction

Requirements

Dataset Preprocessing

For 3D pre-training in 3D-PGT

For Downstream tasks

Cite

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages