Skip to content

Latest commit

 

History

History
62 lines (49 loc) · 3.03 KB

README.md

File metadata and controls

62 lines (49 loc) · 3.03 KB

Exploring Chemical Space with
Score-based Out-of-distribution Generation

This is the official code repository for the paper Exploring Chemical Space with Score-based Out-of-distribution Generation (ICML 2023), in which we propose Molecular Out-Of-distribution Diffusion (MOOD).

Contribution

  • We propose a novel score-based generative model for OOD generation, which overcomes the limited explorability of previous models by leveraging the novel OOD-controlled reverse-time diffusion.
  • Since the extended exploration space by the OOD control contains molecules that are chemically implausible, we propose a framework for molecule optimization that leverages the gradients of the property prediction network to confine the generated molecules to a novel yet chemically meaningful space.
  • We experimentally demonstrate that the proposed MOOD can generate novel molecules that are drug-like, synthesizable, and have high binding affinity for five protein targets, outperforming existing molecule generation methods.

Dependencies

Run the following commands to install the dependencies:

conda create -n mood python=3.8
conda activate mood
conda install -c pytorch pytorch==1.12.0 cudatoolkit=11.3
conda install -c conda-forge rdkit=2020.09 openbabel
pip install tqdm pyyaml pandas easydict networkx==2.6.3 numpy==1.20.3
chmod u+x scorer/qvina02

Running Experiments

1. Preparation

MOOD utilizes GDSS as its backbone diffusion model. In our paper, we utilized the pretrained gdss_zinc250k_v2.pth GDSS checkpoint, which is in the folder checkpoints/ZINC250k.

Run the following command to preprocess the ZINC250k dataset:

python data/preprocess.py

2. Training a Property Prediction Network $P_\phi$

We provide the pretrained property predictor networks ($P_\text{obj}=\hat{\text{DS}} \times \text{QED} \times \hat{\text{SA}}$) for target proteins parp1, fa7, 5ht1b, braf, and jak2, respectively (prop_parp1.pth, prop_fa7.pth, prop_5ht1b.pth, prop_braf.pth, and prop_jak2.pth, respectively), in the folder checkpoints/ZINC250k.

To train your own property predictor, run the following command:

CUDA_VISIBLE_DEVICES=${gpu_id} python main.py --type train --config prop_train

You can modify hyperparameters in config/prop_train.yaml.

3. Generation and Evaluation

To generate molecules, run the following command:

CUDA_VISIBLE_DEVICES=${gpu_id} python main.py --type sample --config sample

You can modify hyperparameters in config/sample.yaml.

Citation

If you find this repository and our paper useful, we kindly request to cite our work.

@article{lee2023MOOD,
  author    = {Seul Lee and Jaehyeong Jo and Sung Ju Hwang},
  title     = {Exploring Chemical Space with Score-based Out-of-distribution Generation},
  journal   = {Proceedings of the 40th International Conference on Machine Learning},
  year      = {2023}
}