This is the official code repository for the paper Drug Discovery with Dynamic Goal-aware Fragments (ICML 2024).
In our paper, we introduce:
-
Fragment-wise Graph Information Bottleneck (FGIB), a goal-aware fragment extration method using the graph information bottleneck (GIB) theory to construct a fragment vocabulary for target chemical properties.
-
Goal-aware fragment Extraction, Assembly, and Modification (GEAM), a generative framework consists of FGIB, soft-actor critic (SAC), and genetic algorithm (GA), each responsible for fragment extraction, fragment assembly, and fragment modification.
Run the following commands to install the dependencies:
conda create -n geam python==3.7
conda activate geam
conda install cudatoolkit=10.2
pip install torch==1.8.1+cu102 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch-scatter==2.0.8 torch-sparse==0.6.12 torch-cluster==1.5.9 -f https://data.pyg.org/whl/torch-1.8.1+cu102.html
pip install torch-geometric==2.0.4 rdkit requests urllib3==1.26.6 more_itertools gym==0.18
conda install -c conda-forge openbabel
conda install -c dglteam dgl-cuda10.2==0.8
Run the following command to preprocess the ZINC250k dataset:
python utils_fgib/data.py
We provide the pretrained FGIB for target proteins parp1, fa7, 5ht1b, braf, and jak2, respectively (parp1.pt
, fa7.pt
, 5ht1b.pt
, braf.pt
, and jak2.pt
, respectively), in the folder ckpt
.
To train your own FGIB, run the following command:
python train_fgib.py -g ${gpu_id} -t ${target}
# e.g., python train_fgib.py -g 0 -t parp1
We provide the initial fragment vocabularies for target proteins parp1, fa7, 5ht1b, braf, and jak2, respectively (parp1.txt
, fa7.txt
, 5ht1b.txt
, braf.txt
, and jak2.txt
, respectively), in the folder data
.
To construct the initial fragment vocabulary from scratch, run the following command:
python get_frags.py -g ${gpu_id} -t ${target} -m ${gib_path} -v ${vocab_path}
# e.g., python get_frags.py -g 0 -t parp1 -m ckpt/parp1.pt -v data/parp1.txt
To generate molecules with GEAM, run the following command:
python run.py -g ${gpu_id} -t ${target} -m ${gib_path} -v ${vocab_path}
# e.g., python run.py -g 0 -t parp1 -m ckpt/parp1.pt -v data/parp1.txt
To evaluate the generated molecules, run the following command:
python eval.py ${file_name} -t ${target}
# e.g., python eval.py results/file_name.csv -t parp1
If you find this repository and our paper useful, we kindly request to cite our work.
@article{lee2024GEAM,
author = {Seul Lee and Seanie Lee and Kenji Kawaguchi and Sung Ju Hwang},
title = {Drug Discovery with Dynamic Goal-aware Fragments},
journal = {Proceedings of the 41th International Conference on Machine Learning},
year = {2024}
}