This is the official code repository for the paper READRetro: Natural Product Biosynthesis Planning with Retrieval-Augmented Dual-View Retrosynthesis (bioRxiv, 2023).
We also provide a web version for ease of use.
Download the necessary data folder READRetro_data
from Zenodo to ensure proper execution of the code and demonstrations in this repository.
The directory structure of READRetro_data
is as follows:
READRetro_data
├── data.sh
├── data
│ ├── model_train_data
│ └── multistep_data
├── model
│ ├── bionavi
│ ├── g2s
│ │ └── saved_models
│ ├── megan
│ └── retroformer
│ └── saved_models
├── result
└── scripts
Place READRetro_data
into the READRetro directory (i.e., READRetro/READRetro_data
) and run sh data.sh
in READRetro_data
to set up the data.
Ensure the data is correctly located in READRetro
. Verify the following:
READRetro/retroformer/saved_models
should matchREADRetro_data/model/retroformer/saved_models
.READRetro/g2s/saved_models
should matchREADRetro_data/model/g2s/saved_models
.READRetro/data
should matchREADRetro_data/data/multistep_data
.READRetro/result
should matchREADRetro_data/result
.READRetro/scripts
should matchREADRetro_data/scripts
.
The directories READRetro_data/model/bionavi
, READRetro_data/model/megan
, and READRetro_data/data/model_train_data
are required for reproducing the values in the manuscript.
Run the following commands to install the dependencies:
conda create -n readretro python=3.8
conda activate readretro
conda install pytorch==1.12.0 cudatoolkit=11.3 -c pytorch
pip install easydict pandas tqdm numpy==1.22 OpenNMT-py==2.3.0 networkx==2.5
conda install -c conda-forge rdkit=2019.09
Alternatively, you can install the readretro
package through pip:
conda create -n readretro python=3.8 -y
conda activate readretro
pip install readretro==1.2.0
We provide the trained models through Zenodo.
You can use your own models trained using the official codes (https://github.com/coleygroup/Graph2SMILES and https://github.com/yuewan2/Retroformer).
More detailed instructions can be found in demo.ipynb
.
Run the following commands to evaluate the single-step performance of the models:
CUDA_VISIBLE_DEVICES=${gpu_id} python eval_single.py # ensemble
CUDA_VISIBLE_DEVICES=${gpu_id} python eval_single.py -m retroformer # Retroformer
CUDA_VISIBLE_DEVICES=${gpu_id} python eval_single.py -m g2s -s 200 # Graph2SMILES
Run the following command to plan paths of multiple products using multiprocessing:
CUDA_VISIBLE_DEVICES=${gpu_id} python run_mp.py
# e.g., CUDA_VISIBLE_DEVICES=0 python run_mp.py
You can modify other hyperparameters described in run_mp.py
.
Lower num_threads
if you run out of GPU capacity.
Run the following command to plan the retrosynthesis path of your own molecule:
CUDA_VISIBLE_DEVICES=${gpu_id} python run.py ${product}
# e.g., CUDA_VISIBLE_DEVICES=0 python run.py 'O=C1C=C2C=CC(O)CC2O1'
run_readretro -rc ${retroformer_ckpt} -gc ${g2s_ckpt} ${product}
# e.g., run_readretro -rc retroformer/saved_models/biochem.pt -gc g2s/saved_models/biochem.pt 'O=C1C=C2C=CC(O)CC2O1'
# you can replace the checkpoints with your own trained checkpoints of retroformer and g2s
# you should set the corresponding vocab file as an option if you replace the checkpoints
You can modify other hyperparameters described in run.py
.
Run the following command to evaluate the planned paths of the test molecules:
python eval.py ${save_file}
# e.g., python eval.py result/debug.txt
You can reproduce the figures and tables presented in the paper or train your own models by utilizing the provided demo.ipynb
.