This repository contains the code used to generate the results reported in the paper: RGCVAE: Relational Graph Conditioned Variational Autoencoder for Molecule Design.
This project uses the conda
environment. In the root
folder you can find, for each model, the .yml
file for the
configuration of the conda
environment and also the .txt
files for the pip
environment. Note that some versions of
the dependencies can generate problems in the configuration of the environment. For this reason, although
the setup.bash
file is present for the configuration of each project, it is better to configure them manually.
The project is structured as follows:
data
: contains the code to execute to make the dataset;results
: contains the checkpoints and the results;model
: contains the code about the model;utils
: contains all the utility code.
First you need to download the necessary files and configuring the environment by running the following commands:
sh setup.bash install
conda activate rgcvae
In order to make de datasets type the following commands:
cd data
python make_dataset.py --dataset [dataset]
Where dataset can be:
- qm9
- qm9_long2
- zinc
- zinc_long2
In order to train the model use:
python RGCVAE.py --dataset [dataset] --config '{"generation":0, "log_dir":"./results", "use_mask":false}'
In order to generate new molecules:
python RGCVAE.py --dataset [dataset] --restore results/[checkpoint].pickle --config '{"generation":1, "log_dir":"./results"}'
While, in order to reconstruct the molecules:
python RGCVAE.py --dataset [dataset] --restore results/[checkpoint].pickle --config '{"generation":2, "log_dir":"./results"}'
In order to analyze the results, we used the following environment: ComparisonsDGM.
In order to optimize a molecule use the following command:
python RGCVAE.py --dataset zinc_long2 --restore results/[checkpoint].pickle --config '{"generation":1, "use_mask":false, "suffix":"opt", "optimization_step": 20, "number_of_generation":100, "prior_learning_rate":0.3, "use_argmax_nodes":true, "use_argmax_bonds":true}'
Soon we will public the pre-processed datasets, pre-trained models and generated molecules.
NOTE: Some functions are extracted from the following source code.
MIT