Plug & Play Directed Evolution

Official implementation of Plug & Play Directed Evolution (PPDE). A fast MCMC-based sampler for mixing and matching unsupervised and supervised sequence models for machine-learning-based directed evolution of proteins.

[DOI] [arxiv link]

Please check out https://github.com/NREL/EvoProtGrad for an easy-to-use library that implements PPDE, is installable via pip, and supports 🤗 HuggingFace protein language models.

Install

Create the conda env with necessary dependencies:

conda env create -f environment.yml

Activate the environment:

conda activate ppde

Install the package:

poetry install

Run MNIST experiments

Simulated annealing sampler

python3 scripts/mnist_sum.py --seed 1 --sampler simulated_annealing --unsupervised_expert ebm --energy_function product_of_experts --simulated_annealing_temp 10 --muts_per_seq_param 5 --energy_lamda 30 --n_iters 20000 --log_every 50 --wild_type 1

MALA-approx sampler

python3 scripts/mnist_sum.py --seed 1 --sampler MALA-approx --unsupervised_expert ebm --energy_function product_of_experts --diffusion_step_size 0.1 --diffusion_relaxation_tau 0.9 --energy_lamda 5 --n_iters 20000 --log_every 50 --wild_type 1

CMA-ES sampler

python3 scripts/mnist_sum.py --seed 1 --sampler CMAES --unsupervised_expert ebm --energy_function product_of_experts --energy_lamda 20 --cmaes_initial_variance 0.1 --n_iters 20000 --log_every 50 --wild_type 1

PPDE sampler

python3 scripts/mnist_sum.py --seed 1 --sampler PPDE --unsupervised_expert ebm --energy_function product_of_experts --ppde_pas_length 10 --energy_lamda 10 --n_iters 20000 --log_every 50 --wild_type 1

By default, the script will save metrics and visualizations to results/mnist_sum/.

Training MNIST models

See ./scripts/train_mnist.sh for instructions on training the MNIST models. Script for training the Denoising Autoencoder (DAE) model: ./scripts/train_binary_mnist_dae.py. Script for training the supervised experts: ./scripts/train_binary_mnist_regression.py.

Run Protein experiments

UPDATE: We discovered that the PPDE protein sampler was running with a "soft" maximum number of mutations from wild type---the sampler would reset the Markov chain to the wild type whenever a mutation proposal was rejected. We have corrected the accept/reject step code (L77 in ppde/protein_samplers/ppde.py), and added a proper "hard" maximum number of mutations constraint. This is easily implemented in our sampler by setting the logits of mutations that result in an edit distance from the wild type greater than a threshold to negative infinity.

Overall, this improves PPDE's diversity scores, average number of mutations in the population, and predicted activity scores, with a reasonable drop in evolutionary density scores---due to increased exploration away from the wild type. See this PDF for updated versions of Table 1 and Table 2 (this is with a "hard" maximum of 10 mutations from wild type, which can be set with argument --nmut_threshold in our code). If aiming to replicate the PPDE protein experiment results from the paper, simply set the flag --paper_results to use the "soft" maximum number of mutations constraint. Note that these flags only affect the PPDE protein sampler (not the baselines or the MNIST experiments).

protein	unsupervised expert	$\lambda$
PABP_YEAST_Fields2013	potts	5
UBE4B_MOUSE_Klevit2013-nscor_log2_ratio	potts	0.5
GFP_AEQVI_Sarkisyan2016	potts	15
PABP_YEAST_Fields2013	transformer	5
UBE4B_MOUSE_Klevit2013-nscor_log2_ratio	transformer	3
GFP_AEQVI_Sarkisyan2016	transformer	1

See ./scripts/run_protein_samplers.sh or:

Random sampler

python3 scripts/directed_evolution.py --seed 1 --sampler Random --unsupervised_expert potts --energy_function product_of_experts --energy_lamda 5 --n_iters 10000 --log_every 50 --protein PABP_YEAST_Fields2013 --msa_path data/proteins/PABP_YEAST.a2m

Simulated annealing sampler

python3 scripts/directed_evolution.py --seed 1 --sampler simulated_annealing --unsupervised_expert potts --energy_function product_of_experts --energy_lamda 5 --n_iters 10000 --log_every 50 --protein PABP_YEAST_Fields2013 --msa_path data/proteins/PABP_YEAST.a2m

MALA-approx sampler

python3 scripts/directed_evolution.py --seed 1 --sampler MALA-approx --unsupervised_expert potts --energy_function product_of_experts --energy_lamda 5 --n_iters 10000 --log_every 50 --protein PABP_YEAST_Fields2013 --msa_path data/proteins/PABP_YEAST.a2m

CMA-ES sampler

python3 scripts/directed_evolution.py --seed 1 --sampler CMAES --unsupervised_expert potts --energy_function product_of_experts --energy_lamda 5 --n_iters 1000 --log_every 50 --protein PABP_YEAST_Fields2013 --msa_path data/proteins/PABP_YEAST.a2m

PPDE sampler

python3 scripts/directed_evolution.py --seed 1 --sampler PPDE --unsupervised_expert potts --energy_function product_of_experts --energy_lamda 5 --n_iters 100 --log_every 50 --protein PABP_YEAST_Fields2013 --msa_path data/proteins/PABP_YEAST.a2m

By default, the script will save metrics in .npy format to results/proteins/$PROTEIN. Compute metrics with ./scripts/make_figures.py.

Cite the work

@article{emami2023plug,
	author={Emami, Patrick and Perreault, Aidan and Law, Jeffrey and Biagioni, David and St. John, Peter C},
	title={Plug & Play Directed Evolution of Proteins with Gradient-based Discrete MCMC},
	journal={Machine Learning: Science and Technology},
	url={http://iopscience.iop.org/article/10.1088/2632-2153/accacd},
	year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
ppde		ppde
scripts		scripts
weights		weights
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Plug & Play Directed Evolution

Install

Run MNIST experiments

Training MNIST models

Run Protein experiments

Cite the work

About

Releases

Packages

Languages

License

pemami4911/ppde

Folders and files

Latest commit

History

Repository files navigation

Plug & Play Directed Evolution

Install

Run MNIST experiments

Training MNIST models

Run Protein experiments

Cite the work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages