Description

This repository contains an easy-to-use python function for the KM prediction model from our paper "Deep learning allows genome-scale prediction of Michaelis constants from structural features". Please note that the provided model is not identical to the one presented in the paper: Here, we used enzyme representations that are slightly different. Instead of the UniRep model, here we are using the ESM-1b model to create the enzyme representations. It was shown that the ESM-1b model outperforms the UniRep model as it is trained with a more up-to-date model for natural language processing (with a transformer network instead of a LSTM).

Predicting Km values for enzyme-substrate pairs

The KM prediction model was only trained with natural enzyme-substrate pairs. Hence, the model will not be good at detecting non-substrates, but it is only suitable for predicting the KM value if we already know the substrate for an enzyme. Moreover, we only trained our model with wild-type ennymes. Therefore, we would not expect that the model to be good at predicting the effect of singe amino acid mutations, as it was not trained to do so.

Using KEGG Compound IDs as substrate representations

If you wish to use KEGG Compound IDs as inputs for the substrates, you need to unzip a zipped file called "mol-files", which is in the folder "data". The unzipped folder "mol-files" has to be stored in the folder "data".

Alternatively, you can use InChI strings and SMILES strings as substrate representations.

Predicting Km values for BiGG genome-scale metabolic network

We added two jupyter notebookes in the folder "code" ("01 BiGG - ..." and "02 BiGG - ...") that contain code to calcualte KM predictions for genome-scale metabolic netowrks.

Requirements

python 3.7
tensorflow 2.3.1
jupyter
pandas 1.1.3
torch 1.7.1
numpy
rdkit 2020.09.1
fair-esm 0.3.1
py-xgboost 1.3.1

The listed packaged can be installed using conda and anaconda:

pip install torch
pip install numpy
pip install tensorflow
pip install fair-esm
conda install -c conda-forge py-xgboost=1.3.3
conda install -c rdkit rdkit

Content

There exist a Jupyter notebook "Tutorial KM prediction.ipynb" in the folder "code" that contains an example on how to use the KM prediction function.

Problems/Questions

If you face any issues or problems, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
code		code
data		data
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Predicting Km values for enzyme-substrate pairs

Using KEGG Compound IDs as substrate representations

Predicting Km values for BiGG genome-scale metabolic network

Requirements

Content

Problems/Questions

About

Releases

Packages

Languages

License

AlexanderKroll/KM_prediction_function

Folders and files

Latest commit

History

Repository files navigation

Description

Predicting Km values for enzyme-substrate pairs

Using KEGG Compound IDs as substrate representations

Predicting Km values for BiGG genome-scale metabolic network

Requirements

Content

Problems/Questions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages