PROTAC-Degradation-Predictor

title	emoji	colorFrom	colorTo	sdk	sdk_version	app_file	pinned	license
PROTAC-Degradation-Predictor	🧬	pink	green	gradio	4.37.2	app.py	false	mit

PROTAC-Degradation-Predictor

A machine learning-based tool for predicting PROTAC protein degradation activity.

📚 Table of Contents

Data Curation
Installation
Documentation and Usage
Training
Citation
License

📝 Data Curation

The code for data curation can be found in the Jupyter notebook data_curation.ipynb.

The folder data/studies contains the training and test data used in each study reported in our paper. The label column that is used for predictions is named "Active (Dmax 0.6, pDC50 6.0)" and contains binary values.

🚀 Installation

To install the package, open your terminal and run the following commands:

git clone --branch=main --depth=1 https://github.com/ribesstefano/PROTAC-Degradation-Predictor.git
cd PROTAC-Degradation-Predictor
pip install .

The package has been developed on a Linux machine with Python 3.10.8. It is recommended to use a virtual environment to avoid conflicts with other packages.

🎯 Documentation and Usage

The package documentation can be found here. For a walkthrough on how to use the package, please refer to the tutorial notebook protac_degradation_predictor_tutorial.ipynb.

After installing the package, you can use it as follows:

import protac_degradation_predictor as pdp

protac_smiles = 'Cc1ncsc1-c1ccc(CNC(=O)[C@@H]2C[C@@H](O)CN2C(=O)[C@@H](NC(=O)COCCCCCCCCCOCC(=O)Nc2ccc(C(=O)Nc3ccc(F)cc3N)cc2)C(C)(C)C)cc1'
e3_ligase = 'VHL'
target_uniprot = 'P04637'
cell_line = 'HeLa'

active_protac = pdp.is_protac_active(
    protac_smiles,
    e3_ligase,
    target_uniprot,
    cell_line,
)

print(f'The given PROTAC is: {"active" if active_protac else "inactive"}')

This example demonstrates how to predict the activity of a PROTAC molecule. The is_protac_active function takes the SMILES string of the PROTAC, the E3 ligase, the UniProt ID of the target protein, and the cell line as inputs. It returns whether the PROTAC is active or not.

The function supports batch computation by passing lists of SMILES strings, E3 ligases, UniProt IDs, and cell lines. In this case, it returns a list of booleans indicating the activity of each PROTAC.

📈 Training

Before running the experiments reported in our work or train on your custom dataset, here are some required steps to follow (assuming one is in the repository directory already):

Download the data from the Cellosaurus database and save it in the data directory:

wget https://ftp.expasy.org/databases/cellosaurus/cellosaurus.txt data/

Make a copy of the Uniprot embeddings to be placed in the data directory:

cp protac_degradation_predictor/data/uniprot2embedding.h5 data/

Create a virtual environment and install the required packages by running the following commands:

conda env create -f environment.yaml
conda activate protac-degradation-predictor

The code for training the PyTorch models can be found in the file run_experiments_pytorch.py.

(Don't forget to adjust the PYTHONPATH environment variable to include the repository directory: export PYTHONPATH=$PYTHONPATH:/path/to/PROTAC-Degradation-Predictor)

Training on Custom Dataset

For training a model on a user-provided dataset, please refer to the guide reported in this README.

📄 Citation

If you use this tool in your research, please cite the following paper:

@article{Ribes_2024,
   title={Modeling PROTAC degradation activity with machine learning},
   volume={6},
   ISSN={2667-3185},
   url={http://dx.doi.org/10.1016/j.ailsci.2024.100104},
   DOI={10.1016/j.ailsci.2024.100104},
   journal={Artificial Intelligence in the Life Sciences},
   publisher={Elsevier BV},
   author={Ribes, Stefano and Nittinger, Eva and Tyrchan, Christian and Mercado, Rocío},
   year={2024},
   month=dec, pages={100104}
}

The directories logs and reports contain the logs and reports generated during the experiments reported in the paper. Additionally, in reports, one can find the pickled Optuna studies for the reported experiments.

The directory models contains the trained models for the experiments reported in the paper.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
data		data
docs		docs
models		models
notebooks		notebooks
plots		plots
protac_degradation_predictor		protac_degradation_predictor
reports		reports
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PROTAC-Degradation-Predictor

📚 Table of Contents

📝 Data Curation

🚀 Installation

🎯 Documentation and Usage

📈 Training

Training on Custom Dataset

📄 Citation

📜 License

About

Releases 3

Packages

Languages

License

ribesstefano/PROTAC-Degradation-Predictor

Folders and files

Latest commit

History

Repository files navigation

PROTAC-Degradation-Predictor

📚 Table of Contents

📝 Data Curation

🚀 Installation

🎯 Documentation and Usage

📈 Training

Training on Custom Dataset

📄 Citation

📜 License

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages