FlexMol: A Flexible Toolkit for Benchmarking Molecular Relation Learning

Overview

FlexMol is a powerful and flexible toolkit designed to advance molecular relation learning (MRL) by enabling the construction and evaluation of diverse model architectures across various datasets and performance metrics. FlexMol aims to streamline the development process, reduce repetitive coding efforts, and ensure fair comparisons of different models.

FlexMol offers several unique features:

Dynamic and Flexible Model Construction: Easily create over 70,000 distinct model architectures.
Comprehensive Data Support: The first MRL toolkit to support encoders for protein structures, protein sequences, and drug data at the same time.
Advanced Interaction Layers: The first MRL library to introduce interaction layers, enabling more sophisticated modeling of molecular relationships.
User-Friendly API: Customize and develop models with just a few lines of code, making the process straightforward and efficient.
Versatile Task Handling: Supports binary classification, regression, and multi-class classification tasks, covering a wide range of applications.
Customizable Encoders: Create your own custom encoders and seamlessly integrate them with preset encoders to build unique models.

Installation

Build from Source

Clone the Repository

git clone https://github.com/Steven51516/FlexMol.git
cd flexmol

Create a New Conda Environment

 conda create --name flexmol_env python=3.8
 conda activate flexmol_env

Install Dependencies
```
pip install -r requirements.txt
```

Using `pip`

We plan to enable installation using pip for easier setup and dependency management. Stay tuned for updates!

Tutorials

We provide tutorials to get started with FlexMol:

Name	Description
Dataloading1	Data Loading Techniques in FlexMol
Dataloading2	Using FlexMol with TDC Interface
101	Introduce FlexMol Encoders
102	Build and train a simple Dual Encoders model
103	Build models with Multiple Encoders
104	Introduce Interaction Layers
105	Integration of User-Custom Encoders
106	Example of Building Complex Models

Model Building in FlexMol

The model building process in FlexMol is designed to be intuitive and flexible. This process is structured into three main steps:

Step 1: Task Selection and Dataset Loading

Begin by selecting a specific task, which will guide you in loading the appropriate dataset tailored to your research needs. Refer to the tutorials for instructions on how to load from a custom dataset here or load from TDC here.

Example code to load data into FlexMol:

# Example code to load Drug-Target Interaction (DTI) data into a dataframe
# Load data into a dataframe with the columns: "Drug", "Protein", "Protein_ID (optional)", and "Y"
# The optional "Protein_ID" column can be included for 3D encoders that require PDB files as input.
DTI = load_DTI("data/toy_data/dti.txt", delimiter=" ")
print("Drug-Target Interaction data:")
print(DTI.head())

Step 2: Model Customization

In the customization phase, users can define their model by choosing from FlexMol's extensive array of components. This includes the selection of.

16 drug encoders
13 protein sequence encoders
9 protein structure encoders
7 interaction layers

Drug Encoders

Encoder Type	Models
Sequence	CNN, Transformer, Morgan, Daylight, ErG, PubChem, ChemBERTa, ESPF
Graph 2D	GCN, MPNN, GAT, NeuralFP, AttentiveFP, GIN
Graph 3D	SchNet, MGCN

Protein Encoders

Encoder Type	Models
Sequence	CNN, Transformer, AAC, ESPF, PseudoAAC, Quasi-seq, Conjoint triad, ESM, ProtTrans-t5, ProtTrans-bert, ProtTrans-albert, Auto correlation, CTD
Graph 3D	GCN, GAT, GIN, GCN_ESM, GAT_ESM, GIN_ESM, PocketDC, GVP, GearNet

While FlexMol offers the ability to generate a vast array of models, it does not impose restrictions on the number of encoders and interaction layers used, providing users with limitless configuration possibilities.

from FlexMol.encoder import FlexMol

# Example code to build a simple DTI model using Transformer encoders for both drug and protein
FM = FlexMol()

# Initialize Transformer encoders for drug and protein without pooling
de = FM.init_drug_encoder("Transformer", pooling=False)
pe = FM.init_prot_encoder("Transformer", pooling=False)

# Set up the cross-attention interaction layer (requires pooling=False)
interaction_output = FM.set_interaction(
    [de, pe], 
    "cross_attention"
)

# Apply an MLP to the interaction output
output = FM.apply_mlp(interaction_output, head=1)

# Build the model
FM.build_model()

Step 3: Model Construction and Training

Once the model is configured, FlexMol takes over the construction process. It automates the assembly of the model and manages all aspects of data processing and training. The built-in trainer is equipped with 21 metrics for evaluating performance.

from FlexMol.task import BinaryTrainer

# Example code to train a DTI model on binary classification task
# Load the user-provided datasets for training, validation, and testing
train_df, val_df, test_df = ... 
# Initialize the user-customized FlexMol instance
FM = ... 

# Configure the BinaryTrainer with the FlexMol instance and training parameters
trainer = BinaryTrainer(
    FM,  
    task="DTI",
    test_metrics=["accuracy", "precision", "recall", "f1"],
    device="cpu",
    early_stopping="roc-auc",
    epochs=30,
    patience=10,
    lr=0.0001,
    batch_size=128
)

# Prepare the datasets for training, validation, and testing
train_data, val_data, test_data = trainer.prepare_datasets(train_df=train_df, val_df=val_df, test_df=test_df)

# Train the model using the training and validation datasets
trainer.train(train_data, val_data)

# Test the model using the test dataset
trainer.test(test_data)

# Save the trained model to the specified path
trainer.save_model("path/to/save/model.pth")

Design of FlexMol

FlexMol is built on a flexible, modular framework designed to facilitate the dynamic construction of molecular relation models. This section introduces two fundamental components of the FlexMol architecture: Encoders and Interaction Layers.

Encoder

The Encoder component in FlexMol is responsible for transforming raw molecular data into meaningful representations. It begins with preprocessing tasks managed by the Featurizer class, including tokenization, normalization, feature extraction, fingerprinting, and graph construction. The preprocessed data is then processed by the Encode Layer, which generates embeddings during model training and inferencing.

Interaction Layer

The Interaction Layer is crucial for capturing and modeling complex relationships between different molecular entities. Interaction layers can integrate inputs from various FlexMol components, including Encoder Layers and other Interaction Layers.

Contact

Reach us at sliu0727@usc.edu or open a GitHub issue.

License

FlexMol is licensed under the BSD 3-Clause License.

This software includes components modified from the DeepPurpose project, which is licensed under the BSD 3-Clause License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
FlexMol		FlexMol
data		data
experiments		experiments
images		images
tutorials		tutorials
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlexMol: A Flexible Toolkit for Benchmarking Molecular Relation Learning

Overview

Installation

Build from Source

Using `pip`

Tutorials

Model Building in FlexMol

Step 1: Task Selection and Dataset Loading

Step 2: Model Customization

Drug Encoders

Protein Encoders

Step 3: Model Construction and Training

Design of FlexMol

Encoder

Interaction Layer

Contact

License

About

Releases

Packages

Languages

License

Westlake-OmicsAI/FlexMol

Folders and files

Latest commit

History

Repository files navigation

FlexMol: A Flexible Toolkit for Benchmarking Molecular Relation Learning

Overview

Installation

Build from Source

Using pip

Tutorials

Model Building in FlexMol

Step 1: Task Selection and Dataset Loading

Step 2: Model Customization

Drug Encoders

Protein Encoders

Step 3: Model Construction and Training

Design of FlexMol

Encoder

Interaction Layer

Contact

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Using `pip`

Packages