[GraphsGPT] A Graph is Worth $K$ Words:
Euclideanizing Graph using Pure Transformer (ICML2024)

Zhangyang Gao*, Daize Dong*, Cheng Tan, Jun Xia, Bozhen Hu, Stan Z. Li

Published on The 41st International Conference on Machine Learning (ICML 2024).

Introduction

Can we model Non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The Non-Euclidean property have posed a long term challenge in graph modeling. Despite recent graph neural networks and graph transformers efforts encoding graphs as Euclidean vectors, recovering the original graph from vectors remains a challenge. In this paper, we introduce GraphsGPT, featuring an Graph2Seq encoder that transforms Non-Euclidean graphs into learnable GraphWords in the Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from GraphWords to ensure information equivalence. We pretrain GraphsGPT on $100$M molecules and yield some interesting findings:

The pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on $8/9$ graph classification and regression tasks.
The pretrained GraphGPT serves as a strong graph generator, demonstrated by its strong ability to perform both few-shot and conditional graph generation.
Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known Non-Euclidean challenges.
The edge-centric pretraining framework GraphsGPT demonstrates its efficacy in graph domain tasks, excelling in both representation and generation.

This is the official code implementation of ICML 2024 paper A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer.

The model checkpoints can be downloaded from 🤗 Transformers. We provide both the foundational pretrained models with different number of Graph Words $\mathcal{W}$ (GraphsGPT-nW), and the conditional version with one Graph Word (GraphsGPT-1W-C).

Model Name	Model Type	Model Checkpoint
GraphsGPT-1W	Foundation Model
GraphsGPT-2W	Foundation Model
GraphsGPT-4W	Foundation Model
GraphsGPT-8W	Foundation Model
GraphsGPT-1W-C	Finetuned Model

Installation

To get started with GraphsGPT, please run the following commands to install the environments.

git clone git@github.com:A4Bio/GraphsGPT.git
cd GraphsGPT
conda create --name graphsgpt python=3.12
conda activate graphsgpt
pip install -e .[dev]
pip install -r requirement.txt

Quick Start

We provide some Jupyter Notebooks in ./jupyter_notebooks, and their corresponding online Google Colaboratory Notebooks. You can run them for a quick start.

Example Name	Jupyter Notebook	Google Colaboratory
GraphsGPT Pipeline	example_pipeline.ipynb
Graph Clustering Analysis	clustering.ipynb
Graph Hybridization Analysis	hybridization.ipynb
Graph Interpolation Analysis	interpolation.ipynb

Representation

You should first download the configurations and data for finetuning, and put them in ./data_finetune. (We also include the finetuned checkpoints in the model_zoom.zip file for a quick test.)

To evaluate the representation performance of Graph2Seq Encoder, please run:

bash ./scripts/representation/finetune.sh

You can also toggle the --mixup_strategy for graph mixup using Graph2Seq.

Generation

For unconditional generation with GraphGPT Decoder, please refer to README-Generation-Uncond.md.

For conditional generation with GraphGPT-C Decoder, please refer to README-Generation-Cond.md.

To evaluate the few-shots generation performance of GraphGPT Decoder, please run:

bash ./scripts/generation/evaluation/moses.sh
bash ./scripts/generation/evaluation/zinc250k.sh

Citation

@article{gao2024graph,
  title={A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer},
  author={Gao, Zhangyang and Dong, Daize and Tan, Cheng and Xia, Jun and Hu, Bozhen and Li, Stan Z},
  journal={arXiv preprint arXiv:2402.02464},
  year={2024}
}

Contact Us

If you have any questions, please contact:

Zhangyang Gao: gaozhangyang@westlake.edu.cn
Daize Dong: dzdong2019@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
configs		configs
data		data
data_finetune		data_finetune
entrypoints		entrypoints
jupyter_notebooks		jupyter_notebooks
models		models
moses		moses
scripts		scripts
utils		utils
README.md		README.md
graphsgpt.svg		graphsgpt.svg
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[GraphsGPT] A Graph is Worth $K$ Words:
Euclideanizing Graph using Pure Transformer (ICML2024)

Introduction

Installation

Quick Start

Representation

Generation

Citation

Contact Us

About

Releases 1

Packages

Contributors 2

Languages

A4Bio/GraphsGPT

Folders and files

Latest commit

History

Repository files navigation

[GraphsGPT] A Graph is Worth $K$ Words:Euclideanizing Graph using Pure Transformer (ICML2024)

Introduction

Installation

Quick Start

Representation

Generation

Citation

Contact Us

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

[GraphsGPT] A Graph is Worth $K$ Words:
Euclideanizing Graph using Pure Transformer (ICML2024)

Packages