Camille Dunning, Zhishang Luo, Sirui Tao
Halıcıoğlu Data Science Institute, UC San Diego, La Jolla, CA
TODO: Flowchart figure
Abstract: Graph Neural Networks (GNNs) have gained tremendous popularity for their potential to effectively learn from graph-structured data, commonly encountered in real-world applications. However, most of these models, based on the message-passing paradigm (interactions within a neighborhood of a few nodes), can only handle local interactions within a graph. When we enforce the models to use information from far away nodes, we will encounter two major issues: oversmoothing & oversquashing. Architectures such as the transformer and diffusion models are introduced to solve this; although transformers are powerful, they require significant computational resources for both training and inference, thereby limiting their scalability, particularly for graphs with long-term dependencies. Hence, this paper proposes GraphHSCN—a Heterogenized Spectral Cluster Network, a message-passing-based approach specifically designed for capturing long-range interaction. On our first iteration of ablation studies, we observe reduced time complexities compared to SAN, the most popular graph transformer model, yet comparable performance in graph-level prediction tasks.
- Graph coarsening via spectral clustering: We propose a scheme to coarsen graph representation via spectral clustering with the relaxed formulation of the MinCUT problem, as presented in the paper from Bianchi et. al. We observe the structural patterns uncovered by SC reveal which long-range virtual connections should be made.
- New connections learned by a heterogeneous network: We create an intra-cluster connection with a virtual node, and learn the new relationship as a graph indepdenent of the original graph. A heterogeneous convolutional network is trained on these separate relations, further coarsening the representations. On our set of ablation studies, and after hyperparameter tuning, Graph-HSCN out-performs the traditional message-passing architectures by up to 10 percent, achieving metrics similar to those of SAN while reducing the time complexity.
To set up the environment and install all dependencies, run make env
. The logs
and datasets
directories will be created automatically at the project level.
TODO
TODO
TODO
TODO
Feel free to open an issue on this repository or e-mail adunning@ucsd.edu.
The code in this project is heavily adapted and modified from the following repositories: