This library provides various Python modules and scripts to perform semi-supervised learning with heterophily (SSLH). It includes methods to perform label propagation with linearized belief propagation and to estimate class-to-class compatibilities from very sparsely labeled graphs, extending an earlier release of SSLH and prior ideas. Also included are code and experimental traces to reproduce the experiments from our SIGMOD 2020 paper: Factorized Graph Representations for Semi-supervised Learning from Sparse Labels
Overview of SSLH: Given a partially labeled graph and a class-to-class compatibility matrix, linearized belief propagation (LinBP) performs a generalized form of label propagation to label the remaining nodes. Distant compatibility estimation (DCE) performs the same function but does not require the compatibility matrix as input. For quick understanding of the approach, please also see the video presented at SIGMOD 2020:
Dependencies can be installed using requirements.txt
.
experiments_sigmod20/
folder containing scripts and notebooks for recreating figures from the paperdatacache/
folder containing traces from experiments saved as CSVfigs/
folder in which code places figures from experimentsrealData/
place real data sets into this folder before running experiments...
various modules that perform varous experimentsFigures_realdata_sigmod20.ipynb
Notebook that plots all figures for experiments on 8 real data setsFigures_syntheticdata_sigmod20.ipynb
Start here: Notebook that plots all other figures in the paper
sslh/
folder containing modules with main functionsestimation.py
module containing main functions for parameter estimationfileInteraction.py
module containing functions for loading and saving experimental resultsgraphGenerator.py
module containing synthetic graph generator with planted graph propertiesinference.py
module containing main propagation methods for linearized belief propagationutils.py
module containing various helper functionsvisualize.py
helper function to plot figures
test_sslh/
folder with unit tests for modules and functions insslh/
A copy of the 8 real datasets we used in our experiments is available in the form of 16 CSV files totaling 1.2GB on Google Drive.
To run the experiments, place them into the folder experiments_sigmod20/realData/
, then run the respective methods in experiments_sigmod20/
.
- For examples on the usage of the various methods, please see the
test_sslh
directory in the source tree. - /reproducibility.md contains a detailed description to reproduce the experimental results reported in the paper (as submitted to the ACM SIGMOD 2021 Reproducibility).
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
If you use this code in your work, please cite:
@inproceedings{DBLP:conf/sigmod/PLG20,
author = {Krishna Kumar P. and Paul Langton and Wolfgang Gatterbauer},
title = {Factorized Graph Representations for Semi-Supervised Learning from Sparse Data},
booktitle = {International Conference on Management of Data (SIGMOD)},
pages = {1383--1398},
publisher = {{ACM}},
year = {2020},
url = {https://doi.org/10.1145/3318464.3380577},
}
For any clarification, comments, or suggestions on the main methods in sslh/
please create an issue or contact Wolfgang.
For any questions on the scripts in experiments_sigmod/
and reproducability of the experiments, please contact Paul and Krishna.