GitHub - northeastern-datalab/factorized-graphs: Semi-supervised learning and inference for sparsely labelled graphs

Factorized Graph Representations for Semi-supervised Learning from Sparse Labels

This library provides various Python modules and scripts to perform semi-supervised learning with heterophily (SSLH). It includes methods to perform label propagation with linearized belief propagation and to estimate class-to-class compatibilities from very sparsely labeled graphs, extending an earlier release of SSLH and prior ideas. Also included are code and experimental traces to reproduce the experiments from our SIGMOD 2020 paper: Factorized Graph Representations for Semi-supervised Learning from Sparse Labels

Overview of SSLH: Given a partially labeled graph and a class-to-class compatibility matrix, linearized belief propagation (LinBP) performs a generalized form of label propagation to label the remaining nodes. Distant compatibility estimation (DCE) performs the same function but does not require the compatibility matrix as input. For quick understanding of the approach, please also see the video presented at SIGMOD 2020:

Dependencies

Dependencies can be installed using requirements.txt.

Project structure

experiments_sigmod20/ folder containing scripts and notebooks for recreating figures from the paper
- datacache/ folder containing traces from experiments saved as CSV
- figs/ folder in which code places figures from experiments
- realData/ place real data sets into this folder before running experiments
- ... various modules that perform varous experiments
- Figures_realdata_sigmod20.ipynb Notebook that plots all figures for experiments on 8 real data sets
- Figures_syntheticdata_sigmod20.ipynb Start here: Notebook that plots all other figures in the paper
sslh/ folder containing modules with main functions
- estimation.py module containing main functions for parameter estimation
- fileInteraction.py module containing functions for loading and saving experimental results
- graphGenerator.py module containing synthetic graph generator with planted graph properties
- inference.py module containing main propagation methods for linearized belief propagation
- utils.py module containing various helper functions
- visualize.py helper function to plot figures
test_sslh/ folder with unit tests for modules and functions in sslh/

Real data sets

A copy of the 8 real datasets we used in our experiments is available in the form of 16 CSV files totaling 1.2GB on Google Drive. To run the experiments, place them into the folder experiments_sigmod20/realData/, then run the respective methods in experiments_sigmod20/.

Usage

For examples on the usage of the various methods, please see the test_sslh directory in the source tree.
/reproducibility.md contains a detailed description to reproduce the experimental results reported in the paper (as submitted to the ACM SIGMOD 2021 Reproducibility).

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Citation

If you use this code in your work, please cite:

@inproceedings{DBLP:conf/sigmod/PLG20,
  author    = {Krishna Kumar P. and Paul Langton and Wolfgang Gatterbauer},
  title     = {Factorized Graph Representations for Semi-Supervised Learning from Sparse Data},
  booktitle = {International Conference on Management of Data (SIGMOD)},
  pages     = {1383--1398},
  publisher = {{ACM}},
  year      = {2020},
  url       = {https://doi.org/10.1145/3318464.3380577},
}

Contributors

For any clarification, comments, or suggestions on the main methods in sslh/ please create an issue or contact Wolfgang. For any questions on the scripts in experiments_sigmod/ and reproducability of the experiments, please contact Paul and Krishna.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Factorized Graph Representations for Semi-supervised Learning from Sparse Labels

Dependencies

Project structure

Real data sets

Usage

License

Citation

Contributors

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
experiments_sigmod20		experiments_sigmod20
sslh		sslh
test_sslh		test_sslh
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
reproducibility.md		reproducibility.md
requirements.txt		requirements.txt

License

northeastern-datalab/factorized-graphs

Folders and files

Latest commit

History

Repository files navigation

Factorized Graph Representations for Semi-supervised Learning from Sparse Labels

Dependencies

Project structure

Real data sets

Usage

License

Citation

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages