This repository contains a series of machine learning experiments for link prediction within social networks.
We first implement and apply a variety of link prediction methods to each of the ego networks contained within the SNAP Facebook dataset and SNAP Twitter dataset, as well as to various random networks generated using networkx, and then calculate and compare the ROC AUC, Average Precision, and runtime of each method.
If you use this repository in your work, please cite the corresponding DOI:
- (Variational) Graph Auto-Encoders: An end-to-end trainable convolutional neural network model for unsupervised learning on graphs
- Node2Vec/DeepWalk: A skip-gram based approach to learning node embeddings from random walks within a given graph
- Spectral Clustering: Using spectral embeddings to create node representations from an adjacency matrix
- Baseline Indexes: Adamic-Adar, Jaccard Coefficient, Preferential Attachment
- python 2.7
- TensorFlow (1.0 or later)
- networkx
- gensim
- scikit-learn
- scipy
- jupyter notebook
- pandas
python setup.py install
facebook/
: Original Facebook ego networks dataset, with added .allfeats files (with both ego and alter features)fb-processed/
: Pickle dumps of (adjacency_matrix, feature_matrix) tuples for each ego network, and for combined networktwitter/
: Twitter ego networks dataset (combined), with adjacency matrix pickle dumpvisualizations/
: Visualizations of each network generated by networkx and matplotlibnetwork-statistics/
: .txt and .pkl files of pre-calculated network characteristics (with info on connectivity, network size, etc.) for each networktrain-test-splits/
: Pickle dumps of pre-processed train-test splits for Facebook ego networks, with varying degrees of visibility (i.e. how many edges are hidden). Includes:adj_train
,train_edges
,train_edges_false
,val_edges
,val_edges_false
,test_edges
,test_edges_false
process-ego-networks.py
: Script used to process raw Facebook data and generate pickle dumpsprocess-combined-network.py
: Script used to combine Facebook ego networks and generate complete network pickle dumpprocess-twitter-network.py
: Script used to process raw Twitter data and generate pickle dumpsfb-train-test-splits.py
: Script used to generate and store train-test splits for each Facebook ego networktwitter-train-test-splits.py
: Script used to generate and store train-test-splits for Twitter combined network
link-prediction-baselines.ipynb
: Adamic-Adar, Jaccard Coefficient, Preferential Attachmentspectral-clustering.ipynb
: Using spectral embeddings for link predictionnode2vec.ipynb
: Skip-gram based representation learning for node/edge embeddingsgraph-vae.ipynb
: (Variational) Graph Autoencoder, learns node embeddings to recreate adjacency matrix
link_prediction_scores.py
: Utility functions for running various link prediction tests
network-visualizations.ipynb
: Generate .pdf visualizations for each network, in addition to calculating and storing a variety of network metrics (e.g. transtivity, avg. clustering coefficient, etc.)
nx-graph-experiments.ipynb
: Run all link prediction tests on various types of random networks (Erdos-Renyi, Barabasi-Albert, etc.)fb-graph-experiments.ipynb
: Run all link prediction tests on each Facebook ego networkrun-all-experiments.py
: Run all link prediction experiments (on both Facebook networks and random networkx networks), save results as pickle dumps inresults
run-twitter-experiments.py
: Run all link prediction experiments on combined (directed) Twitter network. Warning: may take a while to run.
results/
: Pickle dumps of experiment results, with results (ROC AUC, ROC Curve, Avg. Precision, Runtime) stored for each link prediction method in Python dictionary forminvestigate-results.ipynb
: Generate and save bar plots/ROC curve plots for each method and graph type, save network characteristics to.txt
filesresult-plots-by-graph/
: Bar plots for the results of each experiment (ROC AUC, AP, Minimum Runtime) organized by network, in .pdf formresult-plots-by-algorithm/
: Bar plots for the AUROC results of each experiment organized by link prediction algorithm, in .pdf form