Skip to content
/ PIHAM Public

Probabilistic generative model to perform inference in attributed multilayer networks, where both edges and attributes can have arbitrary data types.

License

Notifications You must be signed in to change notification settings

mcontisc/PIHAM

Repository files navigation

PIHAM

Probabilistic Inference in Heterogeneous and Attributed Multilayer networks

License: MIT Made with Python ARXIV: 2301.11226

This repository contains the implementation of the PIHAM model presented in

   [1] Flexible inference in heterogeneous and attributed multilayer networks
        Contisciani M., Hobbhahn M., Power E.A., Hennig P., and De Bacco C. (2024)
        [ ArXiv ]

If you make use of this code please cite our work in the form of the reference [1] above.

What's included

  • src: Contains the Python implementation of the PIHAM algorithm, the code to generate synthetic data and additional utilities
  • data/input: Contains a synthetic dataset generated using the PIHAM approach
  • data/output: Contains some results

Requirements

In order to be able to run the code, you need to install the packages contained in requirements.txt. We suggest to create a conda environment with conda create --name PIHAM python=3.8 --no-default-packages, activate it with conda activate PIHAM, and install all the dependencies by running (inside the PIHAM directory):

pip install -r requirements.txt

Perform inference

To perform the inference in a given heterogeneous and attributed multilayer network, run:

python main_inference.py

The script takes in input the name of the dataset, the path of the folder where it is stored, and the number of communities K. It then executes the PIHAM algorithm from the file src/model.py using the configuration provided in the src/setting_inference.yaml file.

See the demo jupyter notebook for an example on how to analyse the output results.

Input format

The data should be stored in a .pt file, , which includes:

  • A: An adjacency tensor of dimension L x N x N containing the interactions of every layer
  • X_categorical: A design matrix with the categorical attribute
  • X_poisson: A design matrix with the Poisson attributes
  • X_gaussian: A design matrix with the Gaussian attributes

Here, L is the number of layers and N is the number of nodes.

The code example in this directory is suitable to analyze a network with L = 3 layers (one with binary interactions, the second with nonnegative discrete weights, and the third with real values) and three covariates (one categorical, one with nonnegative discrete values, and the last with real values). However, the model can be easily adapted to accommodate datasets with other data types.

Output

The algorithm outputs a compressed file inside the data/output folder. To load the inferred results and display the out-going membership matrix, run:

import numpy as np 
theta = np.load("theta_<file_label>.npz")
print(theta["U"])

The variable theta includes the following parameters inferred by PIHAM:

  • U: The out-going membership matrix of dimension N x K
  • V: The in-coming membership matrix of dimension N x K
  • W: The affinity tensor of dimension L x K x K
  • Hcategorical: The community-covariate matrix related to the categorical attribute of dimension K x Z_categorical
  • Hpoisson: The community-covariate matrix related to the Poisson attributes of dimension K x P_poisson
  • Hgaussian: The community-covariate matrix related to the Gaussian attribute of dimension K x P_gaussian
  • Cov: The covariance matrix
  • Cov_diag: The diagonal matrix of the variances

Here, K is the number of communities, Z_categorical is the number of categories for the categorical attribute, P_poisson is the number of Poisson attributes, and P_gaussian is the number of Gaussian attributes.

Run a cross-validation routine

If you are interested in assessing the prediction performance of PIHAM in a dataset for a given K, run:

python main_cv.py

The script takes in input the following parameters:

  • in_folder: Path of the input folder
  • data_file: Name of the dataset to analyse
  • K: Number of communities
  • NFold: Number of folds for the cross-validation routine
  • cv_type: Type of cross-validation routine
  • out_results: Flag to save the prediction performance
  • --out_mask: Flag to save the masks used during the cross-validation routine to hide entries of A and X
  • --out_inference: Flag to save the inferred parameters during the cross-validation routine

For each fold, the script runs the PIHAM algorithm on the training set to learn its parameters, and evaluates its performance on the test set. This process is repeated NFold times, each time with a different fold as the test set. Various performance metrics are used depending on the type of information being evaluated. The results are saved in a .csv file in the data/output/cv folder.

Generate synthetic data

If you want to generate synthetic data using the PIHAM approach, run:
python main_generation.py

The script takes in input the number of independent samples to generate, a random seed, the number of communities K, and the number of nodes N. The code example generates a heterogeneous and attributed network with L = 3 layers (one with binary interactions, the second with nonnegative discrete weights, and the third with real values) and three covariates (one categorical, one with nonnegative discrete values, and the last with real values), using the default parameters specified in the file src/synthetic.py. However, the script can be easily adapted to generate datasets with other data types and parameters.

About

Probabilistic generative model to perform inference in attributed multilayer networks, where both edges and attributes can have arbitrary data types.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published