This repository contains the implementation of the PIHAM model presented in
[1] Flexible inference in heterogeneous and attributed multilayer networks
Contisciani M., Hobbhahn M., Power E.A., Hennig P., and De Bacco C. (2024)
[
ArXiv
]
If you make use of this code please cite our work in the form of the reference [1] above.
src
: Contains the Python implementation of the PIHAM algorithm, the code to generate synthetic data and additional utilitiesdata/input
: Contains a synthetic dataset generated using the PIHAM approachdata/output
: Contains some results
In order to be able to run the code, you need to install the packages contained in requirements.txt
. We suggest to create a conda environment with
conda create --name PIHAM python=3.8 --no-default-packages
, activate it with conda activate PIHAM
, and install all the dependencies by running (inside the PIHAM
directory):
pip install -r requirements.txt
To perform the inference in a given heterogeneous and attributed multilayer network, run:
python main_inference.py
The script takes in input the name of the dataset, the path of the folder where it is stored, and the number of communities K
.
It then executes the PIHAM algorithm from the file src/model.py
using the configuration provided in the src/setting_inference.yaml
file.
See the demo jupyter notebook for an example on how to analyse the output results.
The data should be stored in a .pt
file, , which includes:
A
: An adjacency tensor of dimension L x N x N containing the interactions of every layerX_categorical
: A design matrix with the categorical attributeX_poisson
: A design matrix with the Poisson attributesX_gaussian
: A design matrix with the Gaussian attributes
Here, L
is the number of layers and N
is the number of nodes.
The code example in this directory is suitable to analyze a network with L = 3
layers (one with binary interactions,
the second with nonnegative discrete weights, and the third with real values) and three covariates (one categorical,
one with nonnegative discrete values, and the last with real values). However, the model can be easily adapted to accommodate datasets with other data types.
The algorithm outputs a compressed file inside the data/output
folder. To load the inferred results and display the out-going membership matrix, run:
import numpy as np
theta = np.load("theta_<file_label>.npz")
print(theta["U"])
The variable theta
includes the following parameters inferred by PIHAM:
U
: The out-going membership matrix of dimension N x KV
: The in-coming membership matrix of dimension N x KW
: The affinity tensor of dimension L x K x KHcategorical
: The community-covariate matrix related to the categorical attribute of dimension K x Z_categoricalHpoisson
: The community-covariate matrix related to the Poisson attributes of dimension K x P_poissonHgaussian
: The community-covariate matrix related to the Gaussian attribute of dimension K x P_gaussianCov
: The covariance matrixCov_diag
: The diagonal matrix of the variances
Here, K
is the number of communities, Z_categorical
is the number of categories for the categorical attribute, P_poisson
is the number of Poisson attributes, and P_gaussian
is the number of Gaussian attributes.
If you are interested in assessing the prediction performance of PIHAM in a dataset for a given K
, run:
python main_cv.py
The script takes in input the following parameters:
in_folder
: Path of the input folderdata_file
: Name of the dataset to analyseK
: Number of communitiesNFold
: Number of folds for the cross-validation routinecv_type
: Type of cross-validation routineout_results
: Flag to save the prediction performance--out_mask
: Flag to save the masks used during the cross-validation routine to hide entries of A and X--out_inference
: Flag to save the inferred parameters during the cross-validation routine
For each fold, the script runs the PIHAM algorithm on the training set to learn its parameters,
and evaluates its performance on the test set. This process is repeated NFold
times,
each time with a different fold as the test set. Various performance metrics are used depending on the type of information being evaluated.
The results are saved in a .csv
file in the data/output/cv
folder.
python main_generation.py
The script takes in input the number of independent samples to generate, a random seed, the number of communities K
,
and the number of nodes N
. The code example generates a heterogeneous and attributed network with L = 3
layers (one with binary interactions,
the second with nonnegative discrete weights, and the third with real values) and three covariates (one categorical,
one with nonnegative discrete values, and the last with real values), using the default parameters specified in the file src/synthetic.py
.
However, the script can be easily adapted to generate datasets with other data types and parameters.