OVERVIEW
Programm SSL implements and runs tests for different semi-supervised learing methods on multiclass or multilabel graphs with available groundtruth labels.
Two modes available:
- test: Takes as input a graph and labels over all nodes. Randomly sumples a number of nodes (
num_seeds
) and predicts the labels of teh remaining ones. Experiments are repeated for a predefined number of times (num_iters
) and the mean Micro F1 and Macro F1 scores are reported. - predict: This is the operational mode. A graph is given and a file with a subset of nodes and its labels. The selected method is implemented and the predicted labels over all the nodes of the graph in a predefined (
outfile
) output file.
Methods included:
- PPR: Personalized PageRank
- TunedRwR: Tuned random walk with restarts ( see here )
- AdaDIF: Adaptive Diffusions ( see here )
INPUT FILES FORMAT
SSL loads the graph in adjacency list format from a .txt
file that contains edges as tab separated pairs of node indexes in the format: node1_index \tab node2_index
. Node indexes should be in range [1 , 2^64 ]
.
For multiclass graphs, the labels are loaded from a .txt
file where each line is of the format: node_index \tab label
. Labels have to be integers in [-127,127]
.
For multilabel graphs, labels are loaded from a .txt
file in compressed one-hot-matrix form (see graphs/HomoSapiens/class.txt
for example).
when in test mode, all nodes must be labeled (present in the label file).
When in predict(ion) mode, any subset of nodes can be labeled.
OUTPUT FILES FORMAT
- Multiclass: Similar to input, each line is
node_index \tab predicted_label
- Multilabel: The output for multilabel graphs is a ranking for every node. Each line follows the format
node_index: \tab pred_1 pred_2 ... pred_c
, wherepred_i
is the i-th most probable label for this node.
COMPILATION
Dependencies: blas
and pthread
must be installed
Command line: make clean
and then make
EXECUTION
Command line: ./SSL [OPTIONS]
OPTIONS
Command line optional arguments with values:
ARGUMENT | VALUES | DEFAULT | DESCRIPTION |
---|---|---|---|
--mode |
test predict |
test |
Operational mode (see Overview) |
--method |
Tuned_RwR AdaDIF PPR |
AdaDIF |
Selection of prediction method (see Overview) |
--graph_file |
(adjacency list).txt |
graphs/BlogCatalog/adj.txt |
See Input Files Format |
--label_file |
(label list or one-hot).txt |
graphs/BlogCatalog/class.txt |
See Input Files Format |
--outfile |
(predicted labels).txt |
out/label_predictions.txt |
File where predictions are stored when in --mode = __predict__ (see Output Files Format) |
--num_seeds |
[1, 2^16] |
1030 |
Number of nodes that are labeled ( only works when --mode = __test__ ) |
--walk_length |
[1, 2^16] |
10 |
Length of AdaDIF (and/or PPR) random walk. |
--lambda_trwr |
>=0.0 |
1.0 |
Regularization parameter for Tuned RwR method |
--lambda_addf |
>=0.0 |
5.0 |
Smoothness over the graph regularization parameter for AdaDIF method |
--num_iters |
[1, 2^16] |
1 |
Number of experiments performed ( only works when --mode = __test__ ) |
Default values can be changed by editing defs.h
Command line optional arguments without values:
ARGUMENT | RESULT |
---|---|
--unconstrained |
switches AdaDIF to unconstrained mode |
--single_thread |
forces single thread execution |
--multiclass |
specifies multiclass input / output (default is multilabel) |