Code for Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural Networks.
Hao Peng, Ruitong Zhang, Yingtong Dou, Renyu Yang, Jingyi Zhang, Philip S. Yu.
The repository is organized as follows:
data/
: dataset folderYelpChi.zip
: Data of the dataset Yelp;Amazon.zip
: Data of the dataset Amazon;Mimic.zip
: Data of the dataset Mimic;
log/
: log foldermodel/
: model foldergraphsage.py
: model code for vanilla GraphSAGE model;layers.py
: RioGNN layers implementations;model.py
: RioGNN model implementations;
RL/
: RL folderactor_critic.py
: RL algorithm, Actor-Critic;rl_model.py
: RioGNN RL Forest implementations;
utils/
: functions folderdata_process.py
: transfer sparse matrix to adjacency lists;utils.py
: utility functions for data i/o and model evaluation;
train.py
: training and testing all models
We build different multi-relational graphs for experiments in two task scenarios and three datasets:
Dataset | Task | Nodes | Relation |
---|---|---|---|
Yelp | Fraud Detection | 45,954 | rur, rtr, rsr, homo |
Amazon | Fraud Detection | 11,944 | upu, usu, uvu, homo |
MIMIC-III | Diabetes Diagnosis | 28,522 | vav, vdv, vpv, vmv, homo |
To run RioGNN on your datasets, you need to prepare the following data:
- Multiple-single relation graphs with the same nodes where each graph is stored in
scipy.sparse
matrix format, you can usesparse_to_adjlist()
inutils.py
to transfer the sparse matrix into adjacency lists used by RioGNN; - A numpy array with node labels. Currently, RioGNN only supports binary classification;
- A node feature matrix stored in
scipy.sparse
matrix format.
You can download the project and and run the program as follows:
1. The dataset folder \data
only contains two Fraud datasets, please use the following links to download the Mimic dataset (~700MB);
Google Drive: https://drive.google.com/file/d/1WvYtNSHcvSQr8fzI9ykpgjMBSPwCTW0h/view?usp=sharing
Baidu Cloud: https://pan.baidu.com/s/1iyaOqnkyYGqo1Mdwt4QYnQ Password: vbwn
* Note that all datasets need to be unzipped in the folder \data
first;
pip3 install -r requirements.txt
python data_process.py
python train.py
* To run the code, you need to have at least Python 3.6 or later versions.
- Our model supports both CPU and GPU mode, you can change it through parameter
--use_cuda
and--device
: - Set the
--data
asyelp
,amazon
ormimic
to change different dataset. - Parameter
--num_epochs
is used to set the maximum number of iterative epochs. Note that the model will stop early when reinforcement learning has explored all depths. - The default value of parameter
--ALAPHA
is10
, which means that the accuracy of different depths of reinforcement learning tree will be progressive with 0.1, 0.01, 0.001, etc. If you want to conduct more width and depth experiments, please adjust here.
* For other dataset and parameter settings, please refer to the arg parser in train.py
.
Our preliminary work, CAmouflage-REsistant Graph Neural Network (CARE-GNN), is a GNN-based fraud detector based on a multi-relation graph equipped with three modules that enhance its performance against camouflaged fraudsters.
If you use our code, please cite the paper below:
@article{peng2021reinforced,
title={Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural Networks},
author={Peng, Hao and Zhang, Ruitong and Dou, Yingtong and Yang, Renyu and Zhang, Jingyi and Yu, Philip S.},
journal={ACM Transactions on Information Systems (TOIS)},
year={2021}
}