Wb-MSF: Multi-Source Information Diffusion Dataset on Weibo
Each cascade is saved in a csv file with following properties:
Property | Description |
---|---|
origin_id | precursor post id (be reposted) |
origin_uid | user id of precursor post |
id | post id |
uid | user id |
created_at | (re)post time |
- The properties of italics is the private user information, which are hashed;
- Property
created_at
is format asYYYY-mm-DD HH:MM:SS
; - If
origin_id
isNULL
andorigin_uid == uid
, means this post is original and not reposted.
Each dataset has a global followership network, format as edge list: each line of {dataname}_followership.txt
is a pair of comma separated user ids
To protect user privacy, we anonymize the data: i.e., we hash the fields related to user-id
and post-id
.
@inproceedings{WBMSF_2022,
author = {Wu, Zhen and Zhou, Jingya and Wang, Jie and Sun, Xigang},
title = {Wb-MSF: A Large-scale Multi-source Information Diffusion Dataset for Social Information Diffusion Prediction},
booktitle = {2022 Tenth International Conference on Advanced Cloud and Big Data (CBD)},
year = {2022},
}
@inproceedings{HERIGCN_2022,
author = {Wu, Zhen and Zhou, Jingya and Liu, Ling and Li, Chaozhuo and Gu, Fei},
title = {Deep Popularity Prediction in Multi-Source Cascade with HERI-GCN},
booktitle = {2022 IEEE 38th International Conference on Data Engineering (ICDE)},
year = {2022}
}