Source code for LargeEA: Aligning Entities for Large-scale Knowledge Graphs
If there is any problem with reproduction, please create an issue in The GitHub repo page
pytorch>=1.7.0
tensorflow>=2.4.1 (required for RREA)
faiss
transformers
datasketch[redis]
...
A full list of required packages is located in src/requirements.txt
The IDS benchmark is provided by OpenEA
Our newly proposed benchmark DBP1M is available at Google Drive
First download and unzip dataset files, place them to the project root folder:
unzip OpenEA_dataset_v1.1.zip
unzip mkdata.zip
The dataset (small for IDS15K, medium for IDS100K, large for DBP1M) and lang (fr or de) parameter controls which benchmark to use.
For example, in the src
folder, setting dataset to small and lang to fr will run on OpenEA EN_FR_15K_V1 dataset.
Take DBP1M(EN-FR) as an example:
Make sure the folder for results is created:
cd src/
mkdir tmp4
First get the BERT embeddings of all entities
python main.py --phase 1 --dataset large --lang fr
Then calculate TopK sims based on BERT:
python main.py --phase 2 --dataset large --lang fr
Finally the string-based similarity(this requires a redis server listening localhost:6379):
python main.py --phase 3 --dataset large --lang fr
The structure channel uses result of name channel to get name-based seeds. Make sure run name channel first.
To run RREA model:
python main.py --phase 0 --dataset large --lang fr --model rrea --epoch 100
python main.py --phase 4 --dataset large --lang fr
Reference to cite when you use LargeEA in a research paper:
@article{largeEA,
author = {Congcong Ge and
Xiaoze Liu and
Lu Chen and
Baihua Zheng and
Yunjun Gao},
title = {LargeEA: Aligning Entities for Large-scale Knowledge Graphs},
journal = {{PVLDB}},
volume = {15},
number = {2},
pages = {237--245},
year = {2022}
}
We use the codes of MRAEA, RREA, GCN-Align, DGMC, AttrGNN, OpenEA, EAKit, SimAlign.
We also provide the modified version of OpenEA in order to run experiments on RTX3090 GPU:OpenEA-TF2