Skip to content

Source code for LargeEA: Aligning Entities for Large-scale Knowledge Graphs, VLDB 2022

Notifications You must be signed in to change notification settings

ZJU-DAILY/LargeEA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

LargeEA

Source code for LargeEA: Aligning Entities for Large-scale Knowledge Graphs

If there is any problem with reproduction, please create an issue in The GitHub repo page

Requirements

pytorch>=1.7.0

tensorflow>=2.4.1 (required for RREA)

faiss

transformers

datasketch[redis]

...

A full list of required packages is located in src/requirements.txt

Datasets

The IDS benchmark is provided by OpenEA

Our newly proposed benchmark DBP1M is available at Google Drive

First download and unzip dataset files, place them to the project root folder:

unzip OpenEA_dataset_v1.1.zip
unzip mkdata.zip

The dataset (small for IDS15K, medium for IDS100K, large for DBP1M) and lang (fr or de) parameter controls which benchmark to use. For example, in the src folder, setting dataset to small and lang to fr will run on OpenEA EN_FR_15K_V1 dataset.

Run

Take DBP1M(EN-FR) as an example:

Make sure the folder for results is created:

cd src/
mkdir tmp4

Name Channel

First get the BERT embeddings of all entities

python main.py --phase 1 --dataset large --lang fr 

Then calculate TopK sims based on BERT:

python main.py --phase 2 --dataset large --lang fr 

Finally the string-based similarity(this requires a redis server listening localhost:6379):

python main.py --phase 3 --dataset large --lang fr 

Structure Channel

The structure channel uses result of name channel to get name-based seeds. Make sure run name channel first.

To run RREA model:

python main.py --phase 0 --dataset large --lang fr --model rrea --epoch 100 

Channel Fusion and Eval

python main.py --phase 4  --dataset large --lang fr 

Citation

Reference to cite when you use LargeEA in a research paper:

@article{largeEA,
  author    = {Congcong Ge and
               Xiaoze Liu and
               Lu Chen and
               Baihua Zheng and
               Yunjun Gao},
  title     = {LargeEA: Aligning Entities for Large-scale Knowledge Graphs},
  journal   = {{PVLDB}},
  volume    = {15},
  number    = {2},
  pages     = {237--245},
  year      = {2022}
}

Acknowledgements

We use the codes of MRAEA, RREA, GCN-Align, DGMC, AttrGNN, OpenEA, EAKit, SimAlign.

We also provide the modified version of OpenEA in order to run experiments on RTX3090 GPU:OpenEA-TF2

About

Source code for LargeEA: Aligning Entities for Large-scale Knowledge Graphs, VLDB 2022

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages