LargeEA

Source code for LargeEA: Aligning Entities for Large-scale Knowledge Graphs

If there is any problem with reproduction, please create an issue in The GitHub repo page

Requirements

pytorch>=1.7.0

tensorflow>=2.4.1 (required for RREA)

faiss

transformers

datasketch[redis]

...

A full list of required packages is located in src/requirements.txt

Datasets

The IDS benchmark is provided by OpenEA

Our newly proposed benchmark DBP1M is available at Google Drive

First download and unzip dataset files, place them to the project root folder:

unzip OpenEA_dataset_v1.1.zip
unzip mkdata.zip

The dataset (small for IDS15K, medium for IDS100K, large for DBP1M) and lang (fr or de) parameter controls which benchmark to use. For example, in the src folder, setting dataset to small and lang to fr will run on OpenEA EN_FR_15K_V1 dataset.

Run

Take DBP1M(EN-FR) as an example:

Make sure the folder for results is created:

cd src/
mkdir tmp4

Name Channel

First get the BERT embeddings of all entities

python main.py --phase 1 --dataset large --lang fr

Then calculate TopK sims based on BERT:

python main.py --phase 2 --dataset large --lang fr

Finally the string-based similarity(this requires a redis server listening localhost:6379):

python main.py --phase 3 --dataset large --lang fr

Structure Channel

The structure channel uses result of name channel to get name-based seeds. Make sure run name channel first.

To run RREA model:

python main.py --phase 0 --dataset large --lang fr --model rrea --epoch 100

Channel Fusion and Eval

python main.py --phase 4  --dataset large --lang fr

Citation

Reference to cite when you use LargeEA in a research paper:

@article{largeEA,
  author    = {Congcong Ge and
               Xiaoze Liu and
               Lu Chen and
               Baihua Zheng and
               Yunjun Gao},
  title     = {LargeEA: Aligning Entities for Large-scale Knowledge Graphs},
  journal   = {{PVLDB}},
  volume    = {15},
  number    = {2},
  pages     = {237--245},
  year      = {2022}
}

Acknowledgements

We use the codes of MRAEA, RREA, GCN-Align, DGMC, AttrGNN, OpenEA, EAKit, SimAlign.

We also provide the modified version of OpenEA in order to run experiments on RTX3090 GPU:OpenEA-TF2

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LargeEA

Requirements

Datasets

Run

Name Channel

Structure Channel

Channel Fusion and Eval

Citation

Acknowledgements

About

Releases

Packages

Languages

ZJU-DAILY/LargeEA

Folders and files

Latest commit

History

Repository files navigation

LargeEA

Requirements

Datasets

Run

Name Channel

Structure Channel

Channel Fusion and Eval

Citation

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages