Altegrad 2021-2022 - Citation Prediction Challenge

Authors: Apavou Clément & Belkada Younes & Zucker Arthur

The kaggle challenge is the following : https://www.kaggle.com/c/altegrad-2021/leaderboard

🔎 Introduction

In this challenge, we are given a large scientific citation graph, with each node corresponding to a certain article. The dataset consists of 138 499 vertices i.e articles, with their associated abstract and list of authors. The goal is to be able to predict whether two nodes are citing each other, given all this information. In the next sections, we will try to elaborate on the various intuitions behind our approaches, and present the obtained results as well as some possible interpretations for each observations. The provided code corresponds to the code that we have used for the best model (i.e the right commit ).

🔨 Getting started

pip3 install requirements.txt

Then,

sh download_data.sh

python3 main.py

📍 Tips

The best model can be used using the best-model branch, as it does not use this implementation of the code. This branch is the final code as it allows customization of the various embeddings and corresponds to the latest version of the code.

🔎 Results

Model	loss validation	loss test (private leaderboard)	Run
Best model	0.07775	0.07939

All experiments are available on wandb:

♦️ Best MLP architecture

📎 Presentation of our work

Report & Slides

🔧 Some tools used

Some citations

@misc{cohan2020specter,
      title={SPECTER: Document-level Representation Learning using Citation-informed Transformers}, 
      author={Arman Cohan and Sergey Feldman and Iz Beltagy and Doug Downey and Daniel S. Weld},
      year={2020},
      eprint={2004.07180},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.vscode		.vscode
Agents		Agents
Baselines		Baselines
Dataset		Dataset
Models		Models
assets		assets
config		config
submissions		submissions
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
correct_preds.py		correct_preds.py
download_data.sh		download_data.sh
main.py		main.py
requirements.txt		requirements.txt
sweep.yml		sweep.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Altegrad 2021-2022 - Citation Prediction Challenge

🔎 Introduction

🔨 Getting started

📍 Tips

🔎 Results

♦️ Best MLP architecture

📎 Presentation of our work

🔧 Some tools used

Some citations

About

Releases

Packages

Contributors 3

Languages

License

younesbelkada/altegrad_challenge

Folders and files

Latest commit

History

Repository files navigation

Altegrad 2021-2022 - Citation Prediction Challenge

🔎 Introduction

🔨 Getting started

📍 Tips

🔎 Results

♦️ Best MLP architecture

📎 Presentation of our work

🔧 Some tools used

Some citations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages