Skip to content

The codebase for the ACL 2024 paper "Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialectal Arabic Datasets"

Notifications You must be signed in to change notification settings

AMR-KELEG/ALDi-and-IAA

Repository files navigation

ALDi-and-IAA

The codebase accompanying the Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialectal Arabic Datasets paper, accepted to ACL 2024.

Environment and Dependencies

conda create -n "ALDI_IAA" python=3.10
pip install -r requirements.txt

camel_data -i defaults

Datasets

Dataset Link
1 MPOLD GitHub
2 YouTube Cyberbullying OneDrive
3 DCD Personal Site
4 ArSAS Personal Site
5 ArSarcasm-v1 Provided by the authors
6 iSarcasm GitHub
7 DART Dropbox
8 Mawqif Provided by the authors
9 ASAD Provided by the authors

Generating the ALDi-IAA Plots

conda activate ALDI_IAA

# 1) MANUALLY Download the dataset files to `data/raw_data/`

# 2) Augment the dataset files with ALDi scores, and dialect labels
python prepare_datasets.py

# 3) Generate the Agreement plots
python compute_agreement_percentages.py

About

The codebase for the ACL 2024 paper "Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialectal Arabic Datasets"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published