A Dataset for Detecting Humor in Arabic Text

Humor detection is a complex and ambiguous task in natural language processing. This has made automatic humor detection challenging, particularly for languages with limited resources such as Arabic. In this paper, we attempt to solve this task by collecting and annotating Arabic humorous tweets (dialects) and Modern Standard Arabic (MSA) text then performing automatic humor detection on the collected data. We experimented on the collected dataset by fine-tuning seven Arabic Pre-Trained language models which are: AraBERTv02, Arabertv02-twitter, QARIB, MarBERT, MARBERTv2, CAMeLBERT-DA, and CAMeLBERT-MIX to establish a baseline classification system. We concluded that CAMeLBERT-DA was the best-performing model and it achieved an F1-score and accuracy of 72.11%.

File Specifications

humor.tsv : File that contains tweets with two labels, "humor" and "non-humor"

Citation

If you use this dataset please cite as:

@inproceedings{[Al-Khalifa et al., 2022],
  title={A Dataset for Detecting Humor in Arabic Text},
  author={Hend Al-Khalifa, Fetoun AlZahrani, Hala Qawara, Reema AlRowais, Sawsan Alowa  and Luluh AlDhubayi},
  booktitle={The 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)},
  year={2022}
}

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
humor.tsv		humor.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Dataset for Detecting Humor in Arabic Text

File Specifications

Citation

License

About

Releases

Packages

iwan-rg/Arabic-Humor

Folders and files

Latest commit

History

Repository files navigation

A Dataset for Detecting Humor in Arabic Text

File Specifications

Citation

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages