Arabic Dialect Classification

Many countries speak Arabic; however, each country has its own dialect. The aim of this project is to build a model that predicts the dialect given the text.

Overview

In this project, we have explored various machine learning models such as Support Vector Machine (SVM), XGBoost, and Multinomial Naive Bayes (MultinomialNB). After experimentation, we found that the MultinomialNB model achieved the highest accuracy of 79%.

Additionally, we utilized ARABERT, a BERT-based model from Hugging Face, to further improve the accuracy of our predictions. With ARABERT, we achieved an accuracy of 82%.

Project Structure

data/: Contains database and cleaned data used for training and evaluation, as well as the data fetching script (fetch_data.py) for easy access to the data.
Models/: Contains saved model parameters.
Notebooks/: Jupyter notebooks used for data exploration, model training, and evaluation.
Preprocessing/: Jupyter notebooks for the data cleaning process.
Web App/: Contains the web app script for deployment.

WebApp

The Web application for the AraBert Based Model using Streamlit: WebApp Video

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Arabic Dialect Classification

Overview

Project Structure

WebApp

Files

README.md

Latest commit

History

README.md

File metadata and controls

Arabic Dialect Classification

Overview

Project Structure

WebApp