Egyptian Arabic Wikipedia Scanner: Automatic Detection of Template-translated Articles in the Egyptian Wikipedia
- Application Description
- Application Features
- Application Datasets
- Application Local Run
- Application Citations
We release the source code of this web-based detection system, Egyptian Arabic Wikipedia Scanner, to automatically identify the template-translated articles on the Egyptian Arabic Wikipedia edition. We publically host this application on Streamlit Community Cloud, here: https://egyptian-wikipedia-scanner.streamlit.app, and for better accessibility, we also host it on Hugging Face Spaces, here: https://huggingface.co/spaces/SaiedAlshahrani/Egyptian-Wikipedia-Scanner.
This web-based application is introduced in a research paper titled "Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition", which is accepted at LREC-COLING 2024: The 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT6), and is currently released under an MIT license.
This web-based application, Egyptian Wikipedia Scanner, allows users to search for an article directly or select a suggested article retrieved using fuzzy search from the Egyptian Arabic Wikipedia edition. The application automatically fetches the article’s metadata (using the Wikimedia XTools API), displays the fetched metadata in a table, and automatically classifies the article as human-generated
or template-translated
. The application also dynamically displays the full summary of the article and provides the URL to the article to read the full text, as shown in the screenshots below.
We also release the heuristically filtered, manually processed, and automatically classified Egyptian Arabic Wikipedia articles dataset (balanced subset) that has been used to train and test this web-based application, Egyptian Arabic Wikipedia Scanner, on Hugging Face Hub, here: https://huggingface.co/datasets/SaiedAlshahrani/Detect-Egyptian-Wikipedia-Articles, and is also currently released under an MIT license.
This web-based application, Egyptian Wikipedia Scanner, is publicly hosted online on Streamlit Community Cloud and Hugging Face Spaces, yet if you desire to run the application locally on your machine, follow these steps.
1- Clone the application's GitHub repository to your machine. Use this command in your terminal:
git clone https://github.com/SaiedAlshahrani/Egyptian-Wikipedia-Scanner.git
cd Egyptian-Wikipedia-Scanner
2- Download the required Python packages. Use this command in your terminal:
pip install -r requirements.txt
3- Run Streamlit local server. Use this command in your terminal:
streamlit run scanner.py
Saied Alshahrani, Hesham Haroon, Ali Elfilali, Mariama Njie, and Jeanna Matthews. 2024. Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition. arXiv preprint arXiv:2404.00565.
Saied Alshahrani, Hesham Haroon, Ali Elfilali, Mariama Njie, and Jeanna Matthews. 2024. Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition. In Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024, pages 31–45, Torino, Italia. ELRA and ICCL.*