Enhancing Tesseract Arabic Text Recognition

This repository contains the fine-tuned Long Short-Term Memory (LSTM) model for Arabic text recognition in Tesseract OCR.

About the Project

This project is part of a research study titled "Enhancing Arabic Text Recognition: Fine-tuning of the LSTM Model in Tesseract OCR". The LSTM model in Tesseract OCR was fine-tuned using a diverse training dataset of 1038 unique Arabic fonts. The performance of this fine-tuned model was evaluated across various Arabic text types and compared with the original Tesseract OCR model.

Trained Model

The ara.traineddata file in this repository is the result of our fine-tuning process. It is designed to significantly enhance Arabic text recognition in Tesseract OCR.

Usage

To use this fine-tuned model, download the ara.traineddata file and place it in your Tesseract 'tessdata' directory, replacing the existing Arabic trained data file. Then, simply run Tesseract as you normally would.

Results

The fine-tuned model demonstrated significant performance enhancements in recognizing most Arabic text types, specifically Arabic-indic digits, European digits, normal Arabic text, and Arabic text without diacritics. For more details about the research and results, refer to the research paper.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
ara.traineddata		ara.traineddata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing Tesseract Arabic Text Recognition

About the Project

Trained Model

Usage

Results

About

Releases

Packages

ClearCypher/enhancing-tesseract-arabic-text-recognition

Folders and files

Latest commit

History

Repository files navigation

Enhancing Tesseract Arabic Text Recognition

About the Project

Trained Model

Usage

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages