Arabic OCR

OCR system for Arabic language that converts images of typed text to machine-encoded text.
The system aims to solve a simpler problem of OCR with images that contain only Arabic characters (check the dataset link below to see a sample of the images).

Important Note

The system currently supports only letters (29 letters) ا-ى , لا (no numbers or special symbols).

Setup

Install python then run this command:

pip install -r requirements.txt

Run

Put the images in src/test directory
Go to src directory and run the following command
```
python OCR.py
```
Output folder will be created with:
- text folder which has text files corresponding to the images.
- running_time file which has the time taken to process each image.

Pipeline

Dataset

Link to dataset of images and the corresponding text: here.
We used 1000 images to generate character dataset that we used for training.

Examples

Line Segmentation

Word Segmentation

Character Segmentation

Testing

NOTE: Make sure you have a folder with the truth output with same file names to compare it with the predicted text.

From within src folder run:

python edit.py 'output/text' 'truth'

Performance

Average accuracy: 95%.
Average time per image: 16 seconds.

NOTE

We achieved these results when we used only the flatten image as feature.

References

An Efficient, Font Independent Word and Character Segmentation Algorithm for Printed Arabic Text.
A Robust Line Segmentation Algorithm for Arabic Printed Text with Diacritics.
Arabic Character Segmentation Using Projection Based Approach with Profile's Amplitude Filter .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Arabic OCR

Important Note

Setup

Run

Pipeline

Dataset

Examples

Line Segmentation

Word Segmentation

Character Segmentation

Testing

Performance

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Arabic OCR

Important Note

Setup

Run

Pipeline

Dataset

Examples

Line Segmentation

Word Segmentation

Character Segmentation

Testing

Performance

References