Arabic OCR

OCR system for Arabic language that converts images of typed text to machine-encoded text.
The system aims to solve a simpler problem of OCR with images that contain only Arabic characters (check the dataset link below to see a sample of the images).

Important Note

The system currently supports only letters (29 letters) ا-ى , لا (no numbers or special symbols).

Setup

Install python then run this command:

pip install -r requirements.txt

Run

Put the images in src/test directory
Go to src directory and run the following command
```
python OCR.py
```
Output folder will be created with:
- text folder which has text files corresponding to the images.
- running_time file which has the time taken to process each image.

Pipeline

Dataset

Link to dataset of images and the corresponding text: here.
We used 1000 images to generate character dataset that we used for training.

Examples

Line Segmentation

Word Segmentation

Character Segmentation

Testing

NOTE: Make sure you have a folder with the truth output with same file names to compare it with the predicted text.

From within src folder run:

python edit.py 'output/text' 'truth'

Performance

Average accuracy: 95%.
Average time per image: 16 seconds.

NOTE

We achieved these results when we used only the flatten image as feature.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
Dataset		Dataset
Figures		Figures
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic OCR

Important Note

Setup

Run

Pipeline

Dataset

Examples

Line Segmentation

Word Segmentation

Character Segmentation

Testing

Performance

References

About

Releases

Packages

Contributors 4

Languages

License

HusseinYoussef/Arabic-OCR

Folders and files

Latest commit

History

Repository files navigation

Arabic OCR

Important Note

Setup

Run

Pipeline

Dataset

Examples

Line Segmentation

Word Segmentation

Character Segmentation

Testing

Performance

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages