Skip to content

Latest commit

 

History

History
145 lines (72 loc) · 4.67 KB

README.md

File metadata and controls

145 lines (72 loc) · 4.67 KB

The aim of this repository is creating an OCR model (convert image to text) for Old Persian Cuneiform

This repository is part of Electronic Old Persian Library organization and inspired from eBL project.

Three OCR models have been developed in this repository:

  • yolo_cnn_old_persian
  • tesseract_old_persian
  • easyocr_old_persian

Current status of these 3 OCR models:

  • yolo_cnn_old_persian : is not completed yet.
  • tesseract_old_persian is completed.
  • easyocr_old_persian is completed but needs more optimization and real data.

easyocr_old_persian

This model is based on EasyOCR repository for a custum model. If you see any error please check issues.

Trainer notebook:

https://github.com/Melanee-Melanee/Old-Persian-Cuneiform-OCR/blob/master/easyocr_old_persian/trainer_easyocr_old_persian.ipynb

Using saved model:

https://github.com/Melanee-Melanee/Old-Persian-Cuneiform-OCR/blob/master/easyocr_old_persian/model_easyocr.ipynb

To use saved model please create the root of your machine like below structure and replace custum_example.pth, custom_example.py and custom_example.yaml files there. For more comprehension please watch this tutorial on youtube.

/root/

 /EasyOCR/
       /model/
           custum_example.pth
       /user_network/
           custom_example.py
           custom_example.yaml

tesseract_old_persian:

This tesseract pre-trained OCR model converts Old Persian cuneiform to English transcription and is developed by S. Muhammad Hossein Mousavi.

Notebook: https://github.com/Melanee-Melanee/Old-Persian-Cuneiform-OCR/blob/master/tesseract_old_persian/Tesseract_Old_Persian_OCR.ipynb

Please replace peo.traineddata file in this directory: /usr/share/tesseract-ocr/4.00/tessdata

An example:

The last 12 lines of the great Darius's inscription in Persepolis, DPd inscription:

Input:

darius2

Output:

Zittiy ; iaryvuS ; xrSayZiy;

mnc;aurmzia;upstam; rlauv;

hia ; ViZiriS ; rgiriS ; uta;

im am ; i h yaum ; au lm z i a ;

pitTucs;hca;hinaya; hca;

QuSiyala ; hca;iruga;ariy;

imam ;ihyaum;ma; ajMiya; ait;

aim ;yanm;jDiyaMiy;

aitmiy ; iiaTuv

At the next stage, we can translate that Old Persian transcription to modern languages by Chat-GPT:

Prompt: Can you translate this Old Persian (achaemenid) English transcription to modern English in one paragraph?

“This is me, Dariush king; By the grace of Ahura Mazda, I have built this; I founded this empire and made it strong. May Ahuramazda protect me and my kingdom; may it last forever; and it would be safe from lies; that is what I did; That is what I am saying.”

Translate to modern Persian:

این منم داریوش شاهنشاه؛ به لطف اهورامزدا، من این را بنا کردم؛ من این امپراتوری را بنیان نهادم و آن را نیرومند ساختم. باشد که اهورامزدا من و پادشاهی مرا محافظت کند؛ باشد که برای همیشه پایدار بماند؛ و باشد که از دروغ در امان باشد؛ این است آنچه من انجام دادم؛

این است آنچه من می‌گویم.

Article

I wrote an article as a tiny report for what I have done for this project till now.

License

This repository is under CC-BY-NC license and any commercial use is prohibited.

Contact

This repository is still under developing. For contributing contact me by email: melaneepython@gmail.com

Donate (Financial sponsers)

If you would like to support this project financially, you can use this link:

https://github.com/Melanee-Melanee/Old-Persian-Cuneiform-OCR/blob/master/other/Financial.md