fine tuning arabic traineddata to solve extended words issue #362

sifdinNh · 2023-11-28T19:54:21Z

so i want to finetune ara.traineddata in the traineddata_best repo to handle extended words like the this :

to do that i made a list of lines with the same format like this :

.............
الســــــــيد العضـــــو د. عــــلي العتيبــــــي:
الســــــــيد العضـــــو جــــمال الحــــربي:
الســــــــيد العضـــــو د. خالــــد الفيصـــــل:
الســـــــــيد العضـــــو تركـــــي المطيــــري:
..............

i started by genereting ground truth files with .tif images and .box files

then started training with this:

make training MODEL_NAME=ara_new TESSDATA=../tesseract/tessdata START_MODEL=ara MAX_ITERATIONS=10000 LANG_TYPE=RTL

i started with 99%BCER and stoped when i had 24% BCER

when i came to test the traineddata file with evalute it with best traineddata ara.trainedata

i got a poor result

this is the result of best traineddata for arabic:

it's giving me almost 90% accuracy

but when i tested the new trained file this is the result :

it's like doesn't recognize anything and the main the reason i started this is to finetune it to better accuracy

The text was updated successfully, but these errors were encountered:

sifdinNh · 2023-11-29T20:50:15Z

@zdenop

AhmadHakami · 2024-01-02T20:44:21Z

uncertain if the issue arises because the model was trained on multiline in tiff, but have you attempted fine tuning with one line text in images? give it a try if not yet and share results with us

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fine tuning arabic traineddata to solve extended words issue #362

fine tuning arabic traineddata to solve extended words issue #362

sifdinNh commented Nov 28, 2023

sifdinNh commented Nov 29, 2023

AhmadHakami commented Jan 2, 2024

fine tuning arabic traineddata to solve extended words issue #362

fine tuning arabic traineddata to solve extended words issue #362

Comments

sifdinNh commented Nov 28, 2023

sifdinNh commented Nov 29, 2023

AhmadHakami commented Jan 2, 2024