-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] [Pytesseract] [Urdu] [Segmentation fault] [Deserialize header failed] #354
Comments
How is this related to Python and pytesseract? By the way: GitHub allows formatting code sections as code to improve readability (just use the |
Also, it seems you try to run training on some platform (kaggle?) - run it on your local computer Linux/WSL or Mac. |
Hi @zdenop, I'm running it on Jupyter Notebook. I started with a single page that contained 10 lines only. |
Hi @stefan6419846, I'm working on Jupyter notebook for python and writing the code in it. Moreover, I have also made the code more readable as you suggested. Thanks |
Follow readme instruction - only supported training process. Jupyter notebook is not there. |
Hi All,
I'm having trouble executing the fine-tunning on this repository. Below is my code which I run on my Jupyter notebook:
Step-6:
I have replaced /content/tesstrain/data/irt/list.train folder with my file which contains below text:
/content/tesstrain/data/irt-ground-truth/page_10_line_1.png نقش فریادی ہے کس کی شوخیٔ تحریر کا
/content/tesstrain/data/irt-ground-truth/page_10_line_2.png کاغذی ہے پیرہن ہر پیکر تصویر کا
/content/tesstrain/data/irt-ground-truth/page_10_line_3.png کاو کاو سخت جانی ہائے تنہائی نہ پوچھ
/content/tesstrain/data/irt-ground-truth/page_10_line_4.png صبح کرنا شام کا لانا ہے جوئے شیر کا
/content/tesstrain/data/irt-ground-truth/page_10_line_5.png جذبۂ بے اختیار شوق دیکھا چاہیے
/content/tesstrain/data/irt-ground-truth/page_10_line_6.png سینۂ شمشیر سے باہر ہے دم شمشیر کا
/content/tesstrain/data/irt-ground-truth/page_10_line_7.png آگہی دام شنیدن جس قدر چاہے بچھائے
/content/tesstrain/data/irt-ground-truth/page_10_line_8.png مدعا عنقا ہے اپنے عالم تقریر کا
/content/tesstrain/data/irt-ground-truth/page_10_line_9.png نبسکہ ہوں غالبؔ اسیری میں بھی آتش زیر پا
/content/tesstrain/data/irt-ground-truth/page_10_line_10.png موئے آتش دیدہ ہے حلقہ مری زنجیر کا
Step8 OutCome:
You are using make version: 4.3
lstmtraining
--debug_interval 0
--traineddata data/irt/irt.traineddata
--old_traineddata /content/tesstrain/usr/share/tessdata/urd.traineddata
--continue_from data/urd/irt.lstm
--learning_rate 0.0001
--model_output data/irt/checkpoints/irt
--train_listfile data/irt/list.train
--eval_listfile data/irt/list.eval
--max_iterations 10000
--target_error_rate 0.01
Loaded file data/urd/irt.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 129 to 129!
Num (Extended) outputs,weights in Series:
1,48,0,1:1, 0
Num (Extended) outputs,weights in Series:
C3,3:9, 0
Ft16:16, 160
Total weights = 160
[C3,3Ft16]:16, 160
Mp3,3:16, 0
Lfys64:64, 20736
Lfx96:96, 61824
Lrx96:96, 74112
Lfx384:384, 738816
Fc129:129, 49665
Total weights = 945313
Previous null char=2 mapped to 128
Continuing from data/urd/irt.lstm
Deserialize header failed: /content/tesstrain/data/irt-ground-truth/page_10_line_1.png نقش فریادی ہے کس کی شوخیٔ تحریر کا
Deserialize header failed: /content/tesstrain/data/irt-ground-truth/page_10_line_2.png کاغذی ہے پیرہن ہر پیکر تصویر کا
Deserialize header failed: /content/tesstrain/data/irt-ground-truth/page_10_line_5.png جذبۂ بے اختیار شوق دیکھا چاہیے
Load of page 0 failed!
Load of images failed!!
make: *** [Makefile:327: data/irt/checkpoints/irt_checkpoint] Segmentation fault (core dumped)
Please help me how to proceed further. I'm stuck.
Thanks you
The text was updated successfully, but these errors were encountered: