-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flag for Page Segmentation Modes control #1601
Conversation
I added an flag --psm for controlling PSM (Page Segmentation Modes) in Tesseract. The default option (3) gives me quite bad results. When I use 6, 11, or 12 for Bulgarian, it gives me much better OCR results. I haven't tested other languages yet, but I expect improvements as well if other mode is used.
@prateekmedia have you added this flag already? |
@PunitLodha Not added yet, will add once this merges. |
@prateekmedia could you add it to this PR itself? |
@PunitLodha Here I have made a PR to his repo: |
feat: add psm for rust parser
@prateekmedia merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks good although not tested myself!
The tests failing will be resolved in #1635. cc @PunitLodha |
@prateekmedia the tests aren't passing yet |
@PunitLodha This PR needs rebase again. |
CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 1a13bbb...:
All tests passing on the master branch were passed completely. NOTE: The following tests have been failing on the master branch as well as the PR:
Congratulations: Merging this PR would fix the following tests:
Check the result page for more info. |
CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 1a13bbb...:
All tests passing on the master branch were passed completely. NOTE: The following tests have been failing on the master branch as well as the PR:
Congratulations: Merging this PR would fix the following tests:
Check the result page for more info. |
In raising this pull request, I confirm the following (please check boxes):
My familiarity with the project is as follows (check one):
I added an flag
-psm
for controlling PSM (Page Segmentation Modes) in Tesseract. The default option (3) gives me quite bad results. When I use 6, 11, or 12 for Bulgarian, it gives me much better OCR results. I haven't tested other languages yet, but I expect improvements as well if other mode is used.p.s This PR is continue #1544 which was closed after the rebase 🥲