Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag for Page Segmentation Modes control #1601

Merged
merged 10 commits into from
Sep 3, 2024

Conversation

Neo2SHYAlien
Copy link
Contributor

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

I added an flag -psm for controlling PSM (Page Segmentation Modes) in Tesseract. The default option (3) gives me quite bad results. When I use 6, 11, or 12 for Bulgarian, it gives me much better OCR results. I haven't tested other languages yet, but I expect improvements as well if other mode is used.

p.s This PR is continue #1544 which was closed after the rebase 🥲

I added an flag --psm for controlling PSM (Page Segmentation Modes) in Tesseract. The default option (3) gives me quite bad results. When I use 6, 11, or 12 for Bulgarian, it gives me much better OCR results. I haven't tested other languages yet, but I expect improvements as well if other mode is used.
@Neo2SHYAlien
Copy link
Contributor Author

@cfsmp3 After the resync of the main branch previous PR #1544 was closed automatically. I hope the code change to be good enough I'm nod a daily dev 😊

@PunitLodha
Copy link
Member

@prateekmedia have you added this flag already?

@prateekmedia
Copy link
Member

@PunitLodha Not added yet, will add once this merges.

@PunitLodha
Copy link
Member

@prateekmedia could you add it to this PR itself?

@prateekmedia prateekmedia mentioned this pull request Aug 23, 2024
10 tasks
@prateekmedia
Copy link
Member

@PunitLodha Here I have made a PR to his repo:
Neo2SHYAlien#1

@Neo2SHYAlien
Copy link
Contributor Author

@prateekmedia merged

Copy link
Member

@prateekmedia prateekmedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good although not tested myself!

@prateekmedia
Copy link
Member

The tests failing will be resolved in #1635. cc @PunitLodha

@PunitLodha
Copy link
Member

@prateekmedia the tests aren't passing yet

@prateekmedia
Copy link
Member

@PunitLodha This PR needs rebase again.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 1a13bbb...:

Report Name Tests Passed
Broken 12/13
CEA-708 9/14
DVB 4/7
DVD 3/3
DVR-MS 2/2
General 15/27
Hauppage 2/3
MP4 3/3
NoCC 10/10
Options 83/86
Teletext 21/21
WTV 9/13
XDS 22/34

All tests passing on the master branch were passed completely.

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:


Check the result page for more info.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 1a13bbb...:

Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 85/86
Teletext 21/21
WTV 13/13
XDS 34/34

All tests passing on the master branch were passed completely.

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:


Check the result page for more info.

@PunitLodha PunitLodha merged commit 349020e into CCExtractor:master Sep 3, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants