16 May 17:16

deajan

3cfde86

Commiting the git crime again and again, with a spoon Latest

Latest

Limit preprocessor/transform threads to config defined NUMBER_OF_PROCESSES
Tesseract PDF intermediary transformation
- Added intermediary transformation suffix to make sure we don't overwrite earlier files
- Fixed intermediary transformation failing
- Disabled intermediary transformation when preprocessor is used
Tesseract preprocessor
- Improved tesseract preprocessor settings
- Made general preprocessing/transformation dpi a variable
- Always preprocess files to TIFF format so we don't need intermediary transformation with preprocessor

Assets 2

08 Mar 12:04

deajan

v1.8.1

05a02e8

Commiting the git crime again

This release adds some nifty featues:

A configurable directory poller interval
Service recovery when the monitored directory is not writable or absent

It also fixes upgrades with newer configuration files and preprocessed images errors when using the new poller.

As already said, this should be the last pmOCR v1 release.
It will be maintained until pmOCR v2 shows up, written in Python, which should be fairly more easy to maintain than a 2.5K lines bash script ;)

Assets 2

25 Feb 15:20

deajan

v1.8.0

8000237

Commiting the git crime

This release adds a new inotifywait emulation which uses polling instead of waiting for inotify signals from kernel, allowing to use pmOCR on Samba / NFS shares.
it also speeds up the file detection process by using pre-determined file lists.

As we're hitting 2022, this will be the last pmOCR release coded in bash.
bash is a wonderful complicated beast which is heavily error prone and was never designed to be used in such complicated ways.

I wish to continue maintaining this wrapper, but I definitly need to shift to a better programming language, and have chosen Python since it allows to code pmOCR with simple existing tools, without the need to reinvent (recode) the wheel.

Unless pmOCR v2 is released, support for pmOCR v1.x is guaranteed.

Happy OCRring

Assets 2

11 Jul 08:43

deajan

v1.6.1

8d9ac25

poor man's 4 tesseracts

pmOCR v1.6.1 maintenance release

This release brings the following features:

Tesseract 4.x support (actually, did already work, but now it's tested and allows to select different OCR engines)
Currently in use files are deferred in service mode for later OCR processing

Other fixes went into this release:

Fix automatic service shutdown in RHEL / CentOS 6/7 after 10 days (automatic /tmp directory cleanup did remove the run file)
Many minor improvements and fixes that came with ofunctions developped on osync/obackup

Assets 2

21 Dec 18:36

deajan

v1.6.0

e31d9ac

Long time no see

A brand new pmocr release with lots of bugfixes and more sanity checks.

IMPORTANT Configuration file syntax has changed with version 1.6.0 in order to simplify new deployments.
Please make sure to use the new format.

See Changelog for more details

Assets 2

21 Apr 17:53

deajan

v1.5.7

1ae583a

Urgent bugfix release

Bugfix release addressing an issue introduced with earlier v1.5.6 release that stops the service monitor after a first run because of the new cleanup behavior.

Assets 2

20 Apr 19:15

deajan

v1.5.6

7044c42

Bugfix & test framework release

This release mainly introduces some unit and functional testing, which resolved a couple of issues and also allows to run on travis CI platform:

Service run file was created in root since v1.5.4 because of some merge modifications
CSV transformation didn't work anymore (nasty typo)
Fixed a low severity security issue where log & run files were world readable

For more details, see chanelog file.

Assets 2

13 Mar 12:14

deajan

v1.5.4

76cfb63

Small improvement release

The main feature of this release is the ability to move files upon successful / failed OCR recognition in order to keep the folder structure clean.
For other minor fixes see changelog.

Assets 2

06 Feb 16:55

deajan

v1.5.2

bc42c6e

A small improvements & bugfix release :)

New release of the 1.5 branch including the following

Improvements

Service now makes a 'forced' run every MAX_WAIT seconds (defaults to an hour)
An OCR run is also made on service start now
Moving files in monitored directories also trigger a run
Improved mail functions, parallel execution and logging

Bugfixes

Prevent overwriting multiple failed files to be overwritten when source produces the same filename

Assets 2

21 Oct 13:49

deajan

v1.5

fad0710

poor man's OCR tool just got less poor

This should be a pretty mature release, including the following highlights:

Owership preservation possibilty
Parallelization of OCR runs
Support for image preprocessors
New config file support
Better tesseract support

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pmOCR v1.6.1 maintenance release

New release of the 1.5 branch including the following

Improvements

Bugfixes

Releases: deajan/pmOCR

Commiting the git crime again and again, with a spoon

Commiting the git crime again

Commiting the git crime

poor man's 4 tesseracts

pmOCR v1.6.1 maintenance release

Long time no see

Urgent bugfix release

Bugfix & test framework release

Small improvement release

A small improvements & bugfix release :)

New release of the 1.5 branch including the following

Improvements

Bugfixes

poor man's OCR tool just got less poor