Releases: deajan/pmOCR
Commiting the git crime again and again, with a spoon
- Limit preprocessor/transform threads to config defined NUMBER_OF_PROCESSES
- Tesseract PDF intermediary transformation
- Added intermediary transformation suffix to make sure we don't overwrite earlier files
- Fixed intermediary transformation failing
- Disabled intermediary transformation when preprocessor is used
- Tesseract preprocessor
- Improved tesseract preprocessor settings
- Made general preprocessing/transformation dpi a variable
- Always preprocess files to TIFF format so we don't need intermediary transformation with preprocessor
Commiting the git crime again
This release adds some nifty featues:
- A configurable directory poller interval
- Service recovery when the monitored directory is not writable or absent
It also fixes upgrades with newer configuration files and preprocessed images errors when using the new poller.
As already said, this should be the last pmOCR v1 release.
It will be maintained until pmOCR v2 shows up, written in Python, which should be fairly more easy to maintain than a 2.5K lines bash script ;)
Commiting the git crime
This release adds a new inotifywait emulation which uses polling instead of waiting for inotify signals from kernel, allowing to use pmOCR on Samba / NFS shares.
it also speeds up the file detection process by using pre-determined file lists.
As we're hitting 2022, this will be the last pmOCR release coded in bash.
bash is a wonderful complicated beast which is heavily error prone and was never designed to be used in such complicated ways.
I wish to continue maintaining this wrapper, but I definitly need to shift to a better programming language, and have chosen Python since it allows to code pmOCR with simple existing tools, without the need to reinvent (recode) the wheel.
Unless pmOCR v2 is released, support for pmOCR v1.x is guaranteed.
Happy OCRring
poor man's 4 tesseracts
pmOCR v1.6.1 maintenance release
This release brings the following features:
- Tesseract 4.x support (actually, did already work, but now it's tested and allows to select different OCR engines)
- Currently in use files are deferred in service mode for later OCR processing
Other fixes went into this release:
- Fix automatic service shutdown in RHEL / CentOS 6/7 after 10 days (automatic /tmp directory cleanup did remove the run file)
- Many minor improvements and fixes that came with ofunctions developped on osync/obackup
Long time no see
A brand new pmocr release with lots of bugfixes and more sanity checks.
IMPORTANT Configuration file syntax has changed with version 1.6.0 in order to simplify new deployments.
Please make sure to use the new format.
See Changelog for more details
Urgent bugfix release
Bugfix release addressing an issue introduced with earlier v1.5.6 release that stops the service monitor after a first run because of the new cleanup behavior.
Bugfix & test framework release
This release mainly introduces some unit and functional testing, which resolved a couple of issues and also allows to run on travis CI platform:
- Service run file was created in root since v1.5.4 because of some merge modifications
- CSV transformation didn't work anymore (nasty typo)
- Fixed a low severity security issue where log & run files were world readable
For more details, see chanelog file.
Small improvement release
The main feature of this release is the ability to move files upon successful / failed OCR recognition in order to keep the folder structure clean.
For other minor fixes see changelog.
A small improvements & bugfix release :)
New release of the 1.5 branch including the following
Improvements
- Service now makes a 'forced' run every MAX_WAIT seconds (defaults to an hour)
- An OCR run is also made on service start now
- Moving files in monitored directories also trigger a run
- Improved mail functions, parallel execution and logging
Bugfixes
- Prevent overwriting multiple failed files to be overwritten when source produces the same filename
poor man's OCR tool just got less poor
This should be a pretty mature release, including the following highlights:
- Owership preservation possibilty
- Parallelization of OCR runs
- Support for image preprocessors
- New config file support
- Better tesseract support