Enhancements
- Add character-level filtering for tesseract output. It is controllable via
TESSERACT_CHARACTER_CONFIDENCE_THRESHOLD
environment variable.
Features
Fixes
- Fix NLTK Download to use nltk assets in docker image
- removed the ability to automatically download nltk package if missing