Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 330 Bytes

README.md

File metadata and controls

22 lines (16 loc) · 330 Bytes

Tesseract merge script

This is bash script, which

  • extract and download images from HTML file
  • merge images into one
  • enlarge merged image
  • run tesseract OCR to read the whole image

Dependencies

  • tesseract-ocr
  • imagemagick

Usage

bash ./ocr-extract.sh /path/to/file.html