Simple-OCR provides a more convenient way of reading PDF's and Images using the Tessaract Engine.
- Install Tesseract.
- Install ImageMagick.
It's very simple to use Simple-OCR:
# Specify the path of your source image or PDF.
img = OCR::Image.new("source.png")
# Specify the output file name, called "destination" here.
img.scan("destination", "-l eng", :pdf)
You can also give custom command line options.
img.scan("destination", "-l eng -psm 1...", :pdf)
It is also possible to specify the output file type, which can either be:
- txt
- hocr
img.scan("destination", "-l eng", :txt)
img.scan("destination", "-l eng", :hocr)
SimpleOCR is maintained and funded by Skcript. The names and logos for Skcript are properties of Skcript.
We love open source, and we have been doing quite a bit of contributions to the community. Take a look at them here. Also, encourage people around us to get involved in community operations. Join us, if you'd like to see the world change from our HQ.