AWS Lambda functions to extract text from various binary formats.
-
Updated
Feb 7, 2018 - Python
AWS Lambda functions to extract text from various binary formats.
Build a RAG preprocessing pipeline
Recognize page content of a PDF as text using Tesseract and Ghostscript.
A powerful and user-friendly tool based on OCRmyPDF, offering a seamless GUI for conversion of image-based PDFs into searchable text.
Simple and reliable script to conduct high-quality fast OCR on a PDF
Example Django-Python project which contains OCR, PDF to OCR PDF, Text Similarity/Dissimilarity, PDF to PNG converter modules.
PDF OCR service in docker
A tool for compare, merge, display difference and make OCR between the PDFs.
Utility with collect in one place, some operations that are normally done on PDF files.
Add a description, image, and links to the pdf-ocr-extraction topic page so that developers can more easily learn about it.
To associate your repository with the pdf-ocr-extraction topic, visit your repo's landing page and select "manage topics."