A Python Script to Excelize Parsed Complex Text, Image, Tables from Bulk PDF
Explore »
The main goal of this project is to input data on excel from some complex PDFs. A PDF is called complex if it contains multiple pages with various shapes and dimensions of tables, chemical images, drawings, diagrams etc.
- beautifulsoup4==4.11.1
- cryptography==37.0.4
- html5lib==1.1
- lxml==4.9.1
- numpy==1.23.1
- pandas==1.4.3
- pdfminer.six==20220524
- pdfplumber==0.7.4
- Pillow==9.2.0
- pipreqs==0.4.11
- PyMuPDF==1.20.1
- urllib3==1.26.11
- Wand==0.6.9
- xlrd==2.0.1
You need Python 3.7 or more and Pip 20.0 or more for this project. I have used Python 3.9.13 and pip 22.2.1
Below is an example of how you can instruct your audience on installing and setting up your app. This template doesn't rely on any external dependencies or services.
- Get a free API Key at https://example.com
- Clone the repo
git clone https://github.com/akifislam/SmartDataEntryKiller.git
- Install Dependencies
pip install -r requirements.txt
- Run Script
python3 BurstProcessor.py
Akif Islam - Akif Islam - iamakifislam@gmail.com
Project Link: Smart Data Entry Killer
- Mohammad Ruhul Ameen Bhai for boosting me to complete this impossible tasks
- StackOverFlow for saving my life and giving me recognition to outsiders as a Python Developer (though I know nothing about it)
-->