QueryPDF is a web application designed to enhance document analysis by intelligently retrieving relevant sentences based on user queries. Traditional search methods often miss nuanced information in documents, requiring exact keyword matches. QueryPDF overcomes this limitation using advanced sentence transformer models to identify contextually relevant sentences, even when specific keywords are absent.
- Contextual Search: Search for topics of interest without needing exact keyword matches.
- Intelligent Analysis: Utilizes advanced models to identify and extract relevant information.
- Efficient Retrieval: Presents a sorted list of the most pertinent sentences from uploaded documents.
- User-Friendly Interface: Simple upload interface with query input for seamless operation.
- Upload Document: Upload your PDF document of interest.
- Enter Query: Specify your query or topic, such as "sustainable practices"
- Retrieve Results: QueryPDF scans the document, identifies contextually similar sentences, and presents them in a sorted list.
The following image illustrates the real-time functionality of the QueryPDF App
To set up the QueryPDF App on your local machine, follow these steps:
- Clone the repository
git clone https://github.com/Git-With-Chris/QueryPDF.git
- Change Directory
cd QueryPDF
- Install dependencies
pip install -r requirements.txt
- Run the application
python app.py
.
|-- README.md # The main documentation file for the project
|-- app.py # The main Flask application file
|-- images
| |-- Concept.gif # Animated GIF illustrating the concept or workflow
| |-- Preview.png # Preview image showing the main interface
| `-- ProposedSolution.png # Image depicting the proposed solution or architecture
|-- input # Directory for storing input files, such as PDFs
|-- notebooks
| |-- MVP_V1_Analysis.ipynb # Jupyter Notebook for MVP version 1 analysis and development
| |-- MVP_V2_Analysis.ipynb # Jupyter Notebook for MVP version 2 analysis and development
| |-- POC_Analysis.ipynb # Jupyter Notebook for Proof of Concept (POC) analysis
| |-- Regex_Analysis.ipynb # Jupyter Notebook for regex-based analysis
| |-- Sentence_Parser_Analysis.ipynb # Jupyter Notebook for sentence parser analysis
| `-- Validation_Template.ipynb # Jupyter Notebook for validation template
|-- script.py # Additional Python script for auxiliary functions
|-- static
| `-- styles.css # CSS file for styling the web application
`-- templates
|-- login.html # HTML template for the login page
|-- pdfParser.html # HTML template for the PDF parser interface
|-- registration.html # HTML template for the registration page
`-- results.html # HTML template for displaying the results
6 directories, 17 files
Contributions to QueryPDF are welcome! Here's how you can contribute:
- Fork the repository
- Create your feature branch
git checkout -b feature/YourFeature
- Commit your changes
git commit -am 'Add some feature'
- Push to the branch
git push origin feature/YourFeature
- Open a pull request
QueryPDF would not be possible without the contributions of many open source projects:
- PyPDF
- PyTorch
- Flask
- Tesseract / PyTesseract
- Transformers and many others!
This project is licensed under the MIT license.
It contains code that is copied and adapted from transformers, which is Apache 2.0 licensed.