QueryPDF 💬

QueryPDF is a web application designed to enhance document analysis by intelligently retrieving relevant sentences based on user queries. Traditional search methods often miss nuanced information in documents, requiring exact keyword matches. QueryPDF overcomes this limitation using advanced sentence transformer models to identify contextually relevant sentences, even when specific keywords are absent.

Features

Contextual Search: Search for topics of interest without needing exact keyword matches.
Intelligent Analysis: Utilizes advanced models to identify and extract relevant information.
Efficient Retrieval: Presents a sorted list of the most pertinent sentences from uploaded documents.
User-Friendly Interface: Simple upload interface with query input for seamless operation.

How It Works

Upload Document: Upload your PDF document of interest.
Enter Query: Specify your query or topic, such as "sustainable practices"
Retrieve Results: QueryPDF scans the document, identifies contextually similar sentences, and presents them in a sorted list.

Preview

The following image illustrates the real-time functionality of the QueryPDF App

Installation

To set up the QueryPDF App on your local machine, follow these steps:

Clone the repository

git clone https://github.com/Git-With-Chris/QueryPDF.git

Change Directory

cd QueryPDF

Install dependencies

pip install -r requirements.txt

Run the application

python app.py

Project Structure

.
|-- README.md                                      # The main documentation file for the project
|-- app.py                                         # The main Flask application file
|-- images
|   |-- Concept.gif                                # Animated GIF illustrating the concept or workflow
|   |-- Preview.png                                # Preview image showing the main interface
|   `-- ProposedSolution.png                       # Image depicting the proposed solution or architecture
|-- input                                          # Directory for storing input files, such as PDFs
|-- notebooks
|   |-- MVP_V1_Analysis.ipynb                      # Jupyter Notebook for MVP version 1 analysis and development
|   |-- MVP_V2_Analysis.ipynb                      # Jupyter Notebook for MVP version 2 analysis and development
|   |-- POC_Analysis.ipynb                         # Jupyter Notebook for Proof of Concept (POC) analysis
|   |-- Regex_Analysis.ipynb                       # Jupyter Notebook for regex-based analysis
|   |-- Sentence_Parser_Analysis.ipynb             # Jupyter Notebook for sentence parser analysis
|   `-- Validation_Template.ipynb                  # Jupyter Notebook for validation template
|-- script.py                                      # Additional Python script for auxiliary functions
|-- static
|   `-- styles.css                                 # CSS file for styling the web application
`-- templates
    |-- login.html                                 # HTML template for the login page
    |-- pdfParser.html                             # HTML template for the PDF parser interface
    |-- registration.html                          # HTML template for the registration page
    `-- results.html                               # HTML template for displaying the results

6 directories, 17 files

Contributing

Contributions to QueryPDF are welcome! Here's how you can contribute:

Fork the repository
Create your feature branch git checkout -b feature/YourFeature
Commit your changes git commit -am 'Add some feature'
Push to the branch git push origin feature/YourFeature
Open a pull request

Acknowledgements

QueryPDF would not be possible without the contributions of many open source projects:

PyPDF
PyTorch
Flask
Tesseract / PyTesseract
Transformers and many others!

License

This project is licensed under the MIT license.

It contains code that is copied and adapted from transformers, which is Apache 2.0 licensed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QueryPDF 💬

Features

How It Works

Preview

Installation

Project Structure

Contributing

Acknowledgements

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Images		Images
Notebooks		Notebooks
static		static
templates		templates
LICENSE.txt		LICENSE.txt
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
script.py		script.py

License

Git-With-Chris/QueryPDF

Folders and files

Latest commit

History

Repository files navigation

QueryPDF 💬

Features

How It Works

Preview

Installation

Project Structure

Contributing

Acknowledgements

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages