PDF-Text-Embeddings-Search

Overview

This project is designed to facilitate the loading of PDF files, here we took example of the book "HeerShneiderman2012-Interactive Dynamics for Visual Analysis" who's pdf was uploaded. The primary goal is to utilize Langchain and Language Models (LLMs), specifically leveraging OpenAI's LLM model, for querying data extracted from the PDF.

Additionally, the project incorporates CassandraDB, specifically Astra from DataStax, to establish a cloud-based VectorDB. This VectorDB enables efficient search and querying based on text embeddings.

Workflow

PDF Processing:
- Load PDF files, such as "HeerShneiderman2012-Interactive Dynamics for Visual Analysis."
- Split the pages of the PDF.
- Extract characters from each page.
Text Embeddings:
- Convert characters into text embeddings.
Vector Database (CassandraDB - Astra):
- Store text embeddings as vector data in the CassandraDB database.
- Establish a cloud-based VectorDB for quick search and querying.
Language Models (LLMs):
- Load Language Models.
- Utilize LLMs for querying purposes.
Query Processing:
- Pass text embeddings to LLMs for queries.
Similarity Search:
- AstraDB employs similarity search algorithms.
- Match queries to the closest information in the VectorDB.

Technologies Used

Langchain
OpenAI's LLM
CassandraDB (Astra from DataStax)

How to Use

Clone the repository.
Install the required dependencies.
Follow the provided documentation for detailed instructions on loading PDFs, querying using LLMs, and utilizing the VectorDB.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
app.ipynb		app.ipynb
book.pdf		book.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-Text-Embeddings-Search

Overview

Workflow

Technologies Used

How to Use

About

Releases

Packages

Languages

0sparsh2/PDF-Query-LLM-with-VectorDB

Folders and files

Latest commit

History

Repository files navigation

PDF-Text-Embeddings-Search

Overview

Workflow

Technologies Used

How to Use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages