DocuMentor is a sophisticated chatbot application designed to assist users in extracting valuable information from uploaded PDF documents. Users can upload PDF files, chat with the AI chatbot to ask questions or seek information related to the document, and receive well-informed responses. This readme provides an overview of the DocuMentor PDF Chatbot, including its features and the technology stack used.
-
PDF Upload: Users can upload PDF documents for analysis and conversation with the AI chatbot.
-
AI Chatbot: Engage in a chat conversation with the AI chatbot to ask questions or discuss the content of the PDF.
-
Document Analysis: The chatbot creates chunks and embeddings to analyze the document and understand its content.
-
Similarity Search: Utilize Langchain for similarity search to find related content within the document.
-
ChromaDB Integration: Store vector searches in ChromaDB for efficient retrieval of similar content.
-
React: The user interface of DocuMentor is built using React, offering a modern and responsive design.
-
Chakra UI: Chakra UI provides a set of accessible and customizable components for creating a visually appealing and user-friendly interface.
- Python Flask: The server-side logic of the chatbot is implemented using Flask, a micro web framework for Python.
-
Langchain: Langchain is used for creating embeddings and performing similarity searches.
-
OpenAI: OpenAI's ChatGPT model 3.5 powers the chatbot, offering natural language understanding and generation capabilities.
-
Embeddings: Embeddings are generated to analyze and represent the content of the PDF.
-
Tiktoken: Tiktoken is used for tokenization and counting words in the text.
-
PyPDF: PyPDF is used for parsing and extracting text from PDF documents.
- ChromaDB: ChromaDB is integrated to store vector searches for efficient retrieval and similarity searching.
-
Clone the repository from GitHub.
-
Navigate to the project directory and install the required dependencies for both the frontend and backend using
npm install
for React andpip install -r requirements.txt
for Python. -
Set up a database connection to ChromaDB, and configure the database settings in the backend.
-
Create environment variables for sensitive information, such as API keys and database connections.
-
Start the frontend and backend servers using
npm start
for React andpython app.py
for Python Flask. -
Access the DocuMentor PDF Chatbot via a web browser by navigating to the specified URL (usually
http://localhost:3000
).
-
Open the DocuMentor PDF Chatbot in your web browser.
-
Upload a PDF document for analysis and conversation.
-
Engage in a chat conversation with the AI chatbot to ask questions or discuss the content of the PDF.
-
The chatbot will analyze the document, create embeddings, and perform similarity searches to provide informed responses.
-
Store vector searches in ChromaDB for efficient retrieval of similar content in the future.
-
Use DocuMentor to unlock valuable insights from your PDF documents.
Contributions to the DocuMentor PDF Chatbot are welcome. Please follow the guidelines outlined in the CONTRIBUTING.md file.
This project is open-source and available under the MIT License.
- Ansh Kathpal
Special thanks to the React, Flask, and OpenAI communities for providing resources and libraries that made this advanced PDF chatbot possible.