DocuMate is a tool that allows users to interactively query information from multiple documents using natural language. It leverages advanced language models and document processing techniques to provide efficient and accurate responses to user queries.
- Document Upload: Users can upload documents (PDFs) containing textual information.
- Natural Language Query: Users can ask questions in natural language.
- Interactive Chat Interface: Responses to user queries are displayed in an interactive chat interface.
- Memory: The bot retains chat history to provide contextually relevant responses during the conversation.
- Reset Functionality: Users can reset the uploaded documents and conversation history.
- Source Attribution: Enhance the bot’s responses by including references to the specific parts of the uploaded document from which the current answer was derived. This will improve transparency and trustworthiness.
- Support for Multiple Document Formats: Expand the bot’s capabilities to handle a variety of document formats, including .txt, .doc, and .csv files, allowing users to upload and query diverse types of data.
- Integration with Online Resources: Enable the bot to use online hyperlinks as data sources, facilitating real-time access to updated and relevant information from the web.
-
Clone the repository:
git clone https://github.com/hlw-aryan/DocuMate.git
-
Create a virtual environment in the project directory:
python -m venv venv
-
Activate the virtual environment:
-
On Windows:
venv\Scripts\activate
-
On macOS and Linux:
source venv/bin/activate
-
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
-
Create a
.env
file in the root directory. -
Add the following variables:
GOOGLE_API_KEY=your_google_api_key
-
-
Run the FastAPI server:
uvicorn main:app --reload
-
Access the application backend in your browser at
http://localhost:8000
. -
Run the Streamlit app:
streamlit run app.py
-
Access the application in your browser at the provided URL (usually
http://localhost:8501
). -
Upload PDF documents using the provided interface and click on the button 'Process'.
-
Enter your query in the text input field and press enter.
-
View the responses in the chat interface.
-
Use the "Reset" button to clear uploaded documents and conversation history.
- POST /upload/: Uploads a PDF document.
- GET /query/: Retrieves responses to user queries.
- POST /reset/: Resets uploaded documents and conversation history.
- Streamlit: For building the user interface.
- FastAPI: For creating the backend API.
- Google Generative AI: For natural language processing.
- pdfplumber: For extracting text from PDF documents.
- FAISS: For efficient text similarity search.
Contributions are welcome! Please fork the repository and submit a pull request with your changes.
This project is licensed under the Apache License - see the LICENSE file for details.