This project is a Streamlit application for processing multimodal documents and querying a Milvus database. It leverages cutting-edge tools like LangChain, transformers, EasyOCR, and others for processing, storing, and querying text extracted from various file types. 🚀
- Supports multiple file types:
audio
,video
,image
,text
,csv
,yaml
,json
,docx
, andpdf
. - Extracts text content using:
- 🔊 Audio:
speech_recognition
andpydub
. - 🎥 Video: Custom extraction logic.
- 🖼️ Image:
EasyOCR
. - 📄 Text/Logs/Documents: LangChain loaders.
- 🔊 Audio:
- 🗃️ Stores processed document embeddings for similarity-based querying.
- 🧠 Utilizes
HuggingFaceEmbeddings
for generating vector representations.
- Natural language query interface.
- Implements a Retrieval-Augmented Generation (RAG) pipeline for AI-driven responses.
- Python 3.8+
pip
orconda
package manager- CUDA-compatible GPU (optional, for faster processing)
-
Fork the repository: Navigate to RAG-Architecture GitHub Repository and click Fork.
-
Clone the forked repository:
git clone https://github.com/<your-username>/RAG-Architecture.git cd RAG-Architecture
pip install -r requirements.txt
Run the Streamlit app:
streamlit run app.py
- Upload a file to process and store its content in Milvus.
- Displays extracted content and stores embeddings in the database.
- Enter a question to search and retrieve relevant information from the Milvus database.
- Returns AI-generated responses using LangChain's RAG pipeline.
## 📁 **File Structure**
```bash
project/
│
├── app.py # 🎯 Main Streamlit application
├── requirements.txt # 📦 Python dependencies
├── utils/ # 🛠️ Utility modules
│ ├── audio_utils.py # 🎵 Audio file processing
│ ├── video_utils.py # 📹 Video file processing
│ ├── image_utils.py # 🖼️ Image file processing
│ ├── document_loaders.py # 📜 Document processing loaders
│ ├── milvus_client.py # 🗄️ Initializes Milvus database
│
├── milvus_database.db # 🗃️ Milvus database file (auto-created)
├── Dataset # 📂 Folder to store datasets
├── Images # 📁 Folder for storing images
🔑 Key Modules
🧩 Main application logic
- Handles file uploads, document processing, and querying.
- 🎵 Audio: Splits audio into chunks and transcribes text.
- 📹 Video: Processes video files to extract and analyze content.
- 🖼️ Image: Uses EasyOCR for extracting text.
- 📜 Logs/Documents: Processes CSV, YAML, JSON, and PDF files into structured LangChain documents.
🛠️ Example Workflow
- Select "Upload Files" mode.
- Upload a file (e.g.,
example.pdf
). - Process and store the file in the database.
- Select "Query" mode.
- Enter a natural language question.
- Receive a concise, fact-based response.
🌟 Future Improvements
- 🔍 Add more advanced query capabilities.
- 📂 Enhance support for additional file types and embeddings.
- ⚡ Improve scalability for larger datasets.
📜 License This project is licensed under the MIT License.
🙌 Acknowledgments
- 🌐 Streamlit for the interactive UI.
- 📚 LangChain and Milvus for document processing, retrieval and vector db.
- 🤖 Transformers for embedding generation.
- 🖼️ EasyOCR for image text extraction.
- 📹 Moviepy for video processing.