Kotamon: A Comprehensive Guide to an Innovative RAG Framework
-
Introduction
- Kotamon: An open-source RAG framework with 11.5K stars on GitHub.
- Focus on the project's core technologies.
- Structure overview of the article.
-
PDF Parsing
- Parsing via PDFThumbnailReader class.
- Main steps in parsing process:
- Load PDF file using PyPDF2.
- Generate page thumbnails with Pillow.
- Store text and thumbnails in documents list.
-
OCR Integration
- OCRReader: Matplotlib, AdobeDocumentIntelligence, AdobeReader, UnstructuredReader.
- Introduction of OCRReader for OCR-based PDF reading.
- OCRReader uses FullOCR endpoint and unstructured for processing.
-
Document Chunking
- Default class for chunking: TokenSplitter.
- Splits document into chunks with parameters.
-
Re-ranking Methods
- Available Methods:
- CoherentRanking
- LLMRanking
- LLMThematicScoring
- LLMScoring
- Available Methods:
-
GraphRAG
- GraphRAG Indexing Pipeline explained.
- Process: Standard file indexing and creating graph index.
- GraphRAG Retrieval Pipeline explained.
- Builds graph search context, formats context records into documents.
- Plots the knowledge graph and retrieves documents.
-
Reasoning
- Simple Reasoning Pipeline: Handles straightforward question-answering tasks.
- Complex Reasoning Pipeline: Breaks down complex questions into smaller sub-questions.
- Uses DecomposeQuestionPipeline class.
- Supports methods such as ReAct and ReWOO.
- Agent-Based Reasoning: Analysis of ReAct.
-
Query Processing
- Identified query rewriting during reasoning processes.
- Custom pipelines can be implemented for query expansion, classification, or refining.
-
Prompt Compression
- No direct techniques implemented for prompt compression.
- Potential for custom pipelines to reduce token load during processing.
-
LLMs and Embedding Models
- Supports multiple platforms for LLMs: OpenAI, Ollama, Claude, and Groq.
- Embedding models supported: OpenAI, Ollama, FastEmbed, and Cohere.
-
Web Service and Frontend
- Built using Gradio for frontend UI.
- Provides a clean interface for document uploads and QA.
-
Storage
- Supports document stores (Docstores) and vector stores (Vectorstores).
- Efficient retrieval for RAG systems.
-
Conclusion and Insights
- Kotamon as a powerful RAG framework.
- Supports multi-modal document parsing, GraphRAG, and agent-based reasoning.
- Managing unstructured data, including complex images or high-resolution PDFs, remains challenging.
- GraphRAG is a game-changer for knowledge-intensive queries by leveraging entities and relationships.
- Further improvements could focus on more efficient indexing strategies or query refinement to enhance responsiveness.
- The project's popularity and star count matter less than its technical usefulness.