Skip to content

Advanced Retrieval-Augmented Generation (RAG) system for processing and retrieving semi-structured data from PDF documents using state-of-the-art NLP techniques.

Notifications You must be signed in to change notification settings

SJ9VRF/Semantic-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Retrieval Augmented Generation (RAG)

Screenshot_2024-08-08_at_8 16 31_PM-removebg-preview

Advanced RAG for Semi-structured Data

This repository contains an implementation of an advanced Retrieval-Augmented Generation (RAG) system, designed to handle and process semi-structured data extracted from PDF documents. It utilizes state-of-the-art NLP techniques along with custom preprocessing pipelines to parse, classify, and effectively retrieve content.

Features

  • PDF Parsing: Leverages the unstructured library to extract diverse elements such as text, tables, and images.
  • Data Processing: Processes extracted elements for optimal formatting and utility.
  • Element Classification: Classifies elements to aid in further processing and retrieval tasks.
  • Content Summarization: Utilizes advanced NLP models for summarizing extracted content.
  • Content Retrieval: Employs a multi-vector retrieval system for efficient and relevant content fetching based on user queries.
  • Storage Management: Manages storage and retrieval of processed and raw data efficiently.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

  • Python 3.8 or higher
  • pip (Python package installer)

Installation

  1. Clone the repository:
    git clone https://github.com/yourusername/yourprojectname.git
    
  2. Navigate to the project directory:
    cd yourprojectname
  3. Install the required dependencies:
    pip install -r requirements.txt
    

Usage

python src/main.py

Documentation

For a detailed guide on how to use this system and further documentation on the architecture and functionalities, please refer to the docs/ directory located within this project.

About

Advanced Retrieval-Augmented Generation (RAG) system for processing and retrieving semi-structured data from PDF documents using state-of-the-art NLP techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages