Layout-Aware RAG (LA-RAG)

This is the source code repository for Python package la-rag

Why Layout-Aware RAG?

The impressive abilities of large language models (LLMs) offer exciting possibilities for large-scale document analysis. However, a significant challenge remains in making text from extensive documents, such as large PDFs, accessible to LLMs due to their limited context window, which restricts the amount of text they can process at a time.

Retrieval Augmented Generation (RAG) systems address this challenge by combining LLMs with advanced retrieval techniques. Common chunking technologies used in LangChain, such as TextSplitter, RecursiveCharacterTextSplitter, etc break documents into smaller sections to fit within the LLM’s context window. But, these methods will allow the system to lose the semantic connection of layout features, such as sections & subsections, tables, lists, bullet points, etc. For example, in a bullet-pointed list, all the points can be interrelated, and each point is connected to the paragraph or the last sentence of the paragraph given above the list.

ayout-aware RAG considers the layout features of the document and the semantic connection between them.

Important Note

Although I have made the repository public and released the python package, I am still working on this project. I will be adding more details on the installation and building projects using la-rag soon. I encourage you to look into the tests\sample.py for reference to a sample code to see how to use the package.

If you are interested to collaborate on this project, please email me at muafirathasnikt@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
src		src
tests		tests
LICENSE		LICENSE
README.md		README.md
layout aware chunking demo.png		layout aware chunking demo.png
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Layout-Aware RAG (LA-RAG)

Why Layout-Aware RAG?

Important Note

About

Releases

Packages

Languages

License

MuafiraThasni/LayoutAwareRAG

Folders and files

Latest commit

History

Repository files navigation

Layout-Aware RAG (LA-RAG)

Why Layout-Aware RAG?

Important Note

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages