Skip to content

andyfe76/Page-Layout-LLM-context

Repository files navigation

Page layout extraction for LLM context

Converts PDF, Excel and HTML files to text preserving layout.

When using RAG with LLMs, you do not have access to layout position of text extracted from pages. Using this approach, the LLM can be instructed to look for a specific information using position instructions - e.g. "extract puchase order number from top right, after text 'Order #:'"

res = convert_pdf("sample_table.pdf")

res = convert_xls("sample.xls")

About

Convert PDF/Excel/HTML to text maintaining layout

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages