Converts PDF, Excel and HTML files to text preserving layout.
When using RAG with LLMs, you do not have access to layout position of text extracted from pages. Using this approach, the LLM can be instructed to look for a specific information using position instructions - e.g. "extract puchase order number from top right, after text 'Order #:'"
res = convert_pdf("sample_table.pdf")
res = convert_xls("sample.xls")