-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance document querying: Enable summaries and overviews for uploaded files #2214
Comments
I was asked about both #1 and #2 in a live stream yesterday. #1 is being discussed in a separate issue specifically about GraphRAG. #2 is an interesting one in terms of design. For example:
I'm curious what you think about those questions. I imagine we can figure out the feature, as it is indeed a popular request, but I think it'll be a fairly different code path than our current flow. |
Thank you for the insights. My inclination is to send the entire document to the LLM each time the user interacts with it. Here’s my reasoning:
Regarding GraphRAG, the primary drawback lies in the high upfront costs of constructing the graph, a challenge that LazyGraphRAG, as referenced in #1928, may help mitigate. |
That makes sense. That's easier to implement, it means we only need to run part of the ingestion pipeline (Extraction). Here's how ChatGPT seems to do it:
I think we could potentially send the full data over the wire, for simplicity, but that will incur higher costs, and we'd need to decide whether to do base-64 data URIs inside JSON or do a multi-part with both JSON and an attachment. |
Addendum: Another option is to store the document text inside localStorage, and have the client fetch it from there for follow-up questions. That avoid the need for a cloud DB. We could expire it after a while (I have a package called lscache that does time-based expiry for localStorage). That still has the drawback of increasing the size of data sent over the wire, however. |
@pamelafox - I’m really impressed with how you broke down ChatGPT’s file upload process—it clarified a lot, especially since I initially thought the full file was loaded into the context window. That said, I’m curious why we’d send file content as Base64 data instead of using our prepdocs functions, which could make the content cleaner and more structured for processing. Wouldn’t this approach also help optimize costs and user experience while still keeping things manageable? |
I'd still want to use prepdocs for file understanding (the parsing step), but not for indexing, since we wouldn't be doing that step. For example, if I was copying the ChatGPT approach entirely:
If I was going to use local storage:
|
The current method of file uploading is confusing for our users. Documents are chunked, vectorized, and placed in a knowledge base, resulting in the ability to only ask questions about specific pieces of text in the document, in line with the known limitations of the "normal" RAG approach.
Users upload a document but cannot request a summary or overview, which confuses them. They wonder why they can't get a high-level view of the document they just uploaded.
To address this issue, we propose two potential solution directions:
This issue is high on our priority list to scale from the pilot phase to full organizational implementation.
This issue is for a: (mark with an
x
)Minimal steps to reproduce
Any log messages given by the failure
Expected/desired behavior
OS and Version?
azd version?
Versions
The text was updated successfully, but these errors were encountered: