-
Notifications
You must be signed in to change notification settings - Fork 121
Bug fix - source
field was not copied to chunks
#49
Conversation
This prevented sources from appearing in the final response
The source field was actually not pushed at all into `metadata` at upsert() time, and wasn't copied to the result `KbDocWithScore` at query() time!
This test was missing on multiple levels - chunker tests, KB E2E tests
New feature, added new test
My main question is whether we would like to make the source not optional for first release. I think practically there are two options:
Currently what we have is allowing source as optional, but we make the LLM to output empty strings. The minimal fix for that can be to simply say to the LLM in the prompt to ignore empty sources. |
As @miararoy pointed out, the empty string is a feature, not a bug. |
For demos this is fine, but the question what happens when users actually want to use it in production. If they don't have any source that make sense to present to the user, we shouldn't force the model to present a source. So what I'm trying to say that in some sense right now the source is not really optional |
So the API will resamble that of Document.
Problem
There were actually multiple problems:
Chunker
didn't copy thesource
field from theDocument
to all chunksupsert()
, thesource
field was not pushed intometadata
query()
, thesource
field was not popped frommetadata
back into a dedicated fieldupsert_dataframe()
didn't support thesource
column at allSolution
Fixed all above.
In addition - I added an explicit check whether docs contain reserved metadata fields (such as
'text'
), and raise and error (+ relevant tests).Type of Change
Test Plan
Added tests for all above + beefed up KB testing