-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query by metadata #911
Query by metadata #911
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #911 +/- ##
==========================================
+ Coverage 58.44% 58.46% +0.01%
==========================================
Files 109 109
Lines 13868 13884 +16
==========================================
+ Hits 8105 8117 +12
- Misses 5763 5767 +4 ☔ View full report in Codecov by Sentry. |
Sorry I wasn't clear before. I wasn't saying I prefer this approach, I just wanted to weigh the pros and cons of both approaches. |
Well that isn't quite true, I definitely prefer treating the filename as metadata rather than having a distinct row for it, but as we are discovering it does have drawbacks. |
I personally prefer this approach. Perhaps it could be more targeted with |
Alternatively, could go back to the old approach, but store metadata as a separate table, then joined on the main table. |
Looking at ChromaDB, they have explicit filters for metadata filtering |
My main question is whether it's really necessary to store a text and metadata column and compute the additional embeddings. Does always including the metadata in the embedding result in any appreciable performance degredation? |
Do you mean rather than In this PR, we only look up by text_and_metadata, but return only text |
lumen/ai/vector_store.py
Outdated
f"({key}: {self._format_metadata_value(value)})" | ||
for key, value in metadata.items() | ||
] | ||
text_and_metadata = " ".join(metadata_items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's just metadata, not text_and_metadata
?
Okay this is now ready. I also added functionality where if user uploads the same file (perhaps with modified contents), it'll overwrite the existing document. |
Looks good! |
Supersedes #883 based on: