-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query by filename #883
Query by filename #883
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Would like some discussion around the decision to insert the filename as a separate kind of entry vs. simply adding it as a metadata field (which seems conceptually cleaner to me). |
I guess the obvious reason is that the metadata is currently only used for filtering and the embeddings are only computed for the contents? In which case I guess my question is whether we should automatically include the embeddings in the metadata somehow. |
If it's added only to the metadata, I believe embeddings can't query it, they can only perform an exact match case filter vs entry, it can do fuzzy search. |
I suppose we could have another database column containing a joined text of everything, and have a toggle to query_metadata = True/False text | metadata | both |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #883 +/- ##
==========================================
- Coverage 51.66% 51.59% -0.07%
==========================================
Files 108 108
Lines 13697 13714 +17
==========================================
Hits 7076 7076
- Misses 6621 6638 +17 ☔ View full report in Codecov by Sentry. |
Superseded by #911 |
Closes #873
For context, I uploaded two completely separate documents:
When I asked
what's in the receipt
, it referenced contents from the org chart instead of the receipt because the receipt doesn't mention it's a receipt in its contents.This PR adds some weights to the relevancy based on filename