Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query by filename #883

Closed
wants to merge 3 commits into from
Closed

Query by filename #883

wants to merge 3 commits into from

Conversation

ahuang11
Copy link
Contributor

@ahuang11 ahuang11 commented Dec 18, 2024

Closes #873

For context, I uploaded two completely separate documents:

  1. a receipt
  2. an org chart

When I asked what's in the receipt, it referenced contents from the org chart instead of the receipt because the receipt doesn't mention it's a receipt in its contents.

This PR adds some weights to the relevancy based on filename

Copy link
Collaborator

@amaloney amaloney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@philippjfr
Copy link
Member

Would like some discussion around the decision to insert the filename as a separate kind of entry vs. simply adding it as a metadata field (which seems conceptually cleaner to me).

@philippjfr
Copy link
Member

philippjfr commented Jan 2, 2025

I guess the obvious reason is that the metadata is currently only used for filtering and the embeddings are only computed for the contents? In which case I guess my question is whether we should automatically include the embeddings in the metadata somehow.

@ahuang11
Copy link
Contributor Author

ahuang11 commented Jan 2, 2025

If it's added only to the metadata, I believe embeddings can't query it, they can only perform an exact match case filter vs entry, it can do fuzzy search.

@ahuang11
Copy link
Contributor Author

ahuang11 commented Jan 2, 2025

In which case I guess my question is whether we should automatically include the embeddings in the metadata somehow.

I suppose we could have another database column containing a joined text of everything, and have a toggle to query_metadata = True/False

text | metadata | both

Copy link

codecov bot commented Jan 2, 2025

Codecov Report

Attention: Patch coverage is 0% with 22 lines in your changes missing coverage. Please review.

Project coverage is 51.59%. Comparing base (3b580de) to head (e08afca).
Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
lumen/ai/tools.py 0.00% 21 Missing ⚠️
lumen/ai/controls.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #883      +/-   ##
==========================================
- Coverage   51.66%   51.59%   -0.07%     
==========================================
  Files         108      108              
  Lines       13697    13714      +17     
==========================================
  Hits         7076     7076              
- Misses       6621     6638      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ahuang11 ahuang11 mentioned this pull request Jan 2, 2025
@ahuang11
Copy link
Contributor Author

ahuang11 commented Jan 2, 2025

Superseded by #911

@ahuang11 ahuang11 closed this Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RAG should consider filename
3 participants