Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable search across beneficiary, note, reason, account in /credits endpoint #120

Merged
merged 12 commits into from
Sep 11, 2024

Conversation

andersy005
Copy link
Member

@andersy005 andersy005 commented Sep 4, 2024

after discussing with Grayson, we decided that the easiest initial improvement is to implement a basic partial match search. the rationale is that the search functionality doesn't need to be advanced at this point. for instance, if a user searches for apple, it's acceptable for the results to include transactions for snapple. the idea is that offsetsDB acts as a magnifying glass; users who need precise and accurate results can always download the data and analyze it using their preferred tools, such as pandas.

@github-actions github-actions bot added the api label Sep 4, 2024
Copy link
Member

@badgley badgley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments, though we discussed a bunch of this over zoom. looking good!

offsets_db_api/routers/credits.py Outdated Show resolved Hide resolved
@@ -117,10 +119,30 @@ def upgrade() -> None:
op.create_index(
op.f('ix_credit_transaction_date'), 'credit', ['transaction_date'], unique=False
)

# Create GIN indexes for trigram similarity search
op.execute(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be curious to learn about advantages / disadvantages of i) indexing each column separately and ii) indexing all together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the endpoint exposes a search_fields parameter

 search_fields: list[str] = Query(
        default=[
            'retirement_beneficiary',
            'retirement_account',
            'retirement_note',
            'retirement_reason',
        ],
        description='Fields to search in',

with this parameter which allows users to search on a subset of columns, it made sense to create an appropriate index for each column. this potentially improves performance for queries that only search on a subset of columns

offsets_db_api/routers/credits.py Outdated Show resolved Hide resolved
offsets_db_api/routers/credits.py Outdated Show resolved Hide resolved
…e don't need to apply it again in our query
@andersy005 andersy005 marked this pull request as ready for review September 11, 2024 21:05
@andersy005 andersy005 merged commit b6efecb into main Sep 11, 2024
5 checks passed
@andersy005 andersy005 deleted the add-search-across-beneficiary-columns branch September 11, 2024 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants