Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(grouping): migration for hnsw index #870

Merged
merged 8 commits into from
Jul 17, 2024

Conversation

trillville
Copy link
Contributor

based on analysis here

Copy link
Member

@JoshFerge JoshFerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should turn ingestion off before merging this, just to be safe and observe what the database does

@trillville
Copy link
Contributor Author

we should turn ingestion off before merging this, just to be safe and observe what the database does

i've updated my migration so it applies concurrently, we should be good to apply without pausing ingestion

@trillville
Copy link
Contributor Author

I've updated this migration to do the following:

  1. switch ef_construction to 200
  2. apply hash partitioning on project_id with 100 different partitions

I've also made this apply in a single txn rather than concurrently, with the intention being that we run this with ingestion paused.

Copy link
Member

@mrduncan mrduncan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of comments but neither are necessarily blocking, especially since this table won't be used until a followup migration swaps it out

message VARCHAR NOT NULL,
error_type VARCHAR,
stacktrace_embedding VECTOR(768) NOT NULL,
PRIMARY KEY (id, project_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just noticed this - should this be just id? If feels surprising that id wouldn't be unique

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is necessary for hash partitioning AFAIK (cant hash partition on something that isnt PK)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that makes sense.

If we have any places where we're querying with only an id we should see what the query planner does - theoretically the id, project_id index should be able to be applied to just an id query but I'm not super familiar with how partitioning affects that.

src/migrations/versions/d87a6410efe4_migration.py Outdated Show resolved Hide resolved
Co-authored-by: Matt Duncan <14761+mrduncan@users.noreply.github.com>
@trillville trillville merged commit ce03fd0 into main Jul 17, 2024
10 checks passed
@trillville trillville deleted the trillville/grouping-pg-migration branch July 17, 2024 21:51
trillville added a commit that referenced this pull request Jul 17, 2024
DO NOT MERGE/APPLY UNTIL WE HAVE VERIFIED
#870
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants