Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed BUG: Handling insertion with zero vectors #14

Conversation

swetavooda
Copy link
Contributor

What is the Issue?

Pinecone insists that dense vectors cannot be zero in all dimensions. When building/inserting a zero vector pinecone will throw an error.

At Build Time (indexing)

Current Working
During build (when there are zero vectors in the table): ERROR is thrown and values are not pushed into pinecone.

Possible ways to handle:

  1. Remove the zero vectors and push the remaining vectors into pinecone(remove zero vectors) - but this can be a bad user experience as we are deleting vectors without the permission of the user.
  2. Chosen: Do not let the user index(build) until there are no zero vectors.

During Insertion (Buffering)

Current Working
The user can insert zero vectors into the indexed table, but only when the buffer is flushed the error occurs and the the remaining vectors are also paused from flushing into pinecone.

Problem with this:

  • If the user sees and error message he would need to remove the zero vector manually, which can be stopped at the insertion stage itself.
  • The zero vector is added to the Buffer and the value of the id doesn't change. on addition of every other vector the same error will be thrown(bad user experience) and the buffer wont be flushed since the check point bklnumber doesn't change.

Solution

  1. Remove the tuple from buffer and update the bklnumber counter for flushing.
  2. Better solution: Do not allow users insert zero vector (thereby not adding the tuple to the buffer to avoid complexities).

@Chitti-Ankith
Copy link
Collaborator

Can you add a test-case for this as well? The logic LGTM

@swetavooda swetavooda force-pushed the feature/remote_indexes branch from 916efc5 to d537532 Compare April 9, 2024 19:38
@swetavooda swetavooda force-pushed the feature/remote_indexes branch from 1458fd8 to da8c4a2 Compare April 9, 2024 20:03
@swetavooda
Copy link
Contributor Author

Added mock test case for zero vector insertions:

  • Build after insertions
  • Insert after build

Additional comments on the logic

Validating every vector on insert + flushing + build would be redundant and slows down the insertion
[TODO]Alternative approach:

  • Validate only on flushing + build time
  • Skip zero vector: not to block user from the remaining insertion and throw a WARN instead of a blocking ERROR

@Chitti-Ankith Chitti-Ankith merged commit 9daf93a into georgia-tech-db:feature/remote_indexes Apr 9, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants