Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong distance value sometimes when using a separate candidates pool #11

Open
maksle opened this issue Apr 18, 2022 · 0 comments
Open

Comments

@maksle
Copy link

maksle commented Apr 18, 2022

I realize this project is dead, but leaving this here in case it helps someone else.

When using getAllNearestNeighbors (with different RDDs for items and candidates), I noticed that the distance column is often incorrect. This is because updateHashBuckets is called with the same itemVectors for both the items and the candidates, where it maintains a mapping from itemId to item vector. If there are overlapping IDs between the items and candidate RDDs, then you end up with the distance between an item vector and another item vector (rather than with a candidate vector) who happened to be in the same hash bucket and shares an ID with the candidate vector it was supposed to match up with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant