You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I realize this project is dead, but leaving this here in case it helps someone else.
When using getAllNearestNeighbors (with different RDDs for items and candidates), I noticed that the distance column is often incorrect. This is because updateHashBuckets is called with the same itemVectors for both the items and the candidates, where it maintains a mapping from itemId to item vector. If there are overlapping IDs between the items and candidate RDDs, then you end up with the distance between an item vector and another item vector (rather than with a candidate vector) who happened to be in the same hash bucket and shares an ID with the candidate vector it was supposed to match up with.
The text was updated successfully, but these errors were encountered:
I realize this project is dead, but leaving this here in case it helps someone else.
When using
getAllNearestNeighbors
(with different RDDs for items and candidates), I noticed that the distance column is often incorrect. This is becauseupdateHashBuckets
is called with the sameitemVectors
for both the items and the candidates, where it maintains a mapping fromitemId
to item vector. If there are overlapping IDs between the items and candidate RDDs, then you end up with the distance between an item vector and another item vector (rather than with a candidate vector) who happened to be in the same hash bucket and shares an ID with the candidate vector it was supposed to match up with.The text was updated successfully, but these errors were encountered: