Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plans on adding Cosine similartity to the list of metrics? #670

Open
greenersharp opened this issue Nov 5, 2024 · 0 comments
Open

Comments

@greenersharp
Copy link

I'm learning and experimenting with using Arraymancer and text embedding.

In python I use SentenceTransformers and Sklearn/KNeighborsClassifier to find closest matches, using the Cosine metric.

It seems like Arraymancer doesn't support Cosine metric. Are there plans on adding it?
I was using kdTree, with euclidean metric and the results were all wrong.

Can Arraymancer help me normalize the text embeddings? this way I can use euclidean metric and get some good results?

here is my code:

import arraymancer

let vectors = read_npy[float64]("title_vectors.txt.npy")

echo vectors.shape
# [1226242, 350]

let kd = kdtree(vectors)
let (dist,ix) =  kd.query(vectors[0,_].reshape(350), k = 3 )  # find closest to first entry

Another thing I am confused about, is why I need to reshape(350)
When I tried: let (dist,ix) = kd.query(vectors[0,_], k = 3 ) it resulted in: Broadcasting error: non-singleton dimensions must be the same in both tensors.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant