Relevance score dynamic threshold #788
qixdev
started this conversation in
Feedback & Feature Proposal
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I like meilisearch, but unfortunately I see that it is the most beneficial when there is a lot of data to search in.
In small setups with only few thousands of unique documents it tends to add-up irrelevant search results to the end.
In one of my projects, I made a wrapper, which checks if the first hit in the results hit more than 0.99(which means extremely relevant) and I filter the original results, making sure that all scores are equal to the 0.99 in the first result. This way I filter out unnecessary noise from the search results.
But as you probably thinking about it, it's not quite efficient. Yeah, it cuts the noise on very relevant results, but on really low scores (0.66, 0.15) it doesn't get through filter and all results are returned.
There is some pattern behind that I have noticed. You can see that the difference between first result and other results are determining the relevancy. This might be an heuristic, but I see it works.
Imagine results score set:
0.66, 0.66, 0.36, 0.36, 0.15, 0.10
Here, the first two results probably will be the best and there is no need to return other results.
Same could be applied here:
0.66, 0.33, 0.10
The difference between first result and second is very large, and only first result is going to be relevant.
0.33, 0.33, 0.31, 0.31, 0.27
Here, all 5 results are relevant, due the small rolling difference between results.
My suggestion would be adding this feature of dynamic thresholding, as usual threshold wouldn't capture some of the results and wouldn't cut the noise. Also you could add some customization, similar to
mean
andsigma
values indistribution
settings for embeddings in meilisearch vector searchThank you!
Beta Was this translation helpful? Give feedback.
All reactions