Tokenizing strings on digit/word boundaries #789
polyfloyd
started this conversation in
Feedback & Feature Proposal
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi!
I am working on integrating Meilisearch in our product and have come to the conclusion that our users perform a lot of queries for numeric terms that are not surrounded by separator tokens, but whole words.
Example:
186941
should find123XYZ186941
2110063
should findp2110063
From my understanding of Meilisearch internals, these queries do not return these results because the search term does not occur at the start of the tokens to be matched.
The solution I would propose is to have digit/word/? boundaries be counted as token separators. So e.g.
123XYZ186941
would be split into123
,XYZ
,186941
. The last token in this series would match the search query.We are currently working around this limitation by inserting known separators in strings before sending them off to Meili for indexing, but this has as disadvantage that the returned highlighting information no longer matches the original text.
Beta Was this translation helpful? Give feedback.
All reactions