Tokenizing strings on digit/word boundaries #789

polyfloyd · 2024-11-18T15:05:44Z

polyfloyd
Nov 18, 2024

Hi!

I am working on integrating Meilisearch in our product and have come to the conclusion that our users perform a lot of queries for numeric terms that are not surrounded by separator tokens, but whole words.

Example:

186941 should find 123XYZ186941
2110063 should find p2110063

From my understanding of Meilisearch internals, these queries do not return these results because the search term does not occur at the start of the tokens to be matched.

The solution I would propose is to have digit/word/? boundaries be counted as token separators. So e.g. 123XYZ186941 would be split into 123, XYZ, 186941. The last token in this series would match the search query.

We are currently working around this limitation by inserting known separators in strings before sending them off to Meili for indexing, but this has as disadvantage that the returned highlighting information no longer matches the original text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meilisearch

Tokenizing strings on digit/word boundaries #789

{{title}}

Replies: 0 comments

Select a reply

Meilisearch

Tokenizing strings on digit/word boundaries #789

polyfloyd Nov 18, 2024

Replies: 0 comments

polyfloyd
Nov 18, 2024