-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search in Umbraco don´t work as expected. #11176
Comments
I had previous a strange issue in Examine where e.g. the Danish character I could reproduce the issue with Vendr demo store. I haven't checked if this has been fixed in a newer version of Umbraco or Examine. |
@bjarnef Examine don´t have the problem as Examine is doing what it´s told. But the problem is how Umbraco is configurating the Index, and Field Index. And i would like either to hear @UmbracoHQ why the have configurate the field to be with the CultureInvariantStandardAnalyzer and not the StandardAnalyzer, or if it simply a bug that is configuratet that way. With Umbraco I can find the problem from version 8.3 - to the newest. |
@lucasmichaelsengorm well regarding the issue I was linking to, it seemed to be how Examine constructed the raw lucene query under the hood (when I restarted app pool the lucene query changed), but it may be a different issue you are seeing. |
@bjarnef you have a much funny question, witch is related to Examine, I´ll say this 2 cases a 2 diffendt things. Here is the problem if I search for that word, like +nodeName:løn* and will expect results where Løn is part of the nodeName, with the externalIndex. But it is not hitting any results. If you use this When you see how Umbraco is storing the field If you change the field to be stored with the So the problem here is how Umbraco is telling the fields to be stored. |
Hi all, Here is the explanation of the issue and reasons behind it. It was unclear until now that Lucene does not translate a term that is a wildcard query using the analyzer. The issue is purely for wildcard searches. The reasons why Lucene does not translate a term for a wildcard query can be found here: Shazwazza/Examine#244 (comment) |
(oops, sent the last message too soon) ... continued: The reason why this analyzer exists and is used is to simplify searching and indexing across languages for all users. So if you have the word "løn" it will be analyzed and indexed with 'ascii folding' and it becomes "lon". So now if you search on either "løn" or "lon" you will get the result. This can be a friendlier approach for editors if your site has a lot of languages and your editors don't know the nuances of any given language and it's accents. The wildcard issue was unapparent until now. I'm unsure the best way to resolve that particular problem currently. That said, Umbraco Examine is customizable. You are more than welcome to change the default analyzer and field types for any of the indexes and perhaps that makes sense for your installation. |
@Shazwazza is there a recommended approach to search using Examine fluent API and wildcard to match many Danish words including æ, ø and å? I have this on a project, but it seems it by default doesn't find results when search term gad æ, ø and å... but when a replace the characters, e.g. æ => ae, ø => o and å => a
|
@Shazwazza But as far as I know, is that lucen is running charset of iso-8859-8 witch contains æ, ø, å out of the box. For the standard solution Umbraco is giving out of the box, it look like it only support english 100%. Witch is kind of sad :( |
Hi all. I will recap the issue - we know the 'key to the evil':
Moving forward: Figure out the best approach now that we know this doesn't work for accented languages + wildcard queries. There's probably a few options but ultimately there will be no 'perfect' solution for everyone's website. In those cases, folks should configure an appropriate analyzer for what works for them. Possible solutions:
|
Hello @Shazwazza For my code I use the ExternalIndex, witch is using the StandardAnalyzer, but then I still need to override, the Field analyzer to be the standardAnalyzer as well. Is there a simple way to override so the Fields are stored with the standardAnalyzer? |
Hiya @lucasmichaelsengorm, Just wanted to let you know that we noticed that this issue got a bit stale and might not be relevant any more. We will close this issue for now but we're happy to open it up again if you think it's still relevant (for example: it's a feature request that's not yet implemented, or it's a bug that's not yet been fixed). To open it this issue up again, you can write For example:
This will reopen the issue in the next few hours. Thanks, from your friendly Umbraco GitHub bot 🤖 🙂 |
Which exact Umbraco version are you using? For example: 8.13.1 - don't just write v8
8.14.0
Bug summary
When you have your own search based on the ExternalIndex, and want to search for terms like "Løns" or "Füre" it results in no hits.
Specifics
I have had a small discussion with Shazwazza on the Examine project, because I thought the issue was there, but I manage to narrow it down to bee in Umbraco - see the thread here
Steps to reproduce
Add some nodes, containing fx. Danish char. like "Lønsikring", "Lønstigning", "Ansøgning"
Create a search
Run the search with term of "Løn"
Expected result / actual result
So I create 3 nodes, with a nodeName "Lønsikring ...." and try to search with the term "Løn", and wildcard it.
The expected result is that the TotalItemCount should bee 3, but the actual result is 0.
What I have discovered, is that the ExterenalIndex has the StandardAnalyzer, as I expected it to be, but if you look down to the FieldAnalyzer on the Field "nodeName", then the Field is stored with the CultureInvariantStandardAnalyzer, as the same for near all the fields. If you change the CultureInvariantStandardAnalyzer to be StandardAnalyzer on the field, you get the Expected.
When you search inside of Umbraco Dashboard, you get the expected result of 3 nodes, but the problem is that you actual get unwanted results, because you get the string "løn" parsed. So the search query like this
+nodeName:lon*
when is should be like+nodeName:Løn*
. So why could it get unwanted results? Well if you have a node called something like "long bording day" and search, you get 4 result be searching for "Løn".But try to look into the conventation i had with Shazwazzen - see the thread here
The text was updated successfully, but these errors were encountered: