Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not seem to support singular and plural #72

Open
dcai-icfi opened this issue Jan 10, 2017 · 3 comments
Open

Does not seem to support singular and plural #72

dcai-icfi opened this issue Jan 10, 2017 · 3 comments

Comments

@dcai-icfi
Copy link

For example, I have
"Storm,Hurricane,Tropical Storm" in the file,
Searching 'storms' will not get the same set of results. Do I have to add 'storms' to the list?

Thanks,
Dong

@Mykezero
Copy link
Contributor

@dcai-icfi I'm not sure what you have tried, but would using a stemmer maybe help with this problem?

I would give the SnowballPorterFilterFactory filter a try. You can set this up in the schema / managed_schema file. You'd probably need to do the stemming at both index and query time, so something like this might work:

Example text_general field

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory"/>
      </analyzer>
    </fieldType>

From my understanding, when storms is stored in a document, then the stemmer will store "storms" as "storm" at index time. And since we include the stemmer at query time as well, the user's query for "storms" will be converted to "storm," and the synonyms should match up.

The list of filters can be found here if you would like to try other combinations of them:
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-SnowballPorterStemmerFilter

You can view the results of these changes in the Analysis menu:

image

The left column shows how the value would be stored at index time, and the right column shows how the user's query would be transformed to at query time.

Great screen to play around with filters and to see kind of what they are doing.

Sorry if this is something you already knew, and hopefully it helps out a little bit ^^;

@dcai-icfi dcai-icfi changed the title Does not seems to support singular and plural Does not seem to support singular and plural Jan 10, 2017
@dcai-icfi
Copy link
Author

Thanks @Mykezero !
I use <filter class="solr.PorterStemFilterFactory"/> for both 'index' and 'query'. Therefore, 'storms' is stored as 'storm' See attachment below.
I believe this plugin is for query side. My understanding is that 'storms' is not on the list, so no synonyms applied. I actually made it work by appending 'storms' to the list, but think the plugin should handle this.
solr admin

@Mykezero
Copy link
Contributor

@dcai-icfi Yeah, I couldn't find any default way to do that with filters. I usually do add the plurals (table tent, table tents) but it would be cool if that was handled automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants