Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in length of stem #1

Open
Adelija opened this issue Nov 14, 2017 · 2 comments
Open

Bug in length of stem #1

Adelija opened this issue Nov 14, 2017 · 2 comments

Comments

@Adelija
Copy link
Contributor

Adelija commented Nov 14, 2017

Hello Nikola,
I am using your stemmer and it's quite good. You wrote in your paper related to Stemmer that minimal length of stem should be more than 2.
But in both method stem_arr(str) and stem_str(str)
if(word.endswith(key) and len(word)>2):
return stem with length 2. for example, words plaše, plovan, pleva return the same stem "pl". Maybe you should change that line of code with
if(word.endswith(key) and len(word)-len(key)>2):
Am I right?

@nikolamilosevic86
Copy link
Owner

Hello Adelija,
I suppose that in one degree it is right. Although, the code in comment states that the statement prevents stemming of words that do have total length or 2. For example like "na", "da", "ja"... These words should not be stemmed as it does not make sense. Stem could be length 2 or more.
The correct code should probably be
if(word.endswith(key) and len(word)>2 and len(word)-len(key)>2):

However that can be simplified to what you said. I see you already forked the project. Do you want to do pull request and I will accept it?

@Adelija
Copy link
Contributor Author

Adelija commented Nov 14, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants