-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context words are used outside the suffix/prefix window #1444
Comments
This looks like a bug. To reproduce: res = analyzer.analyze(text = text, language='en', return_decision_process=True)
for ress in res:
print()
print(f"text: {text[ress.start:ress.end]},"
f"\nentity: {ress.entity_type}, "
f"\nscore before: {ress.analysis_explanation.original_score}"
f"\nscore context improvement: {ress.analysis_explanation.score_context_improvement}"
f"\nsupporting context word: {ress.analysis_explanation.supportive_context_word}")
|
Looks like this might be due to the models Part-of-Speech tagging rather than a Presidio bug. The above example uses the default NLP Spacy model
This can be seen with the following code:
Interestingly if the As the |
@hhobson thanks for this analysis! I found it surprising that |
I agree, lowercasing the text doesn't feel the right thing to do. Especially as in this case the different sized spaCy models behaved differently, so things might change in future versions. I think the best approach is to recommend using singular form context words, like |
Would that solve the problem if the sentence has upper case plurals to begin with? We would end up comparing |
I'm new to Presidio (started working with the code yesterday), but I can't figure out why I'm getting the results I am. Code is below. It doesn't seem to be recognizing "cents" in the context. However, if I turn it to 'cent' everything works fine. But that brings up another question, if it's basing the suffix count on "dollars", why is 'Six' (in Sixty) tagged? I assume I'm misunderstanding something. Any help would be appreciated.
Output:
[type: CURRENCY, start: 41, end: 45, score: 1, type: CURRENCY, start: 61, end: 65, score: 1, type: CURRENCY, start: 78, end: 81, score: 1, type: CURRENCY, start: 84, end: 89, score: 0.01]
Originally posted by @mmoody-vv in #1443
The text was updated successfully, but these errors were encountered: