Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly fill nlp.prop_ner even with punctuation #112

Open
ec-m opened this issue Jun 27, 2019 · 2 comments
Open

Correctly fill nlp.prop_ner even with punctuation #112

ec-m opened this issue Jun 27, 2019 · 2 comments
Labels
bug 🪲 Something isn't working
Milestone

Comments

@ec-m
Copy link
Collaborator

ec-m commented Jun 27, 2019

If I insert vanilla and chocolate one each then nlp.prop_ner is filled correctly with (('one', 'CARDINAL'),). However, if I instead write vanilla and chocolate, one each(i.e., simply adding punctuation to the sentence) nlp.prop_ner stays empty.

@josephbirkner josephbirkner added this to the Inference milestone Jun 28, 2019
@josephbirkner josephbirkner added the bug 🪲 Something isn't working label Jun 28, 2019
@josephbirkner
Copy link
Collaborator

josephbirkner commented Jun 28, 2019

Thanks for writing this issue - Named Entity Recognition is definitely a big construction zone. It also fails mostly for NAME/LOCATION/ORGANIZATION if the input is not cased correctly. IMO this is also a big blocker for #96 . So we should really fix this asap!

@josephbirkner
Copy link
Collaborator

josephbirkner commented Jun 28, 2019

Fortunately, spacy provides easy extension mechanisms, especially for named entity recognition. If we use the en_medium NLP model, spacy provides word vectors, which we can match (with some tolerance) to named entities. For Cardinals, we can just detect cardinal words - that one should be easy to implement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants