Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't find T790M mutation in civicmine #6

Open
hongiiv opened this issue Feb 17, 2023 · 2 comments
Open

Can't find T790M mutation in civicmine #6

hongiiv opened this issue Feb 17, 2023 · 2 comments
Labels

Comments

@hongiiv
Copy link

hongiiv commented Feb 17, 2023

Hi jakelever,

Thanks for this wonderful project.

When i used the civicmine (http://bionlp.bcgsc.ca/civicmine) i can't find "T790M" in any sentence. It was odd for me because EGFR T790M is very famous biomarker in treatment cancer.

This is a tokenizer problem that Spacy language model (en_core_web_sm) tokenizes the "T790M" as a "T790" and "M". (('T790', 'NOUN'), ('M', 'PROPN'))

I changed the kindred package like this (kindred/Parser.py)

if not model in Parser._models:
      Parser._models[model] = spacy.load(model, disable=['ner'])

      self.nlp = Parser._models[model]
      special_case = [{ORTH: "T790M"}]
      self.nlp.tokenizer.add_special_case("T790M", special_case)

Now "T790M" is ('T790M', 'VERB') fixed.

best,
jakelever

@jakelever
Copy link
Owner

Hi @hongiiv , thanks for looking into this. I'll have a little dig myself and see what other issues there may be.

@stale
Copy link

stale bot commented Jun 8, 2023

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the wontfix label Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants