This is the official repository for the paper "How Do Your Biomedical Named Entity Models Generalize to Novel Entities?." The paper has been accepted by IEEE Access, and is now available in the Early Access (link).
Recent NER models seem to be reliable due to their strong F1 performance on benchmarks. Nevertheless, we can easily observe that they often miss unseen or newly discovered entities such as COVID-19 in real-world problems. In fact, we found their generalization abilities are limited through a systematic analysis.
To complement traditional evaluation methods (i.e., measuring overall performance), we propose to evaluate models in terms of three recognition abilities that reliable BioNER models should possess. We briefly describe them below. Please see our paper for formal definitions with detailed descriptions.
- Memorization (Mem): The most basic ability is to identify entity mentions that were seen during training.
- Synonym generalization (Syn): A BioNER model should be robust to synonyms. While their surface forms are unseen, CUIs (concept unique identifiers) were seen during training.
- Concept generalization (Con): While synonym generalization deals with recognizing new surface forms of existing entities, concept generalization refers to the generalization to novel entities or concepts that did not exist before. Their surface forms and CUIs are both unseen.
This repository has been tested in a Python 3 environment.
To measure the generalizability of a model, please install all the requirements by running pip install -r requirements.txt
.
Also, you need (1) mention dictionary, (2) CUI dictionary, (3) test set annotated with BIO labels, (4) test set annotated with CUIs, and (5) model predictions.
We provide dictionaries and preprocessed datasets in the resources
directory, but you can replace them with your datasets or model outputs with the same format.
We follow BioBERT's tokenization scheme, which separates tokens based on whitespace and special characters. A single token and label make up a single line. An example for (3) and (5) is given below.
# BIO annotations
The O
product O
of O
the O
ataxia B
- I
telangiectasia I
gene O
( O
ATM O
) O
was O
...
In addition, we need another form of test data that is annotated with CUIs. As shown in the example below, a CUI is labeled at the first token of an entity.
# CUI annotations
The -
product -
of -
the -
ataxia D001260
- -
telangiectasia -
gene -
( -
ATM -
) -
was -
...
When all files are ready, run the evaluation script below.
export DATA=NCBI-disease
python evaluate.py \
--mention_dictionary resources/$DATA/mention_dictionary.txt \
--cui_dictionary resources/$DATA/cui_dictionary.txt \
--gold_labels resources/$DATA/test.txt \
--gold_cuis resources/$DATA/test_cuis.txt \
--predictions resources/$DATA/biobert/predictions.txt
In addition to the evaluation script, we provide the code for training NER models, including our debiasing method.
See the training
directory.
If you are using scientific artifacts (such as code, datasets, or models) related to our work, please cite our paper. Thank you!
@article{kim2022your,
title={How Do Your Biomedical Named Entity Recognition Models Generalize to Novel Entities?},
author={Kim, Hyunjae and Kang, Jaewoo},
journal={IEEE Access},
year={2022},
publisher={IEEE}
}
Feel free to email Hyunjae Kim (hyunjae-kim@korea.ac.kr)
if you have any questions.