Exploring Errors in POS-tagging by Quantifying Model Uncertainty

University of Amsterdam Deep Learning for Natural Language Processing Fall 2020 Mini Project

Abstract

Part-of-speech (POS) tagging is an import pre-processing step in Natural Language Processing. State-of-the-art neural approaches typically produce rich, context-sensitive word encodings with recurrent networks. A recently proposed and highly successful meta recurrent architecture integrates sentence-level context from both character and word-based representations. In this work, we exploit Bayesian model averaging to analyze the uncertainty of the different components of a recurrent meta-architecture in the context of POS tagging. We find that the meta component mediates the signals from the word and character-based components. Most importantly, we show that the meta model is highly uncertain when its input signals disagree.

Authors

Leila F.C. Talha
Michael J. Neely
Stefan F. Schouten

Setup

Prepare a Python virtual environment and install the necessary packages.

python3 -m venv v-dl4nlp-pos-tagging
source v-dl4nlp-pos-tagging/bin/activate
pip install torch
pip install -r requirements.txt
python -m spacy download en

Datasets

CoNLL-2000

Download the train and tests sets to the datasets/conll200 directory and run the scripts/split_conll2000_train.py script. Provide the percentage of the train set to use as the validation set with a positional argument. Default: 0.1

Running the Experiment

Train the Meta-BiLSTM morphosyntactic tagger, calculate its uncertainty on the test set, and generate some interesting figures by running:

allennlp uncertainty-experiment experiments/conll2000_meta_tagger_separate_mcdrop.jsonnet

By default, generated artifacts are saved in the outputs/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.vscode		.vscode
dl4nlp_pos_tagging		dl4nlp_pos_tagging
experiments		experiments
scripts		scripts
.allennlp_plugins		.allennlp_plugins
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
paper.pdf		paper.pdf
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test_input		test_input
test_output		test_output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Errors in POS-tagging by Quantifying Model Uncertainty

Abstract

Authors

Setup

Datasets

Running the Experiment

About

Releases

Packages

Contributors 3

Languages

License

michaeljneely/model-uncertainty-pos-tagging

Folders and files

Latest commit

History

Repository files navigation

Exploring Errors in POS-tagging by Quantifying Model Uncertainty

Abstract

Authors

Setup

Datasets

Running the Experiment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages