A Survey of NLP Annotation Platforms

This README is a summary of responses to questionnaire on annotation platforms (from https://forms.gle/iZk8kehkjAWmB8xe9). The questionnaire is a short survey on users' usage and wants of annotation for text and/or image processing for Natural Language Processing (NLP) tasks.

This summary is based on the results collated on 30 June 2020. We may update the results if there are significant no. of new responses after the stipulate dated in the previous sentence.

The raw data for the survey results can be found on:

Overview

Population
- Population Breakdown
Annotations Requirements
- Why do you need annotations for your task/data?
- What NLP tasks do you need annotation for?
- Would the annotations you need require domain expertise or can it be handled by crowdsource workers with some minimum requirements?
- Do you have a pool of trusted/expert annotators that can work on your annotation task(s)?
Annotation Platforms
- Have you used open source / commercial annotation platforms before?
- Which of these open source annotation platforms/tools have you used before?
- Which of these commercial annotation platforms/tools have you heard before?
- Which of these commercial annotation platforms/tools have you used before?
- Most Useful Features
- Suggestions to annotation tool creators
- Any other feedback on annotation tools?
Your Dream Annotation Platform
- Notable Mentions about "Your Dream Anntoation Platform"
Acknowledgements

Population

There are 78 responses to the questionnaire.
94.9% need annotations for their work
80.8% have used an annotation platform before

Population Breakdown

47.4% from academia
34.6% from industry
6.4% are students
3.8% freelance/independent researchers
1.3% government
the rest comes from a mix of either of the above categories

Annotations Requirements

Why do you need annotations for your task/data?

65.3% (of 78 respondents) can't find existing open datasets that fit their needs
53.8% (of 78 respondents) stated that there is no data in the domain they're interested in
47.4% (of 78 respondents) wants to explore new facet of data/task that requires new annotations
Others reasons includes:
- Data for low-resource languages doesn't exist (e.g. marginalised and indigenous langauges)
- Security/privacy reason
- Developing new annotation methods
- Available data size is insufficient for specific phenomena / task
- Data for biomedical data is inadequate/insufficient
- Academic datasets and pre-trained models are unusable on real data
- "We build datasets in my lab"

What NLP tasks do you need annotation for?

75.6% (of 78 respondents) need Classification annotations
60.0% (of 78 respondents) need Span annotations
38.5% (of 78 respondents) need Entity Linking annotations
Others annotation includes:
- Video captioning (3 out of 78 respondents)
- Bounding boxes (2 out of 78 respondents)
- Word level annotation (includes correction of word tokens)
- Sentiment / Emotions annotation
- Stance annotation
- NER annotation
- Writing paraphrases / Dialog systems
- MCQ answering
- Item/Product relevancy scoring
- Semantic Role Labeling
- Open Information Extraction

Would the annotations you need require domain expertise or can it be handled by crowdsource workers with some minimum requirements?

Do you have a pool of trusted/expert annotators that can work on your annotation task(s)?

Others comments includes:
- Had some trusted annotators for last project. Would have to train a new team the next time.
- Often, it is the researchers who want to use the annotations for a specific task annotating the data.
- Kind of - I train university students to annotate as needed for specific projects.
- Somtimes.
- I don't really know.
- Can't say we have trusted or expert annotators but we have annotators.

Annotation Platforms

Have you used open source / commercial annotation platforms before?

79.7% (63 of 78 respondents) have used open source annotation platforms before
58.2% (46 of 78 respondents) have not used commercial annotation platforms before

Which of these open source annotation platforms/tools have you used before?

50.0% (out of 78 respondents) used Brat before
28.2% (out of 78 respondents) used WebAnno before
17.9% (out of 78 respondents) have not used any open source annotation tools before
12 other tools (not listed in the questionnaire) is used by 2 respondents
40 other tools (not listed in the questionnaire) is used by 1 respondent

Which of these commercial annotation platforms/tools have you heard before?

51.3% (out of 78 respondents) have heard of Amazon Mechanical Turk (MTurk)
41.0% (out of 78 respondents) have heard of Appen Data Annotation (formerly known as Figure8, former-formerly known Crowdflower)
33.3% (out of 78 respondents) have heard of Prodigy
19.2% (out of 78 respondents) have not heard of any of the commercial tools before

(Note: Fiverr and Upwork are generally crowd-source sites that provides huamn annotators but may/may not provide an annotation platform)

Which of these commercial annotation platforms/tools have you used before?

Note: The response for this question is really low, so it might not be representative of all annotation platform users. But this also highlights the stark adoption rate between open source vs commercial annotation tools.

Most Useful Features

The table below presents the list of features that respondents find useful in the open source or commercial annotation tools they have previously interacted with.

Feature	Open Source	Commercial
Active Learning	✓	✓
Annotation progress monitor	✓
Annotation shortcuts	✓
Annotation visualization	✓
Automatically suggesting annotations	✓
Audio annotation support	✓
Available Online (No installation)	✓
Connecting to external resources (e.g. storage / knowledge base / dictionary)	✓
Customizable annotation tasks/labels (with extra code/schema)	✓	✓
Customizable annotation view (when annotating)	✓
Drag and drop interactions	✓
Easy setup/installation (e.g Docker)	✓
Export to multiple formats	✓	✓
Good UI/UX	✓
Interoperability (e.g. load/combine annotations from other tools, integration with other tools)	✓	✓
Multi-annotator agreement mechanisms/metrics / Automatic evaluation of annotations	✓	✓
Post-Annotation curation	✓
Project Management with collaboration features	✓	✓
Python-based	✓
Simple Login/Sign-on	✓
Supports multiple tasks	✓
Communication tools (e.g. annotators interaction with project managers)		✓
Documentation (e.g. forums, example setup)		✓
Built-in quality control (e.g. screening tests, data cleanup/filer)		✓
Demo/test small projects with least setup effort		✓
Access to large/diverse/global pool of annotators		✓

Suggestions to annotation tool creators

The keywords flexibility and ease/easy/simple appears in many of the comments to list the top features for annotation platform. We suggest the following for annotation tools creators to accomodate these feedbacks:

Customizable tasks / labels / schema setup and output formats for annotation project managers
User-friendly and intuitive UI/UX
Customizable shortcuts and/or mobile-friendly features to help annotators

Another recurring theme in the top feature list includes:

Respondents highlighting ability to annotate (i) overlapping spans, (ii) discontinuous spans, (iii) corrections to initial tokenizations (iv) entity relation annotations
In cases where free text annotation is allowed, some respondents highlighted the need for some sort of constraints/limitations of free text

Any other feedback on annotation tools?

Here are a couple of aggregated free-form feedbacks from our respondents:

Feedback for Open Source annotation tools:

Tokenizations can/should be explicit in the data input format
Pre-definition of tagset is important
Automatic publishing of annotations as open source
Having web version accessible to everyone is preferred
Tools for documentation updates during the annotation project progress
Having an active community to maintain the open source annotation tool
Automatic population of annotations in the tools would be great
Annotating across texts segments would be nice

Feedback for annotation tools in general:

Make the platforms as open source
Support changes in the annotation schema and reclassification of the annotation that are already done.
There's no universal tool and there will never be. So we need to find ways to combine the output of different tools.
1-2 years ago, annotation platforms were all simplistic and difficult to use for non-trivial tasks outside of their tutorials
Most platforms work well for an individual task, but there's an enormous amount of effort needed to add new features or repurpose an existing tool to a more complicated NLP task.

Your Dream Annotation Platform

Summarizing the respondents' dream platforms:

All in one
Quick, easy to use, simple, plug & play
Flexibity (ability to customize annotation project, labels, tasks, schema, in-/output formats)
Interoperability (ability to integrate with other annotation tools and their in-/output formats)
Active and large pool of annotators, annotation project creators and annotation tool management community

Notable Mentions about "Your Dream Anntoation Platform"

"Takes over all the server hosting etc. and leaves the researcher with the task of designing experiments."
"Fair treatment of both the annotator and the task creator."

Acknowledgement

We thank all participants/respondents of the questionnaire and the precious insights/feedbacks given!!!

We also have a thankful mention to Mariana Neves for referring us to the following survey

An extensive review of tools for manual annotation of documents (published on Dec 2019) List of tool reviewed

Cite

If reference to this report is necessary,

to cite (in-text):

Liling Tan. 2020. A Survey of NLP Annotation Platforms. Retrieved from https://github.com/alvations/annotate-questionnaire

in bibtex:

@misc{survey-annotation-platform,
author =   {Liling Tan},
title =    {A Survey of NLP Annotation Platforms},
howpublished = {https://github.com/alvations/annotate-questionnaire},
year = {2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
NLP-Annotation-Platforms.csv		NLP-Annotation-Platforms.csv
NLP-Annotation-Platforms.tsv		NLP-Annotation-Platforms.tsv
NLP-Annotation-Platforms.xlsx		NLP-Annotation-Platforms.xlsx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Survey of NLP Annotation Platforms

Overview

Population

Population Breakdown

Annotations Requirements

Why do you need annotations for your task/data?

What NLP tasks do you need annotation for?

Would the annotations you need require domain expertise or can it be handled by crowdsource workers with some minimum requirements?

Do you have a pool of trusted/expert annotators that can work on your annotation task(s)?

Annotation Platforms

Have you used open source / commercial annotation platforms before?

Which of these open source annotation platforms/tools have you used before?

Which of these commercial annotation platforms/tools have you heard before?

Which of these commercial annotation platforms/tools have you used before?

Most Useful Features

Suggestions to annotation tool creators

Any other feedback on annotation tools?

Your Dream Annotation Platform

Notable Mentions about "Your Dream Anntoation Platform"

Acknowledgement

Cite

About

Releases

Packages

alvations/annotate-questionnaire

Folders and files

Latest commit

History

Repository files navigation

A Survey of NLP Annotation Platforms

Overview

Population

Population Breakdown

Annotations Requirements

Why do you need annotations for your task/data?

What NLP tasks do you need annotation for?

Would the annotations you need require domain expertise or can it be handled by crowdsource workers with some minimum requirements?

Do you have a pool of trusted/expert annotators that can work on your annotation task(s)?

Annotation Platforms

Have you used open source / commercial annotation platforms before?

Which of these open source annotation platforms/tools have you used before?

Which of these commercial annotation platforms/tools have you heard before?

Which of these commercial annotation platforms/tools have you used before?

Most Useful Features

Suggestions to annotation tool creators

Any other feedback on annotation tools?

Your Dream Annotation Platform

Notable Mentions about "Your Dream Anntoation Platform"

Acknowledgement

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages