From 538601f950ef2a2e425124187d1ebac038f1b686 Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Wed, 3 Mar 2021 20:57:08 +0100
Subject: [PATCH 01/15] GH-2132: update readme

---
 README.md | 80 +++++++++++++++++++++++++++++++------------------------
 1 file changed, 45 insertions(+), 35 deletions(-)
diff --git a/README.md b/README.md
index 40b2aff8de..34531f7b9e 100644
--- a/README.md
+++ b/README.md
@@ -7,62 +7,48 @@
 
 A very simple framework for **state-of-the-art NLP**. Developed by [Humboldt University of Berlin](https://www.informatik.hu-berlin.de/en/forschung-en/gebiete/ml-en/) and friends.
 
-* __IMPORTANT: (30.08.2020)__ *We moved our models to a new server. Please update your Flair to the newest version!*
+* __Now on HuggingFace model hub:__ *From 0.8, Flair models are hosted on the HuggingFace model hub!*
 
 ---
 
 Flair is:
 
 * **A powerful NLP library.** Flair allows you to apply our state-of-the-art natural language processing (NLP)
-models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS),
+models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS), 
+  special support for [biomedical data](/resources/docs/HUNFLAIR.md),
  sense disambiguation and classification, with support for a rapidly growing number of languages.
 
-* **A biomedical NER library.** Flair has special support for [biomedical data](/resources/docs/HUNFLAIR.md) with
-state-of-the-art models for biomedical NER and support for over 32 biomedical datasets.
-
 * **A text embedding library.** Flair has simple interfaces that allow you to use and combine different word and
 document embeddings, including our proposed **[Flair embeddings](https://www.aclweb.org/anthology/C18-1139/)**, BERT embeddings and ELMo embeddings.
 
 * **A PyTorch NLP framework.** Our framework builds directly on [PyTorch](https://pytorch.org/), making it easy to
 train your own models and experiment with new approaches using Flair embeddings and classes.
 
-Now at [version 0.7](https://github.com/flairNLP/flair/releases)!
-
-## Comparison with State-of-the-Art
-
-Flair outperforms the previous best methods on a range of NLP tasks:
-
-| Task | Language | Dataset | Flair | Previous best |
-| -------------------------------  | ---  | ----------- | ---------------- | ------------- |
-| Named Entity Recognition |English | Conll-03    |  **93.18** (F1)  | *92.22 [(Peters et al., 2018)](https://arxiv.org/pdf/1802.05365.pdf)* |
-| Named Entity Recognition |English | Ontonotes   |  **89.3** (F1)  | *86.28 [(Chiu et al., 2016)](https://arxiv.org/pdf/1511.08308.pdf)* |
-| Emerging Entity Detection | English | WNUT-17      |  **49.49** (F1)  | *45.55 [(Aguilar et al., 2018)](http://aclweb.org/anthology/N18-1127.pdf)* |
-| Part-of-Speech tagging |English| WSJ  | **97.85**  | *97.64 [(Choi, 2016)](https://www.aclweb.org/anthology/N16-1031)*|
-| Chunking |English| Conll-2000  |  **96.72** (F1) | *96.36 [(Peters et al., 2017)](https://arxiv.org/pdf/1705.00108.pdf)*
-| Named Entity Recognition | German  | Conll-03    |  **88.27** (F1)  | *78.76 [(Lample et al., 2016)](https://arxiv.org/abs/1603.01360)* |
-| Named Entity Recognition |German  | Germeval    |  **84.65** (F1)  | *79.08 [(Hänig et al, 2014)](http://asv.informatik.uni-leipzig.de/publication/file/300/GermEval2014_ExB.pdf)*|
-| Named Entity Recognition | Dutch  | Conll-02    |  **92.38** (F1)  | *81.74 [(Lample et al., 2016)](https://arxiv.org/abs/1603.01360)* |
-| Named Entity Recognition |Polish  | PolEval-2018    |  **86.6** (F1) <br> [(Borchmann et al., 2018)](https://github.com/applicaai/poleval-2018) | *85.1 [(PolDeepNer)](https://github.com/CLARIN-PL/PolDeepNer/)*|
+Now at [version 0.8](https://github.com/flairNLP/flair/releases)!
 
-Here's how to [reproduce these numbers](/resources/docs/EXPERIMENTS.md) using Flair. You can also find detailed evaluations and discussions in our papers:
+## State-of-the-Art Models
 
-* *[Contextual String Embeddings for Sequence Labeling](https://www.aclweb.org/anthology/C18-1139/).
-Alan Akbik, Duncan Blythe and Roland Vollgraf.
-27th International Conference on Computational Linguistics, **COLING 2018**.*
+Flair ships with state-of-the-art models for a range of NLP tasks. For instance, check out our latest NER models: 
 
-* *[Pooled Contextualized Embeddings for Named Entity Recognition](https://www.aclweb.org/anthology/papers/N/N19/N19-1078/).
-Alan Akbik, Tanja Bergmann and Roland Vollgraf.
-2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, **NAACL 2019**.*
+| Language | Dataset | Flair | Best published | 
+|  ---  | ----------- | ---------------- | ------------- |
+| NER English | Conll-03 (4-class)   |  **94.09** (F1)  | *94.3 [(Yamada et al., 2018)](https://doi.org/10.18653/v1/2020.emnlp-main.523)* | 
+| NER English | Ontonotes (18-class)  |  **90.93** (F1)  | *91,3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | 
+| NER German  | Conll-03 (4-class)   |  **92,31** (F1)  | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | 
+| NER Dutch  | Conll-03  (4-class)  |  **95,25** (F1)  | *93.7 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | 
+| NER Spanish  | Conll-03 (4-class)   |  **90,54** (F1)  | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | 
 
-* *[FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP](https://www.aclweb.org/anthology/papers/N/N19/N19-4010/).
-Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter and Roland Vollgraf.
-2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), **NAACL 2019**.*
+**New:** Most Flair sequence tagging models (named entity recognition, part-of-speech tagging etc.) are now hosted 
+on the 🤗 HuggingFace model hub! You can: 
+- Search for Flair models on the HF hub [like this](https://huggingface.co/models?filter=flair)! 
+- All models have a "model card" with information, and an online demo where you try each model! 
+  For instance, check out the [English 18-class NER demo](https://huggingface.co/flair/ner-english-ontonotes-large)!
 
 ## Quick Start
 
 ### Requirements and Installation
 
-The project is based on PyTorch 1.1+ and Python 3.6+, because method signatures and type hints are beautiful.
+The project is based on PyTorch 1.5+ and Python 3.6+, because method signatures and type hints are beautiful.
 If you do not have Python 3.6, install it first. [Here is how for Ubuntu 16.04](https://vsupalov.com/developing-with-python3-6-on-ubuntu-16-04/).
 Then, in your favorite virtual environment, simply do:
 
@@ -144,7 +130,7 @@ There are also good third-party articles and posts that illustrate how to use Fl
 
 ## Citing Flair
 
-Please cite the following paper when using Flair:
+Please cite [the following paper](https://www.aclweb.org/anthology/C18-1139/) when using Flair embeddings:
 
 ```
 @inproceedings{akbik2018coling,
@@ -156,7 +142,19 @@ Please cite the following paper when using Flair:
 }
 ```
 
-If you use the pooled version of the Flair embeddings (PooledFlairEmbeddings), please cite:
+If you use the Flair framework for your experiments, please cite [this paper](https://www.aclweb.org/anthology/papers/N/N19/N19-4010/):
+
+```
+@inproceedings{akbik2019flair,
+  title={FLAIR: An easy-to-use framework for state-of-the-art NLP},
+  author={Akbik, Alan and Bergmann, Tanja and Blythe, Duncan and Rasul, Kashif and Schweter, Stefan and Vollgraf, Roland},
+  booktitle={{NAACL} 2019, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)},
+  pages={54--59},
+  year={2019}
+}
+```
+
+If you use the pooled version of the Flair embeddings (PooledFlairEmbeddings), please cite [this paper](https://www.aclweb.org/anthology/papers/N/N19/N19-1078/):
 
 ```
 @inproceedings{akbik2019naacl,
@@ -168,6 +166,18 @@ If you use the pooled version of the Flair embeddings (PooledFlairEmbeddings), p
 }
 ```
 
+If you use our new "FLERT" models or approach, please cite [this paper](https://arxiv.org/abs/2011.06993):
+
+```
+@misc{schweter2020flert,
+    title={FLERT: Document-Level Features for Named Entity Recognition},
+    author={Stefan Schweter and Alan Akbik},
+    year={2020},
+    eprint={2011.06993},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+```
+
 ## Contact
 
 Please email your questions or comments to [Alan Akbik](http://alanakbik.github.io/).

From e884e93201295be317025eb214bfcb840b37ce3c Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Wed, 3 Mar 2021 21:04:35 +0100
Subject: [PATCH 02/15] Update README.md

---
 README.md | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/README.md b/README.md
index 34531f7b9e..b88a50db7f 100644
--- a/README.md
+++ b/README.md
@@ -7,8 +7,6 @@
 
 A very simple framework for **state-of-the-art NLP**. Developed by [Humboldt University of Berlin](https://www.informatik.hu-berlin.de/en/forschung-en/gebiete/ml-en/) and friends.
 
-* __Now on HuggingFace model hub:__ *From 0.8, Flair models are hosted on the HuggingFace model hub!*
-
 ---
 
 Flair is:
@@ -30,19 +28,17 @@ Now at [version 0.8](https://github.com/flairNLP/flair/releases)!
 
 Flair ships with state-of-the-art models for a range of NLP tasks. For instance, check out our latest NER models: 
 
-| Language | Dataset | Flair | Best published | 
-|  ---  | ----------- | ---------------- | ------------- |
-| NER English | Conll-03 (4-class)   |  **94.09** (F1)  | *94.3 [(Yamada et al., 2018)](https://doi.org/10.18653/v1/2020.emnlp-main.523)* | 
-| NER English | Ontonotes (18-class)  |  **90.93** (F1)  | *91,3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | 
-| NER German  | Conll-03 (4-class)   |  **92,31** (F1)  | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | 
-| NER Dutch  | Conll-03  (4-class)  |  **95,25** (F1)  | *93.7 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | 
-| NER Spanish  | Conll-03 (4-class)   |  **90,54** (F1)  | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | 
+| Language | Dataset | Flair | Best published | Model card & demo
+|  ---  | ----------- | ---------------- | ------------- | ------------- |
+| NER English | Conll-03 (4-class)   |  **94.09** (F1)  | *94.3 [(Yamada et al., 2018)](https://doi.org/10.18653/v1/2020.emnlp-main.523)* | [Flair English 4-class NER demo](https://huggingface.co/flair/ner-english-large)  |
+| NER English | Ontonotes (18-class)  |  **90.93** (F1)  | *91,3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair English 18-class NER demo](https://huggingface.co/flair/ner-english-ontonotes-large) |
+| NER German  | Conll-03 (4-class)   |  **92,31** (F1)  | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair German 4-class NER demo](https://huggingface.co/flair/ner-german-large)  |
+| NER Dutch  | Conll-03  (4-class)  |  **95,25** (F1)  | *93.7 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair Dutch 4-class NER demo](https://huggingface.co/flair/ner-dutch-large)  |
+| NER Spanish  | Conll-03 (4-class)   |  **90,54** (F1)  | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair Spanish 18-class NER demo](https://huggingface.co/flair/ner-spanish-large)  |
 
 **New:** Most Flair sequence tagging models (named entity recognition, part-of-speech tagging etc.) are now hosted 
-on the 🤗 HuggingFace model hub! You can: 
-- Search for Flair models on the HF hub [like this](https://huggingface.co/models?filter=flair)! 
-- All models have a "model card" with information, and an online demo where you try each model! 
-  For instance, check out the [English 18-class NER demo](https://huggingface.co/flair/ner-english-ontonotes-large)!
+on the [__🤗 HuggingFace model hub__](https://huggingface.co/models?filter=flair)! You can browse models, check detailed information on how they were trained, and even try each model out online!
+
 
 ## Quick Start
 

From 3c27a99d462f3e2b3f00b9da6daf6b310725513d Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Wed, 3 Mar 2021 21:17:16 +0100
Subject: [PATCH 03/15] GH-2132: update readme

---
 resources/docs/TUTORIAL_2_TAGGING.md | 42 +++++++++++-----------------
 1 file changed, 16 insertions(+), 26 deletions(-)

diff --git a/resources/docs/TUTORIAL_2_TAGGING.md b/resources/docs/TUTORIAL_2_TAGGING.md
index 6b8c7986b4..420667253e 100644
--- a/resources/docs/TUTORIAL_2_TAGGING.md
+++ b/resources/docs/TUTORIAL_2_TAGGING.md
@@ -100,32 +100,22 @@ are provided:
 
 #### English Models
 
-| ID | Task | Training Dataset | Accuracy |
-| -------------    | ------------- |------------- |------------- |
-| 'ner' | 4-class Named Entity Recognition |  Conll-03  |  **93.03** (F1) |
-| 'ner-pooled' | 4-class Named Entity Recognition (memory inefficient) |  Conll-03  |  **93.24** (F1) |
-| 'ner-ontonotes' | [18-class](https://spacy.io/api/annotation#named-entities) Named Entity Recognition |  Ontonotes  |  **89.06** (F1) |
-| 'chunk' |  Syntactic Chunking   |  Conll-2000     |  **96.47** (F1) |
-| 'pos' |  Part-of-Speech Tagging (fine-grained) |  Ontonotes     |  **98.19** (Accuracy) |
-| 'upos' |  Part-of-Speech Tagging (universal) |  Ontonotes     |  **98.6** (Accuracy) |
-| 'keyphrase' |  Methods and materials in science papers (BETA) |  Semeval2017   |  **47.3** (F1)  |
-| 'frame'  |   Semantic Frame Detection |  Propbank 3.0     |  **97.54** (F1) |
-| 'negation-speculation'  |  Negations and speculations in biomedical articles  |  Bioscope  |  **80.2** (F1) |
-
-
-#### Fast English Models
-
-In case you do not have a GPU available, we also distribute smaller models that run faster on CPU.
-
-
-| ID | Task | Training Dataset | Accuracy |
-| -------------    | ------------- |------------- |------------- |
-| 'ner-fast' | 4-class Named Entity Recognition |  Conll-03  |  **92.75** (F1) |
-| 'ner-ontonotes-fast' | [18-class](https://spacy.io/api/annotation#named-entities) Named Entity Recognition |  Ontonotes  |  **89.27** (F1) |
-| 'chunk-fast' |  Syntactic Chunking   |  Conll-2000     |  **96.22** (F1) |
-| 'pos-fast' |  Part-of-Speech Tagging (fine-grained) |  Ontonotes     |  **98.1** (Accuracy) |
-| 'upos-fast' |  Part-of-Speech Tagging (universal) |  Ontonotes     |  **98.47** (Accuracy) |
-| 'frame-fast'  |   Semantic Frame Detection | Propbank 3.0     |  **97.31** (F1) |
+| ID | Task | Language | Training Dataset | Accuracy | Contributor / Notes |
+| -------------    | ------------- |------------- |------------- | ------------- | ------------- |
+| 'ner' | NER (4-class) |  English | Conll-03  |  **93.03** (F1) |
+| 'ner-fast' | NER (4-class)  |  English  |  Conll-03  |  **92.75** (F1) | (fast model)
+| 'ner-pooled' | NER (4-class)  |  English |  Conll-03  |  **93.24** (F1) | (memory inefficient)
+| 'ner-ontonotes' | NER (18-class) |  English | Ontonotes  |  **89.06** (F1) |
+| 'ner-ontonotes-fast' | NER (18-class) |  English | Ontonotes  |  **89.27** (F1) | (fast model)
+| 'chunk' |  Chunking   |  English | Conll-2000     |  **96.47** (F1) |
+| 'chunk-fast' |   Chunking   |  English | Conll-2000     |  **96.22** (F1) |(fast model)
+| 'pos' |  POS-tagging |   English |  Ontonotes     |**98.19** (Accuracy) |
+| 'pos-fast' |  POS-tagging |   English |  Ontonotes     |  **98.1** (Accuracy) |(fast model)
+| 'upos' |  POS-tagging (universal) | English | Ontonotes     |  **98.6** (Accuracy) |
+| 'upos-fast' |  POS-tagging (universal) | English | Ontonotes     |  **98.47** (Accuracy) | (fast model)
+| 'frame'  |   Frame Detection |  English | Propbank 3.0     |  **97.54** (F1) |
+| 'frame-fast'  |  Frame Detection |  English | Propbank 3.0     |  **97.31** (F1) | (fast model)
+| 'negation-speculation'  | Negation / speculation |English |  Bioscope | **80.2** (F1) |
 
 
 #### Multilingual Models

From 610b00408506eec5cc701652080be8a214978c5f Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Wed, 3 Mar 2021 21:21:06 +0100
Subject: [PATCH 04/15] GH-2132: update readme

---
 resources/docs/TUTORIAL_2_TAGGING.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/resources/docs/TUTORIAL_2_TAGGING.md b/resources/docs/TUTORIAL_2_TAGGING.md
index 420667253e..1ffd3d7afd 100644
--- a/resources/docs/TUTORIAL_2_TAGGING.md
+++ b/resources/docs/TUTORIAL_2_TAGGING.md
@@ -104,9 +104,11 @@ are provided:
 | -------------    | ------------- |------------- |------------- | ------------- | ------------- |
 | 'ner' | NER (4-class) |  English | Conll-03  |  **93.03** (F1) |
 | 'ner-fast' | NER (4-class)  |  English  |  Conll-03  |  **92.75** (F1) | (fast model)
+| 'ner-large' | NER (4-class)  |  English  |  Conll-03  |  **94.09** (F1) | (large model)
 | 'ner-pooled' | NER (4-class)  |  English |  Conll-03  |  **93.24** (F1) | (memory inefficient)
 | 'ner-ontonotes' | NER (18-class) |  English | Ontonotes  |  **89.06** (F1) |
 | 'ner-ontonotes-fast' | NER (18-class) |  English | Ontonotes  |  **89.27** (F1) | (fast model)
+| 'ner-ontonotes-large' | NER (18-class) |  English | Ontonotes  |  **90.93** (F1) | (large model)
 | 'chunk' |  Chunking   |  English | Conll-2000     |  **96.47** (F1) |
 | 'chunk-fast' |   Chunking   |  English | Conll-2000     |  **96.22** (F1) |(fast model)
 | 'pos' |  POS-tagging |   English |  Ontonotes     |**98.19** (Accuracy) |

From 17934c8628f5bed7de7f776aa85f89646739a586 Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Wed, 3 Mar 2021 21:28:18 +0100
Subject: [PATCH 05/15] GH-2132: update readme

---
 resources/docs/TUTORIAL_2_TAGGING.md | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/resources/docs/TUTORIAL_2_TAGGING.md b/resources/docs/TUTORIAL_2_TAGGING.md
index 1ffd3d7afd..5f0d8fff9d 100644
--- a/resources/docs/TUTORIAL_2_TAGGING.md
+++ b/resources/docs/TUTORIAL_2_TAGGING.md
@@ -126,13 +126,12 @@ We distribute new models that are capable of handling text in multiple languages
 
 The NER models are trained over 4 languages (English, German, Dutch and Spanish) and the PoS models over 12 languages (English, German, French, Italian, Dutch, Polish, Spanish, Swedish, Danish, Norwegian, Finnish and Czech).
 
-| ID | Task | Training Dataset | Accuracy |
-| -------------    | ------------- |------------- |------------- |
-| 'ner-multi' | 4-class Named Entity Recognition |  Conll-03 (4 languages)  |  **89.27**  (average F1) |
-| 'ner-multi-fast' | 4-class Named Entity Recognition |  Conll-03 (4 languages)  |  **87.91**  (average F1) |
-| 'ner-multi-fast-learn' | 4-class Named Entity Recognition |  Conll-03 (4 languages)  |  **88.18**  (average F1) |
-| 'pos-multi' |  Part-of-Speech Tagging   |  Universal Dependency Treebank (12 languages)  |  **96.41** (average acc.) |
-| 'pos-multi-fast' |  Part-of-Speech Tagging |  Universal Dependency Treebank (12 languages)  |  **92.88** (average acc.) |
+| ID | Task | Language | Training Dataset | Accuracy | Contributor / Notes |
+| -------------    | ------------- |------------- |------------- | ------------- | ------------- |
+| 'ner-multi' | NER (4-class) | Multilingual | Conll-03   |  **89.27**  (average F1) | (4 languages)
+| 'ner-multi-fast' | NER (4-class)|  Multilingual |  Conll-03   |  **87.91**  (average F1) | (4 languages)
+| 'pos-multi' |  POS-tagging   |  Multilingual |  UD Treebanks  |  **96.41** (average acc.) |  (12 languages)
+| 'pos-multi-fast' |  POS-tagging |  Multilingual |  UD Treebanks  |  **92.88** (average acc.) | (12 languages) 
 
 You can pass text in any of these languages to the model. In particular, the NER also kind of works for languages it was not trained on, such as French.
 

From 7fd4490d9d51949df853187f4db18fb148bcd86f Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Wed, 3 Mar 2021 21:36:53 +0100
Subject: [PATCH 06/15] GH-2132: update readme

---
 resources/docs/TUTORIAL_2_TAGGING.md | 54 +++++++++++-----------------
 1 file changed, 21 insertions(+), 33 deletions(-)

diff --git a/resources/docs/TUTORIAL_2_TAGGING.md b/resources/docs/TUTORIAL_2_TAGGING.md
index 5f0d8fff9d..fee9031f1e 100644
--- a/resources/docs/TUTORIAL_2_TAGGING.md
+++ b/resources/docs/TUTORIAL_2_TAGGING.md
@@ -135,39 +135,27 @@ The NER models are trained over 4 languages (English, German, Dutch and Spanish)
 
 You can pass text in any of these languages to the model. In particular, the NER also kind of works for languages it was not trained on, such as French.
 
-The 'ner-multi-fast-learn' model is an experimental model that accumulates entity representations over time. 
-
-#### German Models
-
-We also distribute German models.
-
-| ID | Task | Training Dataset | Accuracy | Contributor |
-| -------------    | ------------- |------------- |------------- |------------- |
-| 'de-ner' | 4-class Named Entity Recognition |  Conll-03  |  **87.94** (F1) | |
-| 'de-ner-germeval' | 4+4-class Named Entity Recognition |  Germeval  |  **84.90** (F1) | |
-| 'de-ner-legal' | NER for German legal text |  [LER](https://github.com/elenanereiss/Legal-Entity-Recognition) dataset  |  **96.35** (F1) | |
-| 'de-pos' | Part-of-Speech Tagging |  UD German - HDT  |  **98.50** (Accuracy) | |
-| 'de-pos-tweets' | Part-of-Speech Tagging |  German Tweets  |  **93.06** (Accuracy) | [stefan-it](https://github.com/stefan-it/flair-experiments/tree/master/pos-twitter-german) |
-| 'de-historic-indirect' | historical German speech and thought (indirect) |  @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
-| 'de-historic-direct' | historical German speech and thought (direct) |  @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
-| 'de-historic-reported' | historical German speech and thought (reported) |  @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
-| 'de-historic-free-indirect' | historical German speech and thought (de-historic-free-indirect) |  @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
-
-
-#### Models for other Languages
-
-Thanks to our contributors we are also able to distribute a couple of models for other languages.
-
-| ID | Task | Training Dataset | Accuracy | Contributor |
-| -------------    | ------------- |------------- |------------- |------------- |
-| 'fr-ner' | Named Entity Recognition |  [WikiNER (aij-wikiner-fr-wp3)](https://github.com/dice-group/FOX/tree/master/input/Wikiner)  |  **95.57** (F1) | [mhham](https://github.com/mhham) |
-| 'nl-ner' | Named Entity Recognition |  [CoNLL 2002](https://www.clips.uantwerpen.be/conll2002/ner/)  |  **92.58** (F1) |  |
-| 'nl-ner-rnn' | Named Entity Recognition |  [CoNLL 2002](https://www.clips.uantwerpen.be/conll2002/ner/)  |  **90.79** (F1) | |
-| 'da-ner' | Named Entity Recognition |  [Danish NER dataset](https://github.com/alexandrainst/danlp)  |   | [AmaliePauli](https://github.com/AmaliePauli) |
-| 'da-pos' | Named Entity Recognition |  [Danish Dependency Treebank](https://github.com/UniversalDependencies/UD_Danish-DDT/blob/master/README.md)  |  | [AmaliePauli](https://github.com/AmaliePauli) |
-| 'ml-pos' | Part-of-Speech Tagging (fine-grained) |  30000 Malayalam sentences  | **83** | [sabiqueqb](https://github.com/sabiqueqb) |
-| 'ml-upos' | Part-of-Speech Tagging (universal)| 30000 Malayalam sentences | **87** | [sabiqueqb](https://github.com/sabiqueqb) |
-| 'pt-pos-clinical' | Part-of-Speech Tagging (fine-grained) for clinical texts | [PUCPR](https://github.com/HAILab-PUCPR/portuguese-clinical-pos-tagger) | **92.39** | [LucasFerroHAILab](https://github.com/LucasFerroHAILab) |
+#### Models for Other Languages
+
+| ID | Task | Language | Training Dataset | Accuracy | Contributor / Notes |
+| -------------    | ------------- |------------- |------------- |------------- | ------------ |
+| 'de-ner' | NER (4-class) |  German | Conll-03  |  **87.94** (F1) | |
+| 'de-ner-germeval' | NER (4-class) | German | Germeval  |  **84.90** (F1) | |
+| 'de-ner-legal' | NER (legal text) |  German | [LER](https://github.com/elenanereiss/Legal-Entity-Recognition) dataset  |  **96.35** (F1) | |
+| 'de-pos' | POS-tagging | German | UD German - HDT  |  **98.50** (Accuracy) | |
+| 'de-pos-tweets' | POS-tagging | German | German Tweets  |  **93.06** (Accuracy) | [stefan-it](https://github.com/stefan-it/flair-experiments/tree/master/pos-twitter-german) |
+| 'de-historic-indirect' | historical indirect speech | German | @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
+| 'de-historic-direct' | historical direct speech |  German | @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
+| 'de-historic-reported' | historical reported speech | German |  @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
+| 'de-historic-free-indirect' | historical free-indirect speech | German | @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
+| 'fr-ner' | NER (4-class) | French | [WikiNER (aij-wikiner-fr-wp3)](https://github.com/dice-group/FOX/tree/master/input/Wikiner)  |  **95.57** (F1) | [mhham](https://github.com/mhham) |
+| 'nl-ner' | NER (4-class) | Dutch |  [CoNLL 2002](https://www.clips.uantwerpen.be/conll2002/ner/)  |  **92.58** (F1) |  |
+| 'nl-ner-rnn' | NER (4-class) | Dutch | [CoNLL 2002](https://www.clips.uantwerpen.be/conll2002/ner/)  |  **90.79** (F1) | |
+| 'da-ner' | NER (4-class) | Danish |  [Danish NER dataset](https://github.com/alexandrainst/danlp)  |   | [AmaliePauli](https://github.com/AmaliePauli) |
+| 'da-pos' | POS-tagging | Danish | [Danish Dependency Treebank](https://github.com/UniversalDependencies/UD_Danish-DDT/blob/master/README.md)  |  | [AmaliePauli](https://github.com/AmaliePauli) |
+| 'ml-pos' | POS-tagging | Malayalam | 30000 Malayalam sentences  | **83** | [sabiqueqb](https://github.com/sabiqueqb) |
+| 'ml-upos' | POS-tagging | Malayalam | 30000 Malayalam sentences | **87** | [sabiqueqb](https://github.com/sabiqueqb) |
+| 'pt-pos-clinical' | POS-tagging | Portuguese | [PUCPR](https://github.com/HAILab-PUCPR/portuguese-clinical-pos-tagger) | **92.39** | [LucasFerroHAILab](https://github.com/LucasFerroHAILab) for clinical texts |
 
 
 ### Tagging a German sentence

From 783cc00cdaef501e18e2bd2e6ecaba3d15fb3736 Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Wed, 3 Mar 2021 21:42:37 +0100
Subject: [PATCH 07/15] GH-2132: update readme

---
 resources/docs/TUTORIAL_2_TAGGING.md | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/resources/docs/TUTORIAL_2_TAGGING.md b/resources/docs/TUTORIAL_2_TAGGING.md
index fee9031f1e..4a80a1b049 100644
--- a/resources/docs/TUTORIAL_2_TAGGING.md
+++ b/resources/docs/TUTORIAL_2_TAGGING.md
@@ -102,21 +102,21 @@ are provided:
 
 | ID | Task | Language | Training Dataset | Accuracy | Contributor / Notes |
 | -------------    | ------------- |------------- |------------- | ------------- | ------------- |
-| 'ner' | NER (4-class) |  English | Conll-03  |  **93.03** (F1) |
-| 'ner-fast' | NER (4-class)  |  English  |  Conll-03  |  **92.75** (F1) | (fast model)
-| 'ner-large' | NER (4-class)  |  English  |  Conll-03  |  **94.09** (F1) | (large model)
+| '[ner](https://huggingface.co/flair/ner-english)' | NER (4-class) |  English | Conll-03  |  **93.03** (F1) |
+| '[ner-fast](https://huggingface.co/flair/ner-english-fast)' | NER (4-class)  |  English  |  Conll-03  |  **92.75** (F1) | (fast model)
+| '[ner-large](https://huggingface.co/flair/ner-english-large)' | NER (4-class)  |  English  |  Conll-03  |  **94.09** (F1) | (large model)
 | 'ner-pooled' | NER (4-class)  |  English |  Conll-03  |  **93.24** (F1) | (memory inefficient)
-| 'ner-ontonotes' | NER (18-class) |  English | Ontonotes  |  **89.06** (F1) |
-| 'ner-ontonotes-fast' | NER (18-class) |  English | Ontonotes  |  **89.27** (F1) | (fast model)
-| 'ner-ontonotes-large' | NER (18-class) |  English | Ontonotes  |  **90.93** (F1) | (large model)
-| 'chunk' |  Chunking   |  English | Conll-2000     |  **96.47** (F1) |
-| 'chunk-fast' |   Chunking   |  English | Conll-2000     |  **96.22** (F1) |(fast model)
-| 'pos' |  POS-tagging |   English |  Ontonotes     |**98.19** (Accuracy) |
-| 'pos-fast' |  POS-tagging |   English |  Ontonotes     |  **98.1** (Accuracy) |(fast model)
-| 'upos' |  POS-tagging (universal) | English | Ontonotes     |  **98.6** (Accuracy) |
-| 'upos-fast' |  POS-tagging (universal) | English | Ontonotes     |  **98.47** (Accuracy) | (fast model)
-| 'frame'  |   Frame Detection |  English | Propbank 3.0     |  **97.54** (F1) |
-| 'frame-fast'  |  Frame Detection |  English | Propbank 3.0     |  **97.31** (F1) | (fast model)
+| '[ner-ontonotes](https://huggingface.co/flair/ner-english-ontonotes)' | NER (18-class) |  English | Ontonotes  |  **89.06** (F1) |
+| '[ner-ontonotes-fast](https://huggingface.co/flair/ner-english-ontonotes-fast)' | NER (18-class) |  English | Ontonotes  |  **89.27** (F1) | (fast model)
+| '[ner-ontonotes-large](https://huggingface.co/flair/ner-english-ontonotes-large)' | NER (18-class) |  English | Ontonotes  |  **90.93** (F1) | (large model)
+| '[chunk](https://huggingface.co/flair/chunk-english)' |  Chunking   |  English | Conll-2000     |  **96.47** (F1) |
+| '[chunk-fast](https://huggingface.co/flair/chunk-english-fast)' |   Chunking   |  English | Conll-2000     |  **96.22** (F1) |(fast model)
+| '[pos](https://huggingface.co/flair/pos-english)' |  POS-tagging |   English |  Ontonotes     |**98.19** (Accuracy) |
+| '[pos-fast](https://huggingface.co/flair/pos-english-fast)' |  POS-tagging |   English |  Ontonotes     |  **98.1** (Accuracy) |(fast model)
+| '[upos](https://huggingface.co/flair/upos-english)' |  POS-tagging (universal) | English | Ontonotes     |  **98.6** (Accuracy) |
+| '[upos-fast](https://huggingface.co/flair/upos-english-fast)' |  POS-tagging (universal) | English | Ontonotes     |  **98.47** (Accuracy) | (fast model)
+| '[frame](https://huggingface.co/flair/frame-english)'  |   Frame Detection |  English | Propbank 3.0     |  **97.54** (F1) |
+| '[frame-fast](https://huggingface.co/flair/frame-english-fast)'  |  Frame Detection |  English | Propbank 3.0     |  **97.31** (F1) | (fast model)
 | 'negation-speculation'  | Negation / speculation |English |  Bioscope | **80.2** (F1) |
 
 

From f45d718a2757060f1c468f4298305ebb6724261f Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Wed, 3 Mar 2021 21:49:58 +0100
Subject: [PATCH 08/15] GH-2132: update readme

---
 resources/docs/TUTORIAL_2_TAGGING.md | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/resources/docs/TUTORIAL_2_TAGGING.md b/resources/docs/TUTORIAL_2_TAGGING.md
index 4a80a1b049..87534cd2de 100644
--- a/resources/docs/TUTORIAL_2_TAGGING.md
+++ b/resources/docs/TUTORIAL_2_TAGGING.md
@@ -95,8 +95,11 @@ The sentence now has two types of annotation: POS and NER.
 ### List of Pre-Trained Sequence Tagger Models
 
 You choose which pre-trained model you load by passing the appropriate
-string to the `load()` method of the `SequenceTagger` class. Currently, the following pre-trained models
-are provided:
+string to the `load()` method of the `SequenceTagger` class. 
+
+A full list of our current and community-contributed models can be browsed on the [__model hub__](https://huggingface.co/models?filter=flair). 
+At least the following pre-trained models are provided (click on an ID link to get more info
+for the model and an online demo):
 
 #### English Models
 
@@ -128,10 +131,10 @@ The NER models are trained over 4 languages (English, German, Dutch and Spanish)
 
 | ID | Task | Language | Training Dataset | Accuracy | Contributor / Notes |
 | -------------    | ------------- |------------- |------------- | ------------- | ------------- |
-| 'ner-multi' | NER (4-class) | Multilingual | Conll-03   |  **89.27**  (average F1) | (4 languages)
-| 'ner-multi-fast' | NER (4-class)|  Multilingual |  Conll-03   |  **87.91**  (average F1) | (4 languages)
-| 'pos-multi' |  POS-tagging   |  Multilingual |  UD Treebanks  |  **96.41** (average acc.) |  (12 languages)
-| 'pos-multi-fast' |  POS-tagging |  Multilingual |  UD Treebanks  |  **92.88** (average acc.) | (12 languages) 
+| '[ner-multi](https://huggingface.co/flair/ner-multi)' | NER (4-class) | Multilingual | Conll-03   |  **89.27**  (average F1) | (4 languages)
+| '[ner-multi-fast](https://huggingface.co/flair/ner-multi-fast)' | NER (4-class)|  Multilingual |  Conll-03   |  **87.91**  (average F1) | (4 languages)
+| '[pos-multi](https://huggingface.co/flair/upos-multi)' |  POS-tagging   |  Multilingual |  UD Treebanks  |  **96.41** (average acc.) |  (12 languages)
+| '[pos-multi-fast](https://huggingface.co/flair/upos-multi-fast)' |  POS-tagging |  Multilingual |  UD Treebanks  |  **92.88** (average acc.) | (12 languages) 
 
 You can pass text in any of these languages to the model. In particular, the NER also kind of works for languages it was not trained on, such as French.
 
@@ -139,19 +142,22 @@ You can pass text in any of these languages to the model. In particular, the NER
 
 | ID | Task | Language | Training Dataset | Accuracy | Contributor / Notes |
 | -------------    | ------------- |------------- |------------- |------------- | ------------ |
-| 'de-ner' | NER (4-class) |  German | Conll-03  |  **87.94** (F1) | |
+| '[de-ner](https://huggingface.co/flair/ner-german)' | NER (4-class) |  German | Conll-03  |  **87.94** (F1) | |
+| '[de-ner-large](https://huggingface.co/flair/ner-german-large)' | NER (4-class) |  German | Conll-03  |  **92,31** (F1) | |
 | 'de-ner-germeval' | NER (4-class) | German | Germeval  |  **84.90** (F1) | |
-| 'de-ner-legal' | NER (legal text) |  German | [LER](https://github.com/elenanereiss/Legal-Entity-Recognition) dataset  |  **96.35** (F1) | |
+| '[de-ner-legal](https://huggingface.co/flair/ner-german-legal)' | NER (legal text) |  German | [LER](https://github.com/elenanereiss/Legal-Entity-Recognition) dataset  |  **96.35** (F1) | |
 | 'de-pos' | POS-tagging | German | UD German - HDT  |  **98.50** (Accuracy) | |
 | 'de-pos-tweets' | POS-tagging | German | German Tweets  |  **93.06** (Accuracy) | [stefan-it](https://github.com/stefan-it/flair-experiments/tree/master/pos-twitter-german) |
 | 'de-historic-indirect' | historical indirect speech | German | @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
 | 'de-historic-direct' | historical direct speech |  German | @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
 | 'de-historic-reported' | historical reported speech | German |  @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
 | 'de-historic-free-indirect' | historical free-indirect speech | German | @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
-| 'fr-ner' | NER (4-class) | French | [WikiNER (aij-wikiner-fr-wp3)](https://github.com/dice-group/FOX/tree/master/input/Wikiner)  |  **95.57** (F1) | [mhham](https://github.com/mhham) |
-| 'nl-ner' | NER (4-class) | Dutch |  [CoNLL 2002](https://www.clips.uantwerpen.be/conll2002/ner/)  |  **92.58** (F1) |  |
+| '[fr-ner](https://huggingface.co/flair/ner-french)' | NER (4-class) | French | [WikiNER (aij-wikiner-fr-wp3)](https://github.com/dice-group/FOX/tree/master/input/Wikiner)  |  **95.57** (F1) | [mhham](https://github.com/mhham) |
+| '[es-ner-large](https://huggingface.co/flair/ner-spanish-large)' | NER (4-class) | Spanish | [WikiNER (aij-wikiner-fr-wp3)](https://github.com/dice-group/FOX/tree/master/input/Wikiner)  |  **90,54** (F1) | [mhham](https://github.com/mhham) |
+| '[nl-ner](https://huggingface.co/flair/ner-dutch)' | NER (4-class) | Dutch |  [CoNLL 2002](https://www.clips.uantwerpen.be/conll2002/ner/)  |  **92.58** (F1) |  |
+| '[nl-ner-large](https://huggingface.co/flair/ner-dutch-large)' | NER (4-class) | Dutch | Conll-03 |  **95,25** (F1) |  |
 | 'nl-ner-rnn' | NER (4-class) | Dutch | [CoNLL 2002](https://www.clips.uantwerpen.be/conll2002/ner/)  |  **90.79** (F1) | |
-| 'da-ner' | NER (4-class) | Danish |  [Danish NER dataset](https://github.com/alexandrainst/danlp)  |   | [AmaliePauli](https://github.com/AmaliePauli) |
+| '[da-ner](https://huggingface.co/flair/ner-danish)' | NER (4-class) | Danish |  [Danish NER dataset](https://github.com/alexandrainst/danlp)  |   | [AmaliePauli](https://github.com/AmaliePauli) |
 | 'da-pos' | POS-tagging | Danish | [Danish Dependency Treebank](https://github.com/UniversalDependencies/UD_Danish-DDT/blob/master/README.md)  |  | [AmaliePauli](https://github.com/AmaliePauli) |
 | 'ml-pos' | POS-tagging | Malayalam | 30000 Malayalam sentences  | **83** | [sabiqueqb](https://github.com/sabiqueqb) |
 | 'ml-upos' | POS-tagging | Malayalam | 30000 Malayalam sentences | **87** | [sabiqueqb](https://github.com/sabiqueqb) |

From f1761b3edef7b9c4f42534fc2a0fc474e6de00a4 Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Wed, 3 Mar 2021 21:56:02 +0100
Subject: [PATCH 09/15] GH-2132: update readme

---
 .../docs/embeddings/TRANSFORMER_EMBEDDINGS.md | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/resources/docs/embeddings/TRANSFORMER_EMBEDDINGS.md b/resources/docs/embeddings/TRANSFORMER_EMBEDDINGS.md
index 7aee364de5..19fb32b87d 100644
--- a/resources/docs/embeddings/TRANSFORMER_EMBEDDINGS.md
+++ b/resources/docs/embeddings/TRANSFORMER_EMBEDDINGS.md
@@ -43,12 +43,12 @@ There are several options that you can set when you init the TransformerWordEmbe
 | Argument             | Default             | Description
 | -------------------- | ------------------- | ------------------------------------------------------------------------------
 | `model` | `bert-base-uncased` | The string identifier of the transformer model you want to use (see above)
-| `layers`             | `-1,-2,-3,-4`       | Defines the layers of the Transformer-based model that produce the embedding
-| `pooling_operation`  | `first`             | See [Pooling operation section](#Pooling-operation).
-| `use_scalar_mix`     | `False`             | See [Scalar mix section](#Scalar-mix).
+| `layers`             | `all`       | Defines the layers of the Transformer-based model that produce the embedding
+| `subtoken_pooling`  | `first`             | See [Pooling operation section](#Pooling-operation).
+| `layer_mean`     | `True`             | See [Layer mean section](#Layer-mean).
 | `batch_size`     | 1             | How many sentences to push through transformer simultaneously (high means faster but more memory usage)
 | `fine_tune`     | `False`             | Whether or not embeddings are fine-tuneable.
-| `allow_long_sentences`     | `False`             | Whether or not texts longer than maximal sequence length are supported.
+| `allow_long_sentences`     | `True`             | Whether or not texts longer than maximal sequence length are supported.
 
 
 ### Layers
@@ -109,12 +109,12 @@ You can choose which one to use by passing this in the constructor:
 
 ```python
 # use first and last subtoken for each word
-embeddings = TransformerWordEmbeddings('bert-base-uncased', pooling_operation='first_last')
+embeddings = TransformerWordEmbeddings('bert-base-uncased', subtoken_pooling='first_last')
 embeddings.embed(sentence)
 print(sentence[0].embedding.size())
 ```
 
-### Scalar mix
+### Layer mean
 
 The Transformer-based models have a certain number of layers. [Liu et. al (2019)](https://arxiv.org/abs/1903.08855)
 propose a technique called scalar mix, that computes a parameterised scalar mixture of user-defined layers.
@@ -122,7 +122,7 @@ propose a technique called scalar mix, that computes a parameterised scalar mixt
 This technique is very useful, because for some downstream tasks like NER or PoS tagging it can be unclear which
 layer(s) of a Transformer-based model perform well, and per-layer analysis can take a lot of time.
 
-To use scalar mix, all Transformer-based embeddings in Flair come with a `use_scalar_mix` argument. The following
+To use scalar mix, all Transformer-based embeddings in Flair come with a `layer_mean` argument. The following
 example shows how to use scalar mix for a base RoBERTa model on all layers:
 
 ```python
@@ -140,11 +140,12 @@ embedding.embed(sentence)
 
 ### Fine-tuneable or not
 
-In some setups, you may wish to fine-tune the transformer embeddings. In this case, set fine_tune=True in the init method: 
+In some setups, you may wish to fine-tune the transformer embeddings. In this case, set `fine_tune=True` in the init method.
+When fine-tuning, you should also only use the topmost layer, so best set `layers='-1'`.
 
 ```python
 # use first and last subtoken for each word
-embeddings = TransformerWordEmbeddings('bert-base-uncased', fine_tune=True)
+embeddings = TransformerWordEmbeddings('bert-base-uncased', fine_tune=True, layers='-1')
 embeddings.embed(sentence)
 print(sentence[0].embedding)
 ```

From 8545c416e595086704a5bfe369a599b4835e2669 Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Wed, 3 Mar 2021 22:59:45 +0100
Subject: [PATCH 10/15] GH-2132: update readme

---
 flair/datasets/__init__.py          |  8 ++++----
 resources/docs/TUTORIAL_6_CORPUS.md | 11 ++++++++++-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/flair/datasets/__init__.py b/flair/datasets/__init__.py
index 5ee979771f..84846dcdac 100755
--- a/flair/datasets/__init__.py
+++ b/flair/datasets/__init__.py
@@ -10,6 +10,7 @@
 from .sequence_labeling import ANER_CORP
 from .sequence_labeling import BIOFID
 from .sequence_labeling import BIOSCOPE
+from .sequence_labeling import BUSINESS_HUN
 from .sequence_labeling import CONLL_03
 from .sequence_labeling import CONLL_03_GERMAN
 from .sequence_labeling import CONLL_03_DUTCH
@@ -55,7 +56,6 @@
 from .sequence_labeling import WSD_UFSAC
 from .sequence_labeling import WNUT_2020_NER
 from .sequence_labeling import XTREME
-from .sequence_labeling import BUSINESS_HUN
 
 # Expose all document classification datasets
 from .document_classification import ClassificationCorpus
@@ -63,6 +63,9 @@
 from .document_classification import CSVClassificationCorpus
 from .document_classification import CSVClassificationDataset
 from .document_classification import AMAZON_REVIEWS
+from .document_classification import COMMUNICATIVE_FUNCTIONS
+from .document_classification import GERMEVAL_2018_OFFENSIVE_LANGUAGE
+from .document_classification import GO_EMOTIONS
 from .document_classification import IMDB
 from .document_classification import NEWSGROUPS
 from .document_classification import SENTIMENT_140
@@ -74,13 +77,10 @@
 from .document_classification import SENTEVAL_SST_GRANULAR
 from .document_classification import TREC_50
 from .document_classification import TREC_6
-from .document_classification import COMMUNICATIVE_FUNCTIONS
 from .document_classification import WASSA_ANGER
 from .document_classification import WASSA_FEAR
 from .document_classification import WASSA_JOY
 from .document_classification import WASSA_SADNESS
-from .document_classification import GO_EMOTIONS
-from .document_classification import GERMEVAL_2018_OFFENSIVE_LANGUAGE
 
 # Expose all treebanks
 from .treebanks import UniversalDependenciesCorpus
diff --git a/resources/docs/TUTORIAL_6_CORPUS.md b/resources/docs/TUTORIAL_6_CORPUS.md
index 9122ba6431..7a6abbac3b 100644
--- a/resources/docs/TUTORIAL_6_CORPUS.md
+++ b/resources/docs/TUTORIAL_6_CORPUS.md
@@ -165,6 +165,7 @@ data the first time you call the corresponding constructor ID. The following dat
 | 'ANER_CORP' | Arabic  |  [Arabic Named Entity Recognition Corpus](http://curtis.ml.cmu.edu/w/courses/index.php/ANERcorp) 4-class NER |
 | 'BIOFID' | German  |  [CoNLL-03](https://www.aclweb.org/anthology/K19-1081/) Biodiversity literature NER |
 | 'BIOSCOPE' | English  |  [BioScope](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-S11-S9/) biomedical texts annotated for uncertainty, negation and their scopes |
+| 'BUSINESS_HUN' | Hungarian | NER on Hungarian business news |
 | 'CONLL_03_DUTCH' | Dutch  |  [CoNLL-03](https://www.clips.uantwerpen.be/conll2002/ner/) 4-class NER |
 | 'CONLL_03_SPANISH' | Spanish  |  [CoNLL-03](https://www.clips.uantwerpen.be/conll2002/ner/) 4-class NER |
 | 'DANE' | Danish | [DaNE dataset](https://github.com/alexandrainst/danlp/blob/master/docs/datasets.md#danish-dependency-treebank) | 
@@ -176,6 +177,7 @@ data the first time you call the corresponding constructor ID. The following dat
 | 'NER_BASQUE' | Basque  |  [NER dataset for Basque](http://ixa2.si.ehu.eus/eiec/) |
 | 'NER_FINNISH' | Finnish | [Finer-data](https://github.com/mpsilfve/finer-data) | 
 | 'NER_SWEDISH' | Swedish | [Swedish Spraakbanken NER](https://github.com/klintan/swedish-ner-corpus/) 4-class NER |
+| 'STACKOVERFLOW_NER' | English  | NER on StackOverflow posts |
 | 'TURKU_NER' | Finnish | [TURKU_NER](https://github.com/TurkuNLP/turku-ner-corpus) NER corpus created by the Turku NLP Group, University of Turku, Finland |
 | 'TWITTER_NER' | English  |  [Twitter NER dataset](https://github.com/aritter/twitter_nlp/) |
 | 'WEIBO_NER' | Chinese  | [Weibo NER corpus](https://paperswithcode.com/sota/chinese-named-entity-recognition-on-weibo-ner/).  |
@@ -266,6 +268,8 @@ for the purpose of training multilingual frame detection systems.
 | -------------    | ------------- |------------- |
 | 'AMAZON_REVIEWS' | English |  [Amazon product reviews](https://nijianmo.github.io/amazon/index.html/) dataset with sentiment annotation |
 | 'COMMUNICATIVE_FUNCTIONS' | English |  [Communicative functions](https://github.com/Alab-NII/FECFevalDataset) of sentences in scholarly papers |
+| 'GERMEVAL_2018_OFFENSIVE_LANGUAGE' | German | Offensive language detection for German |
+| 'GO_EMOTIONS' | English | [GoEmotions dataset](https://github.com/google-research/google-research/tree/master/goemotions) Reddit comments labeled with 27 emotions |
 | 'IMDB' | English |  [IMDB](http://ai.stanford.edu/~amaas/data/sentiment/) dataset of movie reviews with sentiment annotation  |
 | 'NEWSGROUPS' | English | The popular [20 newsgroups](http://qwone.com/~jason/20Newsgroups/) classification dataset |
 | 'SENTIMENT_140' | English | [Tweets dataset](http://help.sentiment140.com/for-students/) with sentiment annotation |
@@ -275,7 +279,6 @@ for the purpose of training multilingual frame detection systems.
 | 'SENTEVAL_MPQA' | English | Opinion-polarity dataset of [SentEval](https://github.com/facebookresearch/SentEval) with opinion-polarity annotation |
 | 'SENTEVAL_SST_BINARY' | English | Stanford sentiment treebank dataset of of [SentEval](https://github.com/facebookresearch/SentEval) with sentiment annotation |
 | 'SENTEVAL_SST_GRANULAR' | English | Stanford sentiment treebank dataset of of [SentEval](https://github.com/facebookresearch/SentEval) with fine-grained sentiment annotation |
-| 'GO_EMOTIONS' | English | [GoEmotions dataset](https://github.com/google-research/google-research/tree/master/goemotions) Reddit comments labeled with 27 emotions |
 | 'TREC_6', 'TREC_50' | English | The [TREC](http://cogcomp.org/Data/QA/QC/) question classification dataset |
 
 
@@ -287,6 +290,12 @@ for the purpose of training multilingual frame detection systems.
 | 'WASSA_JOY' | English | The [WASSA](https://competitions.codalab.org/competitions/16380#learn_the_details) emotion-intensity detection challenge (joy) |
 | 'WASSA_SADNESS' | English | The [WASSA](https://competitions.codalab.org/competitions/16380#learn_the_details) emotion-intensity detection challenge (sadness) |
 
+#### Recognizing Textual Entailment
+| ID(s) | Languages | Description |
+| -------------    | ------------- |------------- |
+| 'GLUE_RTE' | English | The RTE task from the GLUE benchmark |
+| 'SUPERGLUE_RTE' | English | The RTE task from the SuperGLUE benchmark |
+
 
 #### Experimental: Similarity Learning
 | ID(s) | Languages | Description |

From 828e2b9adac6ccb75758dfd1e62370b25bdbc2d5 Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Thu, 4 Mar 2021 13:15:38 +0100
Subject: [PATCH 11/15] GH-2132: bump Flair version

---
 flair/__init__.py | 2 +-
 setup.py          | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/flair/__init__.py b/flair/__init__.py
index ecb28ec24a..5702389c59 100644
--- a/flair/__init__.py
+++ b/flair/__init__.py
@@ -25,7 +25,7 @@
 
 import logging.config
 
-__version__ = "0.7"
+__version__ = "0.8"
 
 logging.config.dictConfig(
     {
diff --git a/setup.py b/setup.py
index 8246264559..1bf6d311a5 100644
--- a/setup.py
+++ b/setup.py
@@ -5,7 +5,7 @@
 
 setup(
     name="flair",
-    version="0.7",
+    version="0.8",
     description="A very simple framework for state-of-the-art NLP",
     long_description=open("README.md", encoding="utf-8").read(),
     long_description_content_type="text/markdown",

From 41587455c04066519875d3d1e872de864018f74a Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Thu, 4 Mar 2021 13:27:49 +0100
Subject: [PATCH 12/15] Update README.md

---
 README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index b88a50db7f..c5678fc6f3 100644
--- a/README.md
+++ b/README.md
@@ -30,11 +30,11 @@ Flair ships with state-of-the-art models for a range of NLP tasks. For instance,
 
 | Language | Dataset | Flair | Best published | Model card & demo
 |  ---  | ----------- | ---------------- | ------------- | ------------- |
-| NER English | Conll-03 (4-class)   |  **94.09** (F1)  | *94.3 [(Yamada et al., 2018)](https://doi.org/10.18653/v1/2020.emnlp-main.523)* | [Flair English 4-class NER demo](https://huggingface.co/flair/ner-english-large)  |
-| NER English | Ontonotes (18-class)  |  **90.93** (F1)  | *91,3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair English 18-class NER demo](https://huggingface.co/flair/ner-english-ontonotes-large) |
-| NER German  | Conll-03 (4-class)   |  **92,31** (F1)  | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair German 4-class NER demo](https://huggingface.co/flair/ner-german-large)  |
-| NER Dutch  | Conll-03  (4-class)  |  **95,25** (F1)  | *93.7 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair Dutch 4-class NER demo](https://huggingface.co/flair/ner-dutch-large)  |
-| NER Spanish  | Conll-03 (4-class)   |  **90,54** (F1)  | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair Spanish 18-class NER demo](https://huggingface.co/flair/ner-spanish-large)  |
+| NER English | Conll-03 (4-class)   |  **94.09**  | *94.3 [(Yamada et al., 2018)](https://doi.org/10.18653/v1/2020.emnlp-main.523)* | [Flair English 4-class NER demo](https://huggingface.co/flair/ner-english-large)  |
+| NER English | Ontonotes (18-class)  |  **90.93**  | *91,3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair English 18-class NER demo](https://huggingface.co/flair/ner-english-ontonotes-large) |
+| NER German  | Conll-03 (4-class)   |  **92,31**  | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair German 4-class NER demo](https://huggingface.co/flair/ner-german-large)  |
+| NER Dutch  | Conll-03  (4-class)  |  **95,25**  | *93.7 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair Dutch 4-class NER demo](https://huggingface.co/flair/ner-dutch-large)  |
+| NER Spanish  | Conll-03 (4-class)   |  **90,54** | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair Spanish 18-class NER demo](https://huggingface.co/flair/ner-spanish-large)  |
 
 **New:** Most Flair sequence tagging models (named entity recognition, part-of-speech tagging etc.) are now hosted 
 on the [__🤗 HuggingFace model hub__](https://huggingface.co/models?filter=flair)! You can browse models, check detailed information on how they were trained, and even try each model out online!

From a2f04a498aef4fd379bbf08d769bd2c519453165 Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Thu, 4 Mar 2021 13:28:46 +0100
Subject: [PATCH 13/15] Update README.md

---
 README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index c5678fc6f3..88f3a36a21 100644
--- a/README.md
+++ b/README.md
@@ -30,11 +30,11 @@ Flair ships with state-of-the-art models for a range of NLP tasks. For instance,
 
 | Language | Dataset | Flair | Best published | Model card & demo
 |  ---  | ----------- | ---------------- | ------------- | ------------- |
-| NER English | Conll-03 (4-class)   |  **94.09**  | *94.3 [(Yamada et al., 2018)](https://doi.org/10.18653/v1/2020.emnlp-main.523)* | [Flair English 4-class NER demo](https://huggingface.co/flair/ner-english-large)  |
-| NER English | Ontonotes (18-class)  |  **90.93**  | *91,3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair English 18-class NER demo](https://huggingface.co/flair/ner-english-ontonotes-large) |
-| NER German  | Conll-03 (4-class)   |  **92,31**  | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair German 4-class NER demo](https://huggingface.co/flair/ner-german-large)  |
-| NER Dutch  | Conll-03  (4-class)  |  **95,25**  | *93.7 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair Dutch 4-class NER demo](https://huggingface.co/flair/ner-dutch-large)  |
-| NER Spanish  | Conll-03 (4-class)   |  **90,54** | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair Spanish 18-class NER demo](https://huggingface.co/flair/ner-spanish-large)  |
+| English | Conll-03 (4-class)   |  **94.09**  | *94.3 [(Yamada et al., 2018)](https://doi.org/10.18653/v1/2020.emnlp-main.523)* | [Flair English 4-class NER demo](https://huggingface.co/flair/ner-english-large)  |
+| English | Ontonotes (18-class)  |  **90.93**  | *91,3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair English 18-class NER demo](https://huggingface.co/flair/ner-english-ontonotes-large) |
+| German  | Conll-03 (4-class)   |  **92,31**  | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair German 4-class NER demo](https://huggingface.co/flair/ner-german-large)  |
+| Dutch  | Conll-03  (4-class)  |  **95,25**  | *93.7 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair Dutch 4-class NER demo](https://huggingface.co/flair/ner-dutch-large)  |
+| Spanish  | Conll-03 (4-class)   |  **90,54** | *90.3 [(Yu et al., 2016)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair Spanish 18-class NER demo](https://huggingface.co/flair/ner-spanish-large)  |
 
 **New:** Most Flair sequence tagging models (named entity recognition, part-of-speech tagging etc.) are now hosted 
 on the [__🤗 HuggingFace model hub__](https://huggingface.co/models?filter=flair)! You can browse models, check detailed information on how they were trained, and even try each model out online!

From daa1c02868ebd908cc605cd8bfa0c84b4e050e28 Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Thu, 4 Mar 2021 13:39:01 +0100
Subject: [PATCH 14/15] Update TUTORIAL_2_TAGGING.md

---
 resources/docs/TUTORIAL_2_TAGGING.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/resources/docs/TUTORIAL_2_TAGGING.md b/resources/docs/TUTORIAL_2_TAGGING.md
index 87534cd2de..c3cad04b5a 100644
--- a/resources/docs/TUTORIAL_2_TAGGING.md
+++ b/resources/docs/TUTORIAL_2_TAGGING.md
@@ -153,7 +153,7 @@ You can pass text in any of these languages to the model. In particular, the NER
 | 'de-historic-reported' | historical reported speech | German |  @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
 | 'de-historic-free-indirect' | historical free-indirect speech | German | @redewiedergabe project |  **87.94** (F1) | [redewiedergabe](https://github.com/redewiedergabe/tagger) | |
 | '[fr-ner](https://huggingface.co/flair/ner-french)' | NER (4-class) | French | [WikiNER (aij-wikiner-fr-wp3)](https://github.com/dice-group/FOX/tree/master/input/Wikiner)  |  **95.57** (F1) | [mhham](https://github.com/mhham) |
-| '[es-ner-large](https://huggingface.co/flair/ner-spanish-large)' | NER (4-class) | Spanish | [WikiNER (aij-wikiner-fr-wp3)](https://github.com/dice-group/FOX/tree/master/input/Wikiner)  |  **90,54** (F1) | [mhham](https://github.com/mhham) |
+| '[es-ner-large](https://huggingface.co/flair/ner-spanish-large)' | NER (4-class) | Spanish | CoNLL-03  |  **90,54** (F1) | [mhham](https://github.com/mhham) |
 | '[nl-ner](https://huggingface.co/flair/ner-dutch)' | NER (4-class) | Dutch |  [CoNLL 2002](https://www.clips.uantwerpen.be/conll2002/ner/)  |  **92.58** (F1) |  |
 | '[nl-ner-large](https://huggingface.co/flair/ner-dutch-large)' | NER (4-class) | Dutch | Conll-03 |  **95,25** (F1) |  |
 | 'nl-ner-rnn' | NER (4-class) | Dutch | [CoNLL 2002](https://www.clips.uantwerpen.be/conll2002/ner/)  |  **90.79** (F1) | |

From 8d5b7e6b240fcd2e1a0c300ed7130b2566468b05 Mon Sep 17 00:00:00 2001
From: Alan Akbik <alan.akbik@gmail.com>
Date: Thu, 4 Mar 2021 14:29:28 +0100
Subject: [PATCH 15/15] Update TUTORIAL_6_CORPUS.md

---
 resources/docs/TUTORIAL_6_CORPUS.md | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/resources/docs/TUTORIAL_6_CORPUS.md b/resources/docs/TUTORIAL_6_CORPUS.md
index 7a6abbac3b..cc0af3f4c9 100644
--- a/resources/docs/TUTORIAL_6_CORPUS.md
+++ b/resources/docs/TUTORIAL_6_CORPUS.md
@@ -98,8 +98,9 @@ Corpus: 12543 train + 2002 dev + 2077 test sentences
 Corpus: 1255 train + 201 dev + 208 test sentences
 ```
 
-For many learning tasks you need to create a target dictionary. Thus, the `Corpus` enables you to create your
-tag or label dictionary, depending on the task you want to learn. Simple execute the following code snippet to do so:
+For many learning tasks you need to create a "dictionary" that contains all the labels you want to predict.
+You can generate this dictionary directly out of the `Corpus` by calling the method `make_label_dictionary`. 
+If a corpus has multiple label types, you additionally need to specify for which label you want to produce the dictionary: 
 
 ```python
 # create tag dictionary for a PoS task
@@ -115,6 +116,8 @@ corpus = flair.datasets.TREC_6()
 print(corpus.make_label_dictionary())
 ```
 
+This should print out different label dictionaries for different datasets and tasks.
+
 Another useful function is `obtain_statistics()` which returns you a python dictionary with useful statistics about your
 dataset. Using it, for example, on the IMDB dataset like this