Table of contents.
This is a tool to download the CoNLL Shared Task classification tables and perform metrics on them. It also allows to perform metrics on local results and display outliers for a parser.
This script has 8 features:
- Download the parser ranking tables from the Conll Shared Task website for a particular year.
- Calculate different metrics on the downloaded data by displaying the results on screen and generating the respective graphs.
- For the data calculated in the local experiments, it obtains the classification of the parsers used.
- For a parser, show the outliers observed in the graphs.
- For outliers of a parser, calculate some statistics on them.
- For a corpus of SemEval-2022 Shared Task 10, find how many sentences have multiple opinions that share some part.
- For a corpus of SemEval-2022 Shared Task 10, separate the files by the number of sentence opinions.
- For a corpus of SemEval-2022 Shared Task 10, Generates graphs with F1 values for each number of opinions.
It is important to note that the script uses the data, experiments, cache and charts folders as the base directory for some features.
Install the necessary dependencies listed in the requirements.txt
file.
$ pip3 install -r requirements.txt
To run the script, from a terminal in the root directory, type:
$ ./conllSharedTasks.py
This will show the usage:
usage: conllSharedTasks.py [-h] {rankings,metrics,experiments,outliers,statistics,analise,separate,opinions} ...
Gets the rankings and shows some metrics on the CoNLL Shared Task of the specified year or in custom experiments.
optional arguments:
-h, --help show this help message and exit
Commands:
{rankings,metrics,experiments,outliers,statistics,analise,separate,opinions}
rankings Get the classification results of the specified type and year from the website.
metrics Gets the indicated metric on the selected CoNLL Shared Task dataset.
experiments For a ranking, gets a leaderboard of the parsers used in the customised experiments.
outliers For a given parser, it shows the subsets in which it obtained its best position.
statistics It shows different statistics on the outliers of the language sets.
analise For a given corpus, analyse whether it has sentences with several opinions in which some element is shared between them.
separate Divide the files of each corpus into as many files as the number of opinions.
opinions Generates graphs with F1 values for each number of opinions.
If you want to know how to use a specific command, for example the rankings command, type:
$ ./conllSharedTasks.py rankings --help
And it will show the help:
usage: conllSharedTasks.py rankings [-h] --type {all,las,mlas,blex,uas,clas,upos,xpos,ufeats,alltags,lemmas,sentences,words,tokens} --year {17,18}
optional arguments:
-h, --help show this help message and exit
--type {all,las,mlas,blex,uas,clas,upos,xpos,ufeats,alltags,lemmas,sentences,words,tokens}
The type of the ranking.
--year {17,18} The year of the shared task.
If you get an error that you do not have permissions to run the script, type:
$ chmod u+x conllSharedTasks.py
Run the script again.
For each type of metric, the website distinguishes between individual treebank rankings and group rankings. The differences between this distinction should be taken into account when indicating the --section parameter.
To download all the rankings for 2018:
$ ./conllSharedTasks.py rankings --type all --year 18
To download a specific ranking for the year 2018:
$ ./conllSharedTasks.py rankings --type blex --year 18
For a specific classification:
$ ./conllSharedTasks.py metrics --ranking blex --section individual --treebank_set_size 10 --sampling_size 100000 --cache_samples yes
To indicate several classifications at the same time, they must be separated by spaces:
$ ./conllSharedTasks.py metrics --ranking las uas blex --section individual --treebank_set_size 10 --sampling_size 100000 --cache_samples yes
Note: Use --cache_samples to store the generated subsets on disk and save time on subsequent invocations of the command.
This command expects a given directory and file structure:
experiments/
├── metric_1
│ └── section_type
│ ├── treebank.csv
│ └── ...
└── metric_2
└── section_type
├── treebank.csv
└── ...
With the data attached to this repository, only the options of the LAS and UAS are available. If you want to add more metrics, you have to modify the command line in the conllSharedTasks.py file, indicating in the choices option the new metrics.
subparser.add_argument('--ranking', type=str, choices=['las', 'uas', 'new_option_1', 'new_option_2', ...], nargs='+', required=True, help="The ranking(s) to compare scores. To indicate several at the same time, they must be separated by spaces.")
For a specific metric:
$ ./conllSharedTasks.py experiments --ranking las --section individual --treebank_set_size 10 --sampling_size 100000 --cache_samples yes
To indicate several metrics at the same time, they must be separated by spaces:
$ ./conllSharedTasks.py experiments --ranking las uas --section individual --treebank_set_size 10 --sampling_size 100000 --cache_samples yes
Note: Use --cache_samples to store the generated subsets on disk and save time on subsequent invocations of the command.
Note: Because parser names contain spaces or single quotes, the name must be enclosed in double quotes.
For a specific parser in a specific metric:
$ ./conllSharedTasks.py outliers --parser "HIT-SCIR (Harbin)" --section individual --ranking las --limit 20 --show_best no --treebank_set_size 10 --sampling_size 100000 --cache_samples yes
To indicate several parsers and/or metrics at the same time, they must be separated by spaces:
$ ./conllSharedTasks.py outliers --parser "HIT-SCIR (Harbin)" "AntNLP (Shanghai)" --section individual --ranking las uas --limit 20 --show_best no --treebank_set_size 10 --sampling_size 100000 --cache_samples yes
- show_best: This parameter controls whether the best outliers (yes) or the worst outliers (no) are displayed.
- Showing the best outliers makes sense in the case of parsers that perform poorly in general, but have subsets where they do much better than expected.
- Showing the worst outliers makes sense in the case of parsers that perform well in general, but have subsets where they do much worse than expected.
Note: Use --cache_samples to store the generated subsets on disk and save time on subsequent invocations of the command.
$ ./conllSharedTasks.py statistics --file Desktop/statistics_worst_udpipe-future.csv
The file must be a CSV without spaces between the values, for example:
hr_set,fo_oft,cs_pud,ko_gsd,th_pud,he_htb,cs_pdt,uk_iu,hsb_ufal,pl_sz
pl_sz,la_ittb,pcm_nsc,th_pud,hi_hdtb,hsb_ufal,grc_proiel,ko_kaist,sl_sst,ug_udt
fa_seraji,it_postwita,no_nynorsklia,hsb_ufal,ca_ancora,ru_syntagrus,es_ancora,fo_oft,ur_udtb,nl_alpino
uk_iu,sr_set,hsb_ufal,hi_hdtb,en_ewt,sl_sst,cs_pdt,ja_gsd,pcm_nsc,sv_talbanken
en_gum,es_ancora,cs_pdt,pl_sz,hu_szeged,ja_gsd,gl_ctg,ja_modern,hsb_ufal,ko_kaist
$ ./conllSharedTasks.py analise --file Desktop/SemEval-2022_Shared_Task_10/data/opener_es/test.json
This will show a table with the results for each measure:
- The number of opinions, how many times they occur and, for each opinion, how many share part of an opinion with the rest of the opinions.
| Number of opinions | Occurrences | Number of times a sentence shares parts |
|----------------------|---------------|-------------------------------------------|
| 1 | 159 | 0 |
| 2 | 77 | 32 |
| 3 | 41 | 31 |
| 4 | 34 | 27 |
| 5 | 17 | 17 |
| 6 | 5 | 5 |
| 7 | 8 | 5 |
| 8 | 6 | 6 |
| 9 | 5 | 5 |
| 10 | 3 | 3 |
| 11 | 2 | 2 |
| 12 | 1 | 1 |
| 13 | 2 | 2 |
| 14 | 1 | 1 |
| 17 | 1 | 1 |
- For each part of an opinion, it shows the number of collisions of each size.
- The "All Cx" column shows how many times a part of an opinion collides with all other opinions in a sentence.
- The column "Not All Cx" shows how many times a part of an opinion does not collide with all other opinions in a sentence. In this case, repetitions are not taken into account, so the numbers show single occurrences.
| | C1 | All C1 | Not All C1 | C2 | All C2 | Not All C2 | C3 | All C3 | C4 | Not All C4 |
|--------|------|----------|--------------|------|----------|--------------|------|----------|------|--------------|
| Source | 17 | 10 | 2 | 9 | 3 | 1 | 1 | 1 | 1 | 1 |
| | C1 | All C1 | Not All C1 | C2 | All C2 | Not All C2 | C3 | All C3 | Not All C3 | C4 | Not All C4 |
|--------|------|----------|--------------|------|----------|--------------|------|----------|--------------|------|--------------|
| Target | 535 | 114 | 5 | 96 | 20 | 3 | 21 | 11 | 2 | 1 | 1 |
| | C1 | All C1 | Not All C1 | C2 | All C2 | Not All C2 | C3 | All C3 | Not All C3 | C4 | All C4 | Not All C4 | C5 | All C5 | Not All C5 | C6 | Not All C6 |
|------------------|------|----------|--------------|------|----------|--------------|------|----------|--------------|------|----------|--------------|------|----------|--------------|------|--------------|
| Polar_expression | 813 | 159 | 4 | 43 | 9 | 3 | 4 | 1 | 1 | 5 | 1 | 1 | 4 | 1 | 2 | 1 | 1 |
$ ./conllSharedTasks.py separate --original data/original --separated data/separated
This will generate files in the following format: name-number_opinions.json
$ ./conllSharedTasks.py opinions --folder data/opinions/ --scores test --display
- folder: Folder where the CSV files are located.
- Note: CSV files must have a header on the first line, as the program interprets the first line as the column names.
- If there is no value for a given number of opinions, leave it empty as the program will interpret it as
None
(See #4).opinions,dev,test 0,0.000,0.000 1,0.372,0.292 2,0.394,0.343 3,0.400,0.500 4,,0.484
- scores: The type of scores to be displayed on the graphs (
dev
ortest
). - display: If specified, displays the values of the points on the graph.
The charts will be generated inside the charts/opinions/
folder.
MIT License
Copyright (c) 2021 Iago Alonso Alonso
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the " Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.