This repo contains the code and the results for the project Checklisting SRL (pdf available in the repo).
The notebooks contain the code for generating and testing the models on the test set.
Every cell is a test and shall be run for the model of interests. Results are autmatically saved in the file results+[model]+[task].json
Standard automatic evaluations of NLP sys- tems on a test split of data offer the advantage of comfortably comparing the performance be- tween different approaches but little they re- veal about the real world performance of these models. Studying in depth the behavior of a system given unseen or challenging input is in fact considered good practice and insight- ful for future improvements. For this reason, (Ribeiro et al., 2020) propose a framework called Checklist to help users and developers to plan and design a challenge data set that allow to singularly test the different capabili- ties required to resolve a certain task. In this work, following the Checklist approach, the main capabilities for Semantic Role Labeling are discussed and a set of test data is proposed based on them. The behaviour of two state of the art models is then evaluated on the set and their performance discussed. Results re- veal that these models still struggle with some of their core capabilities when tested on these targeted test sentences.