Repository for the DISRPT2019 shared task on Discourse Unit Segmentation
The DISRPT 2019 workshop introduces the first iteration of a cross-formalism shared task on discourse unit segmentation. Since all major discourse parsing frameworks imply a segmentation of texts into segments, learning segmentations for and from diverse resources is a promising area for converging methods and insights. We provide training, development and test datasets from all available languages and treebanks in the RST, SDRT and PDTB formalisms, using a uniform format. Because different corpora, languages and frameworks use different guidelines for segmentation, the shared task is meant to promote design of flexible methods for dealing with various guidelines, and help to push forward the discussion of standards for discourse units. For datasets which have treebanks, we will evaluate in two different scenarios: with and without gold syntax, or otherwise using provided automatic parses for comparison.
https://sites.google.com/view/disrpt2019/
- Fri, December 28, 2018 - shared task sample data release
- Mon, January 21, 2019 - training data release
- Fri, February 15, 2019 - test data release
- Thu, February 28, 2019 - papers due (shared task & regular workshop papers)
- Wed, March 27, 2019 - notification of acceptance
- Fri, April 5, 2019 - camera-ready papers due
- June 6/7, 2019 (TBD) - workshop
The shared task repository currently comprises the following directories (to be extended as the task progresses):
- sample - sample data to illustrate formats, provided ahead of the release of training data (this data will be included in training data)
- utils - scripts for validating, evaluating and generating data formats