Skip to content

Initial Release of HLGD

Latest
Compare
Choose a tag to compare
@tingofurro tingofurro released this 05 May 14:55
· 12 commits to main since this release
65174a1

We release the HLGD dataset and trained models.

Dataset:

  • hlgd_original_annotations.json: A JSON file containing the 10 timelines, each with the original five annotations and the global group (aggregate of the five annotations).
  • hlgd_classification_0.1.zip a Zip file containing three files: train.json, dev.json and test.json, each containing a split of the final classification dataset described in the paper. This file is compatible with HuggingFace's dataset library.

Models:

  • cls_elec_base_hlgd_0.74f1.bin model corresponds to the Electra Finetune on HLGD + Time in the paper. An example use of the model is provided in model_classifier.py
  • gpt2med_headline_gen_1.645.bin model corresponds to the headline generator used for the Headline Generator Swap results. An example use of the model is provided in model_generator_swap.py

Analysis:

We release the underlying annotation for the Analysis section (Section 3.6): headline_grouping_typology_negatives.csv and headline_grouping_typology_positives.csv.