Accompanying source code and data for the following publication.
Yiping Jin, Leo Wanner, and Alexander Shvets. GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? LREC-COLING 2024. Turin, Italy.
The jupyter notebooks gpt-3.5-data-generation-[IDENTITY].ipynb
generates the test cases for each identity. Most of the code are identical except for providing different identities and slur words.
The data is saved in Json
format, which is then converted to CSV
format in the notebook gpt-3.5-data-postprocessing.ipynb
.
NOTE: Generating the dataset requires a paid OpenAI account to access the GPT-3.5 endpoint, which is not included in this repo.
The notebook nli_hypothesis_test/hypothesis_testing.ipynb
loads the generated dataset and perform a suite of hypothesis tests depending on the functionality. It then aggregate the entailment predictions to yield the validation outcome. The result of validation is stored in nli_hypothesis_test/output/dataset_[IDENTITY].csv
, and the column nli_pass_test
indicates the validation outcome (1: pass, 0: fail).
The datasets/
folder includes both datasets used in this paper:
- HateCheck: Dataset published in Röttger et al (ACL 2021). [LINK].
- GPT-HateCheck: The new hate speech functionality test dataset we introduced. Located in
datasets/gpt3.5-generated.
.- gpt3.5_generated_hs_dataset_[IDENTITY].json: The raw
json
files generated by GPT-3.5 - functionalities_[IDENTITY].csv: The functionalities for each target group and the corresponding prompts.
- dataset_[IDENTITY].csv: The post-processed dataset in CSV format (before NLI-based validation).
- gpt3.5_generated_hs_dataset_[IDENTITY].json: The raw
We ran the crowd-sourced annotation on Toloka.ai. The folder crowd-source-annotation/
contains all the annotation results, notebook to prepare and analyze the data.
- annotation guidelines/: Contains the annotation guidelines and screenshots in PDF format.
- result/: The crowd-sourced annotation result.
- prepare_data.ipynb: Prepare the dataset for crowd-source annotation.
- annotate_data.ipynb: Notebook used by the author to annotate the data offline using Pigeon.
- trans_annotate_func_gold.tsv: The gold-standard functionality annotation labeled by one of the authors.
- analyze_data.ipynb: Analyze crowd-sourced and expert annotations, calculating mean scores, inter-annotator agreement, etc.
- hatebert-exp.ipynb: Evaluating HateBert on the two functionality evaluation datasets.
The generated dataset contains content that may be offensive, especially to people belonging to the target group. Therefore, we compressed the following folder with a password.
- datasets/gpt3.5-generated.zip
- crowd-source-annotation.zip
- nli_hypothesis_test/output/
To access the data, please briefly email me how you intend to use the dataset. I'll share with you the password within 3-5 working days.
By accessing the data, you agree not to publicly share the data or the password and use it responsibly.
The folder notebooks_for_manuscript/
contains additional notebooks to produce the results, tables, and graphs in the manuscript.