Skip to content

GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? (LREC-COLING2024 code+data)

License

Notifications You must be signed in to change notification settings

YipingNUS/gpt-hate-check

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPT-HateCheck: Hate Speech Functionality Tests Generated by GPT-3.5

Accompanying source code and data for the following publication.

Yiping Jin, Leo Wanner, and Alexander Shvets. GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? LREC-COLING 2024. Turin, Italy.

1. Generating Dataset with GPT-3.5

The jupyter notebooks gpt-3.5-data-generation-[IDENTITY].ipynb generates the test cases for each identity. Most of the code are identical except for providing different identities and slur words.

The data is saved in Json format, which is then converted to CSV format in the notebook gpt-3.5-data-postprocessing.ipynb.

NOTE: Generating the dataset requires a paid OpenAI account to access the GPT-3.5 endpoint, which is not included in this repo.

2. Validating Dataset with NLI-Based Filtering

The notebook nli_hypothesis_test/hypothesis_testing.ipynb loads the generated dataset and perform a suite of hypothesis tests depending on the functionality. It then aggregate the entailment predictions to yield the validation outcome. The result of validation is stored in nli_hypothesis_test/output/dataset_[IDENTITY].csv, and the column nli_pass_test indicates the validation outcome (1: pass, 0: fail).

Datasets

The datasets/ folder includes both datasets used in this paper:

  1. HateCheck: Dataset published in Röttger et al (ACL 2021). [LINK].
  2. GPT-HateCheck: The new hate speech functionality test dataset we introduced. Located in datasets/gpt3.5-generated..
    • gpt3.5_generated_hs_dataset_[IDENTITY].json: The raw json files generated by GPT-3.5
    • functionalities_[IDENTITY].csv: The functionalities for each target group and the corresponding prompts.
    • dataset_[IDENTITY].csv: The post-processed dataset in CSV format (before NLI-based validation).

Crowd-Sourced Annotation

We ran the crowd-sourced annotation on Toloka.ai. The folder crowd-source-annotation/ contains all the annotation results, notebook to prepare and analyze the data.

  • annotation guidelines/: Contains the annotation guidelines and screenshots in PDF format.
  • result/: The crowd-sourced annotation result.
  • prepare_data.ipynb: Prepare the dataset for crowd-source annotation.
  • annotate_data.ipynb: Notebook used by the author to annotate the data offline using Pigeon.
  • trans_annotate_func_gold.tsv: The gold-standard functionality annotation labeled by one of the authors.
  • analyze_data.ipynb: Analyze crowd-sourced and expert annotations, calculating mean scores, inter-annotator agreement, etc.
  • hatebert-exp.ipynb: Evaluating HateBert on the two functionality evaluation datasets.

Important Note of Data Sharing

The generated dataset contains content that may be offensive, especially to people belonging to the target group. Therefore, we compressed the following folder with a password.

  • datasets/gpt3.5-generated.zip
  • crowd-source-annotation.zip
  • nli_hypothesis_test/output/

To access the data, please briefly email me how you intend to use the dataset. I'll share with you the password within 3-5 working days.

By accessing the data, you agree not to publicly share the data or the password and use it responsibly.

Additional Notebooks

The folder notebooks_for_manuscript/ contains additional notebooks to produce the results, tables, and graphs in the manuscript.

About

GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? (LREC-COLING2024 code+data)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published