🕹️ CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech [Accepted at IJCAI 2022: AI for Good(Special Track)]

For more details about our paper

Punyajoy Saha, Kanishk Singh, Adarsh Kumar, Binny Mathew and Animesh Mukherjee : "CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech"

Arxiv Paper Link

Abstract

Recently, many studies have tried to create generation models to assist counter speakers by providing counterspeech suggestions for combating the explosive proliferation of online hate. However, since these suggestions are from a vanilla generation model, they might not include the appropriate properties required to counter a particular hate speech instance. In this paper, we propose CounterGeDi - an ensemble of generative discriminators (GeDi) to guide the generation of a DialoGPT model toward more polite, detoxified, and emotionally laden counterspeech. We generate counterspeech using three datasets and observe significant improvement across different attribute scores. The politeness and detoxification scores increased by around 15% and 6% respectively, while the emotion in the counterspeech increased by at least 10% across all the datasets. We also experiment with triple-attribute control and observe significant improvement over single attribute results when combining complementing attributes, e.g., politeness, joyfulness and detoxification. In all these experiments, the relevancy of the generated text does not deteriorate due to the application of these controls.

WARNING: The repository contains content that are offensive and/or hateful in nature.

Please cite our paper in any published work that uses any of these resources.

@misc{https://doi.org/10.48550/arxiv.2205.04304,
  doi = {10.48550/ARXIV.2205.04304}, 
  url = {https://arxiv.org/abs/2205.04304},
  author = {Saha, Punyajoy and Singh, Kanishk and Kumar, Adarsh and Mathew, Binny and Mukherjee, Animesh},
  keywords = {Computation and Language (cs.CL), Computers and Society (cs.CY), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

Folder Description 📂


./Discriminator       --> Contains the codes for the Discriminators used in GeDi Model
./Generation  	      --> Contains the codes for Generation of Results using our proposed Model	
./Utils               --> Contains the utility functions like Preprocessing, Data loading etc

Usage instructions

BaseModel Training for Counterspeech

To train the base model for Counterspeech Generation, run the file Generation_training.py, after updating the task name and other saving related parameters as per the requirement(see comments to get more idea on different path variables to be updated).

Generation

For generation of results, run Generation_gedi.py file. In order to generate the required result file, adjust the parameters in params dictionary in the python file, as per the requirement. For example

# To generate sentences controlled for emotion joy + Politeness:
params = {
     ...
     ...
     'disc_weight':[0.5, 0.5],
     ...
     ...
     'task_name':[('Emotion', 'joy'), ('Politeness', 'polite')],
     ...
}

Similarly you can tweak other papameters to change the results as per the requirement.

Evaluation instructions

For Generation Metrics:

We evaluate the generated responses on variety of metrics including BLEU,meteor, diversity and novelty.
The methods to compute these scores are described in the Evaluation notebook.ipynb

For Emotions Evaluation:

Do git clone https://github.com/monologg/GoEmotions-pytorch
Then move the Evaluation notebook-Emotion to the GoEmotions-pytorch folder and set file paths accordingly for running evaluation

For Toxicity Evaluation:

Toxicity is calculated using HateXplain model
The colab notebook could be accessed here - CounterGedi_detox_eval.ipynb

For Grammatical Coherence Evaluation:

To evaluate whether the respsonses were grammaticaly coreect or not, we use a pretrained model trained on the corpus of linguistic acceptability(COLA scores).
The colab notebook could be accessed here - CounterGedi_COLA_eval.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
Discriminator		Discriminator
Figures		Figures
Generation		Generation
Utils		Utils
.gitignore		.gitignore
Controlled_generation.ipynb		Controlled_generation.ipynb
Create_training_data.ipynb		Create_training_data.ipynb
Discriminator_training.py		Discriminator_training.py
Evaluation notebook-Emotion.ipynb		Evaluation notebook-Emotion.ipynb
Evaluation notebook-Politeness.ipynb		Evaluation notebook-Politeness.ipynb
Evaluation notebook.ipynb		Evaluation notebook.ipynb
GEDI_training.py		GEDI_training.py
Generation_code.ipynb		Generation_code.ipynb
Generation_gedi.py		Generation_gedi.py
Generation_training.py		Generation_training.py
Hate_speech_detection.ipynb		Hate_speech_detection.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕹️ CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech [Accepted at IJCAI 2022: AI for Good(Special Track)]

For more details about our paper

Abstract

Folder Description 📂

Usage instructions

BaseModel Training for Counterspeech

Generation

Evaluation instructions

Todos

👍 The repo is still in active developements. Feel free to create an issue !! 👍

About

Releases

Packages

Contributors 3

Languages

License

hate-alert/CounterGEDI

Folders and files

Latest commit

History

Repository files navigation

🕹️ CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech [Accepted at IJCAI 2022: AI for Good(Special Track)]

For more details about our paper

Abstract

Folder Description 📂

Usage instructions

BaseModel Training for Counterspeech

Generation

Evaluation instructions

Todos

👍 The repo is still in active developements. Feel free to create an issue !! 👍

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages