Adversarial robustness under the "Explain Then Predict" setting

Overview

This repository contains the source code for our BlackBoxNLP 2024 @ EMNLP paper:

Enhancing adversarial robustness in Natural Language Inference using explanations

In this work, we investigate whether the usage of intermediate explanations in the Natural Language Inference (NLI) task can serve as a model-agnostic defence strategy against adversarial attacks. Our claim is that the intermediate explanation can filter out potential noise superimposed by the adversarial attack in the input pair (premise, hypothesis). Through extensive experimentation, we prove that conditioning the output label (entailment, contradiction, neutral) on an intermediate explanation that describes the inference relationship between the input premise and hypothesis, adversarial robustness is indeed achieved.

Project structure

The repo is organized in the following core directories:

fine-tuning: Includes the code for training and evaluating all the models that are used in our experiments. See the README file located in the fine-tuning directory for more details.
adversarial_attacks: Includes the code for performing adversarial attacks against the aforementioned models. See the README file located in the adversarial_attacks directory for more details.

Installation

Create a local copy of the repo: git clone https://github.com/alexkoulakos/explain-then-predict.git
Navigate to the root directory: cd explain-then-predict
Create a virtual environment called venv: virtualenv --system-site-packages venv
Activate the virtual environment: src venv/bin/activate (for Linux/MacOS) or ./venv/Scripts/activate.ps1 (for Windows)
Install necessary dependancies: pip install -r requirements.txt

Support and Issues

If you encounter any issues, bugs, or have questions, please feel free to open an issue on GitHub. Describe the problem you encountered, including:

A clear description of the issue or bug
Steps to reproduce the issue (if applicable)
Any relevant error messages or screenshots
Details about your environment (Python version, OS, library versions)

We’ll do our best to respond quickly and help resolve any problems.

Citation

If you use our findings in your work, don't forget to cite our paper:

@inproceedings{koulakos-etal-2024-enhancing,
    title = "Enhancing adversarial robustness in Natural Language Inference using explanations",
    author = "Koulakos, Alexandros and Lymperaiou, Maria and Filandrianos, Giorgos and Stamou, Giorgos",
    editor = "Belinkov, Yonatan and Kim, Najoung and Jumelet, Jaap and Mohebbi, Hosein and Mueller, Aaron and Chen, Hanjie",
    booktitle = "Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP",
    month = nov,
    year = "2024",
    address = "Miami, Florida, US",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.blackboxnlp-1.7",
    pages = "105--117"
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
adversarial_attacks		adversarial_attacks
fine-tuning		fine-tuning
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adversarial robustness under the "Explain Then Predict" setting

Overview

Project structure

Installation

Support and Issues

Citation

About

Releases

Packages

Languages

License

alexkoulakos/explain-then-predict

Folders and files

Latest commit

History

Repository files navigation

Adversarial robustness under the "Explain Then Predict" setting

Overview

Project structure

Installation

Support and Issues

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages