GitHub - Kim-Minseon/TARO: Effective Targeted Attacks for Adversarial Self-Supervised Learning (NeurIPS 2023)

Effective Targeted Attacks for Adversarial Self-Supervised Learning

This is the official PyTorch implementation for the paper Effective Targeted Attacks for Adversarial Self-Supervised Learning, NeurIPS 2023: Paper

Abstract

Recently, unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information. Previous studies in unsupervised AT have mostly focused on implementing self-supervised learning (SSL) frameworks, which maximize the instance-wise classification loss to generate adversarial examples. However, we observe that simply maximizing the self-supervised training loss with an untargeted adversarial attack often results in generating ineffective adversaries that may not help improve the robustness of the trained model, especially for non-contrastive SSL frameworks without negative examples. To tackle this problem, we propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks. Specifically, we introduce an algorithm that selects the most confusing yet similar target example for a given instance based on entropy and similarity, and subsequently perturbs the given instance towards the selected target. Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks, on the benchmark datasets.

Pretrain

$ CUDA_VISIBLE_DEVICES=0,1 python taro_pretrain.py

Linear evaluation

$ sh run_lineval.sh ckpt_path epoch

Citation

If you found the provided code useful, please cite our work.

@article{kim2024effective,
  title={Effective Targeted Attacks for Adversarial Self-Supervised Learning},
  author={Kim, Minseon and Ha, Hyeonjeong and Son, Sooel and Hwang, Sung Ju},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
attack		attack
data		data
models		models
README.md		README.md
argument.py		argument.py
lin_eval.py		lin_eval.py
loss.py		loss.py
run_lineval.sh		run_lineval.sh
taro_pretrain.py		taro_pretrain.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Effective Targeted Attacks for Adversarial Self-Supervised Learning

Abstract

Pretrain

Linear evaluation

Citation

About

Releases

Packages

Languages

Kim-Minseon/TARO

Folders and files

Latest commit

History

Repository files navigation

Effective Targeted Attacks for Adversarial Self-Supervised Learning

Abstract

Pretrain

Linear evaluation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages