SLA²P: Self-supervised Anomaly Detection with Adversarial Perturbation (TKDE 2024).

Short version: Self-supervision Meets Adversarial Perturbation: A Novel Framework for Anomaly Detection （CIKM 2022）[paper][code branch]

Abstract

Anomaly detection is a foundational yet difficult problem in machine learning. In this work, we propose a new and effective framework, dubbed as SLA$^2$P, for unsupervised anomaly detection. Following the extraction of delegate embeddings from raw data, we implement random projections on the features and consider features transformed by disparate projections as being associated with separate pseudo-classes. We then train a neural network for classification on these transformed features to conduct self-supervised learning. Subsequently, we introduce adversarial disturbances to the modified attributes, and we develop anomaly scores built on the classifier's predictive uncertainties concerning these disrupted features. Our approach is motivated by the fact that as anomalies are relatively rare and decentralized, 1) the training of the pseudo-label classifier concentrates more on acquiring the semantic knowledge of regular data instead of anomalous data; 2) the altered attributes of the normal data exhibit greater resilience to disturbances compared to those of the anomalous data. Therefore, the disrupted modified attributes of anomalies can not be well classified and correspondingly tend to attain lesser anomaly scores. The results of experiments on various benchmark datasets for images, text, and inherently tabular data demonstrate that SLA$^2$P achieves state-of-the-art performance consistently.

Usage

Environment setup

conda env create -f env.yaml

Prepare data

Download processed data (Caltech 101, 20 Newsgroups and Reuters) from [Google Drive Link] and put them into folder /data. They are borrowed from the [official implementation of RSRAE] (Robust Subspace Recovery Layer for Unsupervised Anomaly Detection. ICLR 2020)

To prepare BERT embeddings for 20 Newsgroups dataset, you can run

python extract_bert_embeddings_20news.py

the processed embeddings will be saved in 20news_bert.data. We also provide the processed embeddings here.

To prepare BERT embeddings for , you can run

python extract_bert_embeddings_arrhythmia.py

the processed embeddings will be saved in arrhythmia_bert.mat. We also provide the processed embeddings here.

To prepapre GPT embeddings, you first need to set your "Open"AI API key in environment variable OPENAI_API_KEY via export OPENAI_API_KEY=your_api_key. Then run

python extract_GPT3_embedding_20news.py
python extract_GPT3_embedding_arrhythmia.py

the corresponding processed embeddings will be saved in 20news_gpt3.data and arrhythmia_gpt3.mat. We also provide the processed embeddings at 20news_gpt3 and arrhythmia_gpt3.

Run the experiments

The SLA$^2$P method is implemented in sla2p.py and the SLA (w/o adversarial perturbation) method is in sla.py.

To reproduce the results reported in the main paper, run the following commands.

# CIFAR-10
python sla2p.py --dataset cifar10 --n_rots 256 --d_out 256 --acc_thres 0.6 --epsilon 1000

# CIFAR-100
python sla2p.py --dataset cifar100 --n_rots 256 --d_out 256 --acc_thres 0.6 --epsilon 10000

# Caltech 101
python sla2p.py --dataset caltech --n_rots 256 --d_out 256 --acc_thres 0.6 --epsilon 1000

# 20 Newsgroups
python sla2p.py --dataset 20news --n_rots 256 --d_out 256 --acc_thres 0.75 --epsilon 10

# Reuters
python sla2p.py --dataset reuters --n_rots 512 --d_out 128 --acc_thres 0.3 --epsilon 100

# Arrhythmia
python sla2p.py --dataset arrhythmia --n_rots 256 --d_out 256 --acc_thres 0.6 --epsilon 1000

# KDD
python sla2p.py --dataset kdd --n_rots 64 --d_out 128 --acc_thres 0.6 --epsilon 1000

# 20 Newsgroups (BERT)
python sla2p.py --dataset 20news_bert --n_rots 256 --d_out 256 --acc_thres 0.6 --epsilon 1000

# 20 Newsgroups (GPT3)
python sla2p.py --dataset 20news_gpt3 --n_rots 256 --d_out 256 --acc_thres 0.6 --epsilon 1000

# Arrhythmia (BERT)
python sla2p.py --dataset arrhythmia_bert --n_rots 256 --d_out 256 --acc_thres 0.6 --epsilon 1000

# Arrhythmia (GPT3)
python sla2p.py --dataset arrhythmia_gpt3 --n_rots 256 --d_out 256 --acc_thres 0.6 --epsilon 1000

To evalute Unsupervised Anomaly Detection performance, use evaluate_roc_auc.py for AUROC scores and evaluate_pr_auc.py for AUPR scores.

Acknowledgments

In this code we heavily rely on the code of E3Outlier. The README file format is heavily based on the GitHub repos of my senior colleague Huan Wang and Xu Ma. Great thanks to them! We also greatly thank the anounymous TKDE and CIKM'22 reviewers for the constructive comments to help us improve the paper.

BibTeX

@ARTICLE{10645289,
  author={Wang, Yizhou and Qin, Can and Wei, Rongzhe and Xu, Yi and Bai, Yue and Fu, Yun},
  journal={IEEE Transactions on Knowledge and Data Engineering}, 
  title={SLA$^{{\text{2}}}$2P: Self-Supervised Anomaly Detection With Adversarial Perturbation}, 
  year={2024},
  volume={36},
  number={12},
  pages={9282-9293},
  keywords={Feature extraction;Anomaly detection;Perturbation methods;Task analysis;Training;Uncertainty;Unsupervised learning;Data mining- anomaly detection;machine learning- deep learning;representation learning},
  doi={10.1109/TKDE.2024.3448473}
}

@inproceedings{wang2022self,
  title={Self-supervision Meets Adversarial Perturbation: A Novel Framework for Anomaly Detection},
  author={Wang, Yizhou and Qin, Can and Wei, Rongzhe and Xu, Yi and Bai, Yue and Fu, Yun},
  booktitle={Proceedings of the 31st ACM International Conference on Information \& Knowledge Management},
  pages={4555--4559},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
images		images
models		models
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
data_loader.py		data_loader.py
env.yaml		env.yaml
eval_accuracy.py		eval_accuracy.py
evaluate_pr_auc.py		evaluate_pr_auc.py
evaluate_roc_auc.py		evaluate_roc_auc.py
extract_BERT_embedding_20news.py		extract_BERT_embedding_20news.py
extract_BERT_embedding_arrhythmia.py		extract_BERT_embedding_arrhythmia.py
extract_GPT3_embedding_20news.py		extract_GPT3_embedding_20news.py
extract_GPT3_embedding_arrhythmia.py		extract_GPT3_embedding_arrhythmia.py
keras2pytorch_dataset.py		keras2pytorch_dataset.py
misc.py		misc.py
outlier_datasets.py		outlier_datasets.py
reproduce.py		reproduce.py
sla.py		sla.py
sla2p.py		sla2p.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SLA²P: Self-supervised Anomaly Detection with Adversarial Perturbation (TKDE 2024).

Short version: Self-supervision Meets Adversarial Perturbation: A Novel Framework for Anomaly Detection （CIKM 2022）[paper][code branch]

Abstract

Usage

Environment setup

Prepare data

Run the experiments

Acknowledgments

BibTeX

About

Releases

Packages

Languages

License

wyzjack/SLA2P

Folders and files

Latest commit

History

Repository files navigation

SLA²P: Self-supervised Anomaly Detection with Adversarial Perturbation (TKDE 2024).

Short version: Self-supervision Meets Adversarial Perturbation: A Novel Framework for Anomaly Detection （CIKM 2022）[paper][code branch]

Abstract

Usage

Environment setup

Prepare data

Run the experiments

Acknowledgments

BibTeX

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages