This repository contains the code and datasets for the paper ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial Viewpoints (NeurIPS2022).
by Yinpeng Dong, Shouwei Ruan, Hang Su, Caixin Kang, Xingxing Wei and Jun Zhu
- Python (3.7.11)
- Pytorch (1.11.0)
- torchvision (0.12.0)
- timm (0.5.4)
- pytorch-lighting(1.5.2)
Imagenet-v is a new out-of-distribution dataset for benchmarking viewpoint robustness of visual classifiers. it's generated by viewfool, and has 10,000 renderings of 100 objects with images of size 400*400
We used 100 3d objects from blenderkit contained within the ImageNet category, and we published the data needed to train NeRF, which can be obtained through this link:
The full ImageNet-V renderings can be obtained from the following link:
Using ImageNet-V we evaluated the viewpoint robustness of 40 classifiers with different diverse architectures, objective functions, and data augmentations.
Their performance under Imagenet-v compared to Natural viepoints renderings is as follows:
Classifier | Natural accuracy (%) | ImageNet-V(ours) accuracy (%) |
---|---|---|
vgg16 | 60.55 | 13.47 |
vgg19 | 62.81 | 11.83 |
resnet18 | 61.08 | 15.15 |
resnet34 | 68.08 | 14.09 |
resnet50 | 78.11 | 23.56 |
resnet101 | 81.19 | 30.15 |
resnet152 | 82.16 | 30.41 |
inception_v3 | 60.59 | 17.99 |
inception_v4 | 63.65 | 13.08 |
inception_resnet_v2 | 51.85 | 18.00 |
densenet121 | 73.36 | 20.93 |
densenet169 | 70.75 | 19.11 |
densenet201 | 70.05 | 19.83 |
efficientnet_b0 | 70.06 | 17.37 |
efficientnet_b1 | 73.36 | 18.35 |
efficientnet_b2 | 74.88 | 24.54 |
efficientnet_b3 | 75.43 | 25.51 |
efficientnet_b4 | 76.33 | 24.31 |
mobilenetv2_120d | 72.73 | 20.89 |
mobilenetv2_140 | 71.60 | 18.90 |
vit_base | 62.28 | 20.41 |
vit_large | 86.04 | 37.67 |
deit_tiny | 62.94 | 18.88 |
deit_small | 76.20 | 23.51 |
deit_base | 81.31 | 27.00 |
swin_tiny | 78.80 | 26.58 |
swin_small | 82.95 | 30.23 |
swin_base | 88.78 | 40.38 |
swin_large | 89.97 | 47.40 |
mixer_b16 | 49.66 | 9.63 |
mixer_l16 | 44.52 | 8.56 |
resnet50_l2_robust_eps=1.0 | 38.12 | 8.21 |
resnet50_l2_robust_eps=3.0 | 31.72 | 5.78 |
resnet50_l2_robust_eps=5.0 | 26.89 | 6.04 |
mae_vitb | 74.67 | 29.33 |
mae_vitl | 79.00 | 40.10 |
mae_vith | 83.67 | 49.85 |
resnet50_augmix | 73.34 | 18.87 |
resnet50_deepaugment | 71.25 | 19.65 |
resnet50_augmix+deepaugment | 72.98 | 23.10 |
We provide evaluation scripts for 40 pre-trained models You can use your own classifier for evaluation by replacing the relevant weight paths in the code or defining the model
Testing the imagenet-v dataset in classifiers with prtrained weight can be done with the following command:
python ./NeRF/Imagenet_v_benchmark.py --model {classifier_name}
There are currently supported classifiers:
'vgg16 ', 'vgg19', 'densenet121 ', 'densenet169', 'densenet201', 'inception_v3 ', 'inception_v4', 'inception_resnet_v2 ', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152', 'efficientnet_b0 ', 'efficientnet_b1', 'efficientnet_b2', 'efficientnet_b3 ', 'efficientnet_b4 ', 'mobilenetv2_120d', 'mobilenetv2_140 ', 'mixer_b16_224 ', 'mixer_l16_224 ', 'vit_base_patch16_224 ', 'vit_large_patch16_224 ', 'deit_base_distilled_patch16_224', 'deit_base_patch16_224 ', 'deit_small_patch16_224 ', 'deit_tiny_patch16_224 ', 'swin_base_patch4_window7_224 ', 'swin_large_patch4_window7_224 ', 'swin_small_patch4_window7_224 ', 'swin_tiny_patch4_window7_224', 'resnet_augmix ', 'resnet_deepaugment', 'resnet_augmix_deepaugment', 'resnet_l2_robust_eps=1.0', 'resnet_l2_robust_eps=3.0', 'resnet_l2_robust_eps=5.0', 'mae_vitb', 'mae_vitl', 'mae_vith'
We propose ViewFool, a novel method to find adversarial viewpoints that mislead visual recognition models. By encoding real-world objects as neural radiance fields (NeRF), ViewFool characterizes a distribution of diverse adversarial viewpoints under an entropic regularizer
Therefore, executing the ViewFool attack requires first obtaining the NeRF weight of the object
You can view nerf_pl understand the detailed training process, and in general, you can use the following commands:
python ./NeRF/train.py --dataset_name blender --root_dir "./training_data/apple_2" --N_importance 64 --img_wh 400 400 --noise_std 0 --num_epochs 30 --batch_size 4096 --optimizer adam --lr 1e-4 --lr_scheduler steplr --decay_step 2 4 8 --decay_gamma 0.5 --exp_name "apple_2"
--root_dir
is the path for training data, Data can be downloaded via the link in 2.1
After the training is complete, the weight file will be found in ./NeRF/ckpts/{exp_name}
Next, we provide two attack methods:
- Random: Randomly generates renderings of objects at various viewpoints within the angular range
python NeRF/attack_randomsearch.py --dataset_name blender_for_attack --scene_name 'AP_random/apple_2' --img_wh 400 400 --N_importance 64 --ckpt_path './NeRF/ckpts/apple_2/epoch=29.ckpt' --num_sample 100 --optim_method random --search_num 6
- ViewFool: Use NES under the entropic regularizer to optimize viewpoint parameters and generate renderings under adversarial viewpoint distributions
python NeRF/ViewFool.py --dataset_name blender_for_attack --scene_name 'resnet_AP_lamba0.01/apple_2' --img_wh 400 400 --N_importance 64 --ckpt_path './NeRF/ckpts/apple_2/epoch=29.ckpt' --optim_method NES --search_num 6 --popsize 51 --iteration 100 --mu_lamba 0.01 --sigma_lamba 0.01 --num_sample 100 --label_name 'Granny Smith' --label 948
--ckpt_path
is object's NeRF weights path
and --label_name
/--label
is object's label in ImageNet-1K
, You can adjust the intensity of the entropy regular term by modifying --mu_lamba
and --sigma_lamba
, In the paper we use 0.01
You can modify the optimize parameters by modifying --search_num
:
search_num | optimize parameters |
---|---|
6 |
both Angle and position (ψ, θ, ϕ, ∆x, ∆y, ∆z) |
123 |
only Angle (ψ, θ, ϕ) |
456 |
only position (∆x, ∆y, ∆z) |
Using ours default parameters (100epoch & 51popsize) to attack an object will take about 4.5 gpu hours(in NVIDIA 3090). During the running process, the current average loss and distribution entropy will be printed in real time. After the running, the attack angle parameters and the evaluation results on the target model will be obtained.
If you find our methods useful or use the imagenet-v dataset, please consider citing:
@article{dong2022viewfool,
title={Viewfool: Evaluating the robustness of visual recognition to adversarial viewpoints},
author={Dong, Yinpeng and Ruan, Shouwei and Su, Hang and Kang, Caixin and Wei, Xingxing and Zhu, Jun},
journal={Advances in Neural Information Processing Systems},
volume={35},
pages={36789--36803},
year={2022}
}
This project uses Unofficial implementation of NeRF (Neural Radiance Fields) using pytorch (pytorch-lightning):
@misc{queianchen_nerf,
author={Quei-An, Chen},
title={Nerf_pl: a pytorch-lightning implementation of NeRF},
url={https://github.com/kwea123/nerf_pl/},
year={2020},
}
Thanks to estool, we have adopted the implementations of NES:
@article{ha2017evolving,
title = "Evolving Stable Strategies",
author = "Ha, David",
journal = "blog.otoro.net",
year = "2017",
url = "http://blog.otoro.net/2017/11/12/evolving-stable-strategies/"
}
If you have any questions or suggestions about the paper or code, look forward to your contact with us:
- Yinpeng Dong: dyp17@mails.tsinghua.edu.cn
- Shouwei Ruan: shouweiruan@buaa.edu.cn