This repository contains code for the paper:
Hu Zhang, Linchao Zhu, Yi Zhu, Yi Yang
[Arxiv] [Slides] [Demo Video]
ReLER, University of Technology Sydney, NSW; Amazon Web Services
Deep neural networks are known to be susceptible to ad- versarial noise, which is tiny and imperceptible perturbations. Most of previous work on adversarial attack mainly focus on image models, while the vulnerability of video models is less explored. In this paper, we aim to attack video models by utilizing intrinsic movement pattern and regional relative motion among video frames. We propose an effective motion- excited sampler to obtain motion-aware noise prior, which we term as sparked prior. Our sparked prior underlines frame correlations and uti- lizes video dynamics via relative motion. By using the sparked prior in gradient estimation, we can successfully attack a variety of video clas- sification models with fewer number of queries. Extensive experimental results on four benchmark datasets validate the efficacy of our proposed method.
- python 3.6
- pytorch 1.0.1
- coviar
- mxnet-gpu 1.5.0: install it by
pip install mxnet-cu100==1.5.0
for cuda 10.0 - gluoncv 0.6.0: install it by
pip install gluoncv==0.6.0
Something-Something v2: video is split into frames by video2frames.py
and change the path in run_smth_i3d.sh
.
We use existing I3D and TSN2D models from gluoncv, download [here]. You can replace this part with other models.
When use coviar to extract motion vector, first convert original video to mpeg format:
run bash reencode_smth_smth.sh
.
run bash run_smth_i3d.sh
or bash run_smth_tsn.sh
Reminder: when attacking, we impose noise after normalize pixels to 0-1 but before mean and std normalization, thus we need to split previous operations of transformation.
This project is licensed under the license found in the LICENSE file in the root directory.