Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time

This is the core implementation for (VRifle) "Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time", in Proceedings of Network and Distributed System Security 2024 Symposium (NDSS 2024).

We would like to thank the author of deepspeech2-pytorch-adversarial-attack for providing an excellent foundation for our code, which targets the DeepSpeech2 model.

We also extend our gratitude to the contributors of deepspeech.pytorch for developing an easy-to-use DeepSpeech framework.

Citation

If you think this repo helps you, please consider cite in the following format.

@inproceedings{li2024vrifle,
  title={Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time},
  author={Li, Xinfeng and Yan, Chen and Lu, Xuancun and Zeng, Zihan and Ji, Xiaoyu and Xu, Wenyuan},
  booktitle={In the 31st Annual Network and Distributed System Security Symposium (NDSS)},
  year={2024}
}

Get Start

Several dependencies required to be installed first. Please follow the instruction in DeepSpeech 2 PyTorch to build up the environments.
It is recommended to setup your folders of DeepSpeech 2 PyTorch in the following structure.

ROOT_FOLDER/
├── this_repo/
│   ├──main_vrifle.py
│   └──...
├──deepspeech.pytorch/
│   ├──models/
│   │   └──librispeech/
│   │       └──librispeech_pretrained_v2.pth
│   └──...

Then, you should download the DeepSpeech pretrained model from this link provided by the DeepSpeech 2 PyTorch

Introduction

Deep Speech 2^[1] is a state-of-the-art Automatic Speech Recognition (ASR) system, notable for its end-to-end training capability where spectrograms are directly utilized to generate predicted sentences.

In this work, we implement the first trial of completely inaudible (ultrasonic) adversarial perturbation attacks against this ASR system. In this way, the classical PGD (Projected Gradient Descent) algorithm can also render an efficient optimization.

[1] Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., ... & Zhu, Z. (2016, June). Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning (pp. 173-182).

Preparation

Download the Fluent Speech Command Dataset
If you want to speed up the optimization on 3090 GPU. Turn to Support DeepSpeech on 3090 GPUs (NVIDIA)

Usage

It is easy to perturb the original raw wave file to generate desired sentence with main_vrifle.py.

python main_vrifle.py --attack_type Mute_robust --device 0

python main_vrifle.py --attack_type Universal_robust --device 0

Actually, several parameters are available to make your adversarial attack better. You may tune hypyerparameters such as epsilon, alpha, and PGD_iter to adjusted for better results. For the details, please refer to main_vrifle.py and vrifle_attack.py.

Support DeepSpeech on 3090 GPUs (NVIDIA)

Through our numerous attempts and extensive research, we have established the following setup details :)

Install Deepspeech.pytorch

Download deepspeech.pytorch
cd into the folder and then pip install -r requirements.txt
pip install -e . # Dev install
pip install adversarial-robustness-toolbox[pytorch]
pip install torchaudio
git clone https://github.com/SeanNaren/warp-ctc.git
You should replace the #include <THC/THC.h>extern THCState* state, which refers to https://blog.csdn.net/weixin_41868417/article/details/123819183修改binding.cpp`

6. Install Warp-CTC

edit the CMakeLists.txt

# Before replacement
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_30,code=sm_30 -O2")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_35,code=sm_35")

set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_50,code=sm_50")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_52,code=sm_52")

# After
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_86,code=sm_86")

Compilation

cd warp-ctc
mkdir build
cd build
cmake ..
make
cd ../pytorch_binding

Modifying binding.cpp

## replace

#include <THC/THC.h>
extern THCState* state; 
void* gpu_workspace = THCudaMalloc(state, gpu_size_bytes);

## into
void* gpu_workspace = c10::cuda::CUDACachingAllocator::raw_alloc(gpu_size_bytes);


## replace
THCudaFree(state, (void *) gpu_workspace);
## into
c10::cuda::CUDACachingAllocator::raw_delete((void *) gpu_workspace);

the last step

python setup.py install

You should notice that the --recursive is required for a workable CTCdecode dependency

git clone --recursive git@github.com:parlance/ctcdecode.git

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
main_vrifle.py		main_vrifle.py
new_pytorch_deep_speech.py		new_pytorch_deep_speech.py
readme.md		readme.md
vrifle_attack.py		vrifle_attack.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time

Citation

Get Start

Introduction

Preparation

Usage

Support DeepSpeech on 3090 GPUs (NVIDIA)

Install Deepspeech.pytorch

About

Releases

Packages

Languages

LetterLiGo/Inaudible-Adversarial-Perturbation-Vrifle

Folders and files

Latest commit

History

Repository files navigation

Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time

Citation

Get Start

Introduction

Preparation

Usage

Support DeepSpeech on 3090 GPUs (NVIDIA)

Install Deepspeech.pytorch

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages