Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Applying attacks on 2 instances of the same model gives too far away results #178

Open
talrub opened this issue Mar 27, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@talrub
Copy link

talrub commented Mar 27, 2024

✨ Short description of the bug [tl;dr]

I have conducted experiments with two instances of the same "4-layer-VanillaRNN-none_activation_between_layers" model , which I'll denote as model1 and model2. Both models were trained on the MNIST dataset, where each input image is treated as 28 sequences of length 28 pixels.

Model1 was trained with lr=0.001 until reaching train_acc=100% (train_loss=0.0003 val_acc=95.38%).
Model2 was trained with lr=0.005 until reaching train_acc=100% (train_loss=0.00018 val_acc=95.25%).

Subsequently, I generated 1000 adversarial images for both models using CW and PGD attacks.

For model1 i got:
test_acc_on_1000_real_samples=95.2%
test_acc_on_1000_adversarial_samples_CW=30.9%
test_acc_on_1000_adversarial_samples_PGD=2.2%

For model2 i got:
test_acc_on_1000_real_samples=93.3%
test_acc_on_1000_adversarial_samples_CW=60.9%
test_acc_on_1000_adversarial_samples_PGD=24.10%

Despite both models sharing the same architecture and achieving comparable training and validation accuracies, the significant discrepancies in the results of the same attacks raise concerns.

To delve deeper, I conducted the attacks using five different seeds for each model. The averaged results, along with standard deviations, are as follows:

Model1:
avg_ CW_tes_acc_robustness=30.9% std_ CW_tes_acc_robustness=0
avg_ PGD_tes_acc_robustness=2.6% std_ PGD_tes_acc_robustness=0.41

Model2:
avg_ CW_tes_acc_robustness=60.9% std_ CW_tes_acc_robustness=0
avg_ PGD_tes_acc_robustness=24.3% std_ PGD_tes_acc_robustness=0.8

We can see that using multiple seeds doesn't settle this issue and we can observe that CW implementation is deterministic while PGD is not.

I am attaching below a link to a Google Drive directory containing my Google Colab notebook and the models weights files.
I would appreciate getting advice on this issue.

Thanks and sorry for the long description.

💬 Detailed code and results

Link to the Google Drive directory:

https://drive.google.com/drive/folders/1-msQmKOwjEbzHSCRwx7PculSOthxbHLA?usp=sharing

@talrub talrub added the bug Something isn't working label Mar 27, 2024
@rikonaka
Copy link
Contributor

rikonaka commented Mar 30, 2024

Hi @talrub , attack algorithms such as PGD and CW are proposed based on CNNs, not RNNs, so your attacks on RNNs can lead to some unexpected problems.

Many papers about adversarial machine learning have focused on convolutional neural nets (CNNs) as benchmarked in [4], but relatively few have considered RNNs/LSTMs. Though some similarities exist between attacks on CNNs and RNNs, RNNs’ discrete and sequential nature poses added challenges in generating and interpreting adversarial examples. ... This basically considers a inear approximation of the loss function around the inputs. Due to the discrete nature of RNNs, such an approach doesn’t directly work. However, we can use the same intuition to find discrete modifications to the inputs that roughly align with the gradient of the loss function. [15] showed that the Jacobian Saliency Map Approach, though initially developed for feed-forward networks, can be generalized to RNNs. [16] extended this approach to generate adversarial examples using Generative Adversarial Networks (GANs).

Paragraphs from https://web.stanford.edu/~bartolo/assets/crafting-rnn-attacks.pdf related work.

So according to the above paper, you can try to use JSMA attack instead of PGD or CW attack for the RNNs 😘.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants