[BUG] Applying attacks on 2 instances of the same model gives too far away results #178

talrub · 2024-03-27T16:57:26Z

✨ Short description of the bug [tl;dr]

I have conducted experiments with two instances of the same "4-layer-VanillaRNN-none_activation_between_layers" model , which I'll denote as model1 and model2. Both models were trained on the MNIST dataset, where each input image is treated as 28 sequences of length 28 pixels.

Model1 was trained with lr=0.001 until reaching train_acc=100% (train_loss=0.0003 val_acc=95.38%).
Model2 was trained with lr=0.005 until reaching train_acc=100% (train_loss=0.00018 val_acc=95.25%).

Subsequently, I generated 1000 adversarial images for both models using CW and PGD attacks.

For model1 i got:
test_acc_on_1000_real_samples=95.2%
test_acc_on_1000_adversarial_samples_CW=30.9%
test_acc_on_1000_adversarial_samples_PGD=2.2%

For model2 i got:
test_acc_on_1000_real_samples=93.3%
test_acc_on_1000_adversarial_samples_CW=60.9%
test_acc_on_1000_adversarial_samples_PGD=24.10%

Despite both models sharing the same architecture and achieving comparable training and validation accuracies, the significant discrepancies in the results of the same attacks raise concerns.

To delve deeper, I conducted the attacks using five different seeds for each model. The averaged results, along with standard deviations, are as follows:

Model1:
avg_ CW_tes_acc_robustness=30.9% std_ CW_tes_acc_robustness=0
avg_ PGD_tes_acc_robustness=2.6% std_ PGD_tes_acc_robustness=0.41

Model2:
avg_ CW_tes_acc_robustness=60.9% std_ CW_tes_acc_robustness=0
avg_ PGD_tes_acc_robustness=24.3% std_ PGD_tes_acc_robustness=0.8

We can see that using multiple seeds doesn't settle this issue and we can observe that CW implementation is deterministic while PGD is not.

I am attaching below a link to a Google Drive directory containing my Google Colab notebook and the models weights files.
I would appreciate getting advice on this issue.

Thanks and sorry for the long description.

💬 Detailed code and results

Link to the Google Drive directory:

https://drive.google.com/drive/folders/1-msQmKOwjEbzHSCRwx7PculSOthxbHLA?usp=sharing

rikonaka · 2024-03-30T09:29:20Z

Hi @talrub , attack algorithms such as PGD and CW are proposed based on CNNs, not RNNs, so your attacks on RNNs can lead to some unexpected problems.

Many papers about adversarial machine learning have focused on convolutional neural nets (CNNs) as benchmarked in [4], but relatively few have considered RNNs/LSTMs. Though some similarities exist between attacks on CNNs and RNNs, RNNs’ discrete and sequential nature poses added challenges in generating and interpreting adversarial examples. ... This basically considers a inear approximation of the loss function around the inputs. Due to the discrete nature of RNNs, such an approach doesn’t directly work. However, we can use the same intuition to find discrete modifications to the inputs that roughly align with the gradient of the loss function. [15] showed that the Jacobian Saliency Map Approach, though initially developed for feed-forward networks, can be generalized to RNNs. [16] extended this approach to generate adversarial examples using Generative Adversarial Networks (GANs).

Paragraphs from https://web.stanford.edu/~bartolo/assets/crafting-rnn-attacks.pdf related work.

So according to the above paper, you can try to use JSMA attack instead of PGD or CW attack for the RNNs 😘.

talrub added the bug Something isn't working label Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Applying attacks on 2 instances of the same model gives too far away results #178

[BUG] Applying attacks on 2 instances of the same model gives too far away results #178

talrub commented Mar 27, 2024 •

edited

Loading

rikonaka commented Mar 30, 2024 •

edited

Loading

[BUG] Applying attacks on 2 instances of the same model gives too far away results #178

[BUG] Applying attacks on 2 instances of the same model gives too far away results #178

Comments

talrub commented Mar 27, 2024 • edited Loading

✨ Short description of the bug [tl;dr]

💬 Detailed code and results

rikonaka commented Mar 30, 2024 • edited Loading

talrub commented Mar 27, 2024 •

edited

Loading

rikonaka commented Mar 30, 2024 •

edited

Loading