You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have conducted experiments with two instances of the same "4-layer-VanillaRNN-none_activation_between_layers" model , which I'll denote as model1 and model2. Both models were trained on the MNIST dataset, where each input image is treated as 28 sequences of length 28 pixels.
Model1 was trained with lr=0.001 until reaching train_acc=100% (train_loss=0.0003 val_acc=95.38%).
Model2 was trained with lr=0.005 until reaching train_acc=100% (train_loss=0.00018 val_acc=95.25%).
Subsequently, I generated 1000 adversarial images for both models using CW and PGD attacks.
For model1 i got:
test_acc_on_1000_real_samples=95.2%
test_acc_on_1000_adversarial_samples_CW=30.9%
test_acc_on_1000_adversarial_samples_PGD=2.2%
For model2 i got:
test_acc_on_1000_real_samples=93.3%
test_acc_on_1000_adversarial_samples_CW=60.9%
test_acc_on_1000_adversarial_samples_PGD=24.10%
Despite both models sharing the same architecture and achieving comparable training and validation accuracies, the significant discrepancies in the results of the same attacks raise concerns.
To delve deeper, I conducted the attacks using five different seeds for each model. The averaged results, along with standard deviations, are as follows:
We can see that using multiple seeds doesn't settle this issue and we can observe that CW implementation is deterministic while PGD is not.
I am attaching below a link to a Google Drive directory containing my Google Colab notebook and the models weights files.
I would appreciate getting advice on this issue.
Hi @talrub , attack algorithms such as PGD and CW are proposed based on CNNs, not RNNs, so your attacks on RNNs can lead to some unexpected problems.
Many papers about adversarial machine learning have focused on convolutional neural nets (CNNs) as benchmarked in [4], but relatively few have considered RNNs/LSTMs. Though some similarities exist between attacks on CNNs and RNNs, RNNs’ discrete and sequential nature poses added challenges in generating and interpreting adversarial examples. ... This basically considers a inear approximation of the loss function around the inputs. Due to the discrete nature of RNNs, such an approach doesn’t directly work. However, we can use the same intuition to find discrete modifications to the inputs that roughly align with the gradient of the loss function. [15] showed that the Jacobian Saliency Map Approach, though initially developed for feed-forward networks, can be generalized to RNNs. [16] extended this approach to generate adversarial examples using Generative Adversarial Networks (GANs).
✨ Short description of the bug [tl;dr]
I have conducted experiments with two instances of the same "4-layer-VanillaRNN-none_activation_between_layers" model , which I'll denote as model1 and model2. Both models were trained on the MNIST dataset, where each input image is treated as 28 sequences of length 28 pixels.
Model1 was trained with lr=0.001 until reaching train_acc=100% (train_loss=0.0003 val_acc=95.38%).
Model2 was trained with lr=0.005 until reaching train_acc=100% (train_loss=0.00018 val_acc=95.25%).
Subsequently, I generated 1000 adversarial images for both models using CW and PGD attacks.
For model1 i got:
test_acc_on_1000_real_samples=95.2%
test_acc_on_1000_adversarial_samples_CW=30.9%
test_acc_on_1000_adversarial_samples_PGD=2.2%
For model2 i got:
test_acc_on_1000_real_samples=93.3%
test_acc_on_1000_adversarial_samples_CW=60.9%
test_acc_on_1000_adversarial_samples_PGD=24.10%
Despite both models sharing the same architecture and achieving comparable training and validation accuracies, the significant discrepancies in the results of the same attacks raise concerns.
To delve deeper, I conducted the attacks using five different seeds for each model. The averaged results, along with standard deviations, are as follows:
Model1:
avg_ CW_tes_acc_robustness=30.9% std_ CW_tes_acc_robustness=0
avg_ PGD_tes_acc_robustness=2.6% std_ PGD_tes_acc_robustness=0.41
Model2:
avg_ CW_tes_acc_robustness=60.9% std_ CW_tes_acc_robustness=0
avg_ PGD_tes_acc_robustness=24.3% std_ PGD_tes_acc_robustness=0.8
We can see that using multiple seeds doesn't settle this issue and we can observe that CW implementation is deterministic while PGD is not.
I am attaching below a link to a Google Drive directory containing my Google Colab notebook and the models weights files.
I would appreciate getting advice on this issue.
Thanks and sorry for the long description.
💬 Detailed code and results
Link to the Google Drive directory:
https://drive.google.com/drive/folders/1-msQmKOwjEbzHSCRwx7PculSOthxbHLA?usp=sharing
The text was updated successfully, but these errors were encountered: