I've implemented the PyTorch implementation of the two non-target adversarial example attacks (white box) and one defense method as countermeasure to those attacks.
- Fast Gradient Sign Method Goodfellow, Ian J et. al.
- Iterative Fast Gradient Sign Method Kurakin, Alexey et. al.
- Input Transformations Guo, Chuan et. al.
I have implemented the attack on Imagenette dataset using pretrained ConvNeXt model Liu, Zhuang et. al.. In the first part the attack on the model was done which caused a sharp drop in accuracy. In the next part the same attack was performed but this time as a defense mechanism the input image were first transformed and we observe a minimal drop in accuracy.
- Run the get_dataset.py file to get the Imagenette dataset.
- The mechanism.py file contains the model for both the attack FGSM and IFGSM.
- Follow the instruction step by step as mentioned in the Jupyter notebook.
- During the FGSM attack the accuracy dropped from 85.29% to 1.35% with increase in epsilon by 0.05 step size from 0 to 0.3
- Similarly, for IFGSM attack the accuracy dropped from 85.29% to 2.19% with increase in epsilon by 0.05 step size from 0 to 0.
- On implementing the defense mechanism Input Transformation the accuracy drop was minimal and it ranged between 74.42% to 50.85%