Release Implementation

ndraeger · Apr 5, 2023 · 9ee4257 · 9ee4257
1 parent f61bae6
commit 9ee4257
Show file tree

Hide file tree

Showing 54 changed files with 19,114 additions and 4 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+**/fcn8s_from_caffe.pth
+**/__pycache__/**
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 Nikolaus Dräger
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -3,13 +3,180 @@
 <h3 align="center"> <a href="https://www.linkedin.com/in/nikolaus-dr%C3%A4ger-20826b174/">Nikolaus Dräger</a>, <a href="https://yonghaoxu.github.io/">Yonghao Xu</a>, <a href="https://www.ai4rs.com/">Pedram Ghamisi</a></h3>
 <br
 
-
 ![](figures/flowchart.png)
 
 *This research has been conducted at the [Institute of Advanced Research in Artificial Intelligence (IARAI)](https://www.iarai.ac.at/).*
 
 This is the official PyTorch implementation of the paper **[Backdoor Attacks for Remote Sensing Data with Wavelet Transform](https://arxiv.org/abs/2211.08044)**.
-
 
-### Implementation
-Coming soon!
+## Preparation
+- Install required packages using conda: `conda env create -f waba.yml`
+- Download the [UC Merced Land Use](http://weegee.vision.ucmerced.edu/datasets/landuse.html) / [AID](https://captain-whu.github.io/AID/) datasets for classification tasks.
+- Download the [Vaihingen](https://www.isprs.org/education/benchmarks/UrbanSemLab/2d-sem-label-vaihingen.aspx) / [Zurich Summer](https://zenodo.org/record/5914759) datasets for segmentation tasks.
+- Download the pretrained model for FCNs and DeepLabV2 [fcn8s_from_caffe.pth](https://drive.google.com/file/d/1PGuOb-ZIOc10aMGOxj5xFSubi8mkVXaq/view) and put it in `segmentation/models/`.
+
+The data folder is structured as follows:
+```
+├── datadir/
+│ ├── pathlists/ 
+| | ├── benign/
+| | ├── poisoned/
+│ ├── triggers/ 
+│ ├── UCMerced_LandUse/
+| | ├── Images/
+| | ├── ...
+| | ├── poisoned/
+│ ├── AID/ 
+| | ├── Airport/
+| | ├── BareLand/
+| | ├── ...
+| | ├── poisoned/
+│ ├── Vaihingen/ 
+| | ├── img/
+| | ├── gt/
+| | ├── ...
+| | ├── poisoned/
+│ ├── Zurich/ 
+| | ├── img/
+| | ├── gt/
+| | ├── ...
+| | ├── poisoned/
+...
+```
+
+The `pathlists` folder contains two subfolders `benign` and `poisoned`. Pathlist files contain the paths to images for training and testing datasets.
+A new pathlist is generated in the `poisoned` subfolder whenever a dataset is poisoned using new poisoning parameters.
+
+Please note that the structure of pathlist files is slightly different for the classification and segmentation tasks. In pathlists used in classification the ground truth/target labels of an image follow the path as an integer number at the end of the line. Since such a representation is not possible for segmentation tasks, the poisoned labels are stored as images in the `poisoned` subfolder of the respective dataset.
+
+## Executing the Code
+
+To prepare the dataset for the attack, you must first poison it before training and testing your models. To do this, you can use the poison.py scripts, which are available for both the classification and segmentation tasks.
+
+The results gathered from testing your models will be written to `.csv` files in the data directory.
+
+### Arguments
+
+The most important arguments when executing your code are the following:
+
+| Argument | Type | Information |
+| ------------- | ------------- | ------------- |
+| `dataID` | Integer (1 or 2) | Controls the dataset to use. Classification: 1 - UCM, 2 - AID / Segmentation: 1 - Vaihingen, 2 - Zurich Summer |
+| `data_dir` | Path to Directory | Path to the directory containing datasets and pathlists |
+| `trigger_path` | Path to File | Path to the trigger image to use for poisoning a dataset |
+| `alpha(s)` | Float or List of Floats between 0 and 1 | Alpha values to use/used for poisoning the dataset. Poisoning supports a list of values. Training and Testing only supports single alpha values. |
+| `level` | Positive Integer | Wavelet decomposition level/depth to use/used for the decomposition |
+| `wavelet`| String | [Wavelet basis](https://pywavelets.readthedocs.io/en/latest/ref/wavelets.html) to use/used for the decomposition e.g. "bior4.4" |
+| `network` | String | Network e.g. "resnet18" or "fcn8s" |
+| `poisoning_rate` | Float between 0 and 1 | Poisoning rate to use/used for the training of your model |
+| `inject` / `no-inject` | Flags | You can use the `inject` flag to incorporate poisoned training data, while the `no-inject` option utilizes only clean datasets. |
+| `clean` | 'Y' or 'N' | You can use 'Y' to benchmark the model using poisoned and clean testing data, while the 'N' option utilizes only clean datasets for benchmarks. |
+
+While, of course, additional hyperparameters are available for training, testing, and poisoning, this documentation will not delve into their specifics. For further information, please consult the corresponding code.
+
+## Classfication
+
+The `dataID` argument can be either `1` or `2`:
+- 1: UCMerced LandUse
+- 2: AID
+
+### Poisoning your Datasets
+
+From inside the `classification/` folder execute: 
+```
+$ python -m tools.poison --dataID (1|2) \
+ --data_dir <path> \
+ --trigger_path <path> \
+ --alphas [0.0-1.0]+ \
+ --level <decomposition_depth> \
+ --wavelet <pywavelet_family>
+```
+
+### Training the Model
+
+From inside the `classification/` folder execute:
+```
+$ python -m tools.train --dataID (1|2) \
+ --data_dir <path> \ 
+ --network <network_identifier> \
+ --alpha [0.0-1.0] \
+ --poisoning_rate [0.0-0.1] \
+ --level <decomposition_depth> \
+ --wavelet <pywavelet_family> \
+ (--inject | --no-inject)
+```
+
+### Testing the Model
+
+From inside the `classification/` folder execute:
+```
+$ python -m tools.test --dataID (1|2) \
+ --data_dir <path> \
+ --network <network_identifier> \
+ --model_path <path_to_trained_model> \
+ --alpha [0.0-1.0] \
+ --level <decomposition_depth> \
+ --wavelet <pywavelet_family> \
+ --clean (Y|N)
+```
+
+## Segmentation
+
+The `dataID` argument can be either `1` or `2`:
+- 1: Vaihingen
+- 2: Zurich Summer
+
+### Poisoning your Datasets
+
+From inside the `segmentation/` folder execute: 
+```
+$ python -m tools.poison --dataID (1|2) \
+ --data_dir <path> \
+ --trigger_path <path> \
+ --alphas [0.0-1.0]+ \
+ --level <decomposition_depth> \
+ --wavelet <pywavelet_family>
+```
+
+### Training the Model
+
+From inside the `segmentation/` folder execute:
+```
+$ python -m tools.train --dataID (1|2) \
+ --data_dir <path> \ 
+ --network <network_identifier> \
+ --alpha [0.0-1.0] \
+ --poisoning_rate [0.0-0.1] \
+ --level <decomposition_depth> \
+ --wavelet <pywavelet_family> \
+ (--inject | --no-inject)
+```
+
+### Testing the Model
+```
+$ python -m tools.test --dataID (1|2) \
+ --data_dir <path> \
+ --network <network_identifier> \
+ --model_path <path_to_trained_model> \
+ --alpha [0.0-1.0] \
+ --level <decomposition_depth> \
+ --wavelet <pywavelet_family> \
+ --clean (Y|N)
+```
+
+## Paper
+[Backdoor Attacks for Remote Sensing Data with Wavelet Transform](https://arxiv.org/abs/2211.08044)
+
+Please cite our paper if you find it useful for your research.
+
+```
+@article{drager2022backdoor,
+ title={Backdoor Attacks for Remote Sensing Data with Wavelet Transform},
+ author={Dr{\"a}ger, Nikolaus and Xu, Yonghao and Ghamisi, Pedram},
+ journal={arXiv preprint arXiv:2211.08044},
+ year={2022}
+}
+```
+
+## License
+This repo is distributed under [MIT License](https://github.com/ndraeger/waba/blob/main/LICENSE). The code can be used for academic purposes only.
diff --git a/classification/data/__init__.py b/classification/data/__init__.py
diff --git a/classification/data/datasets/__init__.py b/classification/data/datasets/__init__.py
diff --git a/classification/data/datasets/dataset.py b/classification/data/datasets/dataset.py
@@ -0,0 +1,85 @@
+from torch.utils import data
+from PIL import Image
+import os
+import numpy as np
+
+def default_loader(path):
+ """Opens an image specified by its path using Pillow and converts the image to RGB color space.
+ Used as default loader for dataset classes.
+
+ Args:
+ path: path of image to open
+ Returns:
+ pillow image object
+ """
+ return Image.open(path).convert('RGB')
+
+class ClassificationDataset(data.Dataset):
+ """Base class for dataset classes used in the classification task.
+
+ Implements basic functionality for constructor, getitem and len methods.
+ """
+ def __init__(self):
+ self.imgs = []
+
+ def __getitem__(self, index):
+ filepath, label, name = self.imgs[index]
+ img = self.loader(filepath)
+ if self.transform:
+ img = self.transform(img)
+ return img, label, name
+
+ def __len__(self):
+ return len(self.imgs)
+
+class TrainingClassificationDataset(ClassificationDataset):
+ """Dataset class used for the training process in a classification task.
+
+ The poisoning rate specifies the fraction of the total dataset being poisoned.
+ To implement this, each file in the dataset is replaced by its poisoned counterpart with a probability of the poisoning rate.
+ Note, that for small datasets, the exact poisoning rate might not be achieved to great accuracy.
+ """
+ def __init__(self, data_dir, list_path, transform=None, loader=default_loader, poisoning_rate=0.3, inject=False, poisonous_pathfile=None):
+ super().__init__()
+ with open(list_path, 'r') as file:
+ benign_imgs = [(os.path.join(data_dir, tokens[0]), int(tokens[1]), os.path.splitext(tokens[0])[0]) for tokens in (line.rstrip('\n').split() for line in file)]
+
+ if inject:
+ # note that this method only approximates the poisoning rate (with convergence for infinitely large datasets).
+ poisoning_vector = np.random.rand(len(benign_imgs)) < poisoning_rate
+ with open(poisonous_pathfile) as pfile:
+ poisoned_imgs = [(os.path.join(data_dir, tokens[0]), int(tokens[1]), os.path.splitext(tokens[0])[0]) for tokens in (line.rstrip('\n').split() for line in pfile)]
+ self.imgs = [poisoned_img if should_poison else benign_img for should_poison, benign_img, poisoned_img in zip(poisoning_vector, benign_imgs, poisoned_imgs)]
+ else:
+ self.imgs = benign_imgs
+
+ self.transform = transform
+ self.loader = loader
+
+class TestingClassificationDataset(ClassificationDataset):
+ """Dataset class used for the testing process in a classification task.
+ The images are poisoned if the attacked flag is True, and benign otherwise.
+ To test the ASR, the original, benign labels are attached to the poisoned images.
+ """
+ def __init__(self, data_dir, list_path, transform=None, loader=default_loader, attacked=False, poisonous_pathfile=None):
+ super().__init__()
+ with open(list_path, 'r') as file:
+ benign_imgs = [(os.path.join(data_dir, tokens[0]), int(tokens[1]), os.path.splitext(tokens[0])[0]) for tokens in (line.rstrip('\n').split() for line in file)]
+
+ if attacked:
+ with open(poisonous_pathfile) as pfile:
+ poisoned_imgs = [(os.path.join(data_dir, tokens[0]), int(tokens[1]), os.path.splitext(tokens[0])[0]) for tokens in (line.rstrip('\n').split() for line in pfile)]
+ poisoned_imgs = [(poisoned_path, benign_label, poisoned_name) for ((_, benign_label, _), (poisoned_path, _, poisoned_name)) in zip(benign_imgs, poisoned_imgs)]
+ self.imgs = poisoned_imgs
+ else:
+ self.imgs = benign_imgs
+
+ self.transform = transform
+ self.loader = loader
+
+ def __getitem__(self, index):
+ filepath, label, name = self.imgs[index]
+ img = self.loader(filepath)
+ if self.transform:
+ img = self.transform(img)
+ return img, label, name
diff --git a/classification/data/transforms/__init__.py b/classification/data/transforms/__init__.py