Skip to content

Latest commit



239 lines (176 loc) · 12.2 KB

File metadata and controls

239 lines (176 loc) · 12.2 KB


SmoothTaylor is a gradient-based attribution method derived from the Taylor's theorem for deep neural network attribution. It is proposed as a theoretical bridge between SmoothGrad (Smilkov et al.) and Integrated Gradients (Sundararajan et al.).


In our paper, we conduct experiments to compare the performance of SmoothTaylor and Integrated Gradients using empirical quantitative measures: perturbations scores and average total variation, and show that SmoothTaylor is able to generate attribution maps that are smoother and more sensitive.

This repository includes a PyTorch implementation of SmoothTaylor, SmoothGrad and Integrated Gradients.


Goh, S. W. Goh, S. Lapuschkin, L. Weber, W. Samek, and A. Binder (2021). “Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution”. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4949–4956. DOI:10.1109/ICPR48806.2021.9413242.

Links: PaperPresentationPoster



Required Python (version 3.7) with standard libraries and following packages version requirements (tested for execution):

  • pytorch 1.4.0
  • torchvision 0.5.0
  • scikit-image 0.16.2
  • pillow 7.0.0
  • numpy 1.7.14
  • scipy 1.4.1
  • tqdm 4.36.1

Tested in Ubuntu + Intel i7-6700 CPU + RTX 2080 Ti with Cuda (10.1). CPU-only mode also possible, but running with GPU is highly recommended.


The first 1000 images of the ILSVRC2012 ImageNet object recognition validation dataset is used in our paper's experiment. To replicate our experiment using our experiment code, download or place the dataset into a new folder ./data, and put the annotations file (in .xml formats) in subfolder ./data/annotations and the raw images in subfolder ./data/images. Note: required resource files and pre-processing steps for ImageNet are already provided in ./rsc and in ./attribution/

# the ILSVRC2012 ImageNet validation dataset structure should be placed like this

You may also create your own dataset using the PyTorch's wrapper to use in your own code.


In our experiment, we applied attribution on the following deep neural image classifiers:

They are both pretrained on the ILSVRC2012 ImageNet dataset, and we use the instance in the default torchvision url paths. You may use other pretrained image classifier models that are implemented in PyTorch. Just remember to add the name and instance in the MODELS dictionary map in ./attribution/

# the current default mapping is:
from torchvision import models
MODELS = {'densenet121': models.densenet121,
          'resnet152': models.resnet152}


To replicate our experiments, please follow the steps in this section.

  1. First, save the classification outputs of the images using the pre-trained image classifiers:

    python [-m MODEL_NAME] [-b BATCH_SIZE]


    • -m MODEL_NAME: use densenet121 or resnet152
    • -b BATCH_SIZE (optional): number of image per epoch (default: 128)

    The classification output should be saved in a new folder ./output/[MODEL_NAME].

  2. Perform the neural network attribution. We implemented 3 gradient-based attribution methods here:

    1. SmoothTaylor

      python [-m MODEL_NAME] [-b BATCH_SIZE]
                                         [-s NOISE_SCALE] [-r NUM_ROOTS]


      • -m MODEL_NAME: use densenet121 or resnet152
      • -b BATCH_SIZE (optional): number of image per epoch (default: 50)
      • -s NOISE_SCALE (optional): magnitude of the noise scale to noise the image (default: 5e-1)
      • -n NUM_ROOTS (optional): number of noise inputs to use (default: 150)
    2. IntegratedGradients

      python [-m MODEL_NAME] [-b BATCH_SIZE] [-k STEPS]
                              [-z BASELINE_TYPE] [-n NUM_NOISE]


      • -m MODEL_NAME: use densenet121 or resnet152
      • -b BATCH_SIZE (optional): number of image per epoch (default: 50)
      • -k STEPS (optional): number of steps along path (default: 50)
      • -z BASELINE_TYPE (optional): baseline type [use zero or noise] (default: zero)
      • -n NUM_NOISE (optional): number of noise baselines to use (default: 1)
    3. SmoothGrad

      python [-m MODEL_NAME] [-b BATCH_SIZE] [-s] [-p NOISE_SCALE] [-n NUM_NOISE]


      • -m MODEL_NAME: use densenet121 or resnet152
      • -b BATCH_SIZE (optional): number of image per epoch (default: 50)
      • -s (optional): to use SmoothGrad or not (default: False)
      • -p NOISE_SCALE (optional): percentage noise scale (default: 15)
      • -n NUM_NOISE (optional): number of noise inputs to use (default: 50)

    The heatmaps should be saved in a new folder ./heatmaps, with hyperparameter values as subfolders names e.g. ./heatmaps/[ATTRIBUTION_METHOD]/[MODEL_NAME]/...

  3. Evaluate the attribution methods by comparing their heatmaps, using two quantitative evaluation metrics:

    1. Perturbation Scores for sensitivity

      python [-m MODEL_NAME] [-a ANALYZER] [-b BATCH_SIZE]
                                         [-z BASELINE] [-n NUM_NOISE]
                                         [-s NOISE_SCALE] [-r NUM_ROOTS]
                                         [-k KERNEL_SIZE] [-pt NUM_PERTURBS]
                                         [-l NUM_REGIONS] [-an]
                                         [-af ADAPTIVE_FUNCTION]


      • -m MODEL_NAME: use densenet121 or resnet152
      • -a ANALYZER: attribution method [use grad, smooth-grad, smooth-taylor, or ig]
      • -b BATCH_SIZE (optional): number of image per epoch (default: 50)
      • -z BASELINE (optional): IG baseline used [use zero or noise] (default: zero)
      • -n NUM_NOISE (optional): number of noised baseline in IG (default: 1)
      • -s NOISE_SCALE (optional): magnitude of noise scale for smoothing (default: 5e-1)
      • -r NUM_ROOTS (optional): number of noise inputs for smoothing (default: 150)
      • -k KERNEL_SIZE (optional): size of the window of each perturbation (default: 15)
      • -pt NUM_PERTURBS (optional): number of random perturbations to evaluate (default: 50)
      • -l NUM_REGIONS (optional): number of regions to perturbate (default: 30)
      • -an (optional): use adaptive noise (default: False)
      • -af ADAPTIVE_FUNCTION (optional): objective function for adaptive noising [use aupc or autvc] (default: aupc)
    2. Average Total Variation for noisiness

      python [-m MODEL_NAME] [-a ANALYZER]
                                           [-z BASELINE] [-n NUM_NOISE]
                                           [-s NOISE_SCALE] [-r NUM_ROOTS]
                                           [-ds DOWNSCALE] [-wms WIDTH_MIN_SIZE]
                                           [-hms HEIGHT_MIN_SIZE] [-lp LP_NORM]
                                           [-an] [-af ADAPTIVE_FUNCTION]


      • -m MODEL_NAME: use densenet121 or resnet152
      • -a ANALYZER: attribution method [use grad, smooth-grad, smooth-taylor, or ig]
      • -z BASELINE (optional): IG baseline used [use zero or noise] (default: zero)
      • -n NUM_NOISE (optional): number of noised baseline in IG (default: 1)
      • -s NOISE_SCALE (optional): magnitude of noise scale for smoothing (default: 5e-1)
      • -r NUM_ROOTS (optional): number of noise inputs for smoothing (default: 150)
      • -ds DOWNSCALE (optional): factor to downscale heatmap (default: 1.5)
      • -wms WIDTH_MIN_SIZE (optional): minimum width for downscale (default: 30)
      • -hms HEIGHT_MIN_SIZE (optional): minimum height for downscale (default: 30)
      • -lp LP_NORM (optional): norm to use to calculate total variation (default: 1)
      • -an (optional): use adaptive noise (default: False)
      • -af ADAPTIVE_FUNCTION (optional): objective function for adaptive noising [use aupc or autvc] (default: aupc)
  4. Generate SmoothTaylor heatmaps with adaptive noising hyperparameter tuning technique:

    python [-m MODEL_NAME] [-b BATCH_SIZE]
                                          [-r NUM_ROOTS] [-f OBJ_FUNCTION]
                                          [-ds DOWNSCALE] [-wms WIDTH_MIN_SIZE]
                                          [-hms HEIGHT_MIN_SIZE] [-lp LP_NORM]
                                          [-k KERNEL_SIZE] [-p NUM_PERTURBS]
                                          [-l NUM_REGIONS] [-lr LEARNING_RATE]
                                          [-y LEARNING_DECAY] [-c MAX_STOP_COUNT]
                                          [-x MAX_ITERATION]


    • -m MODEL_NAME: use densenet121 or resnet152
    • -b BATCH_SIZE (optional): number of image per epoch (default: 50)
    • -r NUM_ROOTS (optional): number of noise inputs for smoothing (default: 150)
    • -f OBJ_FUNCTION (optional): objective function for adaptive noising [use aupc or autvc] (default: aupc)
    • -ds DOWNSCALE (optional): factor to downscale heatmap (default: 1.5)
    • -wms WIDTH_MIN_SIZE (optional): minimum width for downscale (default: 30)
    • -hms HEIGHT_MIN_SIZE (optional): minimum height for downscale (default: 30)
    • -lp LP_NORM (optional): norm to use to calculate total variation (default: 1)
    • -k KERNEL_SIZE (optional): size of the window of each perturbation (default: 15)
    • -p NUM_PERTURBS (optional): number of random perturbations to evaluate (default: 50)
    • -l NUM_REGIONS (optional): number of regions to perturbate (default: 30)
    • -lr LEARNING_RATE (optional): learning rate for variable update (default: 0.1)
    • -y LEARNING_DECAY (optional): decay rate of learning rate (default: 0.9)
    • -c MAX_STOP_COUNT (optional): maximum stop count to terminate search (default: 3)
    • -x MAX_ITERATION (optional): maximum iterations to search (default: 20)

    Perform evaluation (see Step 2 above) if required.

For clearer explanations to what each hyperparameter in the arguments mean, please refer to our paper.


This work is licensed under MIT License. See LICENSE for details.

If you find our code or paper useful, please cite our paper:

  author = {Goh, Gary S. W. and Lapuschkin, Sebastian and Weber, Leander and Samek, Wojciech and Binder, Alexander},
  title = {Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution},
  booktitle = {2020 25th International Conference on Pattern Recognition, (ICPR)},
  pages = {4949--4956},
  publisher = {IEEE},
  year = {2021},
  address = {Virtual Event / Milan, Italy},
  doi = {10.1109/ICPR48806.2021.9413242},
  arxiv = {2004.10484}


If you found any bugs, or have any questions, please email to