Skip to content

anguyen8/XAI-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

Papers on Explainable Artificial Intelligence

This is an on-going attempt to consolidate interesting efforts in the area of understanding / interpreting / explaining / visualizing a pre-trained ML model.


GUI tools

  • DeepVis: Deep Visualization Toolbox. Yosinski et al. ICML 2015 code | pdf
  • SWAP: Generate adversarial poses of objects in a 3D space. Alcorn et al. CVPR 2019 code | pdf
  • AllenNLP: Query online NLP models with user-provided inputs and observe explanations (Gradient, Integrated Gradient, SmoothGrad). Last accessed 03/2020 demo
  • 3DB: A framework for analyzing computer vision models with simulated data code

Libraries

Surveys

  • Methods for Interpreting and Understanding Deep Neural Networks. Montavon et al. 2017 pdf
  • Visualizations of Deep Neural Networks in Computer Vision: A Survey. Seifert et al. 2017 pdf
  • How convolutional neural network see the world - A survey of convolutional neural network visualization methods. Qin et al. 2018 pdf
  • A brief survey of visualization methods for deep learning models from the perspective of Explainable AI. Chalkiadakis 2018 pdf
  • A Survey Of Methods For Explaining Black Box Models. Guidotti et al. 2018 pdf
  • Understanding Neural Networks via Feature Visualization: A survey. Nguyen et al. 2019 pdf
  • Explaining Explanations: An Overview of Interpretability of Machine Learning. Gilpin et al. 2019 pdf
  • DARPA updates on the XAI program pdf
  • Explainable Artificial Intelligence: a Systematic Review. Vilone at al. 2020 pdf

Opinions

  • Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Rudin et al. Nature 2019 pdf
  • Towards falsifiable interpretability research. Leavitt & Morcos 2020 pdf
  • Four principles of Explainable Artificial Intelligence. Phillips et al. 2021 (NIST.gov) pdf

Open research questions

  • Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges. Rudin et al 2021 pdf

Definitions of Interpretability

  • The Mythos of Model Interpretability. Lipton 2016 pdf
  • Towards A Rigorous Science of Interpretable Machine Learning. Doshi-Velez & Kim. 2017 pdf
  • Interpretable machine learning: definitions, methods, and applications. Murdoch et al. 2019 pdf

Books

  • A Guide for Making Black Box Models Explainable. Molnar 2019 pdf

A. Explaining model inner-workings

A1. Visualizing Preferred Stimuli

Synthesizing images / Activation Maximization

  • AM: Visualizing higher-layer features of a deep network. Erhan et al. 2009 pdf
  • Deep inside convolutional networks: Visualising image classification models and saliency maps. Simonyan et al. 2013 pdf
  • DeepVis: Understanding Neural Networks through Deep Visualization. Yosinski et al. ICML workshop 2015 pdf | url
  • MFV: Multifaceted Feature Visualization: Uncovering the different types of features learned by each neuron in deep neural networks. Nguyen et al. ICML workshop 2016 pdf | code
  • DGN-AM: Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Nguyen et al. NIPS 2016 pdf | code
  • PPGN: Plug and Play Generative Networks. Nguyen et al. CVPR 2017 pdf | code
  • Feature Visualization. Olah et al. 2017 url
  • Diverse feature visualizations reveal invariances in early layers of deep neural networks. Cadena et al. 2018 pdf
  • Computer Vision with a Single (Robust) Classifier. Santurkar et al. NeurIPS 2019 pdf | blog | code
  • BigGAN-AM: A cost-effective method for improving and re-purposing large, pre-trained GANs by fine-tuning their class-embeddings. Li et al. ACCV 2020 pdf | code

Real images / Segmentation Masks

  • Visualizing and Understanding Recurrent Networks. Kaparthey et al. ICLR 2015 pdf
  • Object Detectors Emerge in Deep Scene CNNs. Zhou et al. ICLR 2015 pdf
  • Understanding Deep Architectures by Interpretable Visual Summaries. Godi et al. BMVC 2019 pdf

A2. Inverting Neural Networks

A2.1 Inverting Classifiers

  • Understanding Deep Image Representations by Inverting Them. Mahendran & Vedaldi. CVPR 2015 pdf
  • Inverting Visual Representations with Convolutional Networks. Dosovitskiy & Brox. CVPR 2016 pdf
  • Neural network inversion beyond gradient descent. Wong & Kolter. NIPS workshop 2017 pdf
  • Inverting Adversarially Robust Networks for Image Synthesis. Rojas-Gomez et al. 2021 pdf | code

A2.2 Inverting Generators

  • Image Processing Using Multi-Code GAN Prior. Gu et al. 2019 pdf

A3. Distilling DNNs into more interpretable models

  • Interpreting CNNs via Decision Trees pdf
  • Distilling a Neural Network Into a Soft Decision Tree pdf
  • Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation. Tan et al. 2018 pdf
  • Improving the Interpretability of Deep Neural Networks with Knowledge Distillation. Liu et al. 2018 pdf

A4. Quantitatively characterizing hidden features

  • TCAV: Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors. Kim et al. 2018 pdf | code
    • DTCAV: Automating Interpretability: Discovering and Testing Visual Concepts Learned by Neural Networks. Ghorbani et al. 2019 pdf
  • SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. Raghu et al. 2017 pdf | code
  • A Peek Into the Hidden Layers of a Convolutional Neural Network Through a Factorization Lens. Saini et al. 2018 pdf
  • Network Dissection: Quantifying Interpretability of Deep Visual Representations. Bau et al. CVPR 2017 url | pdf
    • GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. Bau et al. ICLR 2019 pdf
    • Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks. Fong & Vedaldi CVPR 2018 pdf
    • Intriguing generalization and simplicity of adversarially trained neural networks. Chen, Agarwal, Nguyen 2020 pdf
    • Understanding the Role of Individual Units in a Deep Neural Network. Bau et al. PNAS 2020 pdf

A5. Network surgery

  • How Important Is a Neuron? Dhamdhere et al. 2018 pdf

A6. Sensitivity analysis

  • NLIZE: A Perturbation-Driven Visual Interrogation Tool for Analyzing and Interpreting Natural Language Inference Models. Liu et al. 2018 pdf

B. Explaining model decisions

B1. Attribution maps

B1.0 Surveys

  • Feature Removal Is A Unifying Principle For Model Explanation Methods. Covert et al. 2020 pdf

B1.1 White-box / Gradient-based

  • A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks pdf

Gradient

  • Gradient: Deep inside convolutional networks: Visualising image classification models and saliency maps. Simonyan et al. 2013 pdf
  • Deconvnet: Visualizing and understanding convolutional networks. Zeiler et al. 2014 pdf
  • Guided-backprop: Striving for simplicity: The all convolutional net. Springenberg et al. 2015 pdf
  • SmoothGrad: removing noise by adding noise. Smilkov et al. 2017 pdf

Input x Gradient

  • DeepLIFT: Learning important features through propagating activation differences. Shrikumar et al. 2017 pdf
  • IG: Axiomatic Attribution for Deep Networks. Sundararajan et al. 2018 pdf | code
    • EG: Learning Explainable Models Using Attribution Priors. Erion et al. 2019 pdf | code
    • I-GOR: Visualizing Deep Networks by Optimizing with Integrated Gradients. Qi et al. 2019 pdf
    • BlurIG: Attribution in Scale and Space. Xu et al. CVPR 2020 pdf | code
    • XRAI: Better Attributions Through Regions. Kapishnikov et al. ICCV 2019 pdf | code
  • LRP: Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation pdf
    • DTD: Explaining NonLinear Classification Decisions With Deep Tayor Decomposition pdf

Activation map

  • CAM: Learning Deep Features for Discriminative Localization. Zhou et al. 2016 code | web

  • Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Selvaraju et al. 2017 pdf

  • Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks. Chattopadhyay et al. 2017 pdf | code

  • Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models. Omeiza et al. 2019 pdf

  • NormGrad: There and Back Again: Revisiting Backpropagation Saliency Methods. Rebuffi et al. CVPR 2020 pdf | code

  • Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks. Wang et al. CVPR 2020 workshop pdf | code

  • Relevance-CAM: Your Model Already Knows Where to Look. Lee et al. CVPR 2021 pdf | code

  • LIFT-CAM: Towards Better Explanations of Class Activation Mapping. Jung & Oh ICCV 2021 pdf

Learning the heatmap

  • MP: Interpretable Explanations of Black Boxes by Meaningful Perturbation. Fong et al. 2017 pdf
    • MP-G: Removing input features via a generative model to explain their attributions to classifier's decisions. Agarwal & Nguyen ACCV 2020 pdf | code
    • EP: Understanding Deep Networks via Extremal Perturbations and Smooth Masks. Fong et al. ICCV 2019 pdf | code
  • FIDO: Explaining image classifiers by counterfactual generation. Chang et al. ICLR 2019 pdf
  • FG-Vis: Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks. Wagner et al. CVPR 2019 pdf
  • CEM: Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives. Dhurandhar & Chen et al. NeurIPS 2018 pdf | code

Attributions of network biases

  • FullGrad: Full-Gradient Representation for Neural Network Visualization. Srinivas et al. NeurIPS 2019 pdf
  • Bias also matters: Bias attribution for deep neural network explanation. Wang et al. ICML 2019 pdf

Others

  • Visual explanation by interpretation: Improving visual feedback capabilities of deep neural networks. Oramas et al. 2019 pdf
  • Regional Multi-scale Approach for Visually Pleasing Explanations of Deep Neural Networks. Seo et al. 2018 pdfb

B1.2 Attention as Explanation

Computer Vision

  • Multimodal explanations: Justifying decisions and pointing to the evidence. Park et al. CVPR 2018 pdf
  • IA-RED2: Interpretability-Aware Redundancy Reduction for Vision Transformers. Pan et al. NeurIPS 2021 pdf
  • Transformer Interpretability Beyond Attention Visualization. Hila et al. CVPR 2021 pdf | code
  • Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers. Hila et al. ECCV 2021 pdf | code

NLP

  • Attention is not Explanation. Jain & Wallace. NAACL 2019 pdf
  • Attention is not not Explanation. Wiegreffe & Pinter. EMNLP 2019 pdf
  • Learning to Deceive with Attention-Based Explanations. Pruthi et al. ACL 2020 pdf

B1.3 Black-box / Perturbation-based

  • Sliding-Patch: Visualizing and understanding convolutional networks. Zeiler et al. 2014 pdf
  • PDA: Visualizing deep neural network decisions: Prediction difference analysis. Zintgraf et al. ICLR 2017 pdf
  • RISE: Randomized Input Sampling for Explanation of Black-box Models. Petsiuk et al. BMVC 2018 pdf
  • LIME: Why should i trust you?: Explaining the predictions of any classifier. Ribeiro et al. 2016 pdf | blog
    • LIME-G: Removing input features via a generative model to explain their attributions to classifier's decisions. Agarwal & Nguyen. ACCV 2020 pdf | code
  • SHAP: A Unified Approach to Interpreting Model Predictions. Lundberg et al. 2017 pdf | code
  • OSFT: Interpreting Black Box Models via Hypothesis Testing. Burns et al. 2019 pdf
  • IM: Interpretation of NLP models through input marginalization. Kim et al. EMNLP 2020 pdf
    • Considering Likelihood in NLP Classification Explanations with Occlusion and Language Modeling. Harbecke et al. 2020 pdf

B1.4 Evaluating feature importance/attribution heatmaps

Metrics

  • Deletion & Insertion: Randomized Input Sampling for Explanation of Black-box Models. Petsiuk et al. BMVC 2018 pdf
  • ROAD: A Consistent and Efficient Evaluation Strategy for Attribution Methods. Rong & Leemann, et al. ICML 2022 pdf | code
  • ROAR: A Benchmark for Interpretability Methods in Deep Neural Networks. Hooker et al. NeurIPS 2019 pdf | code
    • DiffROAR: Do Input Gradients Highlight Discriminative Features? Shah et al. NeurIPS 2021 pdf | code
  • Sanity Checks for Saliency Maps. Adebayo et al. 2018 pdf
  • BIM: Towards Quantitative Evaluation of Attribution Methods with Ground Truth. Yang et al. 2019 pdf
  • SAM: The Sensitivity of Attribution Methods to Hyperparameters. Bansal, Agarwal, Nguyen. CVPR 2020 pdf | code

Evaluating heatmaps on humans

  • The effectiveness of feature attribution methods and its correlation with automatic evaluation scores. Nguyen, Kim, Nguyen 2021 pdf
  • Debugging Tests for Model Explanations. Adebayo et al. NeurIPS 2020 pdf
  • In Search of Verifiability: Explanations Rarely Enable Complementary Performance in AI-Advised Decision Making. Fok & Weld. 2023 pdf

Computer Vision

  • The (Un)reliability of saliency methods. Kindermans et al. 2018 pdf
  • A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations. Nie et al. 2018 pdf
  • On the (In)fidelity and Sensitivity for Explanations. Yeh et al. 2019 pdf

NLP

  • Deletion_BERT: Double Trouble: How to not explain a text classifier’s decisions using counterfactuals synthesized by masked language models. Pham et al. 2022 pdf | code

  • Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? Hase & Bansal ACL 2020 pdf | code

  • Teach Me to Explain: A Review of Datasets for Explainable NLP. Wiegreffe & Marasović 2021 pdf | web

Tabular data

  • Challenging common interpretability assumptions in feature attribution explanations? Dinu et al. NeurIPS workshop 2020 pdf

Many domains

  • How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods. Jeyakumar et al. NeurIPS 2020 pdf | code

B1.5 Explaining image-image similarity

  • BiLRP: Building and Interpreting Deep Similarity Models. Jie Zhou et al. TPAMI 2020 pdf
  • SANE: Why do These Match? Explaining the Behavior of Image Similarity Models. Plummer et al. ECCV 2020 pdf
  • Visualizing Deep Similarity Networks. Stylianou et al. WACV 2019 pdf | code
  • Visual Explanation for Deep Metric Learning. Zhu et al. 2019 pdf | code

Face verification

  • DISE: Explainable Face Recognition. Williford et al. ECCV 2020 pdf | code
  • xCos: An explainable cosine metric for face verification task. Lin et al. 2021 pdf | code
  • DeepFace-EMD: Re-ranking Using Patch-wise Earth Movers Distance Improves Out-Of-Distribution Face Identification. Phan & Nguyen. CVPR 2022 (pdf | code)

B2. Learning to explain

B2.1 Regularizing attribution maps

  • Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. Ross et al. IJCAI 2017 pdf
  • Learning Explainable Models Using Attribution Priors. Erion et al. 2019 pdf
  • Interpretations are useful: penalizing explanations to align neural networks with prior knowledge. Rieger et al. 2019 pdf

B2.2 Training deep nets to approximate expensive, posthoc attribution methods

  • L2E: Learning to Explain: Generating Stable Explanations Fast. Situ et al. ACL 2021 pdf | code
  • Efficient Explanations from Empirical Explainers. Schwarzenberg et al. 2021 pdf

B2.3 Explaining by prototypes

  • ProtoPNet This Looks Like That: Deep Learning for Interpretable Image Recognition. Chen et al. NeurIPS 2019 pdf | code
    • This Looks Like That, Because ... Explaining Prototypes for Interpretable Image Recognition. Nauta et al. 2020 pdf | code
    • NP-ProtoPNet: These do not Look Like Those. Singh et al. 2021 pdf
  • ProtoTree Neural Prototype Trees for Interpretable Fine-grained Image Recognition. Nauta et al. CVPR 2021 pdf | code

B2.4 Explaining by retrieving supporting examples

  • EMD-Corr & CHM-Corr: Visual correspondence-based explanations improve AI robustness and human-AI team accuracy. Nguyen, Taesiri, Nguyen 2022. pdf | code

B2.5 Adversarial attacks on XAI systems with humans in the loop

  • When and How to Fool Explainable Models (and Humans) with Adversarial Examples. Vadilo et al. 2021 pdf
  • The effectiveness of feature attribution methods and its correlation with automatic evaluation scores. Nguyen, Kim, Nguyen 2021 pdf

B2.6 Others

  • Learning how to explain neural networks: PatternNet and PatternAttribution pdf
  • Deep Learning for Case-Based Reasoning through Prototypes pdf
  • Unsupervised Learning of Neural Networks to Explain Neural Networks pdf
  • Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions pdf
    • Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations pdf
  • Towards robust interpretability with self-explaining neural networks. Alvarez-Melis and Jaakola 2018 pdf

C. Counterfactual explanations

  • Counterfactual Explanations for Machine Learning: A Review. Verma et al. 2020 pdf
  • Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections. Zhang et al. 2018 pdf
  • Counterfactual Visual Explanations. Goyal et al. 2019 pdf
  • Generative Counterfactual Introspection for Explainable Deep Learning. Liu et al. 2019 pdf

Generative models

  • Generative causal explanations of black-box classifiers. O’Shaughnessy et al. 2020 pdf
  • Removing input features via a generative model to explain their attributions to classifier's decisions. Agarwal et al. 2019 pdf | code

D. Explainable AI in the real world

Medical domains

  • A systematic review on the use of explainability in deep learning systems for computer aided diagnosis in radiology: Limited use of explainable AI?. Groen et al. European Journal of Radiology 2022 pdf
  • “Help Me Help the AI”: Understanding How Explainability Can Support Human-AI Interaction. Kim et al. 2022 [pdf](https://arxiv.org/abs/2210.03735 "Practical recommendations and feedback for human-AI explanation designs from interviews with 20 end-users of Merlin, a bird-identification app.)

E. Human-AI collaboration

Computer vision

  • Human-AI Collaboration: The Effect of AI Delegation on Human Task Performance and Task Satisfaction. Hemmer et al. IUI 2023 [pdf](https://arxiv.org/abs/2303.09224 "Letting AIs handle most images in image classification and leaving the harder ones to humans result in higher overall classification accuracy than humans alone".)

F. Others

  • Yang, S. C. H., & Shafto, P. Explainable Artificial Intelligence via Bayesian Teaching. NIPS 2017 pdf
  • Explainable AI for Designers: A Human-Centered Perspective on Mixed-Initiative Co-Creation pdf
  • ICADx: Interpretable computer aided diagnosis of breast masses. Kim et al. 2018 pdf
  • Neural Network Interpretation via Fine Grained Textual Summarization. Guo et al. 2018 pdf
  • LS-Tree: Model Interpretation When the Data Are Linguistic. Chen et al. 2019 pdf

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published