This repository is a knowledge-base of different areas of using and developing AI in a responsible way:heart:. Responsible AI includes both the field of explainable and interpretable machine learning, fairness and bias in machine learning, law regulations as well as the aspect of user experience and human centralized AI. Hence, it is a cross-disciplinary field which includes both the field of computer science and social science. The aim is to achieve systems that are trustworthy, accountable and fair. Therefore, responsible AI should hopefully both interest researchers and practitioners, which includes both developers, system owners/buyers and users :family:.
This repo is a collection of links to research papers, blog post, tools, tutorials, videos and books. The references are divide into different areas as listed in the table of contents.
Explainable AI | Fairness | Guidelines & principles |
People & Tech | Policy & Regulation | User Experience |
We really welcome and appreciates 🙏contributions to make sure this knowledge-base stays relevant. So if you have a link or reference you think should be included then pleas create a pull request. You can also open an issue if you find it easier.
The Responsible AI repository is maintained by the Alexandra Institute which is a Danish non-profit company with a mission to create value, growth and welfare in society. The Alexandra Institute is a member of GTS, a network of independent Danish research and technology organisations.
The initial work on this repository is conducted under a performance contract allocated to the Alexandra Insitute by the Danish Ministry of Higher Education and Science. The project ran in the two years in 2019 and 2020.``
- InterpretML - Open source Python framework that combines local and global explanation methods, as well as, transparent models, like decision trees, rule based models, and GAMs (Generalized Additive Models), into a common API and dashboard.
- AI Explainability 360 - Open source Python XAI framework devloped by IBM researchers combining different data, local and global explanation methods. Also see there github page.
- explainX.ai - Open source Python framework that launches an interactive dashboard for a model in a single line of code in which a model can be investigated using different XAI methods.
- Alibi Explain - Open source Pyton XAI framework combining different methods. Main focus on counterfactual explanations and SHAP for classification tasks on tabular data or images.
- SHAP - THe open source Python framework for generating SHAP explanations. Focused on tree based models, but contains the model agnostic KernelSHAP and an implementation for deep neural networks.
- Lucid - Open source Python framework to explain deep convolutional neural networks used on image data (currently only supports Tensforflow 1). Focuses on understanding the representations the network has learned.
- DeepLIFT - Open source implementation of the DeepLIFT methods for generating local feature attributions for deep neural networks.
- iNNvestigate - Github repository collecting implementations of different feature attribution and gradient based explanation methods for deep neural networks.
- Skope-rules - Open source Python framework for building rule based models.
- Yellowbrick - Open source Python framework to create different visualizations of data and ML models.
- Captum - Open source framework to explain deep learning models created with PyTorch. Includes many known XAI algorithms for deep neural networks.
- What-If Tool - Open source framework from Google to probe the behaviour of a trained model.
- AllenNLP Interpret - Python framework for explaining deep neural networks for language processing developed by the Allen Institute for AI.
- Dalex - Part of the DrWhy.AI universe of packages for interpretable and responsible ML.
- RuleFit - Open source python implementation of an interpretable rule ensemble model.
- SkopeRules - Open source python package for fitting a rule based model.
- ELI5 - Open source python package that implements LIME local explanations and permutation explanations.
- Ansvarlig AI - Cross-disciplinary medium blog about XAI, fairness and responsible AI (in Danish)
- Introducing the Model Card Toolkit - Google blogpost about the Model Card Toolkit that is a framework for reporting about a ML model.
- Interpreting Decision Trees and Random Forests - Blog post about how to interpret and visualize tree based models.
- Introducing PDPbox - Blog post about a python package for generating partial dependence plots.
- Use SHAP loss values to debug/monitor your model - Blog post about how to use SHAP explanations to debug and monitoring.
- Be careful what you SHAP for… - Blog post about the assumption for how and when to use SHAP explanations.
- Awesome Interpretable Machine Learning - Collection of resources (articles, conferences, frameworks, software, etc.) about interpretable ML.
- http://heatmapping.org/ - Homepage of the lab behind the LRP (layerwise propagation relevance) method with links to tutorials and research articles.
- Interpretable Machine Learning - E-book by Christoph Molnar describing and explaining different XAI methods and ways to build intepretable models or methods to interpret them, including examples on open available datasets.
- Can A.I. Be Taught to Explain Itself? - The New York Times Magazine article about the need of explainable models.
- Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention - Blog post about how to interprete a BERT model.
- AI Explanations Whitepaper - Google's whitepaper about Explainable AI.
- Robust-and-Explainable-machine-learning - Collection of links and articles with respect to robust and explainable machine learning, containing mostly deep learning related resources.
- Kaggle - Machine Learning Explainability - Kaggle course about the basics of XAI with example notebooks and exercises.
In this section we list research articles related to interpretable ML and explainable AI.
- A. Weller, "Transparency: Motivations and Challenges", arXiv:1708.01870 [cs.CY]
- J. Chang et al., "Reading Tea Leaves: How Humans Interpret Topic Models", NIPS 2009
- Z. C. Lipton, "The Mythos of Model Interpretability", arXiv:1606.03490 [cs.LG]
- F. Doshi-Velez and B. Kim, "Towards A Rigorous Science of Interpretable Machine Learning", arXiv:1702.08608 [stat.ML]
- G. Vilone and L. Longo, "Explainable Artificial Intelligence: a Systematic Review", arXiv:2006.00093 [cs.AI]
- U. Bhatt et al., "Explainable Machine Learning in Deployment",
FAT*20 648-657, 2020 - Survey about how XAI is used in practice. The key results are:
- XAI methods are mainly used by ML engineers / designers for debugging.
- Limitations of the methods are often unclear to those using it.
- The goal og why XAI is used in the first place is often unclear or not well defined, which could potentially lead to using the wrong method.
- L. H. Gilpin, "Explaining Explanations: An Overview of Interpretability of Machine Learning", IEEE 5th DSAA 80-89, 2019
- S. T. Mueller, "Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Publications, and Bibliography for Explainable AI", arXiv:1902.01876 [cs.AI]
- R. Guidotti et al., "A Survey of Methods for Explaining Black Box Models", ACM Computing Surveys, 2018 - Overview of different interpretability methods grouping them after type of method, model they explain and type of explanation.
- M. Du et al., "Techniques for interpretable machine learning", Communications of the ACM, 2019
- I. C. Covert et al., Explaining by Removing:A Unified Framework for Model Explanation, arXiv:2011.14878 [cs.LG] - (Mathematical) framework that summarizes 25 feature influence methods.
- A. Adadi and M. Berrada, "Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)", IEEE Access (6) 52138-52160, 2018
- A. Abdul et al., "Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda", CHI'18 582 1-18, 2018
- A. Preece, "Asking ‘Why’ in AI: Explainability of intelligent systems – perspectives and challenges", Intell Sys Acc Fin Mgmt (25) 63-72, 2018
- Q. Zhang and S.-C. Zhu, "Visual Interpretability for Deep Learning: a Survey", Technol. Electronic Eng. (19) 27–39, 2018
- B. Mittelstadt et al., "Explaining Explanations in AI", FAT*'19 279–288, 2019
This section contains articles that describe ways to evaluate explanations and explainable models.
- S. Mohseni et al., "A Human-Grounded Evaluation Benchmark for Local Explanations of Machine Learning", arXiv:1801.05075 [cs.HC]
- J. Huysmans et al., "An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models", Decision Support Systems (51:1) 141-154, 2011
- F. Poursabzi-Sangdeh et al., "Manipulating and Measuring Model Interpretability", arXiv:1802.07810 [cs.AI]
- C. J. Cai et al., "The Effects of Example-Based Explanations in a Machine Learning Interface", IUI'19 258-262, 2019
- L. Sixt et al., "When Explanations Lie: Why Many Modified BP Attributions Fail", arXiv:1912.09818 [cs.LG]
- Y. Zhang et al., "Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making", FAT*'20 295-305, 2020 - Analyses the effect of LIME explanation and confidence score as explanation on trust and human decision performance.
- K. Sokol and P. Flach, "Explainability fact sheets: a framework for systematic assessment of explainable approaches", FAT*'20 56-67, 2020 - Framework (essentially a list of questions or checklist) to evaluate and document XAI methods. Also includes question that are relevant to the context in which the XAI methods should be employed, i.e. changing the outcome of the assessment based on the context.
- E. S. Jo and T. Gebru, "Lessons from archives: strategies for collecting sociocultural data in machine learning", FAT*'20 306-316, 2020 - Use archives as inspiration of how to collect, curate and annotate data.
This section contains articles that explain datasets, for example by finding representative examples.
- B. Kim et al., "Examples are not Enough, Learn to Criticize! Criticism for Interpretability", NIPS, 2016 - Code can we found on github.
This section contains articles that describe models that are explainable or transparent by design.
- X. Zhang et al., "Axiomatic Interpretability for Multiclass Additive Models", KDD'19 226–234, 2019
- T. Kulesza et al., "Principles of Explanatory Debugging to Personalize Interactive Machine Learning", IUI'15 126–137, 2015 - Framework showing how a Naive Bayes method can be trained with user interaction and how to generate explanations for these kinds of models.
- M. Hind et al., "TED: Teaching AI to Explain its Decisions", AIES'19 123–129, 2019
- Y. Lou et al., "Accurate Intelligible Models with Pairwise Interactions", KDD'13 623–631, 2013
- C. Chen et al., "An Interpretable Model with Globally Consistent Explanations for Credit Risk", arXiv:1811.12615 [cs.LG]
- C. Chen and C. Rudin, "An Optimization Approach to Learning Falling Rule Lists", PMLR (84) 604-612, 2018
- F. Wang and C. Rudin, "Falling Rule Lists", arXiv:1411.5899 [cs.AI]
- B. Ustun and C. Rudin, "Supersparse Linear Integer Models for Optimized Medical Scoring Systems", arXiv:1502.04269 [stat.ML]
- E. Angelino et al., "Learning Certifiably Optimal Rule Lists for Categorical Data", JMLR (18:234) 1-78, 2018
- H. Lakkaraju et al., "Interpretable Decision Sets: A Joint Framework for Description and Prediction", KDD'16 1675–1684, 2016
- K. Shu et al., "dEFEND: Explainable Fake News Detection", KDD'19 395–405, 2019
- J. Jung et al., "Simple Rules for Complex Decisions", arXiv:1702.04690 [stat.AP]
This section contains articles that are describing methods to globally explain a model. Typically, this is done by generating visualizations in one form or the other.
- B. Ustun et al., "Actionable Recourse in Linear Classification", FAT*'19 Pages 10–19, 2019 - Article describing a method to evaluate actionable variables, i.e. variables a person can impact to change the outcome af a model, of a linear classification model.
- A Datta et al., "Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems", IEEE SP 598-617, 2016
- P.Adler et al., "Auditing black-box models for indirect influence", Knowl. Inf. Syst. (54) 95–122, 2018
- A. Lucic et al., "Why Does My Model Fail? Contrastive Local Explanations for Retail Forecasting", FAT*'20 90–98, 2020 - Presents an explanation to explain failure cases of an ML/AI model. The explanation is presented in form of a feasible range of feature values in which the model works and a trend for each feature. Code for the method is available on github.
- J. Krause et al., "Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models", CHI'16 5686–5697, 2016
- B. Kim et al., "Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)", ICML, PMLR (80) 2668-2677, 2018 - Code for the method can be found on github.
- A. Goldstein et al., "Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation", Journal of Computational and Graphical Statistics (24:1) 44-65, 2015
- J. Wang et al., "Shapley Flow: A Graph-based Approach to Interpreting Model Predictions", arXiv:2010.14592 [cs.LG]
This section contains articles that are describing methods to explain a model by constructing an inherent transparent model that mimics the behaviour of the black-box model.
- S. Tan et al.,
"Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation", AIES'18 303–310, 2018 - L. Chu et al., "Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution", arXiv:1802.06259 [cs.CV]
- C. Yang et al., "Global Model Interpretation via Recursive Partitioning", arXiv:1802.04253 [cs.LG]
- H. Lakkaraju et al., "Interpretable & Explorable Approximations of Black Box Models", arXiv:1707.01154 [cs.AI]
- Y. Hayashi, "Synergy effects between grafting and subdivision in Re-RX with J48graft for the diagnosis of thyroid disease", Knowledge-Based Systems (131) 170-182, 2017
- H. F. Tan et al., "Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable", arXiv:1611.07115 [stat.ML]
This section contains articles that describe local explanation methods, i.e. methods that generate an explanation for a specific outcome of a model.
- M. T. Ribeiro et al., "Anchors: High-Precision Model-Agnostic Explanations", AAAI Conference on Artificial Intelligence, 2018 - The implementation of the method can be found on github.
- A. Shrikumar et al., "Learning Important Features Through Propagating Activation Differences", ICML'17 3145–3153, 2017 - DeepLIFT method for local explanations of deep neural networks.
- S. M. Lundberg et al., "Explainable AI for Trees: From Local Explanations to Global Understanding", arXiv:1905.04610 [stat.ML]
- S. M. Lundberg et al., "From local explanations to global understanding with explainable AI for trees", Nat. Mach. Intell. (2) 56–67, 2020
- M. T. Ribeiro et al., “Why Should I Trust You?” Explaining the Predictions of Any Classifier, KDD'16 1135–1144, 2016
- D. Slack et al., "How Much Should I Trust You? Modeling Uncertainty of Black Box Explanations", arXiv:2008.05030 [cs.LG]
- S. M. Lundberg and S.-I. Lee, "A Unified Approach to Interpreting Model Predictions", NIPS, 2017
- M. Sundararajan and A. Najmi, "The Many Shapley Values for Model Explanation", ICML (119) 9269-9278, 2020
- I. E. Kumar et al., "Problems with Shapley-value-based explanations as feature importance measures", arXiv:2002.11097 [cs.AI]
- P. W. Koh and P. Liang, "Understanding Black-box Predictions via Influence Functions", arXiv:1703.04730 [stat.ML]
This section contains articles that describe methods for counterfactual explanations.
- S. Sharma et al., "CERTIFAI: A Common Framework to Provide Explanations and Analyse the Fairness and Robustness of Black-box Models", AIES'20 166–172, 2020
- C. Russell, "Efficient Search for Diverse Coherent Explanations", FAT*'19 20–28, 2019
- R. K. Mothilal et al., "Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations", FAT*'20 607–617, 2020 - Code for the method is available on github.
- S. Barocas et al.,
"The Hidden Assumptions Behind Counterfactual Explanations and Principal Reasons",
FAT*'20 80–89, 2020 - Raises some questions with respect to the use of counterfactual examples as a form of explanation:
- Are the changes proposed by the counterfactual example feasible (actionable) for a person to change their outcome?
- If the changes are performed, what do they affect otherwise, i.e. they might not be favorable in other contexts?
- Changing one factor might inherently change another factor that actually negatively affects the outcome (counterfactual examples can not describe complex relationships between variables)?
This section contains research articles that are looking at the interaction of users with explanations or interpretable models.
- B. Y. Lim and A. K. Dey, "Assessing Demand for Intelligibility in Context-Aware Applications", UbiComp'09 195–204, 2009
- D. Wang et al., "Designing Theory-Driven User-Centric Explainable AI", CHI'19 (601) 1–15, 2019
- M. Narayanan et al., "How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation", arXiv:1802.00682 [cs.AI]
- U. Bhatt et al., "Machine Learning Explainability for External Stakeholders", arXiv:2007.05408 [cs.CY]
- V. Lai and C. Tan, "On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection", FAT*'19 29–38, 2019
- C. Molnar et al., "Pitfalls to Avoid when Interpreting Machine Learning Models", arXiv:2007.04131 [stat.ML]
- A. Preece et al., "Stakeholders in Explainable AI", arXiv:1810.00184 [cs.AI]
- M. Katell et al., "Toward Situated Interventions for Algorithmic Equity: Lessons from the Field", FAT*'20 45–55, 2020 - Presenting a framework for designing ML/AI solutions based on participatory design and co-design methods, which especially focuses on solutions that effect communities, i.e. models employed by municipalities. The framework is applied to an example case in which a surveillance tool with an automatic decision system is designed.
- M. Eiband et al., "Bringing Transparency Design into Practice", IUI'18 211–223, 2018
This section contains research articles where XAI was used as part of an application or used for validation on a system deployed in practice.
- S. Coppers et al., "Intellingo: An Intelligible Translation Environment", CHI'18 (524) 1–13, 2018
- H. Tang and P. Eratuuli, "Package and Classify Wireless Product Features to Their Sales Items and Categories Automatically", Machine Learning and Knowledge Extraction. CD-MAKE 2019. LNCS (11713), 2019
This section focuses on explainability with respect to deep neural networks (DNNs). This can be methods to explain DNNs or methods to build DNNs that can explain themselves.
- Y. Goyal et al., "Counterfactual Visual Explanations", 36th ICML, PMLR (97) 2376-2384, 2019 - Describing a method to construct a DNN for image classification that provides counterfactual explanations.
- K. Simonyan et al., "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps", arXiv:1312.6034 [cs.CV]
- A. Tavanaei, "Embedded Encoder-Decoder in Convolutional Net works Towards Explainable AI", arXiv:2007.06712 [cs.CV] - DNN with a build in encoder-decoder that generates explanations.
- S. Bach et al., "On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation", PLOS ONE (10:7) e0130140, 2015 - Description of the LRP method for DNNs. Code for playing around with the LRP method can be found on github.
- W. Samek et al., "Evaluating the Visualization of What a Deep Neural Network Has Learned", IEEE Trans. Neural Netw. Learn. Syst. (28:11) 2660-2673, 2017
- G. Montavon et al., "Explaining nonlinear classification decisions with deep Taylor decomposition", Pattern Recognition (65) 211-222, 2017
- G. Montavon et al., "Methods for Interpreting and Understanding Deep Neural Networks", Digital Signal Processing (73) 1-15, 2018
- S. Lapuschkin et al., "Unmasking Clever Hans predictors and assessing what machines really learn", Nat. Commun. 10 1096, 2019 - Using LRP the authors find "cheating" strategies of DNNs in varying tasks. I recommend to also check the supplementary which contains more experiments and insights.
- M. Sundararajan et al., "Exploring Principled Visualizations for Deep NetworkAttributions", IUI Workshops, 2019
- R. R. Selvaraju, "Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization", IEEE ICCV 618-626, 2017
- Q. Zhang, "Interpretable CNNs", IEEE/CVF CVPR 8827-8836, 2018
- R. C. Fong and A. Vedaldi, "Interpretable Explanations of Black Boxes by Meaningful Perturbation", IEEE ICCV 3449-3457, 2017 - A PyTorch implementation can be found on github.
- R. Fong and A. Vedaldi, "Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks", 018 IEEE/CVF CVPR 8730-8738, 2018
- R. Hu et al., "Learning to Reason: End-to-End Module Networks for Visual Question Answering", IEEE ICCV 804-813, 2017
- A. Nguyen, "Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks", arXiv:1602.03616 [cs.CV]
- S. O. Arik and T. Pfister, "ProtoAttend: Attention-Based Prototypical Learning", arXiv:1902.06292 [cs.CV]
- A. Ghorbani et al., "Towards Automatic Concept-based Explanations", NeurIPS, 2019
- M. Ancona et al., "Towards better understanding of gradient-based attribution methods for deep neural networks", arXiv:1711.06104 [cs.LG]
- A. Mahendran and A. Vedaldi, "Understanding deep image representations by inverting them", IEEE CVPR 5188-5196, 2015
- A. Kapishnikov et al., "XRAI: Better Attributions Through Regions", IEEE ICCV 4947-4956, 2019
- B. Alsallakh et al., "Do Convolutional Neural Networks Learn Class Hierarchy?", arXiv:1710.06501 [cs.CV]
- S. Wang et al., "Bias Also Matters: Bias Attribution for Deep Neural Network Explanation", 36th ICML, PMLR (97) 6659-6667, 2019 - Describing the effect of the bias parameter on XAI methods using the gradient.
- N. Papernot and P. McDaniel, "Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning", arXiv:1803.04765 [cs.LG] - A DNN using KNN in the representation space to ensure consistency in the predictions.
- O. Li et al., "Deep Learning for Case-Based Reasoning through Prototypes: A Neural Network that Explains Its Predictions", arXiv:1710.04806 [cs.AI]
- A. Wan et al., "NBDT: Neural-Backed Decision Trees", arXiv:2004.00221 [cs.CV] - An approach that combines DNN with decision trees in cases where there is a "natural" hierarchy of classes. See also their homepage.
- K. Xu et al., "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention", PMLR (37) 2048-2057, 2015 - DNN that generates text explanation together with highlights within the image. Code can be found on github.
- C. Chen et al., "This Looks Like That: Deep Learning for Interpretable Image Recognition", NeurIPS, 2019
- V. Petsiuk et al., "RISE: Randomized Input Sampling for Explanation of Black-box Models", arXiv:1806.07421 [cs.CV]
- P. Sturmfels et al., "Visualizing the Impact of Feature Attribution Baselines", Distill, 2020.
- D. Bau et al., "Understanding the role of individual units in a deep neural network", PNAS (117:48) 30071-30078, 2020 - All links and material regarding the article is summarized by the authors on their website.
This section contains papers in which XAI methods are used or developed for NLP tasks and models.
- S. Jain and B. C. Wallace, "Attention is not Explanation", arXiv:1902.10186 [cs.CL]
- W. J. Murdoch and A. Szlam, "Automatic Rule Extraction from Long Short Term Memory Networks", arXiv:1702.02540 [cs.CL]
- W. J. Murdoch et al., "Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs", arXiv:1801.05453 [cs.CL]
- L. Arras et al., "Explaining Recurrent Neural Network Predictions in Sentiment Analysis", arXiv:1706.07206 [cs.CL]
- T. Guo et al., "Exploring Interpretable LSTM Neural Networks over Multi-Variable Data", 36th ICML (97) 2494-2504, 2019
- F. Liu and B. Avci, "Incorporating Priors with Feature Attribution on Text Classification", 57th ACL (P19-1631) 6274–6283, 2019
- A. Radford et al., "Learning to Generate Reviews and Discovering Sentiment", arXiv:1704.01444 [cs.LG]
- H. Strobelt et al., "LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks", IEEE Trans. Vis. Comput. Graph (24:1) 667-676, 2018
- T. Lei et al., "Rationalizing Neural Predictions", EMNLP (D16-1011) 107–117, 2016
- M. T. Ribeiro et al., "Semantically Equivalent Adversarial Rules for Debugging NLP Models", 56th ACL (P18-1079) 856–865, 2018
- C. Guan et al., "Towards a Deep and Unified Understanding of Deep Neural Models in NLP", 36th ICML (97) 2454-2463, 2019
- J. Li et al., "Visualizing and Understanding Neural Models in NLP", NAACL (N16-1082) 681–691, 2106
- A. Karpathy et al., "Visualizing and Understanding Recurrent Networks", arXiv:1506.02078 [cs.LG]
- L. Arras et al., "What is Relevant in a Text Document?": An Interpretable Machine Learning Approach, arXiv:1612.07843 [cs.CL]
This section contains papers describing explainability with respect to recommender systems.
- I. Nunes and D. Jannach, "A systematic review and taxonomy of explanations in decision support and recommender systems", User Model User-Adap. Inter. (27) 393–444, 2017
- J. L. Herlocker et al., "Explaining Collaborative Filtering Recommendations", CSCW'00 241–250, 2000
- D. Mcsherry, "Explanation in Recommender Systems", Artif. Intell. Rev. 24 179–197, 2005
This section contains papers describing explainability with respect to reinforcement learning.
- L. She and J. Y. Chai, "Interactive Learning of Grounded Verb Semantics towards Human-Robot Communication", 55th ACL (P17-1150) 1634–1644, 2017
- Samantha Krening et al., "Learning From Explanations Using Sentiment and Advice in RL", TCDS (9:1) 44-55, 2017
This section contains papers in which XAI models or methods were used on medical data.
- S. Meyer Lauritsen et al., "Explainable artificial intelligence model to predict acute critical illness from electronic health records", Nat. Commun. 11 3852, 2020
- S. M. Lundberg et al., "Explainable machine-learning predictions for the prevention of hypoxaemia during surgery" Nat. Biomed. Eng. (2:10) 749-760, 2018
- Z. Che et al., "Interpretable Deep Models for ICU Outcome Prediction", AMIA Annu. Symp. Proc. (2016) 371-380, 2017
- R. Sayres et al., "Using a Deep Learning Algorithm and Integrated Gradients Explanation to Assist Grading for Diabetic Retinopathy", Ophthalmology (126:4), 2019s
- J. Ma et al., "Using deep learning to model the hierarchical structure and function of a cell", Nat. Methods (15) 290–298, 2018
- R. Caruana et al., "Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission", KDD'15 1721–1730, 2015
- B. Letham et al., "Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model", arXiv:1511.01644 [stat.AP]
- E. Choi et al., "RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism", NIPS, 2016
- Explainable AI: Interpreting, Explaining and Visualizing Deep Learning - Explainability with respect to deep learning with a focus on convolutional neural networks used for image data. The editor of the book are also behind the layerwise relvance propagation (LRP) method.
- Explainable and Interpretable Models in Computer Vision and Machine Learning - More general book about explainability in machine learning, but also with a focus on deep learning in computer vison.
- AI Fairness 360 Toolkit from IBM both in Python and R to examine, report and mitigate bias and discriminations in data and machine learning models.
- What-if-tool from Google's PAIR (People and AI Research) allowed to play around with different fairness metrics.
- FAT Forensics is a python toolbox for evaluating fairness, accountability and transparency of predictive systems.
- Fairlearn is a python package for accessing and mitigating bias in machine leaning system. The repo both contain implemented algorithm and Jupyter Notebook with examples of use.
- LiFT - The LinkedIn Fairness Toolkit (LiFT)
- Aequitas - Bias and Fairness Audit Toolkit
- What is bias? - Towards data science blogpost about bias.
- Explaining Measures of Fairness, Scott Lundberg, 2020, Medium, Towards Data Science - Blogpost describing how to use XAI methods to explain features' contributions to fairness metrics.
- Algorithmic Solutions to Algorithmic Bias: A Technical Guide - Towards data science blogpost describing different methods and techniques to avoid or correct for bias.
- Fairness Metrics Won’t Save You from Stereotyping, Valerie Carey, 2020, Medium, Towards Data Science - Blogpost pointing out that different models with different "bias" can have the same performance on fairness metrics.
- A Tutorial on Fairness in Machine Learning, Ziyuan Zhong, 2018, Medium, Towards Data Science
- Racial Bias in BERT, Gergely D. Németh, 2020, Medium, Towards Data Science
- The Trouble with Bias, Kate Crawford, NIPS 2017 Keynote
- S. Mitchell et al., "Prediction-based decisions and fairness: A catalogue of choices, assumptions, and definitions", arXiv:1811.07867 [stat.AP]
- P. Gajane and Mykola Pechenizkiy, "On formalizing fairness in prediction with machine learning", arXiv:1710.03184 [cs.LG]
- N. Mehrabi et al., "A survey on bias and fairness in machine learning", arXiv:1908.09635 [cs.LG]
- A. Chouldechova and A. Roth, "A snapshot of the frontiers of fairness in machine learning." Communications of the ACM, 2020
- K. Holstein et al, "Improving fairness in machine learning systems: What do industry practitioners need?." CHI'19 (600) 1–16, 2019
- S. Corbett-Davies and S. Goel, "The measure and mismeasure of fairness: A critical review of fair machine learning", arXiv:1808.00023 [cs.CY]
- A. D. Selbst et al., "Fairness and abstraction in sociotechnical systems", FAT*'19 59-68, 2019.
- B. Lepri et al., "Fair, transparent and accountable algorithmic decision-making processes", Philos. Technol. (31) 611–627, 2018
- A. L. Hoffmann, "Where fairness fails: data, algorithms, and the limits of antidiscrimination discourse", Communication & Society (22:7) 900-915, 2019
This section includes critics and challenges with existing definitions.
Static fairness metrics
- Hardt, Moritz, Eric Price, and Nathan Srebro. "Equality of opportunity in supervised learning", arXiv:1610.02413 [cs.LG] - The paper defines the fairness metric Equalized Odds and criticizes Demographic Parity. The authors provided also an interactive loan application example.
- S. Verma and J. Rubin, "Fairness definitions explained" IEEE/ACM FairWare 1-7, 2018 - This paper explains and demonstrates different statistical fairness metrics which requires to achieve parity for a metric between groups.
- R. Berk et al., "Fairness in criminal justice risk assessments: The state of the art", Sociological Methods & Research, 2018 - The paper discuss trades-offs between different fairness metrics and accuracy for criminal assessment, and shows that some metrics and accuracy is incompatible.
- J. Kleinberg et al., "Inherent trade-offs in the fair determination of risk scores", arXiv:1609.05807 [cs.LG] - The authors examine three definitions of fairness metrics and show that, except in special cases, the metrics are incompatible and can not be achieved simultaneous.
- M. Kearns et al., "Preventing fairness gerrymandering: Auditing and learning for subgroup fairness." ICML (80) 2564-2572, 2018 - The paper highlights that using statistical fairness metrics for ensuring parity between groups does not give any guarantee for subgroups.
- A. Chouldechova, "Fair prediction with disparate impact: A study of bias in recidivism prediction instruments", arXiv:1703.00056 [stat.AP]
Dynamic fairness definitions
- L. T. Liu et al., "Delayed impact of fair machine learning", arXiv:1803.04383 [cs.LG] - Demonstrates trough one step simulation that achieving a fairness metric such as Demographic Parity and Equalized Odds can leave the protected group worse off one step in the "future".
- A. D'Amour et al., "Fairness is not static: deeper understanding of long term fairness via simulation studies." FAT*'20' 525–534, 2020.
Individual and preference fairness
- C. Dwork et al., "Fairness through awareness", ITCS'12 214–226, 2012. - The paper formulates the ideas behind individual fairness (similar individuals should be treated similar).
- M. Kim et al., "Fairness through computationally-bounded awareness." 31st NIPS 4842-4852, 2018
- M. B. Zafar et al., "From parity to preference-based notions of fairness in classification" 30th NIPS, 2017 - Defines preference based fairness which carries the idea that each individual should have a preference for receiving the outcome from its own group deepened classifier. This should leave room for optimizing the classifiers within each group.
- M. P. Kim et al., "Preference-informed fairness", arXiv:1904.01793 [cs.LG] - Combines the ideas between individual and preference fairness.
- T. Speicher et al., "A Unified Approach to Quantifying Algorithmic Unfairness: Measuring Individual & Group Unfairness via Inequality Indices", KDD'18 2239–2248, 2018
- A. Agarwal et al., "Automated Test Generation to Detect Individual Discrimination in AI Models", arXiv:1809.03260 [cs.AI]
- E. Black et al., "FlipTest: Fairness Testing via Optimal Transport", FAT*'20 111–121, 2020
- R. Binns, "On the Apparent Conflict Between Individual and Group Fairness", FAT*'20 514–524, 2020 - Discussing the difference between individual and group fairness and why there does not have to be a trade-off.
Causal reasoning based fairness
- M. J. Kusner et al., "Counterfactual fairness", 30th NIPS, 2017 - Definition of counterfactual fairness. The idea is that fairness is achieved if an individual will receive the same outcome both in the actual world and in the counterfactual. The code to the paper can be found on github.
- S. Chiappa, "Path-specific counterfactual fairness", Proceedings of the AAAI Conference on Artificial Intelligence (33:01) 7801-7808, 2019 - Formulates a counterfactual fairness that follows different paths of sensitive attributes within a causal model.
- S. Garg et al., "Counterfactual fairness in text classification through robustness", AIES'19 219–226, 2019 - Counterfactual method to look at text classification, e.g. for finding toxic comments where the aim is that the reference to the sensitive attribution should not affect the classification.
- S. Chiappa and W. S. Isaac, "A Causal Bayesian Networks Viewpoint on Fairness", arXiv:1907.06430 [stat.ML]
- N. Kilbertus et al., "Avoiding Discrimination through Causal Reasoning", 30th NIPS, 2017
- J. R. Loftus et al., "Causal Reasoning for Algorithmic Fairness", arXiv:1805.05859 [cs.AI]
Procedural fairness
- N. Grgić-Hlača et al., "Beyond Distributive Fairness in Algorithmic Decision Making: Feature Selection for Procedurally Fair Learning", AAAI (18), 2018 - Proposes to shift the focus for outcome fairness to procedurally fairness where there instead should be a focus of how the outcome is concluded instead of what it actually is. The paper includes a survey to examine people's perception of using different input features in different settings.
- N. Grgić-Hlača et al., "The Case for Process Fairness in Learning: Feature Selection for Fair Decision Making", Symposium on Machine Learning and the Law at the 29th NIPS, 2016
Fairness trough explanations
- J. Cesaro and F. G. Cozman, "Measuring Unfairness Through Game-Theoretic Interpretability", ECML PKDD (1167) 253-264, 2019 - Presents the idea that fairness can be assessed by looking at the "global" feature attribution on a test set for different protected group using, e.g the SHAP framework.
- J. M. Hickey et al., "Fairness by Explicability and Adversarial SHAP Learning", arXiv:2003.05330 [cs.LG] - The authors assess fairness trough explanations (SHAP) and compare to other statistic measures, as well as, propose an in-process algorithm for mitigating bias.
- F. Kamiran and T. Calders, "Data preprocessing techniques for classification without discrimination", Knowledge and Information Systems (33:1) 1-33, 2012
- R. Zemel et al. "Learning fair representations", ICML (28:3) 325-333, 2013
- F. Calmon et al., "Optimized pre-processing for discrimination prevention", 30th NIPS, 2017
- M. Feldman et al., "Certifying and removing disparate impact", KDD'15 259–268 . 2015.
- B. H. Zhang et al., "Mitigating unwanted biases with adversarial learning", AIES'18 335–340, 2018
- T. Kamishima et al., "Fairness-aware classifier with prejudice remover regularizer", ECML PKDD 35-50, 2012
- G. Pleiss et al., "On fairness and calibration", 30th NIPS, 2017.
- V. Perrone et al., "Fair Bayesian Optimization", arXiv:2006.05109 [stat.ML]
- P. Lahoti et al., "Fairness without Demographics through Adversarially Reweighted Learning", 33rd NeurIPS, 2020
- I. Y. Chen et al., "Why Is My Classifier Discriminatory?", 31st NeurIPS, 2018
- L. Dixon et al., "Measuring and Mitigating Unintended Bias in Text Classification", AIES'18 67–73, 2018
- A. Amini et al., "Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure", AIES'19 289–295, 2019
- M. Srivastava et al., "Mathematical notions vs. human perception of fairness: A descriptive approach to fairness for machine learning", KDD'19 2459–2468, 2019 - Attempt to measure peoples perception of different statistical fairness metrics trough an Amazon Turk survey.
- G. Harrison et al., "An empirical study on the perceived fairness of realistic, imperfect machine learning models", FAT*'20 392–402, 2020 - Examines peoples perception of trade-offs between models which satisfies different statistical fairness measure or accuracy trough an Amazon Turk survey.
- D. Saha et al., "Measuring non-expert comprehension of machine learning fairness metrics", ICML (119) 8377-8387, 2020 - Examines people's comprehension of statistical fairness metrics and shows that comprehension can be measured trough a multiple-choice survey. Furthermore, the authors find that comprehension is correlated with education and that higher comprehension is correlated with a more negative perception of the metrics.
- J. Dodge et al., "Explaining models: an empirical study of how explanations impact fairness judgment", IUI'19 275–285, 2019
- N. Grgić-Hlača et al., "Human Perceptions of Fairness in Algorithmic Decision Making: A Case Study of Criminal Risk Prediction", WWW'18 903–912, 2018
- R. Binns et al., "‘It’s Reducing a Human Being to a Percentage’; Perceptions of Justice in Algorithmic Decisions", CHI'18 (377) 1–14, 2018
Natural Language Processing
- T. Bolukbasi et al., "Man is to computer programmer as woman is to homemaker? debiasing word embeddings", 29th NIPS, 2016 - The paper examines gender stereotypes in occupations in word embeddings which the authors identify through a survey conducted on people's perception on gender stereotype. The paper proposes a technic to mitigate such identified bias in word embeddings.
- H. Gonen and Y. Goldberg, "Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them", arXiv:1903.03862 [cs.CL] - This paper criticizes method presented in the paper mentioned above method that mitigates bias in word embeddings.
- M. Nissim et al., "Fair is better than sensational: Man is to doctor as woman is to doctor", Computational Linguistics (46:2) 487-497, 2020 - This paper criticizes using word analogies for concluding bias in word embeddings.
- C. Basta et al., "Evaluating the underlying gender bias in contextualized word embeddings", arXiv:1904.08783 [cs.CL]
- J. Zhao et al., "Learning gender-neutral word embeddings", arXiv:1809.01496 [cs.CL]
- S. Kiritchenko and S. M. Mohammad, "Examining gender and race bias in two hundred sentiment analysis systems", arXiv:1805.04508 [cs.CL]
- M. Sap et al., "The Risk of Racial Bias in Hate Speech Detection", ACL (P19-1163) 1668–1678, 2019
- J. Zhao et al., "Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints", EMNLP (D17-1323) 2979–2989, 2017
- M.-E. Brunet et al., "Understanding the Origins of Bias in Word Embeddings", ICML (97) 803-811, 2019
Recidivism
- J. Dressel and Hany Farid, "The accuracy, fairness, and limits of predicting recidivism" Science Advances (4:1) eaao5580, 2018
- A. Chouldechova et al., "A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions", ICML (81) 134-148, 2018
- A. Chouldechova, "Fair prediction with disparate impact: A study of bias in recidivism prediction instruments", Big Data (5:2) 153-163, 2017
Recommender systems
- A. Beutel et al., "Fairness in Recommendation Ranking through Pairwise Comparisons", KDD'19 2212–2220, 2019
- S. C. Geyik et al., "Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search", KDD'19 2221–2231, 2019
Different cases 2. A. Mukerjee et al., "Multi–objective evolutionary algorithms for the risk–return trade–off in bank loan management", International Transactions in operational research (9:5) 583-597, 2002 3. R. Inioluwa Deborah and J. Buolamwini, "Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial AI products", AIES'19 429–435, 2019 4. Z. Obermeyer et al., "Dissecting racial bias in an algorithm used to manage the health of populations", Science (366:6464) 447-453, 2019
- A. D. Selbst et al., "Fairness and Abstraction in Sociotechnical Systems", FAT*'19 59–68, 2019
- M. Veale et al., "Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making", CHI'18 (440) 1–14, 2018
- C. Barabas et al., "Studying Up: Reorienting the study of algorithmic fairness around issues of power", FAT*'20 167–176, 2020
- S. Milli et al., "The Social Cost of Strategic Classification", FAT*'19 230–239, 2019
- Weapons of math destruction: How big data increases inequality and threatens democracy O'Neil, C., 2016. Broadway Books.
- Invisible Women - Exposing Data Bias in a World Designed for Men, Caroline Criado Perez, 2020, Vintage Publishing
- Data Feminism, Lauren F. Klein & Catherine D'Ignazio, 2020, Mit Press Ltd
- Fairness and machine learning - Limitations and Opportunities, Solon Barocas, Moritz Hardt, Arvind Narayanan, in process, https://fairmlbook.org/
- Practical Fairness, Aileen Nielsen, 2020, O'Reilly Media
In this section we list research articles related to guidelines and principles regarding responsible AI.
- A. Jobin et al., "Artificial Intelligence: the global landscape of ethics guidelines", Nat. Mach. Intell. (1) 389–399, 2019
- T. Miller, "Explanation in Artificial Intelligence: Insights from the Social Sciences", Artificial Intelligence (267) 1-38, 2019
- C. Rudin, "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead", Nat. Mach. Intell. (1) 206–215, 2019
- E. Toreini et al., "The relationship between trust in AI and trustworthy machine learning technologies", FAT*'20 272–283, 2020
- F. Pinto et al., "Automatic Model Monitoring for Data Streams", arXiv:1908.04240 [cs.LG] - Describes a method to monitor models that predict on data streams for detecting model drift.
- T. Gebru et al., "Datasheets for Datasets", arXiv:1803.09010 [cs.DB] - Describes a framework for how to document datasets used for building machine learning models.
- E. M. Bender and B. Friedman, "Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science", Transactions of ACL (6), 2018 - Describes a framework for how to document datasets used for NLP tasks.
- M. Mitchell, "Model Cards for Model Reporting", FAT*'19 220-229, 2019 - Describes a framework for how to document ML models. The model card toolkit can be found on github released under the tensorflow repository.
- I. D. Raji et al., "Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing", FAT*'20 33-44, 2020 - Presents a framework for auditing AI/ML based systems. The idea is to use auditing concepts (risk assesment and documentation) known from other industries, like aerospace or finance, and adjust them to AI/ML. One example is the "Failure Modes and Effect Analysis" (FMEA).
- Google PAIR: People + AI guidebook for UX professionals and product managers to follow a human centered approach to AI
- Google’s medical AI was super accurate in a lab. Real life was a different story
- AI Now Institute reports - Publications of the AI Now Institute
- F. Doshi-Velez and M. Kortz,"Accountability of AI Under the Law: The Role of Explanation", arXiv:1711.01134 [cs.AI]
- B. Goodman and S. Flaxman, "European Union regulations on algorithmic decision-making and a “right to explanation”", AI Magazine (38:3) 50-57, 2017
- A. D. Selbst and J. Powles, "Meaningful information and the right to explanation", Proceedings of the 1st FAT (81) 48-48, 2018
- M. E. Kaminski and G. Malgieri, "Multi-layered Explanations from Algorithmic Impact Assessments in the GDPR", FAT*'20 68–79, 2020
- L. Edwards and M. Veale, "Slave to the Algorithm? Why a 'Right to an Explanation' Is Probably Not the Remedy You Are Looking For", 16 Duke Law & Technology Review (18), 2017
- S. Wachter et al., "Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation", International Data Privacy Law, 2017