adds criteria of interpretability

christophM · May 22, 2018 · a241056 · a241056
1 parent 5b91b17
commit a241056
Show file tree

Hide file tree

Showing 2 changed files with 22 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -51,6 +51,8 @@ All notable changes to the book will be documented here.
 - Added feature interaction chapter
 - Improved example in partial dependence plot chapter
 - The weights in LIME text chapter where shown with the wrong words. This has been fixed.
+- Added Criteria for Intepretability Methods
+
 
 ### v0.3 (2018-04-24)
 - Reworked the Feature Importance Chapter

diff --git a/manuscript/02-interpretability.Rmd b/manuscript/02-interpretability.Rmd
@@ -132,6 +132,25 @@ An interpretable model can tell you why it decided that a certain person is not
 - Causality: Check if only causal relationships are picked up. Meaning a predicted change in a decision due to arbitrary changes in the input values are also happening in reality.
 - Trust: It is easier for humans to trust a system that explains its decisions compared to a black box.
 
+## Criteria for Interpretability Methods
+
+Methods for machine learning interpretability can be classified according to different criteria:
+
+- **Intrinsic or post hoc?**
+Intrinsic interpretability means selecting and training a machine learning model that is considered to be intrinsically interpretable (for example short decision trees). 
+Post hoc interpretability means selecting and training a black box model (for example a neural network) and applying interpretability methods after the training (for example measuring the feature importance).
+The "intrinsic or post hoc"-criterion determined the layout of the chapters in the book:
+The two main chapters are the [intrinsically interpretable models chapter](#simple) and the [post hoc (and model-agnostic) interpretability methods chapter](#agnostic).
+- **Model-specific or model-agnostic?**:
+Model-specific interpretation tools are limited to specific model classes.
+The interpretation of regression weights in a linear model is a model-specific interpretation, since - by definition - the interpretation of intrinsically interpretable models is always model-specific.
+Any tool that only works for e.g. interpreting neural networks is model-specific.
+Model-agnostic tools can be used on any machine learning model and are usually post hoc.
+These agnostic methods usually operate by analysing feature input and output pairs. 
+By definition, these methods can't have access to any model internals like weights or structural information.
+- **Local or global?**:
+Does the interpretation method explain a single prediction or the entire model behavior? Or is the scope somewhere in between?
+Read more about the scope criterion in the next section.
 
 ## Scope of Interpretability
 An algorithm trains a model, which produces the predictions. Each step can be evaluated in terms of transparency or interpretability.
@@ -413,4 +432,4 @@ Generality is easily measured by a feature's support, which is the number of ins
 
 [^Strumbelj2011]: Štrumbelj, Erik, and Igor Kononenko. 2011. "A General Method for Visualizing and Explaining Black-Box Regression Models." In International Conference on Adaptive and Natural Computing Algorithms, 21–30. Springer.
 
-[^Nickerson]: Nickerson, Raymond S. 1998. "Confirmation Bias: A Ubiquitous Phenomenon in Many Guises." Review of General Psychology 2 (2). Educational Publishing Foundation: 175.
+[^Nickerson]: Nickerson, Raymond S. 1998. "Confirmation Bias: A Ubiquitous Phenomenon in Many Guises." Review of General Psychology 2 (2). Educational Publishing Foundation: 175.