Skip to content

Commit

Permalink
adds criteria of interpretability
Browse files Browse the repository at this point in the history
  • Loading branch information
christophM committed May 22, 2018
1 parent 5b91b17 commit a241056
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 1 deletion.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ All notable changes to the book will be documented here.
- Added feature interaction chapter
- Improved example in partial dependence plot chapter
- The weights in LIME text chapter where shown with the wrong words. This has been fixed.
- Added Criteria for Intepretability Methods


### v0.3 (2018-04-24)
- Reworked the Feature Importance Chapter
Expand Down
21 changes: 20 additions & 1 deletion manuscript/02-interpretability.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,25 @@ An interpretable model can tell you why it decided that a certain person is not
- Causality: Check if only causal relationships are picked up. Meaning a predicted change in a decision due to arbitrary changes in the input values are also happening in reality.
- Trust: It is easier for humans to trust a system that explains its decisions compared to a black box.

## Criteria for Interpretability Methods

Methods for machine learning interpretability can be classified according to different criteria:

- **Intrinsic or post hoc?**
Intrinsic interpretability means selecting and training a machine learning model that is considered to be intrinsically interpretable (for example short decision trees).
Post hoc interpretability means selecting and training a black box model (for example a neural network) and applying interpretability methods after the training (for example measuring the feature importance).
The "intrinsic or post hoc"-criterion determined the layout of the chapters in the book:
The two main chapters are the [intrinsically interpretable models chapter](#simple) and the [post hoc (and model-agnostic) interpretability methods chapter](#agnostic).
- **Model-specific or model-agnostic?**:
Model-specific interpretation tools are limited to specific model classes.
The interpretation of regression weights in a linear model is a model-specific interpretation, since - by definition - the interpretation of intrinsically interpretable models is always model-specific.
Any tool that only works for e.g. interpreting neural networks is model-specific.
Model-agnostic tools can be used on any machine learning model and are usually post hoc.
These agnostic methods usually operate by analysing feature input and output pairs.
By definition, these methods can't have access to any model internals like weights or structural information.
- **Local or global?**:
Does the interpretation method explain a single prediction or the entire model behavior? Or is the scope somewhere in between?
Read more about the scope criterion in the next section.

## Scope of Interpretability
An algorithm trains a model, which produces the predictions. Each step can be evaluated in terms of transparency or interpretability.
Expand Down Expand Up @@ -413,4 +432,4 @@ Generality is easily measured by a feature's support, which is the number of ins

[^Strumbelj2011]: Štrumbelj, Erik, and Igor Kononenko. 2011. "A General Method for Visualizing and Explaining Black-Box Regression Models." In International Conference on Adaptive and Natural Computing Algorithms, 21–30. Springer.

[^Nickerson]: Nickerson, Raymond S. 1998. "Confirmation Bias: A Ubiquitous Phenomenon in Many Guises." Review of General Psychology 2 (2). Educational Publishing Foundation: 175.
[^Nickerson]: Nickerson, Raymond S. 1998. "Confirmation Bias: A Ubiquitous Phenomenon in Many Guises." Review of General Psychology 2 (2). Educational Publishing Foundation: 175.

0 comments on commit a241056

Please sign in to comment.