Skip to content

Commit

Permalink
fixes #211 and #330
Browse files Browse the repository at this point in the history
  • Loading branch information
christophM committed Nov 28, 2022
1 parent 98e1712 commit a73f823
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions manuscript/05.9b-agnostic-shap.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,9 @@ For tabular data, the following figure visualizes the mapping from coalitions to
knitr::include_graphics("images/shap-simplified-features.jpg")
```

$h_x$ for tabular data treats $X_C$ and $X_S$ as independent and integrates over the marginal distribution:
$h_x$ for tabular data treats feature $X_j$ and $X_{-j}$ (the other features) as independent and integrates over the marginal distribution:

$$\hat{f}(h_x(z'))=E_{X_C}[\hat{f}(x)]$$
$$\hat{f}(h_x(z'))=E_{X_{-j}}[\hat{f}(x)]$$

Sampling from the marginal distribution means ignoring the dependence structure between present and absent features.
KernelSHAP therefore suffers from the same problem as all permutation-based interpretation methods.
Expand Down Expand Up @@ -198,15 +198,15 @@ If we add an L1 penalty to the loss L, we can create sparse explanations.
Lundberg et al. (2018)[^treeshap] proposed TreeSHAP, a variant of SHAP for tree-based machine learning models such as decision trees, random forests and gradient boosted trees.
TreeSHAP was introduced as a fast, model-specific alternative to KernelSHAP, but it turned out that it can produce unintuitive feature attributions.

TreeSHAP defines the value function using the conditional expectation $E_{X_S|X_C}(\hat{f}(x)|x_S)$ instead of the marginal expectation.
TreeSHAP defines the value function using the conditional expectation $E_{X_j|X_{-j}}(\hat{f}(x)|x_j)$ instead of the marginal expectation.
The problem with the conditional expectation is that features that have no influence on the prediction function f can get a TreeSHAP estimate different from zero as shown by Sundararajan et al. (2019) [^cond1] and Janzing et al. (2019) [^cond2].
The non-zero estimate can happen when the feature is correlated with another feature that actually has an influence on the prediction.

How much faster is TreeSHAP?
Compared to exact KernelSHAP, it reduces the computational complexity from $O(TL2^M)$ to $O(TLD^2)$, where T is the number of trees, L is the maximum number of leaves in any tree and D the maximal depth of any tree.

<!-- To explain an individual prediction with exact Shapley values, we have to estimate $E(\hat{f}(x)|x_S)$ for all possible feature value subsets S.-->
TreeSHAP uses the conditional expectation $E_{X_S|X_C}(\hat{f}(x)|x_S)$ to estimate effects.
TreeSHAP uses the conditional expectation $E_{X_j|X_{-j}}(\hat{f}(x)|x_j)$ to estimate effects.
I will give you some intuition on how we can compute the expected prediction for a single tree, an instance x and feature subset S.
If we conditioned on all features -- if S was the set of all features -- then the prediction from the node in which the instance x falls would be the expected prediction.
If we would not condition the prediction on any feature -- if S was empty -- we would use the weighted average of predictions of all terminal nodes.
Expand Down

0 comments on commit a73f823

Please sign in to comment.