fixes #211 and #330

christophM · Nov 28, 2022 · a73f823 · a73f823
1 parent 98e1712
commit a73f823
Showing 1 changed file with 4 additions and 4 deletions.
diff --git a/manuscript/05.9b-agnostic-shap.Rmd b/manuscript/05.9b-agnostic-shap.Rmd
@@ -126,9 +126,9 @@ For tabular data, the following figure visualizes the mapping from coalitions to
 knitr::include_graphics("images/shap-simplified-features.jpg")
 ```
 
-$h_x$ for tabular data treats $X_C$ and $X_S$ as independent and integrates over the marginal distribution:
+$h_x$ for tabular data treats feature $X_j$ and $X_{-j}$ (the other features) as independent and integrates over the marginal distribution:
 
-$$\hat{f}(h_x(z'))=E_{X_C}[\hat{f}(x)]$$
+$$\hat{f}(h_x(z'))=E_{X_{-j}}[\hat{f}(x)]$$
 
 Sampling from the marginal distribution means ignoring the dependence structure between present and absent features.
 KernelSHAP therefore suffers from the same problem as all permutation-based interpretation methods.
@@ -198,15 +198,15 @@ If we add an L1 penalty to the loss L, we can create sparse explanations.
 Lundberg et al. (2018)[^treeshap] proposed TreeSHAP, a variant of SHAP for tree-based machine learning models such as decision trees, random forests and gradient boosted trees.
 TreeSHAP was introduced as a fast, model-specific alternative to KernelSHAP, but it turned out that it can produce unintuitive feature attributions.
 
-TreeSHAP defines the value function using the conditional expectation $E_{X_S|X_C}(\hat{f}(x)|x_S)$ instead of the marginal expectation.
+TreeSHAP defines the value function using the conditional expectation $E_{X_j|X_{-j}}(\hat{f}(x)|x_j)$ instead of the marginal expectation.
 The problem with the conditional expectation is that features that have no influence on the prediction function f can get a TreeSHAP estimate different from zero as shown by Sundararajan et al. (2019) [^cond1] and Janzing et al. (2019) [^cond2].
 The non-zero estimate can happen when the feature is correlated with another feature that actually has an influence on the prediction.
 
 How much faster is TreeSHAP?
 Compared to exact KernelSHAP, it reduces the computational complexity from $O(TL2^M)$ to $O(TLD^2)$, where T is the number of trees, L is the maximum number of leaves in any tree and D the maximal depth of any tree.
 
 <!-- To explain an individual prediction with exact Shapley values, we have to estimate $E(\hat{f}(x)|x_S)$ for all possible feature value subsets S.-->
-TreeSHAP uses the conditional expectation $E_{X_S|X_C}(\hat{f}(x)|x_S)$ to estimate effects.
+TreeSHAP uses the conditional expectation $E_{X_j|X_{-j}}(\hat{f}(x)|x_j)$ to estimate effects.
 I will give you some intuition on how we can compute the expected prediction for a single tree, an instance x and feature subset S.
 If we conditioned on all features -- if S was the set of all features -- then the prediction from the node in which the instance x falls would be the expected prediction.
 If we would not condition the prediction on any feature -- if S was empty -- we would use the weighted average of predictions of all terminal nodes.