From a73f8238a5b299724f284333a8851b3adc81d3a8 Mon Sep 17 00:00:00 2001
From: Christoph Molnar <christoph.molnar@gmail.com>
Date: Mon, 28 Nov 2022 16:58:25 +0100
Subject: [PATCH] fixes #211 and #330

---
 manuscript/05.9b-agnostic-shap.Rmd | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/manuscript/05.9b-agnostic-shap.Rmd b/manuscript/05.9b-agnostic-shap.Rmd
index 7f09e058..38b3bcfc 100644
--- a/manuscript/05.9b-agnostic-shap.Rmd
+++ b/manuscript/05.9b-agnostic-shap.Rmd
@@ -126,9 +126,9 @@ For tabular data, the following figure visualizes the mapping from coalitions to
 knitr::include_graphics("images/shap-simplified-features.jpg")
 ```
 
-$h_x$ for tabular data treats $X_C$ and $X_S$ as independent and integrates over the marginal distribution:
+$h_x$ for tabular data treats feature $X_j$ and $X_{-j}$ (the other features) as independent and integrates over the marginal distribution:
 
-$$\hat{f}(h_x(z'))=E_{X_C}[\hat{f}(x)]$$
+$$\hat{f}(h_x(z'))=E_{X_{-j}}[\hat{f}(x)]$$
 
 Sampling from the marginal distribution means ignoring the dependence structure between present and absent features.
 KernelSHAP therefore suffers from the same problem as all permutation-based interpretation methods.
@@ -198,7 +198,7 @@ If we add an L1 penalty to the loss L, we can create sparse explanations.
 Lundberg et al. (2018)[^treeshap] proposed TreeSHAP, a variant of SHAP for tree-based machine learning models such as decision trees, random forests and gradient boosted trees.
 TreeSHAP was introduced as a fast, model-specific alternative to KernelSHAP, but it turned out that it can produce unintuitive feature attributions.
 
-TreeSHAP defines the value function using the conditional expectation $E_{X_S|X_C}(\hat{f}(x)|x_S)$ instead of the marginal expectation.
+TreeSHAP defines the value function using the conditional expectation $E_{X_j|X_{-j}}(\hat{f}(x)|x_j)$ instead of the marginal expectation.
 The problem with the conditional expectation is that features that have no influence on the prediction function f can get a TreeSHAP estimate different from zero as shown by Sundararajan et al. (2019) [^cond1] and Janzing et al. (2019) [^cond2].
 The non-zero estimate can happen when the feature is correlated with another feature that actually has an influence on the prediction.
 
@@ -206,7 +206,7 @@ How much faster is TreeSHAP?
 Compared to exact KernelSHAP, it reduces the computational complexity from $O(TL2^M)$ to $O(TLD^2)$, where T is the number of trees, L is the maximum number of leaves in any tree and D the maximal depth of any tree.
 
 <!-- To explain an individual prediction with exact Shapley values, we have to estimate  $E(\hat{f}(x)|x_S)$ for all possible feature value subsets S.-->
-TreeSHAP uses the conditional expectation $E_{X_S|X_C}(\hat{f}(x)|x_S)$ to estimate effects.
+TreeSHAP uses the conditional expectation $E_{X_j|X_{-j}}(\hat{f}(x)|x_j)$ to estimate effects.
 I will give you some intuition on how we can compute the expected prediction for a single tree, an instance x and feature subset S.
 If we conditioned on all features -- if S was the set of all features -- then the prediction from the node in which the instance x falls would be the expected prediction.
 If we would not condition the prediction on any feature -- if S was empty -- we would use the weighted average of predictions of all terminal nodes.