From a73f8238a5b299724f284333a8851b3adc81d3a8 Mon Sep 17 00:00:00 2001 From: Christoph Molnar Date: Mon, 28 Nov 2022 16:58:25 +0100 Subject: [PATCH] fixes #211 and #330 --- manuscript/05.9b-agnostic-shap.Rmd | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/manuscript/05.9b-agnostic-shap.Rmd b/manuscript/05.9b-agnostic-shap.Rmd index 7f09e058..38b3bcfc 100644 --- a/manuscript/05.9b-agnostic-shap.Rmd +++ b/manuscript/05.9b-agnostic-shap.Rmd @@ -126,9 +126,9 @@ For tabular data, the following figure visualizes the mapping from coalitions to knitr::include_graphics("images/shap-simplified-features.jpg") ``` -$h_x$ for tabular data treats $X_C$ and $X_S$ as independent and integrates over the marginal distribution: +$h_x$ for tabular data treats feature $X_j$ and $X_{-j}$ (the other features) as independent and integrates over the marginal distribution: -$$\hat{f}(h_x(z'))=E_{X_C}[\hat{f}(x)]$$ +$$\hat{f}(h_x(z'))=E_{X_{-j}}[\hat{f}(x)]$$ Sampling from the marginal distribution means ignoring the dependence structure between present and absent features. KernelSHAP therefore suffers from the same problem as all permutation-based interpretation methods. @@ -198,7 +198,7 @@ If we add an L1 penalty to the loss L, we can create sparse explanations. Lundberg et al. (2018)[^treeshap] proposed TreeSHAP, a variant of SHAP for tree-based machine learning models such as decision trees, random forests and gradient boosted trees. TreeSHAP was introduced as a fast, model-specific alternative to KernelSHAP, but it turned out that it can produce unintuitive feature attributions. -TreeSHAP defines the value function using the conditional expectation $E_{X_S|X_C}(\hat{f}(x)|x_S)$ instead of the marginal expectation. +TreeSHAP defines the value function using the conditional expectation $E_{X_j|X_{-j}}(\hat{f}(x)|x_j)$ instead of the marginal expectation. The problem with the conditional expectation is that features that have no influence on the prediction function f can get a TreeSHAP estimate different from zero as shown by Sundararajan et al. (2019) [^cond1] and Janzing et al. (2019) [^cond2]. The non-zero estimate can happen when the feature is correlated with another feature that actually has an influence on the prediction. @@ -206,7 +206,7 @@ How much faster is TreeSHAP? Compared to exact KernelSHAP, it reduces the computational complexity from $O(TL2^M)$ to $O(TLD^2)$, where T is the number of trees, L is the maximum number of leaves in any tree and D the maximal depth of any tree. -TreeSHAP uses the conditional expectation $E_{X_S|X_C}(\hat{f}(x)|x_S)$ to estimate effects. +TreeSHAP uses the conditional expectation $E_{X_j|X_{-j}}(\hat{f}(x)|x_j)$ to estimate effects. I will give you some intuition on how we can compute the expected prediction for a single tree, an instance x and feature subset S. If we conditioned on all features -- if S was the set of all features -- then the prediction from the node in which the instance x falls would be the expected prediction. If we would not condition the prediction on any feature -- if S was empty -- we would use the weighted average of predictions of all terminal nodes.