Add SHAP calculation to GBT regression #2460

ahuber21 · 2023-08-11T15:38:32Z

This PR explores SHAP values (originally added in XGBoost here: dmlc/xgboost#2438). I have taken the same algorithm and modified it to work with oneDAL datastructures. Most performance bottlenecks were removed. However, there is still room for improvement. (For instance a more efficient use of vector instructions in the unwoundPathSum() and unwindPath() helper functions.

I am creating this as a draft because I am aware of a few things that must be changed. Feel free to review regardless. All comments are helpful at this point.

Things to be addressed (incomplete list, likely to be extended)

Remove std entities (like std::vector in treeShap).
Align model builder classification and regression APIs. The property cover was added for regression, since it is needed to calculate SHAP values. I have added a dummy cover = 0 in some places to fix compilation errors, but that obviously needs improvement. Moving treeShap() to a common base class and enabling it for both classification and regression is probably the best solution.
Right now, treeShap execution is hardcoded, even though predShapContributions and predShapInteractions model parameters are available. The code in PredictRegressionTask::run() must be updated accordingly. Additionally, the fitting size of result memory must be allocated in gbt_regression_predict_result_fpt.cpp (also hardcoded to accommodate SHAP contributions).
It is only possible to calculate either SHAP contributions or interactions - this must be addressed in the constructor of the model parameters.
Validate changes don't affect performance / provide benchmarks
Rebase to current master to fix conflicts.

cpp/daal/include/algorithms/decision_forest/decision_forest_classification_model_builder.h

cpp/daal/src/algorithms/dtrees/dtrees_model_impl.h

cpp/daal/src/algorithms/dtrees/gbt/regression/gbt_regression_predict_dense_default_batch_impl.i

ahuber21 · 2023-09-28T10:13:30Z

/intelci: run

ahuber21 · 2023-09-28T11:10:34Z

@Alexsandruss please use this CI http://intel-ci.intel.com/ee5dea85-3976-f168-8976-a4bf010d0e2e
It is running against the recommended changes in intel/scikit-learn-intelex#1399 which are required for SHAP to work

napetrov · 2023-10-06T13:41:12Z

@ahuber21 - please rebase/merge from master - currently pr shows changes that already in master as a diff , hard to review

ahuber21 · 2023-10-10T08:15:14Z

http://intel-ci.intel.com/ee674475-2b8f-f13d-9e94-a4bf010d0e2e

add weights to GbtDecisionTree Include TreeShap recursion steps fix buffer overflow in memcpy Add cover to GbtDecisionTree from model builder fix some index offsets, correct results for trees up to depth=5 fix: nodeIsDummyLeaf is supposed to check left child remove some debug statements chore: apply oneDAL code style predictContribution wrapper with template dispatching increase speed by reducing number of cache misses use thread-local result accessor backup commit with 13% speedup wrt xgboost add preShapContributions/predShapInteractions as function parameter Revert "introduce pred_contribs and pred_interactions SHAP options" This reverts commit 483aa5b. remove some debug content reset env_detect.cpp to origin/master remove std::vector<float> test by introducing thread-local NumericTable Move treeshap into separate translation unit - caution: treeShap undefined in libonedal builds but segfaults Fix function arguments respect predShapContributions and predShapInteractions options and check for legal combinations tmp: work on pred_interactions

… function arguments

ahuber21 · 2023-10-19T14:06:33Z

Pushed updates after applying review comments. Only the usage of TArray / service_memset and the dispatching logic is missing.

cpp/daal/include/algorithms/gradient_boosted_trees/gbt_regression_predict_types.h

cpp/daal/include/algorithms/tree_utils/tree_utils.h

Co-authored-by: Victoriya Fedotova <viktoria.nn@gmail.com>

ahuber21 · 2023-10-20T12:39:15Z

~~http://intel-ci.intel.com/ee6f459b-ec5f-f1b4-98cd-a4bf010d0e2e~~

http://intel-ci.intel.com/ee71a0d5-b79b-f185-804b-a4bf010d0e2e

ahuber21 · 2023-10-25T15:20:22Z

New CI http://intel-ci.intel.com/ee7349e5-0f83-f1d0-84c0-a4bf010d0e2e

Update: http://intel-ci.intel.com/ee73f0be-9513-f11d-ba9b-a4bf010d0e2e

ahuber21 · 2023-10-26T15:03:13Z

Evaluation of CI failures: LinuxDaal4py is expected to fail, we depend on changes from intel/scikit-learn-intelex#1399

LinuxMakeDPCPP: Happens in other pipelines as well. Needs to be investigated separately.

Vika-F

As an initial implementation of treeSHAP in oneDAL this code looks good.
LGTM
And the other changes requested by the reviewers can be done in the subsequent PRs.

Add SHAP calculation to GBT regression (oneapi-src#2460)

ahuber21 mentioned this pull request Aug 11, 2023

Add SHAP calculation to GBT regression intel/scikit-learn-intelex#1399

Merged

KulikovNikita reviewed Aug 11, 2023

View reviewed changes

ahuber21 force-pushed the dev/ahuber/shap-values-DecisionTreeTable branch from e2dc13e to cc21b23 Compare September 18, 2023 14:31

ahuber21 force-pushed the dev/ahuber/shap-values-DecisionTreeTable branch from 88b770b to 00d8087 Compare September 27, 2023 16:25

ahuber21 marked this pull request as ready for review September 28, 2023 10:12

ahuber21 requested review from Alexsandruss and samir-nasibli as code owners September 28, 2023 10:12

ahuber21 requested a review from maria-Petrova as a code owner September 28, 2023 14:39

ahuber21 mentioned this pull request Sep 29, 2023

doc: Add RELEASE_PLANS.md #2534

Closed

ahuber21 requested a review from napetrov as a code owner October 6, 2023 13:20

ahuber21 added 16 commits October 10, 2023 01:19

no more segfaults

a36406e

fix pred_interactions

396230f

add fast treeshap v1

a2ec071

Add combinationSum calculation for Fast TreeSHAP v2

b922070

daal_calloc -> daal_malloc

d31a577

support shap contribution calculation with Fast TreeSHAP v2

b7389da

Consistently add cover to daaal APIs, add output parameters to end of…

0b45b88

… function arguments

align tree cfl/reg APIs

d396842

restore .gitignore from master

63da212

cleanup for review

9032573

add newline

2cb96e1

remove defaultLeft value that's not needed

b873df4

Update model builder examples

7cf533f

Add backwards-compatible model builder API & deprecate decls

55f3cb2

fix: remove dead code

c130bb0

ahuber21 added 5 commits October 18, 2023 07:19

review comments oneapi-src#1

6c87c7c

review comments oneapi-src#2 - fix pImpl idiom

a367244

refactor: replace boolean parameters with DAAL_UINT64 flag

10a9984

fix: usage of bias/margin for LightGBM models

3ef5a9c

review comments oneapi-src#2

73326ad

ahuber21 added 3 commits October 19, 2023 07:48

fixup endless for loop

3ba321b

use TArray, introduce TreeShapVersion enum

48167a5

use TArray where possible

5e94bcf

ahuber21 requested review from Vika-F, razdoburdin and KulikovNikita October 19, 2023 15:43

Vika-F reviewed Oct 20, 2023

View reviewed changes

cpp/daal/include/algorithms/gradient_boosted_trees/gbt_regression_predict_types.h Outdated Show resolved Hide resolved

Vika-F reviewed Oct 20, 2023

View reviewed changes

cpp/daal/include/algorithms/tree_utils/tree_utils.h Outdated Show resolved Hide resolved

ahuber21 and others added 3 commits October 20, 2023 02:49

fix: move data field to implementation class

6a7b7c9

Update cpp/daal/include/algorithms/tree_utils/tree_utils.h

d65f806

Co-authored-by: Victoriya Fedotova <viktoria.nn@gmail.com>

add typedef to shorten statements

f0a42a9

ahuber21 requested a review from Vika-F October 20, 2023 12:36

Dmitry Razdoburdin and others added 3 commits October 20, 2023 06:26

provide doxygen description of gbt classification funtions

b4a5058

fix some typos

137e1bb

consistently use size_t for node indexing; unsigned -> uint32_t

2c7b59a

ahuber21 added 2 commits October 26, 2023 04:05

fix: don't include test in release

5f19b8c

fix multiline comments

a0c2c7b

Vika-F approved these changes Oct 27, 2023

View reviewed changes

ahuber21 merged commit 25602a5 into oneapi-src:master Oct 27, 2023
10 of 13 checks passed

Vika-F added a commit to Vika-F/daal that referenced this pull request Oct 27, 2023

Merge pull request #12 from oneapi-src/master

203a1d7

Add SHAP calculation to GBT regression (oneapi-src#2460)

ahuber21 mentioned this pull request Nov 19, 2023

chore: remove fn decl from header after it was deleted #2586

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SHAP calculation to GBT regression #2460

Add SHAP calculation to GBT regression #2460

ahuber21 commented Aug 11, 2023 •

edited

Loading

ahuber21 commented Sep 28, 2023

ahuber21 commented Sep 28, 2023 •

edited

Loading

napetrov commented Oct 6, 2023

ahuber21 commented Oct 10, 2023

ahuber21 commented Oct 19, 2023

ahuber21 commented Oct 20, 2023 •

edited

Loading

ahuber21 commented Oct 25, 2023 •

edited

Loading

ahuber21 commented Oct 26, 2023

Vika-F left a comment

Add SHAP calculation to GBT regression #2460

Add SHAP calculation to GBT regression #2460

Conversation

ahuber21 commented Aug 11, 2023 • edited Loading

ahuber21 commented Sep 28, 2023

ahuber21 commented Sep 28, 2023 • edited Loading

napetrov commented Oct 6, 2023

ahuber21 commented Oct 10, 2023

ahuber21 commented Oct 19, 2023

ahuber21 commented Oct 20, 2023 • edited Loading

ahuber21 commented Oct 25, 2023 • edited Loading

ahuber21 commented Oct 26, 2023

Vika-F left a comment

Choose a reason for hiding this comment

ahuber21 commented Aug 11, 2023 •

edited

Loading

ahuber21 commented Sep 28, 2023 •

edited

Loading

ahuber21 commented Oct 20, 2023 •

edited

Loading

ahuber21 commented Oct 25, 2023 •

edited

Loading