Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SHAP calculation to GBT regression #2460

Merged

Conversation

ahuber21
Copy link
Contributor

@ahuber21 ahuber21 commented Aug 11, 2023

This PR explores SHAP values (originally added in XGBoost here: dmlc/xgboost#2438). I have taken the same algorithm and modified it to work with oneDAL datastructures. Most performance bottlenecks were removed. However, there is still room for improvement. (For instance a more efficient use of vector instructions in the unwoundPathSum() and unwindPath() helper functions.

I am creating this as a draft because I am aware of a few things that must be changed. Feel free to review regardless. All comments are helpful at this point.

Things to be addressed (incomplete list, likely to be extended)

  • Remove std entities (like std::vector in treeShap).
  • Align model builder classification and regression APIs. The property cover was added for regression, since it is needed to calculate SHAP values. I have added a dummy cover = 0 in some places to fix compilation errors, but that obviously needs improvement. Moving treeShap() to a common base class and enabling it for both classification and regression is probably the best solution.
  • Right now, treeShap execution is hardcoded, even though predShapContributions and predShapInteractions model parameters are available. The code in PredictRegressionTask::run() must be updated accordingly. Additionally, the fitting size of result memory must be allocated in gbt_regression_predict_result_fpt.cpp (also hardcoded to accommodate SHAP contributions).
  • It is only possible to calculate either SHAP contributions or interactions - this must be addressed in the constructor of the model parameters.
  • Validate changes don't affect performance / provide benchmarks
  • Rebase to current master to fix conflicts.

@ahuber21 ahuber21 force-pushed the dev/ahuber/shap-values-DecisionTreeTable branch from e2dc13e to cc21b23 Compare September 18, 2023 14:31
@ahuber21 ahuber21 force-pushed the dev/ahuber/shap-values-DecisionTreeTable branch from 88b770b to 00d8087 Compare September 27, 2023 16:25
@ahuber21 ahuber21 marked this pull request as ready for review September 28, 2023 10:12
@ahuber21
Copy link
Contributor Author

/intelci: run

@ahuber21
Copy link
Contributor Author

ahuber21 commented Sep 28, 2023

@Alexsandruss please use this CI http://intel-ci.intel.com/ee5dea85-3976-f168-8976-a4bf010d0e2e
It is running against the recommended changes in intel/scikit-learn-intelex#1399 which are required for SHAP to work

@napetrov
Copy link
Contributor

napetrov commented Oct 6, 2023

@ahuber21 - please rebase/merge from master - currently pr shows changes that already in master as a diff , hard to review

@ahuber21
Copy link
Contributor Author

add weights to GbtDecisionTree

Include TreeShap recursion steps

fix buffer overflow in memcpy

Add cover to GbtDecisionTree from model builder

fix some index offsets, correct results for trees up to depth=5

fix: nodeIsDummyLeaf is supposed to check left child

remove some debug statements

chore: apply oneDAL code style

predictContribution wrapper with template dispatching

increase speed by reducing number of cache misses

use thread-local result accessor

backup commit with 13% speedup wrt xgboost

add preShapContributions/predShapInteractions as function parameter

Revert "introduce pred_contribs and pred_interactions SHAP options"

This reverts commit 483aa5b.

remove some debug content

reset env_detect.cpp to origin/master

remove std::vector<float> test by introducing thread-local NumericTable

Move treeshap into separate translation unit - caution: treeShap undefined in libonedal

builds but segfaults

Fix function arguments

respect predShapContributions and predShapInteractions options and check for legal combinations

tmp: work on pred_interactions
@ahuber21
Copy link
Contributor Author

Pushed updates after applying review comments. Only the usage of TArray / service_memset and the dispatching logic is missing.

@ahuber21 ahuber21 requested a review from Vika-F October 20, 2023 12:36
@ahuber21
Copy link
Contributor Author

ahuber21 commented Oct 20, 2023

@ahuber21
Copy link
Contributor Author

ahuber21 commented Oct 25, 2023

@ahuber21
Copy link
Contributor Author

Evaluation of CI failures: LinuxDaal4py is expected to fail, we depend on changes from intel/scikit-learn-intelex#1399

LinuxMakeDPCPP: Happens in other pipelines as well. Needs to be investigated separately.

Copy link
Contributor

@Vika-F Vika-F left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an initial implementation of treeSHAP in oneDAL this code looks good.
LGTM
And the other changes requested by the reviewers can be done in the subsequent PRs.

@ahuber21 ahuber21 merged commit 25602a5 into oneapi-src:master Oct 27, 2023
10 of 13 checks passed
Vika-F added a commit to Vika-F/daal that referenced this pull request Oct 27, 2023
Add SHAP calculation to GBT regression (oneapi-src#2460)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants