-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SHAP calculation to GBT regression #2460
Add SHAP calculation to GBT regression #2460
Conversation
cpp/daal/include/algorithms/decision_forest/decision_forest_classification_model_builder.h
Outdated
Show resolved
Hide resolved
cpp/daal/src/algorithms/dtrees/gbt/regression/gbt_regression_predict_dense_default_batch_impl.i
Outdated
Show resolved
Hide resolved
cpp/daal/src/algorithms/dtrees/gbt/regression/gbt_regression_predict_dense_default_batch_impl.i
Outdated
Show resolved
Hide resolved
cpp/daal/src/algorithms/dtrees/gbt/regression/gbt_regression_predict_dense_default_batch_impl.i
Outdated
Show resolved
Hide resolved
cpp/daal/src/algorithms/dtrees/gbt/regression/gbt_regression_predict_dense_default_batch_impl.i
Outdated
Show resolved
Hide resolved
cpp/daal/src/algorithms/dtrees/gbt/regression/gbt_regression_predict_dense_default_batch_impl.i
Outdated
Show resolved
Hide resolved
cpp/daal/src/algorithms/dtrees/gbt/regression/gbt_regression_predict_dense_default_batch_impl.i
Outdated
Show resolved
Hide resolved
cpp/daal/src/algorithms/dtrees/gbt/regression/gbt_regression_predict_dense_default_batch_impl.i
Outdated
Show resolved
Hide resolved
cpp/daal/src/algorithms/dtrees/gbt/regression/gbt_regression_predict_dense_default_batch_impl.i
Outdated
Show resolved
Hide resolved
e2dc13e
to
cc21b23
Compare
88b770b
to
00d8087
Compare
/intelci: run |
@Alexsandruss please use this CI http://intel-ci.intel.com/ee5dea85-3976-f168-8976-a4bf010d0e2e |
@ahuber21 - please rebase/merge from master - currently pr shows changes that already in master as a diff , hard to review |
add weights to GbtDecisionTree Include TreeShap recursion steps fix buffer overflow in memcpy Add cover to GbtDecisionTree from model builder fix some index offsets, correct results for trees up to depth=5 fix: nodeIsDummyLeaf is supposed to check left child remove some debug statements chore: apply oneDAL code style predictContribution wrapper with template dispatching increase speed by reducing number of cache misses use thread-local result accessor backup commit with 13% speedup wrt xgboost add preShapContributions/predShapInteractions as function parameter Revert "introduce pred_contribs and pred_interactions SHAP options" This reverts commit 483aa5b. remove some debug content reset env_detect.cpp to origin/master remove std::vector<float> test by introducing thread-local NumericTable Move treeshap into separate translation unit - caution: treeShap undefined in libonedal builds but segfaults Fix function arguments respect predShapContributions and predShapInteractions options and check for legal combinations tmp: work on pred_interactions
… function arguments
Pushed updates after applying review comments. Only the usage of |
cpp/daal/include/algorithms/gradient_boosted_trees/gbt_regression_predict_types.h
Outdated
Show resolved
Hide resolved
Co-authored-by: Victoriya Fedotova <viktoria.nn@gmail.com>
Evaluation of CI failures: LinuxDaal4py is expected to fail, we depend on changes from intel/scikit-learn-intelex#1399 LinuxMakeDPCPP: Happens in other pipelines as well. Needs to be investigated separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an initial implementation of treeSHAP in oneDAL this code looks good.
LGTM
And the other changes requested by the reviewers can be done in the subsequent PRs.
Add SHAP calculation to GBT regression (oneapi-src#2460)
This PR explores SHAP values (originally added in XGBoost here: dmlc/xgboost#2438). I have taken the same algorithm and modified it to work with oneDAL datastructures. Most performance bottlenecks were removed. However, there is still room for improvement. (For instance a more efficient use of vector instructions in the
unwoundPathSum()
andunwindPath()
helper functions.I am creating this as a draft because I am aware of a few things that must be changed. Feel free to review regardless. All comments are helpful at this point.
Things to be addressed (incomplete list, likely to be extended)
cover
was added for regression, since it is needed to calculate SHAP values. I have added a dummycover = 0
in some places to fix compilation errors, but that obviously needs improvement. MovingtreeShap()
to a common base class and enabling it for both classification and regression is probably the best solution.predShapContributions
andpredShapInteractions
model parameters are available. The code inPredictRegressionTask::run()
must be updated accordingly. Additionally, the fitting size of result memory must be allocated in gbt_regression_predict_result_fpt.cpp (also hardcoded to accommodate SHAP contributions).