Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does adanet support GBDT as subnetwork? #121

Open
fangkuann opened this issue Sep 5, 2019 · 3 comments
Open

Does adanet support GBDT as subnetwork? #121

fangkuann opened this issue Sep 5, 2019 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@fangkuann
Copy link

No description provided.

@cweill cweill added the question Further information is requested label Sep 9, 2019
@cweill
Copy link
Contributor

cweill commented Sep 9, 2019

@fangkuann: Yes it does! You can see how we use it here. Note this is the old contrib version of GBDT, and there is a new one called tf.estimator.BoostedTreesEstimator which should work, however we haven't tried it.

Please give it a shot and let us know how it works for you.

@fangkuann
Copy link
Author

fangkuann commented Sep 10, 2019

@cweill thanks for your answer!
In this paper https://ai.google/research/pubs/pub48133/, the authors combine the GBDT based model and DNN based model to enhance model performance.
Therefore I want to reproduce it in the product where I worked for using AdaNet. Since I want to train two models using only one single training tool (Training GBDT using LightGBM and training NN using TensorFlow makes system complex, and hard to maintain)

According to the tutorial https://medium.com/tensorflow/combining-multiple-tensorflow-hub-modules-into-one-ensemble-network-with-adanet-56fa73588bb0.
It seems using BoostedTreesEstimator here is almost same as using LinearEstimator, just replace the linear_estimator instead of tree_estimator like code below.

import adanet
estimator = adanet.AutoEnsembleEstimator(
    head=ranking_head,
    candidate_pool=[
        tree_estimator,
        dnn_estimator
    ],
    config=run_config,
    max_iteration_steps=50000,
)

However, our DNN based model has already trained and deployed using TF-Ranking library. Therefore, the input_fn for training has the shape of [None, list_size, 1] for each feature.

When running, the GBDT throws an exception when create bucket for each feature:
ValueError: List argument 'bucket_boundaries' to 'boosted_trees_bucketize' Op with length 1 must match length 64 of argument 'float_values'.
where 64 is list_size in our training config. Maybe GBDT expected [None, 1] shape for each feature.

I did not know whether it is still available to combine TF-Ranking and AdaNet. If it is yes, what is the guideline to do it.

@cweill
Copy link
Contributor

cweill commented Sep 12, 2019

@fangkuann Thanks for sharing that paper, I wasn't aware of it, and will share it with our team. As for the Core GBDT in AdaNet: your sample code looks fine. However I have not used TF-Ranking.

One thing you could try is use the same TF-Ranking head for the NN and GBDT, as well as the AutoEnsembleEstimator. What does the dnn_estimator look like? Is it using ranking_head too?

@cweill cweill self-assigned this Sep 12, 2019
cweill pushed a commit that referenced this issue Sep 19, 2019
…#1 #121

In estimator_distributed_test_runner.py.

PiperOrigin-RevId: 270085601
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants