-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] When to expect speedups for RF on CUDA? #6191
Comments
My specialty is more on the inference side of this question than on training, but let me see how much I can answer for you until the folks who wrote more of the training code are back in the office.
In general, RandomForest follows most other ML algorithms in terms of its GPU acceleration characteristics. The larger the dataset or the larger the model, the greater benefit that GPUs tend to offer. The exact cutoff is hardware dependent, but you can see some example benchmarks here. After the holidays, the folks primarily responsible for the training code can give you much more detailed answers, and if you have questions about inference in the meantime, I can answer that in as much detail as you like. Hope this at least gives you a start on what you need! |
Happy new years! Thank you for this detailed response. Looking forward to additional responses from the training team! Some follow-up questions after reading thru the benchmarks you linked. No rush in answering if the training team is still OOTO.
Is there any heuristic to suggest what one can set max_depth to be before it is not possible to fit on a GPU anymore? Additionally, I understand there is no loss of accuracy on these benchmarks at max_depth compared to sklearn, but I think a more fair comparison could be to compare against sklearn RF trained to purity (i.e. max_depth is not constrained)?
From what I understand, this reduces the search space of a split value from I.e. if one uses CPU and runs the cuml RF with the binning, and/or quantization strategy, is there still a speedup compared to sklearn? I'm asking because one could imagine a significant amount of the speedup in training being due to this algorithmic change. |
What is your question?
Hi, Thanks for the package!
I am browsing https://docs.rapids.ai/api/cuml/stable/api/#random-forest, and am wondering when one might expect speedups for the RF model over the model say in scikit-learn?
I am trying to understand where the GPU parallelization comes into play.
max_depth
is constrained. Is this in part due to the GPU implementation?I am also trying to understand if there are known limitations, and possible performance bottlenecks. Are there any docs or links to benchmark experiments that can help a user understand this better?
The text was updated successfully, but these errors were encountered: