-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataloader single worker and default batch_size
makes R tabnet 4-15x slower than pytorch tabnet
#37
Comments
We still didn't do any profiling of tabnet code to detect bottlenecks so this is expected in general. It's likely that future improvements in the torch package will make it faster and comparable to python's implementation. |
I thought most of the time should be spent in C code and that should be the same in R and Python. However if the R implementation takes 1500 sec and the Python 100sec, then max 100 sec is spent is C and 1400 sec somewhere else. I don't know where, Rprof is not giving me something that let's me figure it out:
|
... and if that's the case, improvements is torch will decrease the 100 sec, but not the rest. |
All the Rprof results is showing torch code. Why do you state that improvements in torch will decrease the 100 sec and not the rest? |
My "statement" was conditional on my belief that both R and Python implementations would call the same C code ("if that's the case"). Maybe that's not the case. |
I don't know exactly what you mean by the same C code. It's indeed different from xgboost where there's a C++ implementation and both python and R bind to that lib. In tabnet's case we have re-implemented the algorithm in R, using torch that binds to libtorch, so any improvements in torch are likely to have a great impact in downstream libraries. |
OK, thanks for clarifying. |
Hello @szilard ,
which brings a 40%+ time improvement (one epoch, my laptop, CPU). |
Thanks @cregouby for looking into this and remarks. Even with the 40% improvement it would be a 10x slowdown, which to my current understanding based on the above it's happening somewhere in the |
To prepare for taking part in the R TabNet Online Meetup in Brussels I did setup my environments for both the R and Python TabNet. Sadly the performance differences are as of today (2022-01-30) quite similar to the original post. I did several runs on my local machine, CPU and GPU (Nothing fancy, 4+ years old Dell laptop) Python & GPUSame code for Python as in the initial post, just split into 3 chunks into a Jupyter notebook and run chunkwise.
R & GPUThe time was done as suggested by @cregouby for comparability. For R I first run with my then current fully refreshed setup for
The final clean run:
|
Hello @gsgxnet , R implementationOnly one CPU thread is running for preprocessing and py implementationat the same time with python , the CPU profile is using the 4 CPUs I've tried to add the
|
Hi @cregouby, thanks for looking further into it. I am trying to figure out whats going on myself, but did not make any real progress. What I have: PyTorch run on GPU, display of CPU cores loads: Nothing unexpected, Python is not multi threading. R similar upper limit of about 100% total, R in general single threaded. Not all though! The peak on the end is the evaluation and Even just visually comparing the two graphs shows, there is nearly a magnitude (at least 5x) more time spend running some tasks on the CPU in the R setup. @dfalbel I will try to find out in which R most of the CPU time seems to be spend. In the past |
Using
I get: As far as I know, that |
batch_size
makes R tabnet 4-15x slower than pytorch tabnet
Thanks a lot @gsgxnet for this investigation, very helpfull. |
Yes please do. |
R code:
Python code:
m5.2xlarge (8 cores):
R:
Python:
Some of the parameter values will have different defaults in the R and Python libs, but still the difference in runtime is too much. More details of my experiments here: szilard/GBM-perf#52
The text was updated successfully, but these errors were encountered: