-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cifar10_train.py on AMD SoC GPU (Kalindi) is 4 times slower than its SoC CPU (Kabini) #239
Comments
Hi @enihcam Thanks for the report. I will do my best to help. As well, do you know if the device you are using has physical local memory or is it using global memory to simulate it? |
Thank you @lukeiwanski. Sorry, what do you mean 'global memory to simulate it'? Since it is an integrated GPU, it uses system RAM (DDR3) shared by processor. |
Hi @enihcam, More specifically, it seems likely to me that there will be some redundant copies on APU hardware (since the memory is shared between the CPU and GPU). For these reasons, I don't think you will obtain good performance on this hardware, even if (as is likely) there are still optimisations we could make to our TensorFlow efforts. |
Are you using latest opencl-amd?
Putting aside whatever specific low end consideration now (his gpu should crunch just short of 150 Gflops btw).. shouldn't you look into zero copy then if that happens? |
That's certainly a possibility, but I don't imagine that this is an interesting optimisation target for us right now. That said I might be wrong - CodeXL might be able to provide some traces showing whether excessive time is being spent copying the buffers around. |
Thank you @mirh @DuncanMcBain Yes, I'm using latest opencl-amd (ver 18.10.572953). How to enable zero-copy? |
It would be more instructive to be sure that this is the issue first than to delve into the guts when, indeed, this optimisation might already be in effect. As I say, however, this hardware isn't currently an interesting target to us. |
Also I would like to know what is the performance of tensorflow-computecpp on intel GPU? is it also slower than CPU? |
I don't believe it is, though I don't have any numbers to hand at the moment (I don't have that hardware, and we don't test it internally, but I think we've done some ad-hoc tests). |
@enihcam / @DuncanMcBain after Neo driver was released we have Skylake series SoC available for tests and benchmarks - there is nothing ad-hoc about this ;) |
@lukeiwanski yes, i'm going to install it on KabyLake (i5-7200U) :D For AMD SoC, I'm wondering, is there any flags required to be turn on (or off) in kernel config? I ask this because all my linux boxes are using customized kernel. config.txt |
As long as AMDGPU and AMDKFD are there, I don't think there's any other particular requirement for it to perform "properly". I'm not sure how much of ROCm or HSA Kabini supports, anyway many features should be already exposed via opencl. EDIT: also of a fun fact, fglrx used to support 2.0 there once upon a time |
@mirh Aha! That explains why Kabini GPU is slow. It does NOT support HSA (i.e. AMDKFD)!! |
Nothing of that is used at all here in the first place. |
Please go to Stack Overflow for help and support:
https://stackoverflow.com/questions/tagged/tensorflow
If you open a GitHub issue, here is our policy:
Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
System information
python ./models/tutorials/image/cifar10/cifar10_train.py
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
Describe the problem
Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.
For CPU-based tensorflow, it was around ~80 examples/sec.
Source code / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
Build configuration:
https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=tensorflow-computecpp
The text was updated successfully, but these errors were encountered: