Skip to content
This repository has been archived by the owner on Mar 30, 2022. It is now read-only.

Eager tensors always report being on CPU Device despite documentation #524

Open
garymm opened this issue Aug 26, 2020 · 5 comments
Open
Assignees

Comments

@garymm
Copy link
Contributor

garymm commented Aug 26, 2020

I'm playing with https://www.tensorflow.org/swift/tutorials/introducing_x10. Both locally and on Colab, the eager tensor shows up on the CPU. The text says If you are running this notebook on a GPU-enabled instance, you should see that hardware reflected in the device description above.

Even if I try to force it to the GPU, it seems to stay on the CPU:

let eagerGPU = Device(kind: .GPU, ordinal: 0, backend: .TF_EAGER)
let eagerTensor1 = Tensor([0.0, 1.0, 2.0], on: eagerGPU)
let eagerTensor2 = Tensor([1.5, 2.5, 3.5], on: eagerGPU)
let eagerTensorSum = eagerTensor1 + eagerTensor2
eagerTensor1.device

Output:

▿ Device(kind: .CPU, ordinal: 0, backend: .TF_EAGER)
  - kind : TensorFlow.Device.Kind.CPU
  - ordinal : 0
  - backend : TensorFlow.Device.Backend.TF_EAGER

So I'd say there may be 2 bugs here:

  1. Either the documentation is wrong and eager tensors are only supposed to be able to use the CPU, or the documentation is right and code is buggy and doesn't use the GPU, and
  2. If the documentation is wrong, creating a tensor with an eager GPU should fail rather than silently run on the CPU.
@BradLarson
Copy link
Contributor

I believe this is due to a bug in the way that eager tensors report their device location. The eager tensors have their operations dispatched on the default accelerator, but always report themselves as being located on the CPU. If you run operations using them on your local machine, you can verify that they're running on the GPU by monitoring GPU activity via nvidia-smi or similar tools.

Likewise, eager tensors currently ignore the device you specify for them, so if you tell them to run on the CPU when there's a GPU available, they'll still run on the GPU.

X10 tensors are accurate in reporting which device they're attached to, as well as respecting manual device placement, just not eager tensors.

@garymm garymm changed the title Impossible to do eager execution GPU Device despite documentation Eager tensors always report being on CPU Device despite documentation Aug 26, 2020
@texasmichelle
Copy link
Member

texasmichelle commented Sep 4, 2020

It looks like this line is always returning the CPU device. I'll figure out how to surface the actual device being used.

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Oct 18, 2020
This addition enables more efficient device handling in S4TF without needing to parse the full device string. As support for devices beyond TF eager are added, this info is needed more often and has a bigger impact on performance.

Partial fix for tensorflow/swift#524.

PiperOrigin-RevId: 337696655
Change-Id: Ifb576d37c765cced2329b77e0cebb591d8d3a46c
@texasmichelle
Copy link
Member

Once swift-apis#1156 is merged, TFE_TensorHandleDeviceType and TFE_TensorHandleDeviceID will be available, making this a straightforward fix.

@texasmichelle
Copy link
Member

Tentative changes here.

@texasmichelle
Copy link
Member

I ran into a problem adding eager/c_api_experimental.h since it contains C++ syntax in the initialization of the TFE_CustomDevice struct.

/home/michellecasbon/repos/out/libtensorflow-prefix/src/libtensorflow/tensorflow/c/eager/c_api_experimental.h:446:14: error: expected ';' at end of declaration list
  int version = TFE_CUSTOM_DEVICE_VERSION;
             ^

It's unclear how to get around this without pursuing custom import rules or modifying upstream.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants