You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 28, 2024. It is now read-only.
I know that in your project description, you mentioned that one has to download the docker image provided in order to build the custom op successfully.
I first tried to use 'make' to build the op. It only works for the ZeroOut CPU op, on my own machine with tf 1.15. But what I really want to do is build a GPU op on my own machine with tf 1.15. Besides, I used your provided docker image, and tried to use 'make' to build the TimeTwo GPU op, it didn't work, complaining the unkonwn -fPIC flag issue:
So, I removed the -fPIC flag. (Notice that there are two -fPIC flags in the nvcc command, I have to remove both to make this error disappear.)
So I copied the command, remove -fPIC flag, and run to get another error:
root@5f0d055f7746:/custom-op# nvcc -std=c++11 -c -o tensorflow_time_two/python/ops/_time_two_ops.cu.o tensorflow_time_two/cc/kernels/time_two_kernels.cu.cc -I/usr/local/lib/python3.6/dist-packages/tensorflow/include -D_GLIBCXX_USE_CXX11_ABI=0 -O2 -std=c++11 -L/usr/local/lib/python3.6/dist-packages/tensorflow -l:libtensorflow_framework.so.2 -D GOOGLE_CUDA=1 -x cu -Xcompiler -DNDEBUG --expt-relaxed-constexpr
In file included from tensorflow_time_two/cc/kernels/time_two_kernels.cu.cc:21:0:
/usr/local/lib/python3.6/dist-packages/tensorflow/include/tensorflow/core/util/gpu_kernel_helper.h:22:53: fatal error: third_party/gpus/cuda/include/cuda_fp16.h: No such file or directory
compilation terminated.
It seems that in this gpu_kernel_helper.h file, it has to include cuda_fp16.h header file. I can't find this file in the docker image, but found some in my host machine, so I copied one into the directory third_party/gpus/cuda/include (I don't have /gpus/cuda/include directory under third_party actually).
root@5f0d055f7746:/usr/local/lib/python3.6/dist-packages/tensorflow/include/third_party/gpus/cuda/include# ll
total 112
drwxr-sr-x 2 root staff 25 Aug 7 06:42 ./
drwxr-sr-x 3 root staff 21 Aug 7 06:42 ../
-rw-r--r-- 1 root staff 114479 Aug 7 06:42 cuda_fp16.h
Now running the nvcc command again:
root@5f0d055f7746:/custom-op# nvcc -std=c++11 -c -o tensorflow_time_two/python/ops/_time_two_ops.cu.o tensorflow_time_two/cc/kernels/time_two_kernels.cu.cc -I/usr/local/lib/python3.6/dist-packages/tensorflow/include -D_GLIBCXX_USE_CXX11_ABI=0 -O2 -std=c++11 -L/usr/local/lib/python3.6/dist-packages/tensorflow -l:libtensorflow_framework.so.2 -D GOOGLE_CUDA=1 -x cu -Xcompiler -DNDEBUG --expt-relaxed-constexpr
In file included from /usr/local/lib/python3.6/dist-packages/tensorflow/include/tensorflow/core/util/gpu_kernel_helper.h:25:0,
from tensorflow_time_two/cc/kernels/time_two_kernels.cu.cc:21:
/usr/local/lib/python3.6/dist-packages/tensorflow/include/tensorflow/core/util/gpu_device_functions.h:34:53: fatal error: third_party/gpus/cuda/include/cuComplex.h: No such file or directory
compilation terminated.
Well, this file is also in the same directory in my host machine, in case some other headers are needed, so I copied all contents into the required directory on the docker image.
Now I got at least 100 errors( which is too long to paste them all here, so I pasted some of them here for reference):
root@5f0d055f7746:/custom-op# nvcc -std=c++11 -c -o tensorflow_time_two/python/ops/_time_two_ops.cu.o tensorflow_time_two/cc/kernels/time_two_kernels.cu.cc -I/usr/local/lib/python3.6/dist-packages/tensorflow/include -D_GLIBCXX_USE_CXX11_ABI=0 -O2 -std=c++11 -L/usr/local/lib/python3.6/dist-packages/tensorflow -l:libtensorflow_framework.so.2 -D GOOGLE_CUDA=1 -x cu -Xcompiler -DNDEBUG --expt-relaxed-constexpr
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_fp16.h(129): error: invalid redeclaration of type name "__half"
/usr/local/lib/python3.6/dist-packages/tensorflow/include/third_party/gpus/cuda/include/cuda_fp16.h(96): here
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_fp16.h(140): error: invalid redeclaration of type name "__half2"
/usr/local/lib/python3.6/dist-packages/tensorflow/include/third_party/gpus/cuda/include/cuda_fp16.h(100): here
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_fp16.h(155): error: cannot overload functions distinguished by return type alone
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_fp16.h(183): error: cannot overload functions distinguished by return type alone
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_fp16.h(198): error: cannot overload functions distinguished by return type alone
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_fp16.h(213): error: cannot overload functions distinguished by return type alone
...
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_fp16.hpp(166): error: class "__half" has no member "__x"
...
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_fp16.hpp(360): error: cannot overload functions distinguished by return type alone
...
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_fp16.hpp(552): error: no suitable user-defined conversion from "__half2" to "__half2" exists
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_fp16.hpp(2053): error: invalid redeclaration of type name "half"
/usr/local/lib/python3.6/dist-packages/tensorflow/include/third_party/gpus/cuda/include/cuda_fp16.h(103): here
/usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_fp16.hpp(2054): error: invalid redeclaration of type name "half2"
/usr/local/lib/python3.6/dist-packages/tensorflow/include/third_party/gpus/cuda/include/cuda_fp16.h(104): here
/usr/local/lib/python3.6/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/arch/Default/TypeCasting.h(25): error: no suitable user-defined conversion from "__half" to "Eigen::half" exists
/usr/local/lib/python3.6/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/arch/GPU/PacketMath.h(1005): error: no suitable user-defined conversion from "__half2" to "half2" exists
Error limit reached.
100 errors detected in the compilation of "/tmp/tmpxft_000000d7_00000000-6_time_two_kernels.cu.cpp1.ii".
Compilation terminated.
The errors all seem to relate to the files we just copied them to. But I don't know what's wrong and how to solve it.
Then I turned to use bazel. Finally I successfully built the TimeTwo GPU op under this tensorflow/tensorflow:custom-op-gpu-ubuntu16 docker image. But I want to modify the bazel code in order to build a simple GPU custom op on Ubuntu 16.0 with tf==1.15. I am not quite familiar with bazel, and am trying to learn it. I don't understand why this is so complicated to build a simple GPU op. :(
The text was updated successfully, but these errors were encountered:
It seems that in your Makefile, the command to build the GPU op TimeTwo is very simple, just a nvcc command is OK. But to build with bazel, there are some other folders that are needed, e.g. 'gpu', 'tf', 'third_party' which makes me learning to build with bazel on a different machine quite difficult. Is there a simple way to use nvcc to build the GPU op?
I got the same problem. This cuda_fp16.h file is missing (I installed TensorFlow via pip). I also checked the TensorFlow master branch, there is no such file in the third_party directory either.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I know that in your project description, you mentioned that one has to download the docker image provided in order to build the custom op successfully.
I first tried to use 'make' to build the op. It only works for the ZeroOut CPU op, on my own machine with tf 1.15. But what I really want to do is build a GPU op on my own machine with tf 1.15. Besides, I used your provided docker image, and tried to use 'make' to build the TimeTwo GPU op, it didn't work, complaining the unkonwn
-fPIC
flag issue:So, I removed the
-fPIC
flag. (Notice that there are two-fPIC
flags in the nvcc command, I have to remove both to make this error disappear.)So I copied the command, remove
-fPIC
flag, and run to get another error:It seems that in this
gpu_kernel_helper.h
file, it has to includecuda_fp16.h
header file. I can't find this file in the docker image, but found some in my host machine, so I copied one into the directorythird_party/gpus/cuda/include
(I don't have/gpus/cuda/include
directory underthird_party
actually).Now running the
nvcc
command again:Well, this file is also in the same directory in my host machine, in case some other headers are needed, so I copied all contents into the required directory on the docker image.
Now I got at least 100 errors( which is too long to paste them all here, so I pasted some of them here for reference):
The errors all seem to relate to the files we just copied them to. But I don't know what's wrong and how to solve it.
Then I turned to use bazel. Finally I successfully built the TimeTwo GPU op under this tensorflow/tensorflow:custom-op-gpu-ubuntu16 docker image. But I want to modify the bazel code in order to build a simple GPU custom op on Ubuntu 16.0 with tf==1.15. I am not quite familiar with bazel, and am trying to learn it. I don't understand why this is so complicated to build a simple GPU op. :(
The text was updated successfully, but these errors were encountered: