Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Placeholder Accessor #89

Open
lukeiwanski opened this issue Jun 16, 2017 · 0 comments
Open

[SYCL] Placeholder Accessor #89

lukeiwanski opened this issue Jun 16, 2017 · 0 comments
Assignees

Comments

@lukeiwanski
Copy link
Owner

This issue aims to visualize / give a better example and use-case for Placeholder Accessor (https://github.com/codeplaysoftware/standards-proposals/blob/master/placeholder_accessors/index.md)

Currently TensorFlow extensively uses functors that contain Scalar pointer (https://github.com/lukeiwanski/tensorflow/blob/master/tensorflow/core/kernels/constant_op_gpu.cu.cc#L29)
Mentioned pattern is used in order to reduce the memory footprint on the devices with limited resources ( for more complex models with multiple data batches 4GB memory is not all that much)

Replacement of the pointers with accessor is the solution to the above issue, however the accessor has to be created inside a queue submit scope where there is a command group handler (https://www.khronos.org/registry/SYCL/specs/sycl-1.2.pdf 3.4.6 that states "Device accessors can only be constructed within command groups and provide access to the underlying data in a queue" - which is inflexible and exposed by TensorFlow use-case). The queue submit scope is located inside Eigen library - where the expression tree is reconstructed.
However, TensorFlow allows the creation of custom functor that contains the device pointer. The creation of such functor can be outside of the queue submit scope. Therefore the limitation of creating an accessor inside the queue submit scope prevents us from using SYCL functors. Hence, we cannot use any tensorflow graph nodes using such functors.

Since SYCL spec currently is lacking the above mentioned feature, a workaround had to be developed ( https://github.com/lukeiwanski/tensorflow/blob/master/tensorflow/core/kernels/constant_op.cc#L186)

Following is the example of a TensorFlow functor that aims to assign a Scalar value to each element of the tensor has to match the dimensionality of the output tensor (in order to perform component-wise assignment). The drawback of that is allocation of another
tensor that matches size of the output tensor (doubles the size of each tensor used in the graph)

That issue can be easily fixed with Placeholder Accessor spec change

All the kernels that follows below pattern:

template <typename Scalar>
struct foo {
  const Scalar* val;

  foo(const Scalar* v) : val(v) {}

  const Scalar operator()() const { return *val; }
};
// Partial specialisation for GPUDevice ( CUDA )
template <typename T>
struct bar<GPUDevice, T> {
  void operator()(const GPUDevice& d, typename TTypes<T>::Flat out,
                  typename TTypes<T>::ConstScalar in) {
    foo<T> f(in.data());
    out.device(d) = out.nullaryExpr(f);
  }
};

can be written for SYCL like this:

template <typename Scalar, typename Accessor>
struct foo {
  Accessor val;

  foo(Accessor v) : val(v) {}

  operator()() const { 
    // we need to cast the underlying uint8_t to actual Scalar
    auto ptr = ConvertToActualTypeSycl(T,val);
    auto val_res =(*ptr);
    return val_res;
  }
};
template <typename Scalar>
struct FillFunctor<SYCLDevice, Scalar> {
  void operator()(const SYCLDevice& d, typename TTypes<Scalar>::Flat out,
                  typename TTypes<Scalar>::ConstScalar in) {

    // get underlying buffer from virtual pointer
    auto buffer = d.get_sycl_buffer(in.data());
    
    // that can be wrapped into equivalent of void type lets say accessor
    typedef cl::sycl::accessor<uint8_t, 1, cl::sycl::access::mode::read_write, cl::sycl::access::target::global_buffer,
                               cl::sycl::codeplay::access::placeholder::true_t>
    Accessor;
    
    foo<Scalar, Accessor> f(Accessor(buffer));
    out.device(d) = out.nullaryExpr(f);
  }
};

That with minor modifications to Eigen can be simplified and generalized further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant