ETI System and file structure

Reference

This wiki page is based on the following issue: https://github.com/kokkos/kokkos-kernels/issues/31.

KokkosKernels ETI Requirements

Pre-compile functions, and prevent them from being implicitly instantiated (ETI)
Even with ETI on, allow other input types (say for example extended precision, or nonstandard data layouts)
Call TPLs (MKL, CUBLAS etc.) for input types which allow it
Disallow anything other than ETI types if requested (eti-only)
Check what type of instantiation gets hit in apps (ETI, Non-ETI, TPL)

In order to do all this we came up with a design which has 3 functionality layers (details later):

User Interface: void foo(ViewType a, Scalar alpha): accepts views of all kinds and combinations; calls the specialization layer
Specialization Layer: struct Foo { static void foo(ViewInternalType a, Scalar alpha); }; makes sure that only the minimally necessary number of instantiations exists, serves as ETI specialization layer, serves as TPL specialization layer
Implementation Layer: This is called by the specialization layer, and has the actual functors etc. Now I want to go through a couple of design aspects in the next posts.

File Structure Overview

src/blas:
  KokkosBlas.hpp: includes all the KokkosBlas function header files
      KokkosBlas1_foo.hpp (contains user interface functions for foo)
src/blas/impl:
  KokkosBlas1_foo_impl.hpp: The actual implementation of the functions (Functors etc.)
  KokkosBlas1_foo_spec.hpp: The specialization layer
src/impl/tpl
  KokkosBlas1_foo_tpl_spec_avail.hpp: Availability of TPLs for particular types
  KokkosBlas1_foo_tpl_spec_decl.hpp: The Specialization declaration for using tuples
src/impl/generated_specializations_hpp
  KokkosBlas1_foo_eti_spec_avail.hpp: Availability declarations for ETI types. Generated during configure from a template .in file.
  KokkosBlas1_foo_eti_spec_decl.hpp: Specialization declarations for ETI types. Generated during configure from a template .in file.
src/impl/generated_specializations_cpp/foo
  KokkosBlas1_abs_eti_spec_inst.cpp.in: a template file that will be used to create a source file for each extern template instantiation enabled.

Updating an existing code

Add a new function:
- Add all those files based on the template provided
Modify the implementation of a function
- Only src/{blas,sparse,graph,...}/impl/KokkosBlas1_foo_impl.hpp needs to be modified
Add a new ETI type
- Modify src/CMakeLists.txt to generate the required header and source files from the ETI templates. For example:

KOKKOSKERNELS_GENERATE_ETI(Blas1_dot_mv dot
  HEADER_LIST HEADERS
  SOURCE_LIST SOURCES
  TYPE_LISTS FLOATS LAYOUTS DEVICES
)

The first argument is the full kernel name, which will result in files such as KokkosBlas1_dot_mv_eti_spec_avail.hpp and KokkosBlas1_dot_mv_eti_spec_decl.hpp. The second argument is the subfolder in src and impl where the template files exist. The KOKKOSKERNELS_GENERATE_ETI function takes the TYPE_LISTS keyword argument. These define the template arguments that will have ETI versions generated. DEVICES provides two template arguments: an execution space and a corresponding valid memory space. The dot function expects 4 ETI parameters. Other functions may require more. This should match the number of arguments to macros like KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_AVAIL (see below).

Add a new TPL variant
- Modify the files in impl/tpl/ to add the new TPL (declare its availability, and provide the implementation of how to call it)

File Structure Details

Public API in `src/blas/KokkosBlas1_foo.hpp`

This file provides the public API for the function foo. The function internally calls the specialization layer after explicitly filling in all the necessary template arguments for the ViewTypes etc. For example for a dot(a,b) product, const modifiers should be added to the scalar type, if they are not already there. Otherwise this would require to compile the code potentially 4 times:

dot(View<double*>, View<double*>);
dot(View<double*>, View<const double*>);
dot(View<const double*>, View<double*>);
dot(View<const double*>, View<const double*>); If you then factor in explicit vs implicit specification of Layout, Memory Space, and MemoryTraits we end up with over 100 possible instantiations for something which is technically the exact same thing!

Furthermore this function should also do static asserts on things which are not allowed (for example wrong Rank of the view) in order to give users an early exit in a function which they can directly associate with the code they written.

Here is an example for:

// Include the specialziation layer which define the Impl::Foo struct
#include<impl/KokkosBlas1_foo_spec.hpp>

namespace KokkosBlas1 {
// User facing function accepts any ViewType
template<class ViewType>
void foo(const ViewType& a) {

  // Static assert on prohibited types
  static_assert(ViewType::rank==1, "Trying to call foo with View of rank other than 1");

  // Convert ViewType to internal ViewType to reduce instantiations
  // Without this wether you explicitly specify a Layout or not would be 
  // two different instantiations since Views have variadic template parameters
  // Furthermore this is the place to add missing const etc.
  typedef Kokkos::View<typename ViewType::data_type,
                       typename ViewType::array_layout,
                       typename ViewType::device_type>
          ViewTypeInternal;

  // Call the actual implementation
  Impl::Foo<ViewTypeInternal>::foo(a);
}
}

The Specialization Layer

This layer is the one which not only serves as the focal point for the unified instantiation of the things the public layer requires, it is also the layer which allows for specialization for third party libraries (such as MKL and CUBLAS) and explicit template instantiation (ETI).

Generally this layer is very thin again and basically just passes through arguments.

The basic mechanism for ETI is the extern template mechanism of C++11. Unfortunately that thing has some funky semantics with respect to classes. In particular it looks like the compile can still choose to inline the implementation of the class, if it is visible in the same compilation unit instead of calling the externally available instantiation. This might also be compiler dependent.

To enable both TPL specialization and ETI specialization additional bool template parameters are added to the specialization layer which are defaulted to values based on whether said specializations are available:

From src/blas/impl/KokkosBlas1_foo_spec.hpp:

template<class ViewType>
struct foo_eti_spec_avail {
  enum : bool { value = false };
};

template<class ViewType, bool tpl_spec_avail = foo_tpl_spec_avail<ViewType>::value,
                         bool eti_spec_avail = foo_eti_spec_avail<ViewType>::value>
struct Foo {
  static void foo(const ViewType& a);
};

In order to declare a specialization available a full specialization of foo_tpl_spec_avail or foo_eti_spec_avail must be made available. Those functions live in src/impl/tpls/KokkosBlas1_foo_tpl_spec_avail.hpp and src/impl/generated_specializations_hpp/KokkosBlas1_foo_eti_spec_avail.hpp respectively with the latter auto generated. We come back to those files in a bit.

The next part in the specialization layer is the definition of the specialization layer for when no TPL is used. This calls the actual implementation provided in src/blas/impl/KokkosBlas1_foo_impl.hpp Note that the TPL bool is set to false, while the other one is set to KOKKOSKERNELS_IMPL_COMPILE_LIBRARY. The latter one is only going to be true while compiling the KokkosKernels library with its explicit template instantiations.

template<class ViewType>
struct Foo<ViewType,false,KOKKOSKERNELS_IMPL_COMPILE_LIBRARY> {
  static void foo(const ViewType& a) {
    execute_foo(a);
  }
};

In this file we also need to define the macros which are later used in the auto generated files:

// Availability Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_AVAIL( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
template<> \
struct foo_eti_spec_avail<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE> > > { \
  enum : bool { value = true }; \
}; 

// Declaration Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_DECL( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
extern template struct Foo<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE>>,false,true>;

// Instantiation Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_INST( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
template struct Foo<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE>>,false,true>;

// Include the actual declarations for tpls and eti
#if !KOKKOSKERNELS_IMPL_COMPILE_LIBRARY
#include<impl/tpls/foo_tpl_spec_decl.hpp>
#include<impl/generated_specializations_hpp/foo_eti_spec_decl.hpp>
#endif

Note how the actual declarations of those classes are only included when we are NOT compiling the library.

The Implementation Layer

The implementation layer in src/blas/impl/KokkosBlas1_foo_impl.hpp is pretty much whatever we need it to be. In this case its just a simple function:

  template<class ViewType>
  void execute_foo(const ViewType& a) {
    Kokkos::parallel_for("KokkosBlas1::foo",a.extent(0), KOKKOS_LAMBDA (const int& i) {
      a(i) = i;
    });
  }

If we want to distinguish between multi vector and normal vector where to put the stuff the implementation layer may be one of the places.

The TPL Layer

The TPL layer consists of two files: the one which declares the availability of a specialization and the one which provides the specialization. The first one is src/impl/tpls/KokkosBlas1_foo_tpl_spec_avail.hpp:

template<class ViewType>
struct foo_tpl_spec_avail {
  enum : bool { value = false };
};

#ifdef KOKKOSKERNELS_ENABLE_MKL
template<>
struct foo_tpl_spec_avail<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>> {
  enum : bool { value = true };
};
#endif

Basically for every new TPL which we want to support we drop another full specialization of this stuff in.

The implementation is the counter part to it. Note that we can use the implementation to decide based on input parameters whether to call our own code or the tpl code. We also need to have two full specializations here based on whether ETI for the same type combination would be available or not.

#ifdef KOKKOSKERNELS_ENABLE_MKL
#include<mkl_foo.hpp>
namespace KokkosBlas1 {
namespace Impl {

// Only a TPL specialization is available
template<>
struct Foo<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>,true,false> {
  typedef Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>> ViewType;

  static void foo(const ViewType& a) {
    #if (KOKKOSKERNELS_ENABLE_CHECK_SPECIALIZATION)
    printf("Calling MKL Specialization\n");
    #endif
    mkl_foo(a.data(),a.extent(0));
  }
};

// Both a TPL specialization and an ETI instantiation are available
template<>
struct Foo<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>,true,true> {
  typedef Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>> ViewType;

  static void foo(const ViewType& a) {
    // Our code is better for large number of entries, so only use TPL for small lengths
    if(a.extent(0) < 100000)
      Foo<ViewType,true,false>::foo(a);
    else
      Foo<ViewType,false,true>::foo(a);
  }
};
}
}
#endif

The Auto-Generated ETI files

Last but not least there are three auto generated files which are kind of like the TPL files: declare a ETI specialization available, provide the extern template declaration of those ETI specializations, and instantiate them in cpp files. Those simply use the previously defined macros with the right type combinations.

There is one more detail using two additional macros:

KOKKOSKERNELS_ENABLE_ETI_ONLY: is used to prevent instantiations of Non-ETI or Non-TPL types. This is used to hide the actual definition of the specialization layer when not compiling the library cpp files.
KOKKOSKERNELS_ENABLE_CHECK_SPECIALIZATION: this is more of a debug option which enables print statements stating which specialization (ETI, Non-ETI, TPL) was called. This is useful to make sure we don't instantiate stuff in cases where we can't turn on full ETI_ONLY.

Also one more word to KOKKOSKERNELS_IMPL_COMPILE_LIBRARY. This macro is always defined as false, except inside the auto generated ETI cpp files.

The Auto-Generation Scripts

TODO: Genereation scripts for blas, sparse and graph. Blas --> depends on only scalar_t sparse --> scalar_t, ordinal_t, offset_t graph --> ordinal_t, offset_t

Home:

Using KokkosKernels

Building
Using

API Reference

BLAS
- BLAS-1
- BLAS-2
- BLAS-3
- cuBLAS Support
- MAGMA Support
LAPACK
Batched BLAS and LAPACK
- Available Functions
SPARSE
- CONTAINERS CRSMatrix
- SPARSE-1
- SPARSE-2
- SPARSE-3
- Sparse solvers
- Additonal Kernels
GRAPH
- Graph Coloring (Distance-1)
- Graph Coloring (Distance-2)

Examples

GMRES Solver

Benchmarks

BLAS
SPARSE
- SpMV
- SpMV struct
- SpGEMM
GRAPH