oneapi-src · Vika-F · Oct 29, 2024 · Oct 14, 2024 · Oct 15, 2024 · Oct 18, 2024
@@ -85,6 +85,11 @@ For your convenience we also added [coding guidelines](http://oneapi-src.github.
 
 ## Custom Components
 
+### CPU Features Dispatching
+
+oneDAL provides multiarchitecture binaries that contain codes for multiple variants of CPU instruction set architectures. When run on a certain hardware type, oneDAL chooses the code path which is most suitable for this particular hardware to achieve better performance.
+Contributors should leverage [CPU Features Dispatching](http://oneapi-src.github.io/oneDAL/contribution/cpu_features.html) mechanism to implement the code of the algorithms that can perform well on various hardware types.
+
 ### Threading Layer
 
 In the source code of the algorithms, oneDAL does not use threading primitives directly. All the threading primitives used within oneDAL form are called the [threading layer](http://oneapi-src.github.io/oneDAL/contribution/threading.html). Contributors should leverage the primitives from the layer to implement parallel algorithms.

@@ -0,0 +1,214 @@
+.. ******************************************************************************
+.. * Copyright contributors to the oneDAL project
+.. *
+.. * Licensed under the Apache License, Version 2.0 (the "License");
+.. * you may not use this file except in compliance with the License.
+.. * You may obtain a copy of the License at
+.. *
+.. *     http://www.apache.org/licenses/LICENSE-2.0
+.. *
+.. * Unless required by applicable law or agreed to in writing, software
+.. * distributed under the License is distributed on an "AS IS" BASIS,
+.. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. * See the License for the specific language governing permissions and
+.. * limitations under the License.
+.. *******************************************************************************/
+
+.. highlight:: cpp
+
+CPU Features Dispatching
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+For each algorithm oneDAL provides several code paths for x86-64-compatibe instruction
+set architectures.
+
+Following architectures are currently supported:
+
+- Intel |reg| Streaming SIMD Extensions 2 (Intel |reg| SSE2)
+- Intel |reg| Streaming SIMD Extensions 4.2 (Intel |reg| SSE4.2)
+- Intel |reg| Advanced Vector Extensions 2 (Intel |reg| AVX2)
+- Intel |reg| Advanced Vector Extensions 512 (Intel |reg| AVX-512)
+
+The particular code path is chosen at runtime based on the underlying hardware characteristics.
-The particular code path is chosen at runtime based on the underlying hardware characteristics.
+The particular code path is chosen at runtime based on underlying hardware properties.
-The particular code path is chosen at runtime based on the underlying hardware characteristics.
+The particular code path is chosen at runtime based on underlying hardware properties.
+
+This chapter describes how the code is organized to support this variety of instruction sets.
+
+Algorithm Implementation Options
+********************************
+
+In addition to the instruction set architectures, an algorithm in oneDAL may have various
+implementation options. Below is a description of these options to help you better understand
+the oneDAL code structure and conventions.
+
+Computational Tasks
+-------------------
+
+An algorithm might have various tasks to compute. The most common options are:
+
+- `Classification <https://oneapi-src.github.io/oneDAL/onedal/glossary.html#term-Classification>`_,
+- `Regression <https://oneapi-src.github.io/oneDAL/onedal/glossary.html#term-Regression>`_.
+
+Computational Stages
+--------------------
+
+An algorithm might have ``training`` and ``inference`` computaion stages aimed
+to train a model on the input dataset and compute the inference results respectively.
+
+Computational Methods
+---------------------
+
+An algorithm can support several methods for the same type of computations.
+For example, kNN algorithm supports
+`brute_force <https://oneapi-src.github.io/oneDAL/onedal/algorithms/nearest-neighbors/knn.html#knn-t-math-brute-force>`_
+and `kd_tree <https://oneapi-src.github.io/oneDAL/onedal/algorithms/nearest-neighbors/knn.html#knn-t-math-kd-tree>`_
+methods for algorithm training and inference.
+
+Computational Modes
+-------------------
+
+oneDAL can provide several computaional modes for an algorithm.
+See `Computaional Modes <https://oneapi-src.github.io/oneDAL/onedal/programming-model/computational-modes.html>`_
+chapter for details.
+
+Folders and Files
+*****************
+
+Consider you are working on some algorithm ``Abc`` in oneDAL.
+
+The part of the implementation of this algorithms that is running on CPU should be located in
+`cpp/daal/src/algorithms/abc` folder.
+
+Consider it provides:
+
+- ``classification`` and ``regression`` learning tasks;
+- ``training`` and ``inference`` stages;
+- ``method1`` and ``method2`` for the ``training`` stage and only ``method1`` for ``inference`` stage;
+- only batch computational mode.
+
+Then the `cpp/daal/src/algorithms/abc` folder should contain at least the following files:
+
+::
+
+  cpp/daal/src/algorithms/abc/
+    |-- abc_classification_predict_method1_batch_fpt_cpu.cpp
+    |-- abc_classification_predict_method1_impl.i
+    |-- abc_classification_predict_kernel.h
+    |-- abc_classification_train_method1_batch_fpt_cpu.cpp
+    |-- abc_classification_train_method2_batch_fpt_cpu.cpp
+    |-- abc_classification_train_method1_impl.i
+    |-- abc_classification_train_method2_impl.i
+    |-- abc_classification_train_kernel.h
+    |-- abc_regression_predict_method1_batch_fpt_cpu.cpp
+    |-- abc_regression_predict_method1_batch_fpt_cpu.cpp
+    |-- abc_regression_predict_method1_impl.i
+    |-- abc_regression_predict_kernel.h
+    |-- abc_regression_train_method1_batch_fpt_cpu.cpp
+    |-- abc_regression_train_method2_batch_fpt_cpu.cpp
+    |-- abc_regression_train_method1_impl.i
+    |-- abc_regression_train_method2_impl.i
+    |-- abc_regression_train_kernel.h
+
+Alternative variant of the folder structure to avoid storing too much files within a single folder
+can be:
+
+::
+
+  cpp/daal/src/algorithms/abc/
+    |-- classification/
+    |     |-- abc_classification_predict_method1_batch_fpt_cpu.cpp
+    |     |-- abc_classification_predict_method1_impl.i
+    |     |-- abc_classification_predict_kernel.h
+    |     |-- abc_classification_train_method1_batch_fpt_cpu.cpp
+    |     |-- abc_classification_train_method2_batch_fpt_cpu.cpp
+    |     |-- abc_classification_train_method1_impl.i
+    |     |-- abc_classification_train_method2_impl.i
+    |     |-- abc_classification_train_kernel.h
+    |-- regression/
+          |-- abc_regression_predict_method1_batch_fpt_cpu.cpp
+          |-- abc_regression_predict_method1_impl.i
+          |-- abc_regression_predict_kernel.h
+          |-- abc_regression_train_method1_batch_fpt_cpu.cpp
+          |-- abc_regression_train_method2_batch_fpt_cpu.cpp
+          |-- abc_regression_train_method1_impl.i
+          |-- abc_regression_train_method2_impl.i
+          |-- abc_regression_train_kernel.h
+
+
+The names of the files stay the same in this case, just the folder layout differs.
+
+Further the purpose and contents of each file are to be described on the example of classification
+training task. For other types of the tasks the structure of the code is similar.
+
+\*_kernel.h
+-----------
+
+Those files contain the definitions of one or several template classes that define member functions that
+do the actual computations. Here is a variant of the ``Abc`` training algorithm kernel definition in the file
+`abc_classification_train_kernel.h`:
+
+.. include:: ../includes/cpu_features/abc-classification-train-kernel.rst
+
+Typical template parameters are:
+
+- ``algorithmFPType``  Data type to use in intermediate computations for the algorithm,
+                       ``float`` or ``double``.
- ``algorithmFPType``  Data type to use in intermediate computations for the algorithm,
-                       ``float`` or ``double``.
+- ``algorithmFPType``  Data type to use in intermediate computations for the algorithm.
+                       Must be one of ``float`` or ``double``.
- ``algorithmFPType``  Data type to use in intermediate computations for the algorithm,
-                       ``float`` or ``double``.
+- ``algorithmFPType``  Data type to use in intermediate computations for the algorithm.
+                       Must be one of ``float`` or ``double``.
+- ``method`` Computational methods of the algorithm. ``method1`` or ``method2`` in the case of ``Abc``.
+- ``cpu`` Version of the cpu-specific implementation of the algorithm, ``daal::CpuType``.
+
+Implementations for different methods are usually defined usind partial class templates specialization.
+
+\*_impl.i
+---------
+
+Those files contain the implementations of the computational functions defined in `*_kernel.h` files.
+Here is a variant of ``method1`` imlementation for ``Abc`` training algorithm that does not contain any
+instruction set specific code. The implementation is located in the file `abc_classification_train_method1_impl.i`:
+
+.. include:: ../includes/cpu_features/abc-classification-train-method1-impl.rst
+
+Although the implementation of the ``method1`` does not contain any instruction set specific code, it is
+expected that the developers leverage SIMD related macros available in oneDAL.
+For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and others pragmas defined in
-For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and others pragmas defined in
+For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and other pragmas defined in
-For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and others pragmas defined in
+For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and other pragmas defined in
+`service_defines.h <https://github.com/oneapi-src/oneDAL/blob/main/cpp/daal/src/services/service_defines.h>`_.
+This will guide the compiler to generate more efficient code for the target architecture.
+
+Consider that the implementation of the ``method2`` for the same algorithm will be different and will contain
+AVX-512-specific code located in ``cpuSpecificCode`` function.
+Then the implementation of the ``method2`` in the file `abc_classification_train_method2_impl.i` will look like:
+
+.. include:: ../includes/cpu_features/abc-classification-train-method2-impl.rst
+
+CPU-specific code needs to be placed under compiler-specific and CPU-specific defines because it usually
+contains intrinsics that cannot be compiled on other architectures.
+
+\*_fpt_cpu.cpp
+--------------
+
+Those files contain the instantiations of the template classes defined in `*_kernel.h` files.
+The instatiation of the ``Abc`` training algorithm kernel for ``method1`` is located in the file
+`abc_classification_train_method1_batch_fpt_cpu.cpp`:
+
+.. include:: ../includes/cpu_features/abc-classification-train-method1-fpt-cpu.rst
+
+`_fpt_cpu.cpp` files are not compiled directly into object files. First, multiple copies of those files
+are made raplacing the ``fpt`` and ``cpu`` parts of the file name as well as the corresponding ``DAAL_FPTYPE`` and
+``DAAL_CPU`` macros with the actual data type and CPU type values. Then the resulting files are compiled
+with appropriate CPU-specific optimization compiler options.
+
+The values for ``fpt`` file name part replacement are:
+- ``flt`` for ``float`` data type, and
+- ``dbl`` for ``double`` data type.
+
+The values for ``DAAL_FPTYPE`` macro replacement are ``float`` and ``double`` respectively.
+
+The values for ``cpu`` file name part replacement are:
+- ``nrh`` for Intel |reg| SSE2 architecture, which stands for Northwood,
+- ``neh`` for Intel |reg| SSE4.2 architecture, which stands for Nehalem,
+- ``hsw`` for Intel |reg| AVX2 architecture, which stands for Haswell,
+- ``skx`` for Intel |reg| AVX-512 architecture, which stands for Skylake-X.
+
+The values for ``DAAL_CPU`` macro replacement are:
+- ``sse2`` for Intel |reg| SSE2 architecture,
+- ``sse42`` for Intel |reg| SSE4.2 architecture,
+- ``avx2`` for Intel |reg| AVX2 architecture,
+- ``avx512`` for Intel |reg| AVX-512 architecture.
@@ -0,0 +1,53 @@
+.. ******************************************************************************
+.. * Copyright contributors to the oneDAL project
+.. *
+.. * Licensed under the Apache License, Version 2.0 (the "License");
+.. * you may not use this file except in compliance with the License.
+.. * You may obtain a copy of the License at
+.. *
+.. *     http://www.apache.org/licenses/LICENSE-2.0
+.. *
+.. * Unless required by applicable law or agreed to in writing, software
+.. * distributed under the License is distributed on an "AS IS" BASIS,
+.. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. * See the License for the specific language governing permissions and
+.. * limitations under the License.
+.. *******************************************************************************/
+
+::
+
+   #ifndef __ABC_CLASSIFICATION_TRAIN_KERNEL_H__
+   #define __ABC_CLASSIFICATION_TRAIN_KERNEL_H__
+
+   #include "src/algorithms/kernel.h"
+   #include "data_management/data/numeric_table.h"    // NumericTable class
+   /* Other necessary includes go here */
+
+   using namespace daal::data_management;    // NumericTable class
+
+   namespace daal::algorithms::abc::training::internal
+   {
+   /* Dummy base template class */
+   template <typename algorithmFPType, Method method, CpuType cpu>
+   class AbcClassificationTrainingKernel : public Kernel
+   {};
+
+   /* Computational kernel for 'method1' of the Abc training algoirthm */
+   template <typename algorithmFPType, CpuType cpu>
+   class AbcClassificationTrainingKernel<algorithmFPType, method1, cpu> : public Kernel
+   {
+   public:
+      services::Status compute(/* Input and output arguments for the 'method1' */);
+   };
+
+   /* Computational kernel for 'method2' of the Abc training algoirthm */
+   template <typename algorithmFPType, CpuType cpu>
+   class AbcClassificationTrainingKernel<algorithmFPType, method2, cpu> : public Kernel
+   {
+   public:
+      services::Status compute(/* Input and output arguments for the 'method2' */);
+   };
+
+   } // namespace daal::algorithms::abc::training::internal
+
+   #endif // __ABC_CLASSIFICATION_TRAIN_KERNEL_H__
@@ -0,0 +1,31 @@
+.. ******************************************************************************
+.. * Copyright contributors to the oneDAL project
+.. *
+.. * Licensed under the Apache License, Version 2.0 (the "License");
+.. * you may not use this file except in compliance with the License.
+.. * You may obtain a copy of the License at
+.. *
+.. *     http://www.apache.org/licenses/LICENSE-2.0
+.. *
+.. * Unless required by applicable law or agreed to in writing, software
+.. * distributed under the License is distributed on an "AS IS" BASIS,
+.. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. * See the License for the specific language governing permissions and
+.. * limitations under the License.
+.. *******************************************************************************/
+
+::
+
+   /*
+   //++
+   //  instantiations of method1 of the Abc training algorithm.
+   //--
+   */
+
+   #include "src/algorithms/abc/abc_classification_train_kernel.h"
+   #include "src/algorithms/abc/abc_classification_train_method1_impl.i"
+
+   namespace daal::algorithms::abc::training::internal
+   {
+   template class DAAL_EXPORT AbcClassificationTrainingKernel<DAAL_FPTYPE, method1, DAAL_CPU>;
+   } // namespace daal::algorithms::abc::training::internal
@@ -0,0 +1,42 @@
+.. ******************************************************************************
+.. * Copyright contributors to the oneDAL project
+.. *
+.. * Licensed under the Apache License, Version 2.0 (the "License");
+.. * you may not use this file except in compliance with the License.
+.. * You may obtain a copy of the License at
+.. *
+.. *     http://www.apache.org/licenses/LICENSE-2.0
+.. *
+.. * Unless required by applicable law or agreed to in writing, software
+.. * distributed under the License is distributed on an "AS IS" BASIS,
+.. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. * See the License for the specific language governing permissions and
+.. * limitations under the License.
+.. *******************************************************************************/
+
+::
+
+   /*
+   //++
+   //  Implementation of Abc training algorithm.
+   //--
+   */
+
+   #include "src/algorithms/service_error_handling.h"
+   #include "src/data_management/service_numeric_table.h"
+
+   namespace daal::algorithms::abc::training::internal
+   {
+
+   template <typename algorithmFPType, CpuType cpu>
+   services::Status AbcClassificationTrainingKernel<algorithmFPType, method1, cpu>::compute(/* ... */)
+   {
+       services::Status status;
+
+       /* Implementation that does not contain instruction set specific code */
+
+       return status;
+   }
+
+
+   } // namespace daal::algorithms::abc::training::internal