-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add chapter about CPU features dispatching into docs #2945
Changes from 4 commits
100c3fa
2a5eade
cdfd793
80f1cc9
7142175
c66f531
4e93be0
9b31d10
af7669a
1ac2d3b
8be8290
c741226
e47546b
8c05b23
6b3543c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,214 @@ | ||||||||||
.. ****************************************************************************** | ||||||||||
.. * Copyright contributors to the oneDAL project | ||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
.. * | ||||||||||
.. * Licensed under the Apache License, Version 2.0 (the "License"); | ||||||||||
.. * you may not use this file except in compliance with the License. | ||||||||||
.. * You may obtain a copy of the License at | ||||||||||
.. * | ||||||||||
.. * http://www.apache.org/licenses/LICENSE-2.0 | ||||||||||
.. * | ||||||||||
.. * Unless required by applicable law or agreed to in writing, software | ||||||||||
.. * distributed under the License is distributed on an "AS IS" BASIS, | ||||||||||
.. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||||||
.. * See the License for the specific language governing permissions and | ||||||||||
.. * limitations under the License. | ||||||||||
.. *******************************************************************************/ | ||||||||||
|
||||||||||
.. highlight:: cpp | ||||||||||
|
||||||||||
CPU Features Dispatching | ||||||||||
^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||
|
||||||||||
For each algorithm oneDAL provides several code paths for x86-64-compatibe instruction | ||||||||||
set architectures. | ||||||||||
|
||||||||||
Following architectures are currently supported: | ||||||||||
|
||||||||||
- Intel |reg| Streaming SIMD Extensions 2 (Intel |reg| SSE2) | ||||||||||
- Intel |reg| Streaming SIMD Extensions 4.2 (Intel |reg| SSE4.2) | ||||||||||
- Intel |reg| Advanced Vector Extensions 2 (Intel |reg| AVX2) | ||||||||||
- Intel |reg| Advanced Vector Extensions 512 (Intel |reg| AVX-512) | ||||||||||
|
||||||||||
The particular code path is chosen at runtime based on the underlying hardware characteristics. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
This chapter describes how the code is organized to support this variety of instruction sets. | ||||||||||
|
||||||||||
Algorithm Implementation Options | ||||||||||
******************************** | ||||||||||
|
||||||||||
In addition to the instruction set architectures, an algorithm in oneDAL may have various | ||||||||||
implementation options. Below is a description of these options to help you better understand | ||||||||||
the oneDAL code structure and conventions. | ||||||||||
|
||||||||||
Computational Tasks | ||||||||||
------------------- | ||||||||||
|
||||||||||
An algorithm might have various tasks to compute. The most common options are: | ||||||||||
|
||||||||||
- `Classification <https://oneapi-src.github.io/oneDAL/onedal/glossary.html#term-Classification>`_, | ||||||||||
- `Regression <https://oneapi-src.github.io/oneDAL/onedal/glossary.html#term-Regression>`_. | ||||||||||
|
||||||||||
Computational Stages | ||||||||||
-------------------- | ||||||||||
|
||||||||||
An algorithm might have ``training`` and ``inference`` computaion stages aimed | ||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
to train a model on the input dataset and compute the inference results respectively. | ||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
Computational Methods | ||||||||||
--------------------- | ||||||||||
|
||||||||||
An algorithm can support several methods for the same type of computations. | ||||||||||
For example, kNN algorithm supports | ||||||||||
`brute_force <https://oneapi-src.github.io/oneDAL/onedal/algorithms/nearest-neighbors/knn.html#knn-t-math-brute-force>`_ | ||||||||||
and `kd_tree <https://oneapi-src.github.io/oneDAL/onedal/algorithms/nearest-neighbors/knn.html#knn-t-math-kd-tree>`_ | ||||||||||
methods for algorithm training and inference. | ||||||||||
|
||||||||||
Computational Modes | ||||||||||
------------------- | ||||||||||
|
||||||||||
oneDAL can provide several computaional modes for an algorithm. | ||||||||||
See `Computaional Modes <https://oneapi-src.github.io/oneDAL/onedal/programming-model/computational-modes.html>`_ | ||||||||||
chapter for details. | ||||||||||
|
||||||||||
Folders and Files | ||||||||||
***************** | ||||||||||
|
||||||||||
Consider you are working on some algorithm ``Abc`` in oneDAL. | ||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
The part of the implementation of this algorithms that is running on CPU should be located in | ||||||||||
`cpp/daal/src/algorithms/abc` folder. | ||||||||||
|
||||||||||
Consider it provides: | ||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
- ``classification`` and ``regression`` learning tasks; | ||||||||||
- ``training`` and ``inference`` stages; | ||||||||||
- ``method1`` and ``method2`` for the ``training`` stage and only ``method1`` for ``inference`` stage; | ||||||||||
- only batch computational mode. | ||||||||||
|
||||||||||
Then the `cpp/daal/src/algorithms/abc` folder should contain at least the following files: | ||||||||||
|
||||||||||
:: | ||||||||||
|
||||||||||
cpp/daal/src/algorithms/abc/ | ||||||||||
|-- abc_classification_predict_method1_batch_fpt_cpu.cpp | ||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|-- abc_classification_predict_method1_impl.i | ||||||||||
|-- abc_classification_predict_kernel.h | ||||||||||
|-- abc_classification_train_method1_batch_fpt_cpu.cpp | ||||||||||
|-- abc_classification_train_method2_batch_fpt_cpu.cpp | ||||||||||
|-- abc_classification_train_method1_impl.i | ||||||||||
|-- abc_classification_train_method2_impl.i | ||||||||||
|-- abc_classification_train_kernel.h | ||||||||||
|-- abc_regression_predict_method1_batch_fpt_cpu.cpp | ||||||||||
|-- abc_regression_predict_method1_batch_fpt_cpu.cpp | ||||||||||
|-- abc_regression_predict_method1_impl.i | ||||||||||
|-- abc_regression_predict_kernel.h | ||||||||||
|-- abc_regression_train_method1_batch_fpt_cpu.cpp | ||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|-- abc_regression_train_method2_batch_fpt_cpu.cpp | ||||||||||
|-- abc_regression_train_method1_impl.i | ||||||||||
|-- abc_regression_train_method2_impl.i | ||||||||||
|-- abc_regression_train_kernel.h | ||||||||||
|
||||||||||
Alternative variant of the folder structure to avoid storing too much files within a single folder | ||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
can be: | ||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
:: | ||||||||||
|
||||||||||
cpp/daal/src/algorithms/abc/ | ||||||||||
|-- classification/ | ||||||||||
| |-- abc_classification_predict_method1_batch_fpt_cpu.cpp | ||||||||||
| |-- abc_classification_predict_method1_impl.i | ||||||||||
| |-- abc_classification_predict_kernel.h | ||||||||||
| |-- abc_classification_train_method1_batch_fpt_cpu.cpp | ||||||||||
| |-- abc_classification_train_method2_batch_fpt_cpu.cpp | ||||||||||
| |-- abc_classification_train_method1_impl.i | ||||||||||
| |-- abc_classification_train_method2_impl.i | ||||||||||
| |-- abc_classification_train_kernel.h | ||||||||||
|-- regression/ | ||||||||||
|-- abc_regression_predict_method1_batch_fpt_cpu.cpp | ||||||||||
|-- abc_regression_predict_method1_impl.i | ||||||||||
|-- abc_regression_predict_kernel.h | ||||||||||
|-- abc_regression_train_method1_batch_fpt_cpu.cpp | ||||||||||
|-- abc_regression_train_method2_batch_fpt_cpu.cpp | ||||||||||
|-- abc_regression_train_method1_impl.i | ||||||||||
|-- abc_regression_train_method2_impl.i | ||||||||||
|-- abc_regression_train_kernel.h | ||||||||||
|
||||||||||
|
||||||||||
The names of the files stay the same in this case, just the folder layout differs. | ||||||||||
|
||||||||||
Further the purpose and contents of each file are to be described on the example of classification | ||||||||||
training task. For other types of the tasks the structure of the code is similar. | ||||||||||
|
||||||||||
\*_kernel.h | ||||||||||
----------- | ||||||||||
|
||||||||||
Those files contain the definitions of one or several template classes that define member functions that | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Don't start the section with a pronoun. Put a full description of what you are describing. Maybe:
|
||||||||||
do the actual computations. Here is a variant of the ``Abc`` training algorithm kernel definition in the file | ||||||||||
`abc_classification_train_kernel.h`: | ||||||||||
|
||||||||||
.. include:: ../includes/cpu_features/abc-classification-train-kernel.rst | ||||||||||
|
||||||||||
Typical template parameters are: | ||||||||||
|
||||||||||
- ``algorithmFPType`` Data type to use in intermediate computations for the algorithm, | ||||||||||
``float`` or ``double``. | ||||||||||
Comment on lines
+166
to
+167
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
- ``method`` Computational methods of the algorithm. ``method1`` or ``method2`` in the case of ``Abc``. | ||||||||||
- ``cpu`` Version of the cpu-specific implementation of the algorithm, ``daal::CpuType``. | ||||||||||
|
||||||||||
Implementations for different methods are usually defined usind partial class templates specialization. | ||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
\*_impl.i | ||||||||||
--------- | ||||||||||
|
||||||||||
Those files contain the implementations of the computational functions defined in `*_kernel.h` files. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Don't start the section with a pronoun. See the similar comment at the start of the |
||||||||||
Here is a variant of ``method1`` imlementation for ``Abc`` training algorithm that does not contain any | ||||||||||
instruction set specific code. The implementation is located in the file `abc_classification_train_method1_impl.i`: | ||||||||||
|
||||||||||
.. include:: ../includes/cpu_features/abc-classification-train-method1-impl.rst | ||||||||||
|
||||||||||
Although the implementation of the ``method1`` does not contain any instruction set specific code, it is | ||||||||||
expected that the developers leverage SIMD related macros available in oneDAL. | ||||||||||
For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and others pragmas defined in | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
`service_defines.h <https://github.com/oneapi-src/oneDAL/blob/main/cpp/daal/src/services/service_defines.h>`_. | ||||||||||
This will guide the compiler to generate more efficient code for the target architecture. | ||||||||||
|
||||||||||
Consider that the implementation of the ``method2`` for the same algorithm will be different and will contain | ||||||||||
AVX-512-specific code located in ``cpuSpecificCode`` function. | ||||||||||
Then the implementation of the ``method2`` in the file `abc_classification_train_method2_impl.i` will look like: | ||||||||||
|
||||||||||
.. include:: ../includes/cpu_features/abc-classification-train-method2-impl.rst | ||||||||||
|
||||||||||
CPU-specific code needs to be placed under compiler-specific and CPU-specific defines because it usually | ||||||||||
contains intrinsics that cannot be compiled on other architectures. | ||||||||||
|
||||||||||
\*_fpt_cpu.cpp | ||||||||||
-------------- | ||||||||||
|
||||||||||
Those files contain the instantiations of the template classes defined in `*_kernel.h` files. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Don't start a section with a pronoun. See the similar comment at the start of the |
||||||||||
The instatiation of the ``Abc`` training algorithm kernel for ``method1`` is located in the file | ||||||||||
`abc_classification_train_method1_batch_fpt_cpu.cpp`: | ||||||||||
|
||||||||||
.. include:: ../includes/cpu_features/abc-classification-train-method1-fpt-cpu.rst | ||||||||||
|
||||||||||
`_fpt_cpu.cpp` files are not compiled directly into object files. First, multiple copies of those files | ||||||||||
are made raplacing the ``fpt`` and ``cpu`` parts of the file name as well as the corresponding ``DAAL_FPTYPE`` and | ||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
``DAAL_CPU`` macros with the actual data type and CPU type values. Then the resulting files are compiled | ||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
with appropriate CPU-specific optimization compiler options. | ||||||||||
|
||||||||||
The values for ``fpt`` file name part replacement are: | ||||||||||
- ``flt`` for ``float`` data type, and | ||||||||||
- ``dbl`` for ``double`` data type. | ||||||||||
|
||||||||||
The values for ``DAAL_FPTYPE`` macro replacement are ``float`` and ``double`` respectively. | ||||||||||
|
||||||||||
The values for ``cpu`` file name part replacement are: | ||||||||||
- ``nrh`` for Intel |reg| SSE2 architecture, which stands for Northwood, | ||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
- ``neh`` for Intel |reg| SSE4.2 architecture, which stands for Nehalem, | ||||||||||
- ``hsw`` for Intel |reg| AVX2 architecture, which stands for Haswell, | ||||||||||
- ``skx`` for Intel |reg| AVX-512 architecture, which stands for Skylake-X. | ||||||||||
|
||||||||||
The values for ``DAAL_CPU`` macro replacement are: | ||||||||||
- ``sse2`` for Intel |reg| SSE2 architecture, | ||||||||||
- ``sse42`` for Intel |reg| SSE4.2 architecture, | ||||||||||
- ``avx2`` for Intel |reg| AVX2 architecture, | ||||||||||
- ``avx512`` for Intel |reg| AVX-512 architecture. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
.. ****************************************************************************** | ||
.. * Copyright contributors to the oneDAL project | ||
.. * | ||
.. * Licensed under the Apache License, Version 2.0 (the "License"); | ||
.. * you may not use this file except in compliance with the License. | ||
.. * You may obtain a copy of the License at | ||
.. * | ||
.. * http://www.apache.org/licenses/LICENSE-2.0 | ||
.. * | ||
.. * Unless required by applicable law or agreed to in writing, software | ||
.. * distributed under the License is distributed on an "AS IS" BASIS, | ||
.. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
.. * See the License for the specific language governing permissions and | ||
.. * limitations under the License. | ||
.. *******************************************************************************/ | ||
|
||
:: | ||
|
||
#ifndef __ABC_CLASSIFICATION_TRAIN_KERNEL_H__ | ||
#define __ABC_CLASSIFICATION_TRAIN_KERNEL_H__ | ||
|
||
#include "src/algorithms/kernel.h" | ||
#include "data_management/data/numeric_table.h" // NumericTable class | ||
/* Other necessary includes go here */ | ||
|
||
using namespace daal::data_management; // NumericTable class | ||
|
||
namespace daal::algorithms::abc::training::internal | ||
{ | ||
/* Dummy base template class */ | ||
template <typename algorithmFPType, Method method, CpuType cpu> | ||
class AbcClassificationTrainingKernel : public Kernel | ||
{}; | ||
|
||
/* Computational kernel for 'method1' of the Abc training algoirthm */ | ||
template <typename algorithmFPType, CpuType cpu> | ||
class AbcClassificationTrainingKernel<algorithmFPType, method1, cpu> : public Kernel | ||
{ | ||
public: | ||
services::Status compute(/* Input and output arguments for the 'method1' */); | ||
}; | ||
|
||
/* Computational kernel for 'method2' of the Abc training algoirthm */ | ||
template <typename algorithmFPType, CpuType cpu> | ||
class AbcClassificationTrainingKernel<algorithmFPType, method2, cpu> : public Kernel | ||
{ | ||
public: | ||
services::Status compute(/* Input and output arguments for the 'method2' */); | ||
}; | ||
|
||
} // namespace daal::algorithms::abc::training::internal | ||
|
||
#endif // __ABC_CLASSIFICATION_TRAIN_KERNEL_H__ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
.. ****************************************************************************** | ||
.. * Copyright contributors to the oneDAL project | ||
.. * | ||
.. * Licensed under the Apache License, Version 2.0 (the "License"); | ||
.. * you may not use this file except in compliance with the License. | ||
.. * You may obtain a copy of the License at | ||
.. * | ||
.. * http://www.apache.org/licenses/LICENSE-2.0 | ||
.. * | ||
.. * Unless required by applicable law or agreed to in writing, software | ||
.. * distributed under the License is distributed on an "AS IS" BASIS, | ||
.. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
.. * See the License for the specific language governing permissions and | ||
.. * limitations under the License. | ||
.. *******************************************************************************/ | ||
|
||
:: | ||
|
||
/* | ||
//++ | ||
// instantiations of method1 of the Abc training algorithm. | ||
//-- | ||
*/ | ||
|
||
#include "src/algorithms/abc/abc_classification_train_kernel.h" | ||
#include "src/algorithms/abc/abc_classification_train_method1_impl.i" | ||
|
||
namespace daal::algorithms::abc::training::internal | ||
{ | ||
template class DAAL_EXPORT AbcClassificationTrainingKernel<DAAL_FPTYPE, method1, DAAL_CPU>; | ||
} // namespace daal::algorithms::abc::training::internal |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
.. ****************************************************************************** | ||
.. * Copyright contributors to the oneDAL project | ||
.. * | ||
.. * Licensed under the Apache License, Version 2.0 (the "License"); | ||
.. * you may not use this file except in compliance with the License. | ||
.. * You may obtain a copy of the License at | ||
.. * | ||
.. * http://www.apache.org/licenses/LICENSE-2.0 | ||
.. * | ||
.. * Unless required by applicable law or agreed to in writing, software | ||
.. * distributed under the License is distributed on an "AS IS" BASIS, | ||
.. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
.. * See the License for the specific language governing permissions and | ||
.. * limitations under the License. | ||
.. *******************************************************************************/ | ||
|
||
:: | ||
|
||
/* | ||
//++ | ||
// Implementation of Abc training algorithm. | ||
//-- | ||
*/ | ||
|
||
#include "src/algorithms/service_error_handling.h" | ||
#include "src/data_management/service_numeric_table.h" | ||
|
||
namespace daal::algorithms::abc::training::internal | ||
{ | ||
|
||
template <typename algorithmFPType, CpuType cpu> | ||
services::Status AbcClassificationTrainingKernel<algorithmFPType, method1, cpu>::compute(/* ... */) | ||
{ | ||
services::Status status; | ||
|
||
/* Implementation that does not contain instruction set specific code */ | ||
|
||
return status; | ||
} | ||
|
||
|
||
} // namespace daal::algorithms::abc::training::internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The term architecture is overloaded. Can we find more precise language here? Different ISA extensions (e.g. avx2, avx512) can be supported in the same binary, but it should be made clear that it's only variations on the same base ISA that are allowed. That is to cover adding documentation for Arm and RISC-V support in the future.
What do you think for the following phrasing?
I still don't think that is ideal, but I hope it illustrates the differentiation between ISA extension and ISA that I want to make clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good observation. Currently in the chapter I do not make the distinction between the ISA in broader meaning (like x86, RISC-V, ARM, ...) and ISA extensions.
I will update the docs in accordance with your suggestion. It is hard for me to come up with a better wording for ISA and ISA extensions as well.