-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add chapter about CPU features dispatching into docs #2945
Changes from 11 commits
100c3fa
2a5eade
cdfd793
80f1cc9
7142175
c66f531
4e93be0
9b31d10
af7669a
1ac2d3b
8be8290
c741226
e47546b
8c05b23
6b3543c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -85,6 +85,12 @@ For your convenience we also added [coding guidelines](http://oneapi-src.github. | |||||
|
||||||
## Custom Components | ||||||
|
||||||
### CPU Features Dispatching | ||||||
|
||||||
oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture. | ||||||
When run on a specific hardware implementation like Haswell, Skylake-X, etc. , oneDAL chooses the code path which is most suitable for that implementation. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
Contributors should leverage [CPU Features Dispatching](http://oneapi-src.github.io/oneDAL/contribution/cpu_features.html) mechanism to implement the code of the algorithms that can perform well on various hardware implementations. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
### Threading Layer | ||||||
|
||||||
In the source code of the algorithms, oneDAL does not use threading primitives directly. All the threading primitives used within oneDAL form are called the [threading layer](http://oneapi-src.github.io/oneDAL/contribution/threading.html). Contributors should leverage the primitives from the layer to implement parallel algorithms. | ||||||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,289 @@ | ||||||||||||||||||
.. ****************************************************************************** | ||||||||||||||||||
.. * Copyright contributors to the oneDAL project | ||||||||||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
.. * | ||||||||||||||||||
.. * Licensed under the Apache License, Version 2.0 (the "License"); | ||||||||||||||||||
.. * you may not use this file except in compliance with the License. | ||||||||||||||||||
.. * You may obtain a copy of the License at | ||||||||||||||||||
.. * | ||||||||||||||||||
.. * http://www.apache.org/licenses/LICENSE-2.0 | ||||||||||||||||||
.. * | ||||||||||||||||||
.. * Unless required by applicable law or agreed to in writing, software | ||||||||||||||||||
.. * distributed under the License is distributed on an "AS IS" BASIS, | ||||||||||||||||||
.. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||||||||||||||
.. * See the License for the specific language governing permissions and | ||||||||||||||||||
.. * limitations under the License. | ||||||||||||||||||
.. *******************************************************************************/ | ||||||||||||||||||
|
||||||||||||||||||
.. |32e_make| replace:: 32e.mk | ||||||||||||||||||
.. _32e_make: https://github.com/oneapi-src/oneDAL/blob/main/dev/make/function_definitions/32e.mk | ||||||||||||||||||
.. |riscv_make| replace:: riscv64.mk | ||||||||||||||||||
.. _riscv_make: https://github.com/oneapi-src/oneDAL/blob/main/dev/make/function_definitions/riscv64.mk | ||||||||||||||||||
.. |arm_make| replace:: arm.mk | ||||||||||||||||||
.. _arm_make: https://github.com/oneapi-src/oneDAL/blob/main/dev/make/function_definitions/arm.mk | ||||||||||||||||||
|
||||||||||||||||||
.. highlight:: cpp | ||||||||||||||||||
|
||||||||||||||||||
CPU Features Dispatching | ||||||||||||||||||
^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||||||||||
|
||||||||||||||||||
For each algorithm |short_name| provides several code paths for x86-64-compatible architectural extensions. | ||||||||||||||||||
|
||||||||||||||||||
Following extensions are currently supported: | ||||||||||||||||||
|
||||||||||||||||||
- Intel\ |reg|\ Streaming SIMD Extensions 2 (Intel\ |reg|\ SSE2) | ||||||||||||||||||
- Intel\ |reg|\ Streaming SIMD Extensions 4.2 (Intel\ |reg|\ SSE4.2) | ||||||||||||||||||
- Intel\ |reg|\ Advanced Vector Extensions 2 (Intel\ |reg|\ AVX2) | ||||||||||||||||||
- Intel\ |reg|\ Advanced Vector Extensions 512 (Intel\ |reg|\ AVX-512) | ||||||||||||||||||
|
||||||||||||||||||
The particular code path is chosen at runtime based on underlying hardware properties. | ||||||||||||||||||
|
||||||||||||||||||
This chapter describes how the code is organized to support this variety of extensions. | ||||||||||||||||||
|
||||||||||||||||||
Algorithm Implementation Options | ||||||||||||||||||
******************************** | ||||||||||||||||||
|
||||||||||||||||||
In addition to the architectural extensions, an algorithm in |short_name| may have various | ||||||||||||||||||
implementation options. Below is a description of these options to help you better understand | ||||||||||||||||||
the |short_name| code structure and conventions. | ||||||||||||||||||
|
||||||||||||||||||
Computational Tasks | ||||||||||||||||||
------------------- | ||||||||||||||||||
|
||||||||||||||||||
An algorithm might have various tasks to compute. The most common options are: | ||||||||||||||||||
|
||||||||||||||||||
- `Classification <https://oneapi-src.github.io/oneDAL/onedal/glossary.html#term-Classification>`_, | ||||||||||||||||||
- `Regression <https://oneapi-src.github.io/oneDAL/onedal/glossary.html#term-Regression>`_. | ||||||||||||||||||
|
||||||||||||||||||
Computational Stages | ||||||||||||||||||
-------------------- | ||||||||||||||||||
|
||||||||||||||||||
An algorithm might have ``training`` and ``inference`` computation stages aimed | ||||||||||||||||||
at training a model on the input dataset and computing the inference results, respectively. | ||||||||||||||||||
|
||||||||||||||||||
Computational Methods | ||||||||||||||||||
--------------------- | ||||||||||||||||||
|
||||||||||||||||||
An algorithm can support several methods for the same type of computations. | ||||||||||||||||||
For example, kNN algorithm supports | ||||||||||||||||||
`brute_force <https://oneapi-src.github.io/oneDAL/onedal/algorithms/nearest-neighbors/knn.html#knn-t-math-brute-force>`_ | ||||||||||||||||||
and `kd_tree <https://oneapi-src.github.io/oneDAL/onedal/algorithms/nearest-neighbors/knn.html#knn-t-math-kd-tree>`_ | ||||||||||||||||||
methods for algorithm training and inference. | ||||||||||||||||||
|
||||||||||||||||||
Computational Modes | ||||||||||||||||||
------------------- | ||||||||||||||||||
|
||||||||||||||||||
|short_name| can provide several computational modes for an algorithm. | ||||||||||||||||||
See `Computational Modes <https://oneapi-src.github.io/oneDAL/onedal/programming-model/computational-modes.html>`_ | ||||||||||||||||||
chapter for details. | ||||||||||||||||||
|
||||||||||||||||||
Folders and Files | ||||||||||||||||||
***************** | ||||||||||||||||||
|
||||||||||||||||||
Suppose that you are working on some algorithm ``Abc`` in |short_name|. | ||||||||||||||||||
|
||||||||||||||||||
The part of the implementation of this algorithms that is running on CPU should be located in | ||||||||||||||||||
`cpp/daal/src/algorithms/abc` folder. | ||||||||||||||||||
|
||||||||||||||||||
Suppose that it provides: | ||||||||||||||||||
|
||||||||||||||||||
- ``classification`` and ``regression`` learning tasks; | ||||||||||||||||||
- ``training`` and ``inference`` stages; | ||||||||||||||||||
- ``method1`` and ``method2`` for the ``training`` stage and only ``method1`` for ``inference`` stage; | ||||||||||||||||||
- only ``batch`` computational mode. | ||||||||||||||||||
|
||||||||||||||||||
Then the `cpp/daal/src/algorithms/abc` folder should contain at least the following files: | ||||||||||||||||||
|
||||||||||||||||||
:: | ||||||||||||||||||
|
||||||||||||||||||
cpp/daal/src/algorithms/abc/ | ||||||||||||||||||
|-- abc_classification_predict_method1_batch_fpt_cpu.cpp | ||||||||||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
|-- abc_classification_predict_method1_impl.i | ||||||||||||||||||
|-- abc_classification_predict_kernel.h | ||||||||||||||||||
|-- abc_classification_train_method1_batch_fpt_cpu.cpp | ||||||||||||||||||
|-- abc_classification_train_method2_batch_fpt_cpu.cpp | ||||||||||||||||||
|-- abc_classification_train_method1_impl.i | ||||||||||||||||||
|-- abc_classification_train_method2_impl.i | ||||||||||||||||||
|-- abc_classification_train_kernel.h | ||||||||||||||||||
|-- abc_regression_predict_method1_batch_fpt_cpu.cpp | ||||||||||||||||||
|-- abc_regression_predict_method1_batch_fpt_cpu.cpp | ||||||||||||||||||
|-- abc_regression_predict_method1_impl.i | ||||||||||||||||||
|-- abc_regression_predict_kernel.h | ||||||||||||||||||
|-- abc_regression_train_method1_batch_fpt_cpu.cpp | ||||||||||||||||||
david-cortes-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
|-- abc_regression_train_method2_batch_fpt_cpu.cpp | ||||||||||||||||||
|-- abc_regression_train_method1_impl.i | ||||||||||||||||||
|-- abc_regression_train_method2_impl.i | ||||||||||||||||||
|-- abc_regression_train_kernel.h | ||||||||||||||||||
|
||||||||||||||||||
Alternative variant of the folder structure to avoid storing too many files within a single folder | ||||||||||||||||||
could be: | ||||||||||||||||||
|
||||||||||||||||||
:: | ||||||||||||||||||
|
||||||||||||||||||
cpp/daal/src/algorithms/abc/ | ||||||||||||||||||
|-- classification/ | ||||||||||||||||||
| |-- abc_classification_predict_method1_batch_fpt_cpu.cpp | ||||||||||||||||||
| |-- abc_classification_predict_method1_impl.i | ||||||||||||||||||
| |-- abc_classification_predict_kernel.h | ||||||||||||||||||
| |-- abc_classification_train_method1_batch_fpt_cpu.cpp | ||||||||||||||||||
| |-- abc_classification_train_method2_batch_fpt_cpu.cpp | ||||||||||||||||||
| |-- abc_classification_train_method1_impl.i | ||||||||||||||||||
| |-- abc_classification_train_method2_impl.i | ||||||||||||||||||
| |-- abc_classification_train_kernel.h | ||||||||||||||||||
|-- regression/ | ||||||||||||||||||
|-- abc_regression_predict_method1_batch_fpt_cpu.cpp | ||||||||||||||||||
|-- abc_regression_predict_method1_impl.i | ||||||||||||||||||
|-- abc_regression_predict_kernel.h | ||||||||||||||||||
|-- abc_regression_train_method1_batch_fpt_cpu.cpp | ||||||||||||||||||
|-- abc_regression_train_method2_batch_fpt_cpu.cpp | ||||||||||||||||||
|-- abc_regression_train_method1_impl.i | ||||||||||||||||||
|-- abc_regression_train_method2_impl.i | ||||||||||||||||||
|-- abc_regression_train_kernel.h | ||||||||||||||||||
|
||||||||||||||||||
The names of the files stay the same in this case, just the folder layout differs. | ||||||||||||||||||
|
||||||||||||||||||
Further the purpose and contents of each file are to be described on the example of classification | ||||||||||||||||||
training task. For other types of the tasks the structure of the code is similar. | ||||||||||||||||||
|
||||||||||||||||||
\*_kernel.h | ||||||||||||||||||
----------- | ||||||||||||||||||
|
||||||||||||||||||
In the directory structure of the ``Abc`` algorithm, there are files with a `_kernel.h` suffix. | ||||||||||||||||||
These files contain the definitions of one or several template classes that define member functions that | ||||||||||||||||||
do the actual computations. Here is a variant of the ``Abc`` training algorithm kernel definition in the file | ||||||||||||||||||
`abc_classification_train_kernel.h`: | ||||||||||||||||||
|
||||||||||||||||||
.. include:: ../includes/cpu_features/abc-classification-train-kernel.rst | ||||||||||||||||||
|
||||||||||||||||||
Typical template parameters are: | ||||||||||||||||||
|
||||||||||||||||||
- ``algorithmFPType`` Data type to use in intermediate computations for the algorithm, | ||||||||||||||||||
``float`` or ``double``. | ||||||||||||||||||
Comment on lines
+166
to
+167
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
- ``method`` Computational methods of the algorithm. ``method1`` or ``method2`` in the case of ``Abc``. | ||||||||||||||||||
- ``cpu`` Version of the cpu-specific implementation of the algorithm, ``daal::CpuType``. | ||||||||||||||||||
|
||||||||||||||||||
Implementations for different methods are usually defined using partial class templates specialization. | ||||||||||||||||||
|
||||||||||||||||||
\*_impl.i | ||||||||||||||||||
--------- | ||||||||||||||||||
|
||||||||||||||||||
In the directory structure of the ``Abc`` algorithm, there are files with a `_impl.i` suffix. | ||||||||||||||||||
These files contain the implementations of the computational functions defined in the files with a `_kernel.h` suffix. | ||||||||||||||||||
Here is a variant of ``method1`` implementation for ``Abc`` training algorithm that does not contain any | ||||||||||||||||||
instruction set specific code. The implementation is located in the file `abc_classification_train_method1_impl.i`: | ||||||||||||||||||
|
||||||||||||||||||
.. include:: ../includes/cpu_features/abc-classification-train-method1-impl.rst | ||||||||||||||||||
|
||||||||||||||||||
Although the implementation of the ``method1`` does not contain any instruction set specific code, it is | ||||||||||||||||||
expected that the developers leverage SIMD related macros available in |short_name|. | ||||||||||||||||||
For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and others pragmas defined in | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
`service_defines.h <https://github.com/oneapi-src/oneDAL/blob/main/cpp/daal/src/services/service_defines.h>`_. | ||||||||||||||||||
This will guide the compiler to generate more efficient code for the target architecture. | ||||||||||||||||||
|
||||||||||||||||||
Consider that the implementation of the ``method2`` for the same algorithm will be different and will contain | ||||||||||||||||||
AVX-512-specific code located in ``cpuSpecificCode`` function. Note that all the compiler-specific code should | ||||||||||||||||||
be placed under compiler-specific defines. For example, the Intel\ |reg|\ oneAPI DPC++/C++ Compiler specific code | ||||||||||||||||||
should be placed under ``DAAL_INTEL_CPP_COMPILER`` define. All the CPU-specific code should be placed under | ||||||||||||||||||
CPU-specific defines. For example, the AVX-512 specific code should be placed under | ||||||||||||||||||
``__CPUID__(DAAL_CPU) == __avx512__``. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
|
||||||||||||||||||
Then the implementation of the ``method2`` in the file `abc_classification_train_method2_impl.i` will look like: | ||||||||||||||||||
|
||||||||||||||||||
.. include:: ../includes/cpu_features/abc-classification-train-method2-impl.rst | ||||||||||||||||||
|
||||||||||||||||||
\*_fpt_cpu.cpp | ||||||||||||||||||
-------------- | ||||||||||||||||||
|
||||||||||||||||||
In the directory structure of the ``Abc`` algorithm, there are files with a `_fpt_cpu.cpp` suffix. | ||||||||||||||||||
These files contain the instantiations of the template classes defined in the files with a `_kernel.h` suffix. | ||||||||||||||||||
The instantiation of the ``Abc`` training algorithm kernel for ``method1`` is located in the file | ||||||||||||||||||
`abc_classification_train_method1_batch_fpt_cpu.cpp`: | ||||||||||||||||||
|
||||||||||||||||||
.. include:: ../includes/cpu_features/abc-classification-train-method1-fpt-cpu.rst | ||||||||||||||||||
|
||||||||||||||||||
`_fpt_cpu.cpp` files are not compiled directly into object files. First, multiple copies of those files | ||||||||||||||||||
are made replacing the ``fpt``, which stands for 'floating point type', and ``cpu`` parts of the file name | ||||||||||||||||||
as well as the corresponding ``DAAL_FPTYPE`` and ``DAAL_CPU`` macros with the actual data type and CPU type values. | ||||||||||||||||||
Then the resulting files are compiled with appropriate CPU-specific optimization compiler options. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
|
||||||||||||||||||
The values for ``fpt`` file name part replacement are: | ||||||||||||||||||
|
||||||||||||||||||
- ``flt`` for ``float`` data type, and | ||||||||||||||||||
- ``dbl`` for ``double`` data type. | ||||||||||||||||||
|
||||||||||||||||||
The values for ``DAAL_FPTYPE`` macro replacement are ``float`` and ``double``, respectively. | ||||||||||||||||||
|
||||||||||||||||||
The values for ``cpu`` file name part replacement are: | ||||||||||||||||||
|
||||||||||||||||||
- ``nrh`` for Intel\ |reg|\ SSE2 architecture, which stands for Northwood, | ||||||||||||||||||
- ``neh`` for Intel\ |reg|\ SSE4.2 architecture, which stands for Nehalem, | ||||||||||||||||||
- ``hsw`` for Intel\ |reg|\ AVX2 architecture, which stands for Haswell, | ||||||||||||||||||
- ``skx`` for Intel\ |reg|\ AVX-512 architecture, which stands for Skylake-X. | ||||||||||||||||||
|
||||||||||||||||||
The values for ``DAAL_CPU`` macro replacement are: | ||||||||||||||||||
|
||||||||||||||||||
- ``__sse2__`` for Intel\ |reg|\ SSE2 architecture, | ||||||||||||||||||
- ``__sse42__`` for Intel\ |reg|\ SSE4.2 architecture, | ||||||||||||||||||
- ``__avx2__`` for Intel\ |reg|\ AVX2 architecture, | ||||||||||||||||||
- ``__avx512__`` for Intel\ |reg|\ AVX-512 architecture. | ||||||||||||||||||
|
||||||||||||||||||
Build System Configuration | ||||||||||||||||||
************************** | ||||||||||||||||||
|
||||||||||||||||||
This chapter describes which parts of the build system need to be modified to add new architectural | ||||||||||||||||||
extensions to the build system or to remove an outdated one. | ||||||||||||||||||
|
||||||||||||||||||
Makefile | ||||||||||||||||||
-------- | ||||||||||||||||||
|
||||||||||||||||||
The most important definitions and functions for CPU features dispatching are located in the files | ||||||||||||||||||
|32e_make|_ for x86-64 architecture, |riscv_make|_ for RISC-V 64-bit architecture, and |arm_make|_ | ||||||||||||||||||
for ARM architecture. | ||||||||||||||||||
Those files are included into operating system related files. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
For example, the |32e_make| file is included into ``lnx32e.mk`` file: | ||||||||||||||||||
|
||||||||||||||||||
:: | ||||||||||||||||||
|
||||||||||||||||||
include dev/make/function_definitions/32e.mk | ||||||||||||||||||
|
||||||||||||||||||
And ``lnx32e.mk`` and similar files are included into the main Makefile: | ||||||||||||||||||
|
||||||||||||||||||
:: | ||||||||||||||||||
|
||||||||||||||||||
include dev/make/function_definitions/$(PLAT).mk | ||||||||||||||||||
|
||||||||||||||||||
Where ``$(PLAT)`` is the platform name, for example, ``lnx32e``, ``win32e``, ``lnxriscv64``, etc. | ||||||||||||||||||
|
||||||||||||||||||
To add a new architectural extension into |32e_make| file, ``CPUs`` and ``CPUs.files`` lists need to be updated. | ||||||||||||||||||
The functions like ``set_uarch_options_for_compiler`` and others should also be updated accordingly. | ||||||||||||||||||
|
||||||||||||||||||
The compiler options for the new architectural extension should be added to the respective file in | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
`compiler_definitions <https://github.com/oneapi-src/oneDAL/tree/main/dev/make/compiler_definitions>`_ folder. | ||||||||||||||||||
|
||||||||||||||||||
For example, `gnu.32e.mk <https://github.com/oneapi-src/oneDAL/blob/main/dev/make/compiler_definitions/gnu.32e.mk>`_ | ||||||||||||||||||
file contains the compiler options for the GNU compiler for x86-64 architecture in the form | ||||||||||||||||||
``option_name.compiler_name``: | ||||||||||||||||||
|
||||||||||||||||||
:: | ||||||||||||||||||
|
||||||||||||||||||
p4_OPT.gnu = $(-Q)march=nocona | ||||||||||||||||||
mc3_OPT.gnu = $(-Q)march=corei7 | ||||||||||||||||||
avx2_OPT.gnu = $(-Q)march=haswell | ||||||||||||||||||
skx_OPT.gnu = $(-Q)march=skylake | ||||||||||||||||||
|
||||||||||||||||||
Bazel | ||||||||||||||||||
----- | ||||||||||||||||||
|
||||||||||||||||||
For now, Bazel build is supported only for Linux x86-64 platform | ||||||||||||||||||
It provides ``cpu`` `option <https://github.com/oneapi-src/oneDAL/tree/main/dev/bazel#bazel-options>`_ | ||||||||||||||||||
that allows to specify the list of target architectural extensions. | ||||||||||||||||||
|
||||||||||||||||||
To add a new architectural extension into Bazel configuration, following steps should be done: | ||||||||||||||||||
|
||||||||||||||||||
- Add the new extension to the list of allowed values in the ``_ISA_EXTENSIONS`` variable in the | ||||||||||||||||||
`config.bzl <https://github.com/oneapi-src/oneDAL/blob/main/dev/bazel/config/config.bzl>`_ file; | ||||||||||||||||||
- Update the ``get_cpu_flags`` function in the | ||||||||||||||||||
`flags.bzl <https://github.com/oneapi-src/oneDAL/blob/main/dev/bazel/flags.bzl>`_ | ||||||||||||||||||
file to provide the compiler flags for the new extension; | ||||||||||||||||||
- Update the ``cpu_defines`` dictionaries in | ||||||||||||||||||
`dal.bzl <https://github.com/oneapi-src/oneDAL/blob/main/dev/bazel/dal.bzl>`_ and | ||||||||||||||||||
`daal.bzl <https://github.com/oneapi-src/oneDAL/blob/main/dev/bazel/daal.bzl>`_ files accordingly. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,20 +19,20 @@ | |
Threading Layer | ||
^^^^^^^^^^^^^^^ | ||
|
||
oneDAL uses Intel\ |reg|\ oneAPI Threading Building Blocks (Intel\ |reg|\ oneTBB) to do parallel | ||
|short_name| uses Intel\ |reg|\ oneAPI Threading Building Blocks (Intel\ |reg|\ oneTBB) to do parallel | ||
computations on CPU. | ||
|
||
But oneTBB is not used in the code of oneDAL algorithms directly. The algorithms rather | ||
But oneTBB is not used in the code of |short_name| algorithms directly. The algorithms rather | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't start a paragraph with a conjunction ('but'). Combine the paragraphs:
|
||
use custom primitives that either wrap oneTBB functionality or are in-house developed. | ||
Those primitives form oneDAL's threading layer. | ||
Those primitives form |short_name|'s threading layer. | ||
|
||
This is done in order not to be dependent on possible oneTBB API changes and even | ||
on the particular threading technology like oneTBB, C++11 standard threads, etc. | ||
|
||
The API of the layer is defined in | ||
`threading.h <https://github.com/oneapi-src/oneDAL/blob/main/cpp/daal/src/threading/threading.h>`_. | ||
Please be aware that the threading API is not a part of oneDAL product API. | ||
This is the product internal API that aimed to be used only by oneDAL developers, and can be changed at any time | ||
Please be aware that the threading API is not a part of |short_name| product API. | ||
This is the product internal API that aimed to be used only by |short_name| developers, and can be changed at any time | ||
without any prior notification. | ||
|
||
This chapter describes common parallel patterns and primitives of the threading layer. | ||
|
@@ -46,7 +46,7 @@ Here is a variant of sequential implementation: | |
|
||
.. include:: ../includes/threading/sum-sequential.rst | ||
|
||
There are several options available in the threading layer of oneDAL to let the iterations of this code | ||
There are several options available in the threading layer of |short_name| to let the iterations of this code | ||
run in parallel. | ||
One of the options is to use ``daal::threader_for`` as shown here: | ||
|
||
|
@@ -59,10 +59,10 @@ Blocking | |
-------- | ||
|
||
To have more control over the parallel execution and to increase | ||
`cache locality <https://en.wikipedia.org/wiki/Locality_of_reference>`_ oneDAL usually splits | ||
`cache locality <https://en.wikipedia.org/wiki/Locality_of_reference>`_ |short_name| usually splits | ||
the data into blocks and then processes those blocks in parallel. | ||
|
||
This code shows how a typical parallel loop in oneDAL looks like: | ||
This code shows how a typical parallel loop in |short_name| looks like: | ||
|
||
.. include:: ../includes/threading/sum-parallel-by-blocks.rst | ||
|
||
|
@@ -92,7 +92,7 @@ Checking the status right after the initialization code won't show the allocatio | |
because oneTBB uses lazy evaluation and the lambda function passed to the constructor of the TLS | ||
is evaluated on first use of the thread-local storage (TLS). | ||
|
||
There are several options available in the threading layer of oneDAL to compute the partial | ||
There are several options available in the threading layer of |short_name| to compute the partial | ||
dot product results at each thread. | ||
One of the options is to use the already mentioned ``daal::threader_for`` and blocking approach | ||
as shown here: | ||
|
@@ -126,7 +126,7 @@ is more performant to use predefined mapping of the loop's iterations to threads | |
This is what static work scheduling does. | ||
|
||
``daal::static_threader_for`` and ``daal::static_tls`` allow implementation of static | ||
work scheduling within oneDAL. | ||
work scheduling within |short_name|. | ||
|
||
Here is a variant of parallel dot product computation with static scheduling: | ||
|
||
|
@@ -135,7 +135,7 @@ Here is a variant of parallel dot product computation with static scheduling: | |
Nested Parallelism | ||
****************** | ||
|
||
oneDAL supports nested parallel loops. | ||
|short_name| supports nested parallel loops. | ||
It is important to know that: | ||
|
||
"when a parallel construct calls another parallel construct, a thread can obtain a task | ||
|
@@ -154,13 +154,13 @@ oneTBB provides ways to isolate execution of a parallel construct, for its tasks | |
to not interfere with other simultaneously running tasks. | ||
|
||
Those options are preferred when the parallel loops are initially written as nested. | ||
But in oneDAL there are cases when one parallel algorithm, the outer one, | ||
But in |short_name| there are cases when one parallel algorithm, the outer one, | ||
calls another parallel algorithm, the inner one, within a parallel region. | ||
|
||
The inner algorithm in this case can also be called solely, without additional nesting. | ||
And we do not always want to make it isolated. | ||
|
||
For the cases like that, oneDAL provides ``daal::ls``. Its ``local()`` method always | ||
For the cases like that, |short_name| provides ``daal::ls``. Its ``local()`` method always | ||
returns the same value for the same thread, regardless of the nested execution: | ||
|
||
.. include:: ../includes/threading/nested-parallel-ls.rst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.