Skip to content

Commit

Permalink
Replace oneDAL with |short_name| to align with other .rst files
Browse files Browse the repository at this point in the history
  • Loading branch information
Vika-F committed Oct 23, 2024
1 parent af7669a commit 1ac2d3b
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 32 deletions.
38 changes: 19 additions & 19 deletions docs/source/contribution/cpu_features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@
CPU Features Dispatching
^^^^^^^^^^^^^^^^^^^^^^^^

For each algorithm oneDAL provides several code paths for x86-64-compatibe instruction
For each algorithm |short_name| provides several code paths for x86-64-compatibe instruction
set architectures.

Following architectures are currently supported:

- Intel\ |reg| Streaming SIMD Extensions 2 (Intel\ |reg| SSE2)
- Intel\ |reg| Streaming SIMD Extensions 4.2 (Intel\ |reg| SSE4.2)
- Intel\ |reg| Advanced Vector Extensions 2 (Intel\ |reg| AVX2)
- Intel\ |reg| Advanced Vector Extensions 512 (Intel\ |reg| AVX-512)
- Intel\ |reg|\ Streaming SIMD Extensions 2 (Intel\ |reg|\ SSE2)
- Intel\ |reg|\ Streaming SIMD Extensions 4.2 (Intel\ |reg|\ SSE4.2)
- Intel\ |reg|\ Advanced Vector Extensions 2 (Intel\ |reg|\ AVX2)
- Intel\ |reg|\ Advanced Vector Extensions 512 (Intel\ |reg|\ AVX-512)

The particular code path is chosen at runtime based on the underlying hardware characteristics.

Expand All @@ -36,9 +36,9 @@ This chapter describes how the code is organized to support this variety of inst
Algorithm Implementation Options
********************************

In addition to the instruction set architectures, an algorithm in oneDAL may have various
In addition to the instruction set architectures, an algorithm in |short_name| may have various
implementation options. Below is a description of these options to help you better understand
the oneDAL code structure and conventions.
the |short_name| code structure and conventions.

Computational Tasks
-------------------
Expand Down Expand Up @@ -66,14 +66,14 @@ methods for algorithm training and inference.
Computational Modes
-------------------

oneDAL can provide several computaional modes for an algorithm.
|short_name| can provide several computaional modes for an algorithm.
See `Computaional Modes <https://oneapi-src.github.io/oneDAL/onedal/programming-model/computational-modes.html>`_
chapter for details.

Folders and Files
*****************

Suppose that you are working on some algorithm ``Abc`` in oneDAL.
Suppose that you are working on some algorithm ``Abc`` in |short_name|.

The part of the implementation of this algorithms that is running on CPU should be located in
`cpp/daal/src/algorithms/abc` folder.
Expand Down Expand Up @@ -166,14 +166,14 @@ instruction set specific code. The implementation is located in the file `abc_cl
.. include:: ../includes/cpu_features/abc-classification-train-method1-impl.rst

Although the implementation of the ``method1`` does not contain any instruction set specific code, it is
expected that the developers leverage SIMD related macros available in oneDAL.
expected that the developers leverage SIMD related macros available in |short_name|.
For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and others pragmas defined in
`service_defines.h <https://github.com/oneapi-src/oneDAL/blob/main/cpp/daal/src/services/service_defines.h>`_.
This will guide the compiler to generate more efficient code for the target architecture.

Consider that the implementation of the ``method2`` for the same algorithm will be different and will contain
AVX-512-specific code located in ``cpuSpecificCode`` function. Note that all the compiler-specific code should
be placed under compiler-specific defines. For example, the Intel\ |reg| oneAPI DPC++/C++ Compiler specific code
be placed under compiler-specific defines. For example, the Intel\ |reg|\ oneAPI DPC++/C++ Compiler specific code
should be placed under ``DAAL_INTEL_CPP_COMPILER`` define. All the CPU-specific code should be placed under
CPU-specific defines. For example, the AVX-512 specific code should be placed under
``__CPUID__(DAAL_CPU) == __avx512__``.
Expand Down Expand Up @@ -205,14 +205,14 @@ The values for ``DAAL_FPTYPE`` macro replacement are ``float`` and ``double``, r

The values for ``cpu`` file name part replacement are:

- ``nrh`` for Intel\ |reg| SSE2 architecture, which stands for Northwood,
- ``neh`` for Intel\ |reg| SSE4.2 architecture, which stands for Nehalem,
- ``hsw`` for Intel\ |reg| AVX2 architecture, which stands for Haswell,
- ``skx`` for Intel\ |reg| AVX-512 architecture, which stands for Skylake-X.
- ``nrh`` for Intel\ |reg|\ SSE2 architecture, which stands for Northwood,
- ``neh`` for Intel\ |reg|\ SSE4.2 architecture, which stands for Nehalem,
- ``hsw`` for Intel\ |reg|\ AVX2 architecture, which stands for Haswell,
- ``skx`` for Intel\ |reg|\ AVX-512 architecture, which stands for Skylake-X.

The values for ``DAAL_CPU`` macro replacement are:

- ``__sse2__`` for Intel\ |reg| SSE2 architecture,
- ``__sse42__`` for Intel\ |reg| SSE4.2 architecture,
- ``__avx2__`` for Intel\ |reg| AVX2 architecture,
- ``__avx512__`` for Intel\ |reg| AVX-512 architecture.
- ``__sse2__`` for Intel\ |reg|\ SSE2 architecture,
- ``__sse42__`` for Intel\ |reg|\ SSE4.2 architecture,
- ``__avx2__`` for Intel\ |reg|\ AVX2 architecture,
- ``__avx512__`` for Intel\ |reg|\ AVX-512 architecture.
26 changes: 13 additions & 13 deletions docs/source/contribution/threading.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,20 +19,20 @@
Threading Layer
^^^^^^^^^^^^^^^

oneDAL uses Intel\ |reg|\ oneAPI Threading Building Blocks (Intel\ |reg|\ oneTBB) to do parallel
|short_name| uses Intel\ |reg|\ oneAPI Threading Building Blocks (Intel\ |reg|\ oneTBB) to do parallel
computations on CPU.

But oneTBB is not used in the code of oneDAL algorithms directly. The algorithms rather
But oneTBB is not used in the code of |short_name| algorithms directly. The algorithms rather
use custom primitives that either wrap oneTBB functionality or are in-house developed.
Those primitives form oneDAL's threading layer.
Those primitives form |short_name|'s threading layer.

This is done in order not to be dependent on possible oneTBB API changes and even
on the particular threading technology like oneTBB, C++11 standard threads, etc.

The API of the layer is defined in
`threading.h <https://github.com/oneapi-src/oneDAL/blob/main/cpp/daal/src/threading/threading.h>`_.
Please be aware that the threading API is not a part of oneDAL product API.
This is the product internal API that aimed to be used only by oneDAL developers, and can be changed at any time
Please be aware that the threading API is not a part of |short_name| product API.
This is the product internal API that aimed to be used only by |short_name| developers, and can be changed at any time
without any prior notification.

This chapter describes common parallel patterns and primitives of the threading layer.
Expand All @@ -46,7 +46,7 @@ Here is a variant of sequential implementation:

.. include:: ../includes/threading/sum-sequential.rst

There are several options available in the threading layer of oneDAL to let the iterations of this code
There are several options available in the threading layer of |short_name| to let the iterations of this code
run in parallel.
One of the options is to use ``daal::threader_for`` as shown here:

Expand All @@ -59,10 +59,10 @@ Blocking
--------

To have more control over the parallel execution and to increase
`cache locality <https://en.wikipedia.org/wiki/Locality_of_reference>`_ oneDAL usually splits
`cache locality <https://en.wikipedia.org/wiki/Locality_of_reference>`_ |short_name| usually splits
the data into blocks and then processes those blocks in parallel.

This code shows how a typical parallel loop in oneDAL looks like:
This code shows how a typical parallel loop in |short_name| looks like:

.. include:: ../includes/threading/sum-parallel-by-blocks.rst

Expand Down Expand Up @@ -92,7 +92,7 @@ Checking the status right after the initialization code won't show the allocatio
because oneTBB uses lazy evaluation and the lambda function passed to the constructor of the TLS
is evaluated on first use of the thread-local storage (TLS).

There are several options available in the threading layer of oneDAL to compute the partial
There are several options available in the threading layer of |short_name| to compute the partial
dot product results at each thread.
One of the options is to use the already mentioned ``daal::threader_for`` and blocking approach
as shown here:
Expand Down Expand Up @@ -126,7 +126,7 @@ is more performant to use predefined mapping of the loop's iterations to threads
This is what static work scheduling does.

``daal::static_threader_for`` and ``daal::static_tls`` allow implementation of static
work scheduling within oneDAL.
work scheduling within |short_name|.

Here is a variant of parallel dot product computation with static scheduling:

Expand All @@ -135,7 +135,7 @@ Here is a variant of parallel dot product computation with static scheduling:
Nested Parallelism
******************

oneDAL supports nested parallel loops.
|short_name| supports nested parallel loops.
It is important to know that:

"when a parallel construct calls another parallel construct, a thread can obtain a task
Expand All @@ -154,13 +154,13 @@ oneTBB provides ways to isolate execution of a parallel construct, for its tasks
to not interfere with other simultaneously running tasks.

Those options are preferred when the parallel loops are initially written as nested.
But in oneDAL there are cases when one parallel algorithm, the outer one,
But in |short_name| there are cases when one parallel algorithm, the outer one,
calls another parallel algorithm, the inner one, within a parallel region.

The inner algorithm in this case can also be called solely, without additional nesting.
And we do not always want to make it isolated.

For the cases like that, oneDAL provides ``daal::ls``. Its ``local()`` method always
For the cases like that, |short_name| provides ``daal::ls``. Its ``local()`` method always
returns the same value for the same thread, regardless of the nested execution:

.. include:: ../includes/threading/nested-parallel-ls.rst

0 comments on commit 1ac2d3b

Please sign in to comment.