Skip to content

Commit

Permalink
Apply review comments to threading.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
Vika-F committed Aug 13, 2024
1 parent 267e7b6 commit 74be85a
Showing 1 changed file with 15 additions and 15 deletions.
30 changes: 15 additions & 15 deletions docs/source/contribution/threading.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,13 @@ use custom primitives that either wrap oneTBB functionality or are in-house deve
Those primitives form oneDAL's threading layer.

This is done in order not to be dependent on possible oneTBB API changes and even
on the particular threading technology.
on the particular threading technology like oneTBB, C++11 standard threads, etc.

The API of the layer is defined in
`threading.h <https://github.com/oneapi-src/oneDAL/blob/main/cpp/daal/src/threading/threading.h>`_.
Please be aware that those APIs are not publicly defined, so they can be changed at any time
without any notification.
Please be aware that the threading API is not a part of oneDAL product API.
This is the product internal API that aimed to be used only by oneDAL developers, and can be changed at any time
without any prior notification.

This chapter describes common parallel patterns and primitives of the threading layer.

Expand All @@ -52,8 +53,7 @@ One of the options is to use ``daal::threader_for`` as shown here:
.. include:: ../includes/threading/sum-parallel.rst

The iteration space here goes from ``0`` to ``n-1``.
The last argument is the lambda function that defines a function object that proceeds ``i``-th
iteration of the loop.
The last argument is a function object that performs a single iteration of the loop, given loop index ``i``.

Blocking
--------
Expand All @@ -76,37 +76,37 @@ Here is a variant of sequential implementation:

Parallel computations can be performed in two steps:

1. Compute partial dot product at each threaded.
1. Compute partial dot product in each thread.
2. Perform a reduction: Add the partial results from all threads to compute the final dot product.

``daal::tls`` provides a local storage where each thread can accumulate its local results.
Following code allocates memory that would store partial dot products for each thread:
The following code allocates memory that would store partial dot products for each thread:

.. include:: ../includes/threading/dot-parallel-init-tls.rst

``SafeStatus`` in this code denotes a thread-safe counterpart of the ``Status`` class.
``SafeStatus`` allows to collect errors from all threads and report them to user using
``detach()`` method as it will be shown later in the code.
``SafeStatus`` allows to collect errors from all threads and report them to the user using the
``detach()`` method. An example will be shown later in the documentation.

Checking the status right after the initialization code won't show the allocation errors,
because oneTBB uses lazy evaluation and the lambda function passed to the constructor of the TLS
is evaluated in the moment of the TLS's first use.
is evaluated on first use of the thread-local storage (TLS).

Again, there are several options available in the threading layer of oneDAL to compute the partial
There are several options available in the threading layer of oneDAL to compute the partial
dot product results at each thread.
One of the options is to use the already mentioned ``daal::threader_for`` and blocking approach
as shown here:

.. include:: ../includes/threading/dot-parallel-partial-compute.rst

To compute the final result it is requred to reduce TLS's partial results over all threads
To compute the final result it is required to reduce each thread's partial results
as shown here:

.. include:: ../includes/threading/dot-parallel-reduction.rst

Local memory of the threads should also be released when it is no longer needed.
Local memory of the threads should be released when it is no longer needed.

The complete parallel verision of dot product computations would look like:
The complete parallel version of dot product computations would look like:

.. include:: ../includes/threading/dot-parallel.rst

Expand All @@ -122,7 +122,7 @@ This strategy is beneficial when it is difficult to estimate the amount of work
by each iteration.

In the cases when it is known that the iterations perform an equal amount of work, it
might be beneficial to use predefined mapping of the loop's iterations to threads.
is more performant to use predefined mapping of the loop's iterations to threads.
This is what static work scheduling does.

``daal::static_threader_for`` and ``daal::static_tls`` allow implementation of static
Expand Down

0 comments on commit 74be85a

Please sign in to comment.