Apply review comments to threading.rst

oneapi-src · Aug 13, 2024 · 74be85a · 74be85a
1 parent 267e7b6
commit 74be85a
Showing 1 changed file with 15 additions and 15 deletions.
diff --git a/docs/source/contribution/threading.rst b/docs/source/contribution/threading.rst
@@ -27,12 +27,13 @@ use custom primitives that either wrap oneTBB functionality or are in-house deve
 Those primitives form oneDAL's threading layer.
 
 This is done in order not to be dependent on possible oneTBB API changes and even
-on the particular threading technology.
+on the particular threading technology like oneTBB, C++11 standard threads, etc.
 
 The API of the layer is defined in
 `threading.h <https://github.com/oneapi-src/oneDAL/blob/main/cpp/daal/src/threading/threading.h>`_.
-Please be aware that those APIs are not publicly defined, so they can be changed at any time
-without any notification.
+Please be aware that the threading API is not a part of oneDAL product API.
+This is the product internal API that aimed to be used only by oneDAL developers, and can be changed at any time
+without any prior notification.
 
 This chapter describes common parallel patterns and primitives of the threading layer.
 
@@ -52,8 +53,7 @@ One of the options is to use ``daal::threader_for`` as shown here:
 .. include:: ../includes/threading/sum-parallel.rst
 
 The iteration space here goes from ``0`` to ``n-1``.
-The last argument is the lambda function that defines a function object that proceeds ``i``-th
-iteration of the loop.
+The last argument is a function object that performs a single iteration of the loop, given loop index ``i``.
 
 Blocking
 --------
@@ -76,37 +76,37 @@ Here is a variant of sequential implementation:
 
 Parallel computations can be performed in two steps:
 
- 1. Compute partial dot product at each threaded.
+ 1. Compute partial dot product in each thread.
  2. Perform a reduction: Add the partial results from all threads to compute the final dot product.
 
 ``daal::tls`` provides a local storage where each thread can accumulate its local results.
-Following code allocates memory that would store partial dot products for each thread:
+The following code allocates memory that would store partial dot products for each thread:
 
 .. include:: ../includes/threading/dot-parallel-init-tls.rst
 
 ``SafeStatus`` in this code denotes a thread-safe counterpart of the ``Status`` class.
-``SafeStatus`` allows to collect errors from all threads and report them to user using
-``detach()`` method as it will be shown later in the code.
+``SafeStatus`` allows to collect errors from all threads and report them to the user using the
+``detach()`` method. An example will be shown later in the documentation.
 
 Checking the status right after the initialization code won't show the allocation errors,
 because oneTBB uses lazy evaluation and the lambda function passed to the constructor of the TLS
-is evaluated in the moment of the TLS's first use.
+is evaluated on first use of the thread-local storage (TLS).
 
-Again, there are several options available in the threading layer of oneDAL to compute the partial
+There are several options available in the threading layer of oneDAL to compute the partial
 dot product results at each thread.
 One of the options is to use the already mentioned ``daal::threader_for`` and blocking approach
 as shown here:
 
 .. include:: ../includes/threading/dot-parallel-partial-compute.rst
 
-To compute the final result it is requred to reduce TLS's partial results over all threads
+To compute the final result it is required to reduce each thread's partial results
 as shown here:
 
 .. include:: ../includes/threading/dot-parallel-reduction.rst
 
-Local memory of the threads should also be released when it is no longer needed.
+Local memory of the threads should be released when it is no longer needed.
 
-The complete parallel verision of dot product computations would look like:
+The complete parallel version of dot product computations would look like:
 
 .. include:: ../includes/threading/dot-parallel.rst
 
@@ -122,7 +122,7 @@ This strategy is beneficial when it is difficult to estimate the amount of work
 by each iteration.
 
 In the cases when it is known that the iterations perform an equal amount of work, it
-might be beneficial to use predefined mapping of the loop's iterations to threads.
+is more performant to use predefined mapping of the loop's iterations to threads.
 This is what static work scheduling does.
 
 ``daal::static_threader_for`` and ``daal::static_tls`` allow implementation of static