Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a chapter about threading layer into the docs #2848

Merged
merged 20 commits into from
Sep 20, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Refer to our guidelines on [pull requests](#pull-requests) and [issues](#issues)

## Contacting maintainers
You may reach out to Intel project maintainers privately at onedal.maintainers@intel.com.
[Codeowners](https://github.com/oneapi-src/oneDAL/blob/main/.github/CODEOWNERS) configuration defines specific maintainers for corresponding code sections, however it's currently limited to Intel members. With further migration to UXL we will be changing this, but here are non-Intel contacts:
[Codeowners](https://github.com/oneapi-src/oneDAL/blob/main/.github/CODEOWNERS) configuration defines specific maintainers for corresponding code sections, however it's currently limited to Intel members. With further migration to UXL we will be changing this, but here are non-Intel contacts:

For ARM specifics you may contact: [@rakshithgb-fujitsu](https://github.com/rakshithgb-fujitsu/)

Expand Down Expand Up @@ -69,6 +69,12 @@ Refer to [ClangFormat documentation](https://clang.llvm.org/docs/ClangFormat.htm

For your convenience we also added [coding guidelines](http://oneapi-src.github.io/oneDAL/contribution/coding_guide.html) with examples and detailed descriptions of the coding style oneDAL follows. We encourage you to consult them when writing your code.

## Custom Components

### Threading Layer

In the source code of the algorithms, oneDAL does not use threading primitives directly. All the threading primitives used within oneDAL form are called the [threading layer](http://oneapi-src.github.io/oneDAL/contribution/threading.html). Contributors should leverage the primitives from the layer to implement parallel algorithms.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the source code of the algorithms, oneDAL does not use threading primitives directly. All the threading primitives used within oneDAL form are called the [threading layer](http://oneapi-src.github.io/oneDAL/contribution/threading.html). Contributors should leverage the primitives from the layer to implement parallel algorithms.
In the source code of the algorithms, oneDAL does not use threading primitives directly. All the threading primitives used within oneDAL form the [threading layer](http://oneapi-src.github.io/oneDAL/contribution/threading.html). Contributors should leverage the primitives from the threading layer to implement parallel algorithms.

Also, if you are defining the name, maybe Threading Layer should be a proper noun with capitalization? I think this might be a nit too far, but may as well leave a comment whilst I'm in the area


## Documentation Guidelines

oneDAL uses `Doxygen` for inline comments in public header files that are used to build the API reference and `reStructuredText` for the Developer Guide. See [oneDAL documentation](https://oneapi-src.github.io/oneDAL/) for reference.
Expand Down
6 changes: 3 additions & 3 deletions cpp/daal/src/threading/threading.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ DAAL_EXPORT size_t _setNumberOfThreads(const size_t numThreads, void ** globalCo
return 1;
}

DAAL_EXPORT void _daal_threader_for(int n, int threads_request, const void * a, daal::functype func)
DAAL_EXPORT void _daal_threader_for(int n, int reserved, const void * a, daal::functype func)
{
if (daal::threader_env()->getNumberOfThreads() > 1)
{
Expand Down Expand Up @@ -160,7 +160,7 @@ DAAL_EXPORT void _daal_threader_for_blocked_size(size_t n, size_t block, const v
}
}

DAAL_EXPORT void _daal_threader_for_simple(int n, int threads_request, const void * a, daal::functype func)
DAAL_EXPORT void _daal_threader_for_simple(int n, int reserved, const void * a, daal::functype func)
{
if (daal::threader_env()->getNumberOfThreads() > 1)
{
Expand Down Expand Up @@ -318,7 +318,7 @@ DAAL_PARALLEL_SORT_IMPL(daal::IdxValType<double>, pair_fp64_uint64)

#undef DAAL_PARALLEL_SORT_IMPL

DAAL_EXPORT void _daal_threader_for_blocked(int n, int threads_request, const void * a, daal::functype2 func)
DAAL_EXPORT void _daal_threader_for_blocked(int n, int reserved, const void * a, daal::functype2 func)
{
if (daal::threader_env()->getNumberOfThreads() > 1)
{
Expand Down
180 changes: 173 additions & 7 deletions cpp/daal/src/threading/threading.h
Original file line number Diff line number Diff line change
Expand Up @@ -223,14 +223,33 @@ inline void threader_func_break(int i, bool & needBreak, const void * a)
lambda(i, needBreak);
}

/// Execute the for loop defined by the input parameters in parallel.
/// The maximal number of iterations in the loop is 2^31 - 1.
Vika-F marked this conversation as resolved.
Show resolved Hide resolved
/// The work is scheduled dynamically across threads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment suggests that the work will be executed in parallel, but it may not be the case. It depends on the implementation behind the abstraction. This function passes the parameters to the threading layer, which may execute the work in parallel. And must the work be scheduled dynamically? Can the threading layer decide to just split n equally into however many threads there are (this is the static schedule in OpenMP parlance). Rewording this to something like:

Pass a function to be executed in a for loop to the threading layer. The work is scheduled over threads by the threading layer.

Also, it would be good to mention loop carried dependencies. Is the assumption here that the work in each loop iteration is independent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of this API is that it uses the default scheduling provided by the underlying implementation.
Of course, the execution might not be parallel for various reasons. For example, if only one thread is available.

It is expected that the iterations of the loop are logically independent. i.e. there are no recurrence among the iterations, but they might access the same data on read or write.
In the latter case, additional synchronization like the use of atomics, mutual execution or thread-local storage is required.

I will reword the description to make it more clear.

///
/// @tparam F Lambda function of type ``[/* captures */](int i) -> void``,
/// where ``i`` is the loop's iteration index, ``0 <= i < n``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking, but I've always used single backtick for monospaced in doxygen. Is the double backtick valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing this. We are using not only doxygen, but also restructured text + some python pre-processing.
Double backtick is valid but it results in italic text.
I'm going to replace it with single backtick to have monospace.

///
/// @param[in] n Number of iterations in the for loop.
/// @param[in] reserved Parameter reserved for the future. Currently unused.
/// @param[in] lambda Lambda function that defines iteration's body.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @param[in] lambda Lambda function that defines iteration's body.
/// @param[in] lambda Lambda function that defines the loop body.

template <typename F>
inline void threader_for(int n, int threads_request, const F & lambda)
inline void threader_for(int n, int reserved, const F & lambda)
{
const void * a = static_cast<const void *>(&lambda);

_daal_threader_for(n, threads_request, a, threader_func<F>);
_daal_threader_for(n, reserved, a, threader_func<F>);
}

/// Execute the for loop defined by the input parameters in parallel.
/// The maximal number of iterations in the loop is 2^63 - 1.
Vika-F marked this conversation as resolved.
Show resolved Hide resolved
/// The work is scheduled dynamically across threads.
///
/// @tparam F Lambda function of type [/* captures */](int64_t i) -> void,
/// where ``i`` is the loop's iteration index, ``0 <= i < n``.
///
/// @param[in] n Number of iterations in the for loop.
/// @param[in] lambda Lambda function that defines iteration's body.
template <typename F>
inline void threader_for_int64(int64_t n, const F & lambda)
{
Expand All @@ -239,12 +258,25 @@ inline void threader_for_int64(int64_t n, const F & lambda)
_daal_threader_for_int64(n, a, threader_func<F>);
}

/// Execute the for loop defined by the input parameters in parallel.
/// The maximal number of iterations in the loop is 2^31 - 1.
/// The work is scheduled dynamically across threads.
/// The iteration space is chunked using oneTBB ``simple_partitioner``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is TBB always going to be the threading layer? It is now, but can this be different in the future? If this function or interface is TBB specific, is this documented somewhere? Might also be worth considering changing the name of the functions, e.g. tbb_for_simple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is impossible to say whether TBB will always be a threading layer or not. For now we do not have plans to migrate to other threading technologies like OpenMP, C++ STD threads and so on, but who knows what happens in the future?

I think I need to reword this and remove TBB mentioning from the description.

The idea of this API is that the iterations are not grouped together; each iteration is considered as a separate task.
Because other APIs can group iterations in the way that a consequent block of iterations is executed by one thread. This API tries to avoid this if possible.

/// (https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Partitioner_Summary.html)
/// with chunk size 1.
///
/// @tparam F Lambda function of type [/* captures */](int i) -> void,
/// where ``i`` is the loop's iteration index, ``0 <= i < n``.
///
/// @param[in] n Number of iterations in the for loop.
/// @param[in] reserved Parameter reserved for the future. Currently unused.
/// @param[in] lambda Lambda function that defines iteration's body.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @param[in] lambda Lambda function that defines iteration's body.
/// @param[in] lambda Function that defines the loop body.

template <typename F>
inline void threader_for_simple(int n, int threads_request, const F & lambda)
inline void threader_for_simple(int n, int reserved, const F & lambda)
{
const void * a = static_cast<const void *>(&lambda);

_daal_threader_for_simple(n, threads_request, a, threader_func<F>);
_daal_threader_for_simple(n, reserved, a, threader_func<F>);
}

template <typename F>
Expand All @@ -255,6 +287,35 @@ inline void threader_for_int32ptr(const int * begin, const int * end, const F &
_daal_threader_for_int32ptr(begin, end, a, threader_func<F>);
}

/// Execute the for loop defined by the input parameters in parallel.
/// The maximal number of iterations in the loop is ``SIZE_MAX`` in C99 standard.
///
/// The work is scheduled statically across threads.
/// This means that the work is always scheduled in the same way across the threads:
/// each thread processes the same set of iterations on each invocation of this loop.
///
/// It is recommended to use this parallel loop if each iteration of the loop
/// performs equal amount of work.
///
/// Let ``t`` be the number of threads available to oneDAL.
///
/// Then the number of iterations processed by each threads (except maybe the last one)
/// is computed as:
/// ``nI = (n + t - 1) / t``
///
/// Here is how the work is split across the threads:
/// The 1st thread executes iterations ``0``, ..., ``nI - 1``;
/// the 2nd thread executes iterations ``nI``, ..., ``2 * nI - 1``;
/// ...
/// the t-th thread executes iterations ``(t - 1) * nI``, ..., ``n - 1``.
///
/// @tparam F Lambda function of type [/* captures */](size_t i, size_t tid) -> void,
/// where
/// ``i`` is the loop's iteration index, ``0 <= i < n``;
/// ``tid`` is the index of the thread, ``0 <= tid < t``.
///
/// @param[in] n Number of iterations in the for loop.
/// @param[in] lambda Lambda function that defines iteration's body.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @param[in] lambda Lambda function that defines iteration's body.
/// @param[in] lambda Function that defines the loop body.

template <typename F>
inline void static_threader_for(size_t n, const F & lambda)
{
Expand All @@ -263,12 +324,27 @@ inline void static_threader_for(size_t n, const F & lambda)
_daal_static_threader_for(n, a, static_threader_func<F>);
}

/// Execute the for loop defined by the input parameters in parallel.
/// The maximal number of iterations in the loop is 2^31 - 1.
Vika-F marked this conversation as resolved.
Show resolved Hide resolved
/// The work is scheduled dynamically across threads.
///
/// @tparam F Lambda function of type [/* captures */](int beginRange, int endRange) -> void
/// where
/// ``beginRange`` is the starting index of the loop's iterations block to be
/// processed by a thread, ``0 <= beginRange < n``;
/// ``endRange`` is the index after the end of the loop's iterations block to be
/// processed by a thread, ``beginRange < endRange <= n``;
///
/// @param[in] n Number of iterations in the for loop.
/// @param[in] reserved Parameter reserved for the future. Currently unused.
/// @param[in] lambda Lambda function that processes the block of loop's iterations
/// ``[beginRange, endRange)``.
template <typename F>
inline void threader_for_blocked(int n, int threads_request, const F & lambda)
inline void threader_for_blocked(int n, int reserved, const F & lambda)
{
const void * a = static_cast<const void *>(&lambda);

_daal_threader_for_blocked(n, threads_request, a, threader_func_b<F>);
_daal_threader_for_blocked(n, reserved, a, threader_func_b<F>);
}

template <typename F>
Expand Down Expand Up @@ -321,10 +397,20 @@ class tls_deleter_ : public tls_deleter
virtual void del(void * a) { delete static_cast<lambdaType *>(a); }
};

/// Thread-local storage (TLS).
/// Can change its local variable after a nested parallel constructs.
/// @note Use carefully in case of nested parallel regions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How common is it to have nested parallelism, and how much do we want users of oneDAL to use it? If at all possible, I recommend not using nested parallelism, as this gets confusing quickly, and requires a lot of care. I would suggest either putting more of a health warning on here, e.g.

@note Thread local storage in nested parallel regions is, in general, not thread local. Nested parallelism should not be used in most cases, and if it is, extra care must be taken with thread local values.

But maybe just leaving out the note is better, as this starts to open a discussion point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the confusion here is because those APIs are not intended to be used by oneDAL's users.
The APIs are just not available as part of oneDAL's release header files.
Those are only available to the developers.

Inside oneDAL we use nested parallelism in many cases. So it is important to warn the developers when using a certain constructs might lead to issues.

Will reformulate the note as well.

///
/// @tparam F Type of the data located in the storage
template <typename F>
class tls : public tlsBase
{
public:
/// Initialize thread-local storage
///
/// @tparam lambdaType Lambda function of type [/* captures */]() -> F
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs monospace, either with the single or double backticks. I've assumed the style so far

Suggested change
/// @tparam lambdaType Lambda function of type [/* captures */]() -> F
/// @tparam lambdaType Lambda function of type ``[/* captures */]() -> F``

///
/// @param lambda Lambda function that initializes a thread-local storage
template <typename lambdaType>
explicit tls(const lambdaType & lambda)
{
Expand All @@ -339,19 +425,35 @@ class tls : public tlsBase
tlsPtr = _daal_get_tls_ptr(a, tls_func<lambdaType>);
}

/// Destroys the memory associated with a thread-local storage
///
/// @note TLS does not release the memory allocated by a lambda-function
/// provided to the constructor.
/// Developers are responsible for deletion of that memory.
virtual ~tls()
{
d->del(voidLambda);
delete d;
_daal_del_tls_ptr(tlsPtr);
}

/// Access a local data of a thread by value
///
/// @return When first invoked by a thread, a lambda provided to the constructor is
/// called to initialize the local data of the thread and return it.
/// All the following invocations just return the same thread-local data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case, should the declaration of the pointer pf be static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so.
It is just an intermediate variable needed to cast from C-style API to C++.

F local()
{
void * pf = _daal_get_tls_local(tlsPtr);
return (static_cast<F>(pf));
}

/// Sequential reduction.
///
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void
/// @tparam lambdaType Lambda function of type ``[/* captures */](F) -> void``

///
/// @param lambda Lambda function that is applied to each element of thread-local
/// storage sequentially.
template <typename lambdaType>
void reduce(const lambdaType & lambda)
{
Expand All @@ -360,6 +462,12 @@ class tls : public tlsBase
_daal_reduce_tls(tlsPtr, a, tls_reduce_func<F, lambdaType>);
}

/// Parallel reduction.
///
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void
/// @tparam lambdaType Lambda function of type ``[/* captures */](F) -> void``

///
/// @param lambda Lambda function that is applied to each element of thread-local
/// storage in parallel.
template <typename lambdaType>
void parallel_reduce(const lambdaType & lambda)
{
Expand Down Expand Up @@ -396,10 +504,18 @@ class static_tls_deleter_ : public static_tls_deleter
virtual void del(void * a) { delete static_cast<lambdaType *>(a); }
};

/// Thread-local storage (TLS) for the case of static parallel work scheduling.
///
/// @tparam F Type of the data located in the storage
template <typename F>
class static_tls
{
public:
/// Initialize thread-local storage.
///
/// @tparam lambdaType Lambda function of type [/* captures */]() -> F
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @tparam lambdaType Lambda function of type [/* captures */]() -> F
/// @tparam lambdaType Lambda function of type ``[/* captures */]() -> F``

///
/// @param lambda Lambda function that initializes a thread-local storage
template <typename lambdaType>
explicit static_tls(const lambdaType & lambda)
{
Expand Down Expand Up @@ -431,6 +547,11 @@ class static_tls
_creater_func = creater_func<F, lambdaType>;
}

/// Destroys the memory associated with a thread-local storage.
///
/// @note Static TLS does not release the memory allocated by a lambda-function
/// provided to the constructor.
/// Developers are responsible for deletion of that memory.
virtual ~static_tls()
{
if (_deleter)
Expand All @@ -441,9 +562,16 @@ class static_tls
delete[] _storage;
}

/// Access a local data of a specified thread by value.
///
/// @param tid Index of the thread.
///
/// @return When first invoked by a thread, a lambda provided to the constructor is
/// called to initialize the local data of the thread and return it.
/// All the following invocations just return the same thread-local data.
F local(size_t tid)
{
if (_storage)
if (_storage && tid < _nThreads)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for this function to be called from within a nested parallel region, where the thread local storage has not yet been instantiated? If so, it could be that nested threads race to create the storage, and this function needs to be made thread safe.

Also, can this be used to get the thread local storage of another thread? Seems as though it might be possible from a quick glance.

If either of the above situations occur, there is a data race in here. I think the documentation should state that it is not safe to call this function from within a parallel region

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned previously, this PR is focused on documenting the present state of the things.

Yes, it is not a good idea to call this local(tid) method from a nested parallel region.
I tried to describe this issue in threading.rst ("Nested Parallelism" chapter) to make it more clear.

The purpose of adding this comparison is not in changing the logic of the code, but just to prevent out-of-range memory access. I can revert it to the mainline version. Because the intension was not to touch the code at all - only to document it.

{
if (!_storage[tid])
{
Expand All @@ -458,6 +586,12 @@ class static_tls
}
}

/// Sequential reduction.
///
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void
/// @tparam lambdaType Lambda function of type ``[/* captures */](F) -> void``

///
/// @param lambda Lambda function that is applied to each element of thread-local
/// storage sequentially.
template <typename lambdaType>
void reduce(const lambdaType & lambda)
{
Expand All @@ -470,6 +604,9 @@ class static_tls
}
}

/// Full number of threads.
///
/// @return Number of threads available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention nested parallelism in another comment. What does this function return in the nested case? The number of threads in the current group, or the total number of threads (including nested threads) that are launched?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function returns the total number of threads available to oneDAL regardless of nesting. Will try to clarify this.

size_t nthreads() const { return _nThreads; }

private:
Expand All @@ -480,10 +617,23 @@ class static_tls
static_tls_deleter * _deleter = nullptr;
};

/// Local storage (LS) for a data of a thread.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Local storage (LS) for a data of a thread.
/// Local storage (LS) for the data of a thread.

/// Does not change its local variable after nested parallel constructs,
/// but can have performance penalties comparing to daal::tls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// but can have performance penalties comparing to daal::tls.
/// but can have performance penalties compared to thread-local storage type ``daal::tls``.

/// Can be safely used in case of nested parallel regions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean to be safely used? Are all values in here read only?

///
/// @tparam F Type of the data located in the storage
template <typename F>
class ls : public tlsBase
{
public:
/// Initialize a local storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Initialize a local storage.
/// Initialize local storage.

///
/// @tparam lambdaType Lambda function of type [/* captures */]() -> F
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @tparam lambdaType Lambda function of type [/* captures */]() -> F
/// @tparam lambdaType Lambda function of type ``[/* captures */]() -> F``

///
/// @param lambda Lambda function that initializes a local storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some of the comments, you specify that the @param is [in], [out], or [in,out]. I really like those annotations. Can we have them for all the comments?

Suggested change
/// @param lambda Lambda function that initializes a local storage
/// @param lambda Lambda function that initializes local storage

/// @param isTls if true, then local storage is a thread-local storage (daal::tls)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @param isTls if true, then local storage is a thread-local storage (daal::tls)
/// @param isTls If ``true``, then local storage is a thread-local storage (``daal::tls``)

/// and might have problems in case of nested parallel regions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth putting a note somewhere in the documentation about nested parallelism, and what problems that can create for threads. Then from the comments in this file, you can link to the note. Just saying that there might be problems opens up questions, rather than resolving questions.

template <typename lambdaType>
explicit ls(const lambdaType & lambda, const bool isTls = false)
{
Expand All @@ -499,13 +649,23 @@ class ls : public tlsBase
lsPtr = _isTls ? _daal_get_tls_ptr(a, tls_func<lambdaType>) : _daal_get_ls_ptr(a, tls_func<lambdaType>);
}

/// Destroys the memory associated with a local storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Destroys the memory associated with a local storage.
/// Destroys the memory associated with local storage.

///
/// @note LS does not release the memory allocated by a lambda-function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @note LS does not release the memory allocated by a lambda-function
/// @note ``ls`` does not release the memory allocated by a lambda-function

/// provided to the constructor.
/// Developers are responsible for deletion of that memory.
virtual ~ls()
{
d->del(voidLambda);
delete d;
_isTls ? _daal_del_tls_ptr(lsPtr) : _daal_del_ls_ptr(lsPtr);
}

/// Access a local data of a thread by value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Access a local data of a thread by value.
/// Access the local data of a thread by value.

///
/// @return When first invoked by a thread, a lambda provided to the constructor is
/// called to initialize the local data of the thread and return it.
/// All the following invocations just return the same thread-local data.
F local()
{
void * pf = _isTls ? _daal_get_tls_local(lsPtr) : _daal_get_ls_local(lsPtr);
Expand All @@ -517,6 +677,12 @@ class ls : public tlsBase
if (!_isTls) _daal_release_ls_local(lsPtr, p);
}

/// Sequential reduction.
///
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void
/// @tparam lambdaType Lambda function of type ``[/* captures */](F) -> void``

///
/// @param lambda Lambda function that is applied to each element of thread-local
/// storage sequentially.
template <typename lambdaType>
void reduce(const lambdaType & lambda)
{
Expand Down
Loading
Loading