Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a chapter about threading layer into the docs #2848

Merged
merged 20 commits into from
Sep 20, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Refer to our guidelines on [pull requests](#pull-requests) and [issues](#issues)

## Contacting maintainers
You may reach out to Intel project maintainers privately at onedal.maintainers@intel.com.
[Codeowners](https://github.com/oneapi-src/oneDAL/blob/main/.github/CODEOWNERS) configuration defines specific maintainers for corresponding code sections, however it's currently limited to Intel members. With further migration to UXL we will be changing this, but here are non-Intel contacts:
[Codeowners](https://github.com/oneapi-src/oneDAL/blob/main/.github/CODEOWNERS) configuration defines specific maintainers for corresponding code sections, however it's currently limited to Intel members. With further migration to UXL we will be changing this, but here are non-Intel contacts:

For ARM specifics you may contact: [@rakshithgb-fujitsu](https://github.com/rakshithgb-fujitsu/)

Expand Down Expand Up @@ -69,6 +69,12 @@ Refer to [ClangFormat documentation](https://clang.llvm.org/docs/ClangFormat.htm

For your convenience we also added [coding guidelines](http://oneapi-src.github.io/oneDAL/contribution/coding_guide.html) with examples and detailed descriptions of the coding style oneDAL follows. We encourage you to consult them when writing your code.

## Custom Components

### Threading Layer

In the source code of the algorithms, oneDAL does not use threading primitives directly. All the threading primitives used within oneDAL form so called [threading layer](http://oneapi-src.github.io/oneDAL/contribution/threading.html). Contributors should leverage the primitives from the layer to implement parallel algorithms.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the source code of the algorithms, oneDAL does not use threading primitives directly. All the threading primitives used within oneDAL form so called [threading layer](http://oneapi-src.github.io/oneDAL/contribution/threading.html). Contributors should leverage the primitives from the layer to implement parallel algorithms.
In the source code of the algorithms, oneDAL does not use threading primitives directly. All the threading primitives used within oneDAL form are called the [threading layer](http://oneapi-src.github.io/oneDAL/contribution/threading.html). Contributors should leverage the primitives from the layer to implement parallel algorithms.


## Documentation Guidelines

oneDAL uses `Doxygen` for inline comments in public header files that are used to build the API reference and `reStructuredText` for the Developer Guide. See [oneDAL documentation](https://oneapi-src.github.io/oneDAL/) for reference.
Expand Down
6 changes: 3 additions & 3 deletions cpp/daal/src/threading/threading.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ DAAL_EXPORT size_t _setNumberOfThreads(const size_t numThreads, void ** globalCo
return 1;
}

DAAL_EXPORT void _daal_threader_for(int n, int threads_request, const void * a, daal::functype func)
DAAL_EXPORT void _daal_threader_for(int n, int reserved, const void * a, daal::functype func)
{
if (daal::threader_env()->getNumberOfThreads() > 1)
{
Expand Down Expand Up @@ -160,7 +160,7 @@ DAAL_EXPORT void _daal_threader_for_blocked_size(size_t n, size_t block, const v
}
}

DAAL_EXPORT void _daal_threader_for_simple(int n, int threads_request, const void * a, daal::functype func)
DAAL_EXPORT void _daal_threader_for_simple(int n, int reserved, const void * a, daal::functype func)
{
if (daal::threader_env()->getNumberOfThreads() > 1)
{
Expand Down Expand Up @@ -318,7 +318,7 @@ DAAL_PARALLEL_SORT_IMPL(daal::IdxValType<double>, pair_fp64_uint64)

#undef DAAL_PARALLEL_SORT_IMPL

DAAL_EXPORT void _daal_threader_for_blocked(int n, int threads_request, const void * a, daal::functype2 func)
DAAL_EXPORT void _daal_threader_for_blocked(int n, int reserved, const void * a, daal::functype2 func)
{
if (daal::threader_env()->getNumberOfThreads() > 1)
{
Expand Down
118 changes: 112 additions & 6 deletions cpp/daal/src/threading/threading.h
Original file line number Diff line number Diff line change
Expand Up @@ -223,14 +223,33 @@ inline void threader_func_break(int i, bool & needBreak, const void * a)
lambda(i, needBreak);
}

/// Execute the for loop defined by the input parameters in parallel.
/// The maximal number of iterations in the loop is 2^31 - 1.
Vika-F marked this conversation as resolved.
Show resolved Hide resolved
/// The work is scheduled dynamically across threads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment suggests that the work will be executed in parallel, but it may not be the case. It depends on the implementation behind the abstraction. This function passes the parameters to the threading layer, which may execute the work in parallel. And must the work be scheduled dynamically? Can the threading layer decide to just split n equally into however many threads there are (this is the static schedule in OpenMP parlance). Rewording this to something like:

Pass a function to be executed in a for loop to the threading layer. The work is scheduled over threads by the threading layer.

Also, it would be good to mention loop carried dependencies. Is the assumption here that the work in each loop iteration is independent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of this API is that it uses the default scheduling provided by the underlying implementation.
Of course, the execution might not be parallel for various reasons. For example, if only one thread is available.

It is expected that the iterations of the loop are logically independent. i.e. there are no recurrence among the iterations, but they might access the same data on read or write.
In the latter case, additional synchronization like the use of atomics, mutual execution or thread-local storage is required.

I will reword the description to make it more clear.

///
/// @tparam F Lambda function of type ``[/* captures */](int i) -> void``,
/// where ``i`` is the loop's iteration index, ``0 <= i < n``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking, but I've always used single backtick for monospaced in doxygen. Is the double backtick valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing this. We are using not only doxygen, but also restructured text + some python pre-processing.
Double backtick is valid but it results in italic text.
I'm going to replace it with single backtick to have monospace.

///
/// @param[in] n Number of iterations in the for loop.
/// @param[in] reserved Parameter reserved for the future. Currently unused.
/// @param[in] lambda Lambda function that defines iteration's body.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @param[in] lambda Lambda function that defines iteration's body.
/// @param[in] lambda Lambda function that defines the loop body.

template <typename F>
inline void threader_for(int n, int threads_request, const F & lambda)
inline void threader_for(int n, int reserved, const F & lambda)
{
const void * a = static_cast<const void *>(&lambda);

_daal_threader_for(n, threads_request, a, threader_func<F>);
_daal_threader_for(n, reserved, a, threader_func<F>);
}

/// Execute the for loop defined by the input parameters in parallel.
/// The maximal number of iterations in the loop is 2^63 - 1.
Vika-F marked this conversation as resolved.
Show resolved Hide resolved
/// The work is scheduled dynamically across threads.
///
/// @tparam F Lambda function of type [/* captures */](int64_t i) -> void,
/// where ``i`` is the loop's iteration index, ``0 <= i < n``.
///
/// @param[in] n Number of iterations in the for loop.
/// @param[in] lambda Lambda function that defines iteration's body.
template <typename F>
inline void threader_for_int64(int64_t n, const F & lambda)
{
Expand All @@ -239,12 +258,25 @@ inline void threader_for_int64(int64_t n, const F & lambda)
_daal_threader_for_int64(n, a, threader_func<F>);
}

/// Execute the for loop defined by the input parameters in parallel.
/// The maximal number of iterations in the loop is 2^31 - 1.
/// The work is scheduled dynamically across threads.
/// The iteration space is chunked using oneTBB ``simple_partitioner``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is TBB always going to be the threading layer? It is now, but can this be different in the future? If this function or interface is TBB specific, is this documented somewhere? Might also be worth considering changing the name of the functions, e.g. tbb_for_simple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is impossible to say whether TBB will always be a threading layer or not. For now we do not have plans to migrate to other threading technologies like OpenMP, C++ STD threads and so on, but who knows what happens in the future?

I think I need to reword this and remove TBB mentioning from the description.

The idea of this API is that the iterations are not grouped together; each iteration is considered as a separate task.
Because other APIs can group iterations in the way that a consequent block of iterations is executed by one thread. This API tries to avoid this if possible.

/// (https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Partitioner_Summary.html)
/// with chunk size 1.
///
/// @tparam F Lambda function of type [/* captures */](int i) -> void,
/// where ``i`` is the loop's iteration index, ``0 <= i < n``.
///
/// @param[in] n Number of iterations in the for loop.
/// @param[in] reserved Parameter reserved for the future. Currently unused.
/// @param[in] lambda Lambda function that defines iteration's body.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @param[in] lambda Lambda function that defines iteration's body.
/// @param[in] lambda Function that defines the loop body.

template <typename F>
inline void threader_for_simple(int n, int threads_request, const F & lambda)
inline void threader_for_simple(int n, int reserved, const F & lambda)
{
const void * a = static_cast<const void *>(&lambda);

_daal_threader_for_simple(n, threads_request, a, threader_func<F>);
_daal_threader_for_simple(n, reserved, a, threader_func<F>);
}

template <typename F>
Expand All @@ -255,6 +287,35 @@ inline void threader_for_int32ptr(const int * begin, const int * end, const F &
_daal_threader_for_int32ptr(begin, end, a, threader_func<F>);
}

/// Execute the for loop defined by the input parameters in parallel.
/// The maximal number of iterations in the loop is ``SIZE_MAX`` in C99 standard.
///
/// The work is scheduled statically across threads.
/// This means that the work is always scheduled in the same way across the threads:
/// each thread processes the same set of iterations on each invocation of this loop.
///
/// It is recommended to use this parallel loop if each iteration of the loop
/// performs equal amount of work.
///
/// Let ``t`` be the number of threads available to oneDAL.
///
/// Then the number of iterations processed by each threads (except maybe the last one)
/// is copmputed as:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// is copmputed as:
/// is computed as:

/// ``nI = (n + t - 1) / t``
///
/// Here is how the work in split across the threads:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Here is how the work in split across the threads:
/// Here is how the work is split across the threads:

/// The 1st thread executes iterations ``0``, ..., ``nI - 1``;
/// the 2nd thread executes iterations ``nI``, ..., ``2 * nI - 1``;
/// ...
/// the t-th thread executes iterations ``(t - 1) * nI``, ..., ``n - 1``.
///
/// @tparam F Lambda function of type [/* captures */](size_t i, size_t tid) -> void,
/// where
/// ``i`` is the loop's iteration index, ``0 <= i < n``;
/// ``tid`` is the index of the thread, ``0 <= tid < t``.
///
/// @param[in] n Number of iterations in the for loop.
/// @param[in] lambda Lambda function that defines iteration's body.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @param[in] lambda Lambda function that defines iteration's body.
/// @param[in] lambda Function that defines the loop body.

template <typename F>
inline void static_threader_for(size_t n, const F & lambda)
{
Expand All @@ -263,12 +324,27 @@ inline void static_threader_for(size_t n, const F & lambda)
_daal_static_threader_for(n, a, static_threader_func<F>);
}

/// Execute the for loop defined by the input parameters in parallel.
/// The maximal number of iterations in the loop is 2^31 - 1.
Vika-F marked this conversation as resolved.
Show resolved Hide resolved
/// The work is scheduled dynamically across threads.
///
/// @tparam F Lambda function of type [/* captures */](int beginRange, int endRange) -> void
/// where
/// ``beginRange`` is the starting index of the loop's iterations block to be
/// processed by a thread, ``0 <= beginRange < n``;
/// ``endRange`` is the index after the end of the loop's iterations block to be
/// processed by a thread, ``beginRange < endRange <= n``;
///
/// @param[in] n Number of iterations in the for loop.
/// @param[in] reserved Parameter reserved for the future. Currently unused.
/// @param[in] lambda Lambda function that processes the block of loop's iterations
/// ``[beginRange, endRange)``.
template <typename F>
inline void threader_for_blocked(int n, int threads_request, const F & lambda)
inline void threader_for_blocked(int n, int reserved, const F & lambda)
{
const void * a = static_cast<const void *>(&lambda);

_daal_threader_for_blocked(n, threads_request, a, threader_func_b<F>);
_daal_threader_for_blocked(n, reserved, a, threader_func_b<F>);
}

template <typename F>
Expand Down Expand Up @@ -321,10 +397,18 @@ class tls_deleter_ : public tls_deleter
virtual void del(void * a) { delete static_cast<lambdaType *>(a); }
};

/// Thread-local storage (TLS)
///
/// @tparam F Type of the data located in the storage
template <typename F>
class tls : public tlsBase
{
public:
/// Initialize thread-local storage
///
/// @tparam lambdaType Lambda function of type [/* captures */]() -> F
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs monospace, either with the single or double backticks. I've assumed the style so far

Suggested change
/// @tparam lambdaType Lambda function of type [/* captures */]() -> F
/// @tparam lambdaType Lambda function of type ``[/* captures */]() -> F``

///
/// @param lambda Lambda function that initializes a thread-local storage
template <typename lambdaType>
explicit tls(const lambdaType & lambda)
{
Expand All @@ -339,19 +423,35 @@ class tls : public tlsBase
tlsPtr = _daal_get_tls_ptr(a, tls_func<lambdaType>);
}

/// Destroys the memory associated with a thread-local storage
///
/// @note TLS does not release the memory allocated by a lambda-function
/// provided to the constructor.
/// Developers are responsible for deletion of that memory.
virtual ~tls()
{
d->del(voidLambda);
delete d;
_daal_del_tls_ptr(tlsPtr);
}

/// Access a local data of a thread by value
///
/// @return When first ionvoced by a thread, a lambda provided to the constructor is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @return When first ionvoced by a thread, a lambda provided to the constructor is
/// @return When first invoked by a thread, a lambda provided to the constructor is

/// called to initialize the local data of the thread and return it.
/// All the following invocations just return the same thread-local data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case, should the declaration of the pointer pf be static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so.
It is just an intermediate variable needed to cast from C-style API to C++.

F local()
{
void * pf = _daal_get_tls_local(tlsPtr);
return (static_cast<F>(pf));
}

/// Sequential reduction.
///
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void
/// @tparam lambdaType Lambda function of type ``[/* captures */](F) -> void``

///
/// @param lambda Lambda function that is applied to each element of thread-local
/// storage sequentially.
template <typename lambdaType>
void reduce(const lambdaType & lambda)
{
Expand All @@ -360,6 +460,12 @@ class tls : public tlsBase
_daal_reduce_tls(tlsPtr, a, tls_reduce_func<F, lambdaType>);
}

/// Parallel reduction.
///
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void
/// @tparam lambdaType Lambda function of type ``[/* captures */](F) -> void``

///
/// @param lambda Lambda function that is applied to each element of thread-local
/// storage in parallel.
template <typename lambdaType>
void parallel_reduce(const lambdaType & lambda)
{
Expand Down
Loading
Loading