-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a chapter about threading layer into the docs #2848
Changes from 9 commits
67e40de
67bd2b7
125d3be
97b804b
edae16a
029ec41
1221b9c
5dae044
2cfc753
2419cde
8ae4036
28fdc89
9278d4c
dc9739a
54b8e94
267e7b6
74be85a
e728c8c
2b44995
050e6be
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -223,14 +223,33 @@ inline void threader_func_break(int i, bool & needBreak, const void * a) | |||||
lambda(i, needBreak); | ||||||
} | ||||||
|
||||||
/// Execute the for loop defined by the input parameters in parallel. | ||||||
/// The maximal number of iterations in the loop is 2^31 - 1. | ||||||
Vika-F marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
/// The work is scheduled dynamically across threads. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment suggests that the work will be executed in parallel, but it may not be the case. It depends on the implementation behind the abstraction. This function passes the parameters to the threading layer, which may execute the work in parallel. And must the work be scheduled dynamically? Can the threading layer decide to just split n equally into however many threads there are (this is the
Also, it would be good to mention loop carried dependencies. Is the assumption here that the work in each loop iteration is independent? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea of this API is that it uses the default scheduling provided by the underlying implementation. It is expected that the iterations of the loop are logically independent. i.e. there are no recurrence among the iterations, but they might access the same data on read or write. I will reword the description to make it more clear. |
||||||
/// | ||||||
/// @tparam F Lambda function of type ``[/* captures */](int i) -> void``, | ||||||
/// where ``i`` is the loop's iteration index, ``0 <= i < n``. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just double checking, but I've always used single backtick for monospaced in doxygen. Is the double backtick valid? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for noticing this. We are using not only doxygen, but also restructured text + some python pre-processing. |
||||||
/// | ||||||
/// @param[in] n Number of iterations in the for loop. | ||||||
/// @param[in] reserved Parameter reserved for the future. Currently unused. | ||||||
/// @param[in] lambda Lambda function that defines iteration's body. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
template <typename F> | ||||||
inline void threader_for(int n, int threads_request, const F & lambda) | ||||||
inline void threader_for(int n, int reserved, const F & lambda) | ||||||
{ | ||||||
const void * a = static_cast<const void *>(&lambda); | ||||||
|
||||||
_daal_threader_for(n, threads_request, a, threader_func<F>); | ||||||
_daal_threader_for(n, reserved, a, threader_func<F>); | ||||||
} | ||||||
|
||||||
/// Execute the for loop defined by the input parameters in parallel. | ||||||
/// The maximal number of iterations in the loop is 2^63 - 1. | ||||||
Vika-F marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
/// The work is scheduled dynamically across threads. | ||||||
/// | ||||||
/// @tparam F Lambda function of type [/* captures */](int64_t i) -> void, | ||||||
/// where ``i`` is the loop's iteration index, ``0 <= i < n``. | ||||||
/// | ||||||
/// @param[in] n Number of iterations in the for loop. | ||||||
/// @param[in] lambda Lambda function that defines iteration's body. | ||||||
template <typename F> | ||||||
inline void threader_for_int64(int64_t n, const F & lambda) | ||||||
{ | ||||||
|
@@ -239,12 +258,25 @@ inline void threader_for_int64(int64_t n, const F & lambda) | |||||
_daal_threader_for_int64(n, a, threader_func<F>); | ||||||
} | ||||||
|
||||||
/// Execute the for loop defined by the input parameters in parallel. | ||||||
/// The maximal number of iterations in the loop is 2^31 - 1. | ||||||
/// The work is scheduled dynamically across threads. | ||||||
/// The iteration space is chunked using oneTBB ``simple_partitioner`` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is TBB always going to be the threading layer? It is now, but can this be different in the future? If this function or interface is TBB specific, is this documented somewhere? Might also be worth considering changing the name of the functions, e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is impossible to say whether TBB will always be a threading layer or not. For now we do not have plans to migrate to other threading technologies like OpenMP, C++ STD threads and so on, but who knows what happens in the future? I think I need to reword this and remove TBB mentioning from the description. The idea of this API is that the iterations are not grouped together; each iteration is considered as a separate task. |
||||||
/// (https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Partitioner_Summary.html) | ||||||
/// with chunk size 1. | ||||||
/// | ||||||
/// @tparam F Lambda function of type [/* captures */](int i) -> void, | ||||||
/// where ``i`` is the loop's iteration index, ``0 <= i < n``. | ||||||
/// | ||||||
/// @param[in] n Number of iterations in the for loop. | ||||||
/// @param[in] reserved Parameter reserved for the future. Currently unused. | ||||||
/// @param[in] lambda Lambda function that defines iteration's body. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
template <typename F> | ||||||
inline void threader_for_simple(int n, int threads_request, const F & lambda) | ||||||
inline void threader_for_simple(int n, int reserved, const F & lambda) | ||||||
{ | ||||||
const void * a = static_cast<const void *>(&lambda); | ||||||
|
||||||
_daal_threader_for_simple(n, threads_request, a, threader_func<F>); | ||||||
_daal_threader_for_simple(n, reserved, a, threader_func<F>); | ||||||
} | ||||||
|
||||||
template <typename F> | ||||||
|
@@ -255,6 +287,35 @@ inline void threader_for_int32ptr(const int * begin, const int * end, const F & | |||||
_daal_threader_for_int32ptr(begin, end, a, threader_func<F>); | ||||||
} | ||||||
|
||||||
/// Execute the for loop defined by the input parameters in parallel. | ||||||
/// The maximal number of iterations in the loop is ``SIZE_MAX`` in C99 standard. | ||||||
/// | ||||||
/// The work is scheduled statically across threads. | ||||||
/// This means that the work is always scheduled in the same way across the threads: | ||||||
/// each thread processes the same set of iterations on each invocation of this loop. | ||||||
/// | ||||||
/// It is recommended to use this parallel loop if each iteration of the loop | ||||||
/// performs equal amount of work. | ||||||
/// | ||||||
/// Let ``t`` be the number of threads available to oneDAL. | ||||||
/// | ||||||
/// Then the number of iterations processed by each threads (except maybe the last one) | ||||||
/// is copmputed as: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// ``nI = (n + t - 1) / t`` | ||||||
/// | ||||||
/// Here is how the work in split across the threads: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// The 1st thread executes iterations ``0``, ..., ``nI - 1``; | ||||||
/// the 2nd thread executes iterations ``nI``, ..., ``2 * nI - 1``; | ||||||
/// ... | ||||||
/// the t-th thread executes iterations ``(t - 1) * nI``, ..., ``n - 1``. | ||||||
/// | ||||||
/// @tparam F Lambda function of type [/* captures */](size_t i, size_t tid) -> void, | ||||||
/// where | ||||||
/// ``i`` is the loop's iteration index, ``0 <= i < n``; | ||||||
/// ``tid`` is the index of the thread, ``0 <= tid < t``. | ||||||
/// | ||||||
/// @param[in] n Number of iterations in the for loop. | ||||||
/// @param[in] lambda Lambda function that defines iteration's body. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
template <typename F> | ||||||
inline void static_threader_for(size_t n, const F & lambda) | ||||||
{ | ||||||
|
@@ -263,12 +324,27 @@ inline void static_threader_for(size_t n, const F & lambda) | |||||
_daal_static_threader_for(n, a, static_threader_func<F>); | ||||||
} | ||||||
|
||||||
/// Execute the for loop defined by the input parameters in parallel. | ||||||
/// The maximal number of iterations in the loop is 2^31 - 1. | ||||||
Vika-F marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
/// The work is scheduled dynamically across threads. | ||||||
/// | ||||||
/// @tparam F Lambda function of type [/* captures */](int beginRange, int endRange) -> void | ||||||
/// where | ||||||
/// ``beginRange`` is the starting index of the loop's iterations block to be | ||||||
/// processed by a thread, ``0 <= beginRange < n``; | ||||||
/// ``endRange`` is the index after the end of the loop's iterations block to be | ||||||
/// processed by a thread, ``beginRange < endRange <= n``; | ||||||
/// | ||||||
/// @param[in] n Number of iterations in the for loop. | ||||||
/// @param[in] reserved Parameter reserved for the future. Currently unused. | ||||||
/// @param[in] lambda Lambda function that processes the block of loop's iterations | ||||||
/// ``[beginRange, endRange)``. | ||||||
template <typename F> | ||||||
inline void threader_for_blocked(int n, int threads_request, const F & lambda) | ||||||
inline void threader_for_blocked(int n, int reserved, const F & lambda) | ||||||
{ | ||||||
const void * a = static_cast<const void *>(&lambda); | ||||||
|
||||||
_daal_threader_for_blocked(n, threads_request, a, threader_func_b<F>); | ||||||
_daal_threader_for_blocked(n, reserved, a, threader_func_b<F>); | ||||||
} | ||||||
|
||||||
template <typename F> | ||||||
|
@@ -321,10 +397,18 @@ class tls_deleter_ : public tls_deleter | |||||
virtual void del(void * a) { delete static_cast<lambdaType *>(a); } | ||||||
}; | ||||||
|
||||||
/// Thread-local storage (TLS) | ||||||
/// | ||||||
/// @tparam F Type of the data located in the storage | ||||||
template <typename F> | ||||||
class tls : public tlsBase | ||||||
{ | ||||||
public: | ||||||
/// Initialize thread-local storage | ||||||
/// | ||||||
/// @tparam lambdaType Lambda function of type [/* captures */]() -> F | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Needs monospace, either with the single or double backticks. I've assumed the style so far
Suggested change
|
||||||
/// | ||||||
/// @param lambda Lambda function that initializes a thread-local storage | ||||||
template <typename lambdaType> | ||||||
explicit tls(const lambdaType & lambda) | ||||||
{ | ||||||
|
@@ -339,19 +423,35 @@ class tls : public tlsBase | |||||
tlsPtr = _daal_get_tls_ptr(a, tls_func<lambdaType>); | ||||||
} | ||||||
|
||||||
/// Destroys the memory associated with a thread-local storage | ||||||
/// | ||||||
/// @note TLS does not release the memory allocated by a lambda-function | ||||||
/// provided to the constructor. | ||||||
/// Developers are responsible for deletion of that memory. | ||||||
virtual ~tls() | ||||||
{ | ||||||
d->del(voidLambda); | ||||||
delete d; | ||||||
_daal_del_tls_ptr(tlsPtr); | ||||||
} | ||||||
|
||||||
/// Access a local data of a thread by value | ||||||
/// | ||||||
/// @return When first ionvoced by a thread, a lambda provided to the constructor is | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// called to initialize the local data of the thread and return it. | ||||||
/// All the following invocations just return the same thread-local data. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is the case, should the declaration of the pointer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so. |
||||||
F local() | ||||||
{ | ||||||
void * pf = _daal_get_tls_local(tlsPtr); | ||||||
return (static_cast<F>(pf)); | ||||||
} | ||||||
|
||||||
/// Sequential reduction. | ||||||
/// | ||||||
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// | ||||||
/// @param lambda Lambda function that is applied to each element of thread-local | ||||||
/// storage sequentially. | ||||||
template <typename lambdaType> | ||||||
void reduce(const lambdaType & lambda) | ||||||
{ | ||||||
|
@@ -360,6 +460,12 @@ class tls : public tlsBase | |||||
_daal_reduce_tls(tlsPtr, a, tls_reduce_func<F, lambdaType>); | ||||||
} | ||||||
|
||||||
/// Parallel reduction. | ||||||
/// | ||||||
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// | ||||||
/// @param lambda Lambda function that is applied to each element of thread-local | ||||||
/// storage in parallel. | ||||||
template <typename lambdaType> | ||||||
void parallel_reduce(const lambdaType & lambda) | ||||||
{ | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.