-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a chapter about threading layer into the docs #2848
Changes from 12 commits
67e40de
67bd2b7
125d3be
97b804b
edae16a
029ec41
1221b9c
5dae044
2cfc753
2419cde
8ae4036
28fdc89
9278d4c
dc9739a
54b8e94
267e7b6
74be85a
e728c8c
2b44995
050e6be
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -223,14 +223,33 @@ inline void threader_func_break(int i, bool & needBreak, const void * a) | |||||
lambda(i, needBreak); | ||||||
} | ||||||
|
||||||
/// Execute the for loop defined by the input parameters in parallel. | ||||||
/// The maximal number of iterations in the loop is 2^31 - 1. | ||||||
Vika-F marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
/// The work is scheduled dynamically across threads. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment suggests that the work will be executed in parallel, but it may not be the case. It depends on the implementation behind the abstraction. This function passes the parameters to the threading layer, which may execute the work in parallel. And must the work be scheduled dynamically? Can the threading layer decide to just split n equally into however many threads there are (this is the
Also, it would be good to mention loop carried dependencies. Is the assumption here that the work in each loop iteration is independent? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea of this API is that it uses the default scheduling provided by the underlying implementation. It is expected that the iterations of the loop are logically independent. i.e. there are no recurrence among the iterations, but they might access the same data on read or write. I will reword the description to make it more clear. |
||||||
/// | ||||||
/// @tparam F Lambda function of type ``[/* captures */](int i) -> void``, | ||||||
/// where ``i`` is the loop's iteration index, ``0 <= i < n``. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just double checking, but I've always used single backtick for monospaced in doxygen. Is the double backtick valid? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for noticing this. We are using not only doxygen, but also restructured text + some python pre-processing. |
||||||
/// | ||||||
/// @param[in] n Number of iterations in the for loop. | ||||||
/// @param[in] reserved Parameter reserved for the future. Currently unused. | ||||||
/// @param[in] lambda Lambda function that defines iteration's body. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
template <typename F> | ||||||
inline void threader_for(int n, int threads_request, const F & lambda) | ||||||
inline void threader_for(int n, int reserved, const F & lambda) | ||||||
{ | ||||||
const void * a = static_cast<const void *>(&lambda); | ||||||
|
||||||
_daal_threader_for(n, threads_request, a, threader_func<F>); | ||||||
_daal_threader_for(n, reserved, a, threader_func<F>); | ||||||
} | ||||||
|
||||||
/// Execute the for loop defined by the input parameters in parallel. | ||||||
/// The maximal number of iterations in the loop is 2^63 - 1. | ||||||
Vika-F marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
/// The work is scheduled dynamically across threads. | ||||||
/// | ||||||
/// @tparam F Lambda function of type [/* captures */](int64_t i) -> void, | ||||||
/// where ``i`` is the loop's iteration index, ``0 <= i < n``. | ||||||
/// | ||||||
/// @param[in] n Number of iterations in the for loop. | ||||||
/// @param[in] lambda Lambda function that defines iteration's body. | ||||||
template <typename F> | ||||||
inline void threader_for_int64(int64_t n, const F & lambda) | ||||||
{ | ||||||
|
@@ -239,12 +258,25 @@ inline void threader_for_int64(int64_t n, const F & lambda) | |||||
_daal_threader_for_int64(n, a, threader_func<F>); | ||||||
} | ||||||
|
||||||
/// Execute the for loop defined by the input parameters in parallel. | ||||||
/// The maximal number of iterations in the loop is 2^31 - 1. | ||||||
/// The work is scheduled dynamically across threads. | ||||||
/// The iteration space is chunked using oneTBB ``simple_partitioner`` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is TBB always going to be the threading layer? It is now, but can this be different in the future? If this function or interface is TBB specific, is this documented somewhere? Might also be worth considering changing the name of the functions, e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is impossible to say whether TBB will always be a threading layer or not. For now we do not have plans to migrate to other threading technologies like OpenMP, C++ STD threads and so on, but who knows what happens in the future? I think I need to reword this and remove TBB mentioning from the description. The idea of this API is that the iterations are not grouped together; each iteration is considered as a separate task. |
||||||
/// (https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Partitioner_Summary.html) | ||||||
/// with chunk size 1. | ||||||
/// | ||||||
/// @tparam F Lambda function of type [/* captures */](int i) -> void, | ||||||
/// where ``i`` is the loop's iteration index, ``0 <= i < n``. | ||||||
/// | ||||||
/// @param[in] n Number of iterations in the for loop. | ||||||
/// @param[in] reserved Parameter reserved for the future. Currently unused. | ||||||
/// @param[in] lambda Lambda function that defines iteration's body. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
template <typename F> | ||||||
inline void threader_for_simple(int n, int threads_request, const F & lambda) | ||||||
inline void threader_for_simple(int n, int reserved, const F & lambda) | ||||||
{ | ||||||
const void * a = static_cast<const void *>(&lambda); | ||||||
|
||||||
_daal_threader_for_simple(n, threads_request, a, threader_func<F>); | ||||||
_daal_threader_for_simple(n, reserved, a, threader_func<F>); | ||||||
} | ||||||
|
||||||
template <typename F> | ||||||
|
@@ -255,6 +287,35 @@ inline void threader_for_int32ptr(const int * begin, const int * end, const F & | |||||
_daal_threader_for_int32ptr(begin, end, a, threader_func<F>); | ||||||
} | ||||||
|
||||||
/// Execute the for loop defined by the input parameters in parallel. | ||||||
/// The maximal number of iterations in the loop is ``SIZE_MAX`` in C99 standard. | ||||||
/// | ||||||
/// The work is scheduled statically across threads. | ||||||
/// This means that the work is always scheduled in the same way across the threads: | ||||||
/// each thread processes the same set of iterations on each invocation of this loop. | ||||||
/// | ||||||
/// It is recommended to use this parallel loop if each iteration of the loop | ||||||
/// performs equal amount of work. | ||||||
/// | ||||||
/// Let ``t`` be the number of threads available to oneDAL. | ||||||
/// | ||||||
/// Then the number of iterations processed by each threads (except maybe the last one) | ||||||
/// is computed as: | ||||||
/// ``nI = (n + t - 1) / t`` | ||||||
/// | ||||||
/// Here is how the work is split across the threads: | ||||||
/// The 1st thread executes iterations ``0``, ..., ``nI - 1``; | ||||||
/// the 2nd thread executes iterations ``nI``, ..., ``2 * nI - 1``; | ||||||
/// ... | ||||||
/// the t-th thread executes iterations ``(t - 1) * nI``, ..., ``n - 1``. | ||||||
/// | ||||||
/// @tparam F Lambda function of type [/* captures */](size_t i, size_t tid) -> void, | ||||||
/// where | ||||||
/// ``i`` is the loop's iteration index, ``0 <= i < n``; | ||||||
/// ``tid`` is the index of the thread, ``0 <= tid < t``. | ||||||
/// | ||||||
/// @param[in] n Number of iterations in the for loop. | ||||||
/// @param[in] lambda Lambda function that defines iteration's body. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
template <typename F> | ||||||
inline void static_threader_for(size_t n, const F & lambda) | ||||||
{ | ||||||
|
@@ -263,12 +324,27 @@ inline void static_threader_for(size_t n, const F & lambda) | |||||
_daal_static_threader_for(n, a, static_threader_func<F>); | ||||||
} | ||||||
|
||||||
/// Execute the for loop defined by the input parameters in parallel. | ||||||
/// The maximal number of iterations in the loop is 2^31 - 1. | ||||||
Vika-F marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
/// The work is scheduled dynamically across threads. | ||||||
/// | ||||||
/// @tparam F Lambda function of type [/* captures */](int beginRange, int endRange) -> void | ||||||
/// where | ||||||
/// ``beginRange`` is the starting index of the loop's iterations block to be | ||||||
/// processed by a thread, ``0 <= beginRange < n``; | ||||||
/// ``endRange`` is the index after the end of the loop's iterations block to be | ||||||
/// processed by a thread, ``beginRange < endRange <= n``; | ||||||
/// | ||||||
/// @param[in] n Number of iterations in the for loop. | ||||||
/// @param[in] reserved Parameter reserved for the future. Currently unused. | ||||||
/// @param[in] lambda Lambda function that processes the block of loop's iterations | ||||||
/// ``[beginRange, endRange)``. | ||||||
template <typename F> | ||||||
inline void threader_for_blocked(int n, int threads_request, const F & lambda) | ||||||
inline void threader_for_blocked(int n, int reserved, const F & lambda) | ||||||
{ | ||||||
const void * a = static_cast<const void *>(&lambda); | ||||||
|
||||||
_daal_threader_for_blocked(n, threads_request, a, threader_func_b<F>); | ||||||
_daal_threader_for_blocked(n, reserved, a, threader_func_b<F>); | ||||||
} | ||||||
|
||||||
template <typename F> | ||||||
|
@@ -321,10 +397,20 @@ class tls_deleter_ : public tls_deleter | |||||
virtual void del(void * a) { delete static_cast<lambdaType *>(a); } | ||||||
}; | ||||||
|
||||||
/// Thread-local storage (TLS). | ||||||
/// Can change its local variable after a nested parallel constructs. | ||||||
/// @note Use carefully in case of nested parallel regions. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How common is it to have nested parallelism, and how much do we want users of oneDAL to use it? If at all possible, I recommend not using nested parallelism, as this gets confusing quickly, and requires a lot of care. I would suggest either putting more of a health warning on here, e.g.
But maybe just leaving out the note is better, as this starts to open a discussion point There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the confusion here is because those APIs are not intended to be used by oneDAL's users. Inside oneDAL we use nested parallelism in many cases. So it is important to warn the developers when using a certain constructs might lead to issues. Will reformulate the note as well. |
||||||
/// | ||||||
/// @tparam F Type of the data located in the storage | ||||||
template <typename F> | ||||||
class tls : public tlsBase | ||||||
{ | ||||||
public: | ||||||
/// Initialize thread-local storage | ||||||
/// | ||||||
/// @tparam lambdaType Lambda function of type [/* captures */]() -> F | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Needs monospace, either with the single or double backticks. I've assumed the style so far
Suggested change
|
||||||
/// | ||||||
/// @param lambda Lambda function that initializes a thread-local storage | ||||||
template <typename lambdaType> | ||||||
explicit tls(const lambdaType & lambda) | ||||||
{ | ||||||
|
@@ -339,19 +425,35 @@ class tls : public tlsBase | |||||
tlsPtr = _daal_get_tls_ptr(a, tls_func<lambdaType>); | ||||||
} | ||||||
|
||||||
/// Destroys the memory associated with a thread-local storage | ||||||
/// | ||||||
/// @note TLS does not release the memory allocated by a lambda-function | ||||||
/// provided to the constructor. | ||||||
/// Developers are responsible for deletion of that memory. | ||||||
virtual ~tls() | ||||||
{ | ||||||
d->del(voidLambda); | ||||||
delete d; | ||||||
_daal_del_tls_ptr(tlsPtr); | ||||||
} | ||||||
|
||||||
/// Access a local data of a thread by value | ||||||
/// | ||||||
/// @return When first invoked by a thread, a lambda provided to the constructor is | ||||||
/// called to initialize the local data of the thread and return it. | ||||||
/// All the following invocations just return the same thread-local data. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is the case, should the declaration of the pointer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so. |
||||||
F local() | ||||||
{ | ||||||
void * pf = _daal_get_tls_local(tlsPtr); | ||||||
return (static_cast<F>(pf)); | ||||||
} | ||||||
|
||||||
/// Sequential reduction. | ||||||
/// | ||||||
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// | ||||||
/// @param lambda Lambda function that is applied to each element of thread-local | ||||||
/// storage sequentially. | ||||||
template <typename lambdaType> | ||||||
void reduce(const lambdaType & lambda) | ||||||
{ | ||||||
|
@@ -360,6 +462,12 @@ class tls : public tlsBase | |||||
_daal_reduce_tls(tlsPtr, a, tls_reduce_func<F, lambdaType>); | ||||||
} | ||||||
|
||||||
/// Parallel reduction. | ||||||
/// | ||||||
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// | ||||||
/// @param lambda Lambda function that is applied to each element of thread-local | ||||||
/// storage in parallel. | ||||||
template <typename lambdaType> | ||||||
void parallel_reduce(const lambdaType & lambda) | ||||||
{ | ||||||
|
@@ -396,10 +504,18 @@ class static_tls_deleter_ : public static_tls_deleter | |||||
virtual void del(void * a) { delete static_cast<lambdaType *>(a); } | ||||||
}; | ||||||
|
||||||
/// Thread-local storage (TLS) for the case of static parallel work scheduling. | ||||||
/// | ||||||
/// @tparam F Type of the data located in the storage | ||||||
template <typename F> | ||||||
class static_tls | ||||||
{ | ||||||
public: | ||||||
/// Initialize thread-local storage. | ||||||
/// | ||||||
/// @tparam lambdaType Lambda function of type [/* captures */]() -> F | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// | ||||||
/// @param lambda Lambda function that initializes a thread-local storage | ||||||
template <typename lambdaType> | ||||||
explicit static_tls(const lambdaType & lambda) | ||||||
{ | ||||||
|
@@ -431,6 +547,11 @@ class static_tls | |||||
_creater_func = creater_func<F, lambdaType>; | ||||||
} | ||||||
|
||||||
/// Destroys the memory associated with a thread-local storage. | ||||||
/// | ||||||
/// @note Static TLS does not release the memory allocated by a lambda-function | ||||||
/// provided to the constructor. | ||||||
/// Developers are responsible for deletion of that memory. | ||||||
virtual ~static_tls() | ||||||
{ | ||||||
if (_deleter) | ||||||
|
@@ -441,9 +562,16 @@ class static_tls | |||||
delete[] _storage; | ||||||
} | ||||||
|
||||||
/// Access a local data of a specified thread by value. | ||||||
/// | ||||||
/// @param tid Index of the thread. | ||||||
/// | ||||||
/// @return When first invoked by a thread, a lambda provided to the constructor is | ||||||
/// called to initialize the local data of the thread and return it. | ||||||
/// All the following invocations just return the same thread-local data. | ||||||
F local(size_t tid) | ||||||
{ | ||||||
if (_storage) | ||||||
if (_storage && tid < _nThreads) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible for this function to be called from within a nested parallel region, where the thread local storage has not yet been instantiated? If so, it could be that nested threads race to create the storage, and this function needs to be made thread safe. Also, can this be used to get the thread local storage of another thread? Seems as though it might be possible from a quick glance. If either of the above situations occur, there is a data race in here. I think the documentation should state that it is not safe to call this function from within a parallel region There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I mentioned previously, this PR is focused on documenting the present state of the things. Yes, it is not a good idea to call this The purpose of adding this comparison is not in changing the logic of the code, but just to prevent out-of-range memory access. I can revert it to the mainline version. Because the intension was not to touch the code at all - only to document it. |
||||||
{ | ||||||
if (!_storage[tid]) | ||||||
{ | ||||||
|
@@ -458,6 +586,12 @@ class static_tls | |||||
} | ||||||
} | ||||||
|
||||||
/// Sequential reduction. | ||||||
/// | ||||||
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// | ||||||
/// @param lambda Lambda function that is applied to each element of thread-local | ||||||
/// storage sequentially. | ||||||
template <typename lambdaType> | ||||||
void reduce(const lambdaType & lambda) | ||||||
{ | ||||||
|
@@ -470,6 +604,9 @@ class static_tls | |||||
} | ||||||
} | ||||||
|
||||||
/// Full number of threads. | ||||||
/// | ||||||
/// @return Number of threads available. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You mention nested parallelism in another comment. What does this function return in the nested case? The number of threads in the current group, or the total number of threads (including nested threads) that are launched? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This function returns the total number of threads available to oneDAL regardless of nesting. Will try to clarify this. |
||||||
size_t nthreads() const { return _nThreads; } | ||||||
|
||||||
private: | ||||||
|
@@ -480,10 +617,23 @@ class static_tls | |||||
static_tls_deleter * _deleter = nullptr; | ||||||
}; | ||||||
|
||||||
/// Local storage (LS) for a data of a thread. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// Does not change its local variable after nested parallel constructs, | ||||||
/// but can have performance penalties comparing to daal::tls. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// Can be safely used in case of nested parallel regions. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does it mean to be safely used? Are all values in here read only? |
||||||
/// | ||||||
/// @tparam F Type of the data located in the storage | ||||||
template <typename F> | ||||||
class ls : public tlsBase | ||||||
{ | ||||||
public: | ||||||
/// Initialize a local storage. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// | ||||||
/// @tparam lambdaType Lambda function of type [/* captures */]() -> F | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// | ||||||
/// @param lambda Lambda function that initializes a local storage | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For some of the comments, you specify that the
Suggested change
|
||||||
/// @param isTls if true, then local storage is a thread-local storage (daal::tls) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// and might have problems in case of nested parallel regions. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's worth putting a note somewhere in the documentation about nested parallelism, and what problems that can create for threads. Then from the comments in this file, you can link to the note. Just saying that there might be problems opens up questions, rather than resolving questions. |
||||||
template <typename lambdaType> | ||||||
explicit ls(const lambdaType & lambda, const bool isTls = false) | ||||||
{ | ||||||
|
@@ -499,13 +649,23 @@ class ls : public tlsBase | |||||
lsPtr = _isTls ? _daal_get_tls_ptr(a, tls_func<lambdaType>) : _daal_get_ls_ptr(a, tls_func<lambdaType>); | ||||||
} | ||||||
|
||||||
/// Destroys the memory associated with a local storage. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// | ||||||
/// @note LS does not release the memory allocated by a lambda-function | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// provided to the constructor. | ||||||
/// Developers are responsible for deletion of that memory. | ||||||
virtual ~ls() | ||||||
{ | ||||||
d->del(voidLambda); | ||||||
delete d; | ||||||
_isTls ? _daal_del_tls_ptr(lsPtr) : _daal_del_ls_ptr(lsPtr); | ||||||
} | ||||||
|
||||||
/// Access a local data of a thread by value. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// | ||||||
/// @return When first invoked by a thread, a lambda provided to the constructor is | ||||||
/// called to initialize the local data of the thread and return it. | ||||||
/// All the following invocations just return the same thread-local data. | ||||||
F local() | ||||||
{ | ||||||
void * pf = _isTls ? _daal_get_tls_local(lsPtr) : _daal_get_ls_local(lsPtr); | ||||||
|
@@ -517,6 +677,12 @@ class ls : public tlsBase | |||||
if (!_isTls) _daal_release_ls_local(lsPtr, p); | ||||||
} | ||||||
|
||||||
/// Sequential reduction. | ||||||
/// | ||||||
/// @tparam lambdaType Lambda function of type [/* captures */](F) -> void | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
/// | ||||||
/// @param lambda Lambda function that is applied to each element of thread-local | ||||||
/// storage sequentially. | ||||||
template <typename lambdaType> | ||||||
void reduce(const lambdaType & lambda) | ||||||
{ | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if you are defining the name, maybe
Threading Layer
should be a proper noun with capitalization? I think this might be a nit too far, but may as well leave a comment whilst I'm in the area