Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reworked wait_contex approach + scalability improvement in task_group #1345

Merged
merged 21 commits into from
Jul 9, 2024

Conversation

pavelkumbrasev
Copy link
Contributor

Description

  1. Introduce per thread reference_vertex that should help with scalability problem that single wait_context.
  2. task_group has a flat reference counting scheme i.e., there is a central reference counter where all the created tasks should increase/decrease reference during execution.
    This approach works fine while tasks are big and submitted from small number of threads (<8).
    When multiple threads will start tree-like algorithm with a lot of tasks the overall performance of the application will drastically degrade with increasing number of threads due to huge synchronization cost.
    This patch utilizes per thread reference counter in task_group.

Fixes # - issue number(s) if exists

  • - git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details)

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

@vossmjp @kboyarinov

Other information

src/tbb/task.cpp Outdated Show resolved Hide resolved
src/tbb/task.cpp Outdated Show resolved Hide resolved
src/tbb/task.cpp Outdated Show resolved Hide resolved
include/oneapi/tbb/detail/_task.h Outdated Show resolved Hide resolved
include/oneapi/tbb/detail/_task.h Outdated Show resolved Hide resolved
@pavelkumbrasev pavelkumbrasev changed the title New wait_vertex + scalability improvement in task_group Reworked wait_contex approach + scalability improvement in task_group Apr 15, 2024
@pavelkumbrasev pavelkumbrasev force-pushed the dev/pavelkumbrasev/task_group_refactoring branch from 4605379 to c3ea187 Compare April 15, 2024 14:17
include/oneapi/tbb/detail/_task_handle.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_group.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_group.h Outdated Show resolved Hide resolved
@@ -221,6 +221,34 @@ void notify_waiters(std::uintptr_t wait_ctx_addr) {
governor::get_thread_data()->my_arena->get_waiting_threads_monitor().notify(is_related_wait_ctx);
}

d1::wait_tree_vertex_interface* get_thread_reference_vertex(d1::wait_tree_vertex_interface* wc) {
__TBB_ASSERT(wc, nullptr);
auto& dispatcher = *governor::get_thread_data()->my_task_dispatcher;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can thread data be uninitialized here? Would get_thread_data_if_initialized() with the assert __TBB_ASSERT(governor::get_thread_data() != nullptr, "") be enough here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be not yet initialize. Consider this example:

tbb::task_group tg;
tg.run(); // get_thread_reference_vertex will be called but TBB not yet initialized

include/oneapi/tbb/collaborative_call_once.h Outdated Show resolved Hide resolved
src/tbb/task.cpp Outdated
@@ -221,6 +221,34 @@ void notify_waiters(std::uintptr_t wait_ctx_addr) {
governor::get_thread_data()->my_arena->get_waiting_threads_monitor().notify(is_related_wait_ctx);
}

d1::wait_tree_vertex_interface* get_thread_reference_vertex(d1::wait_tree_vertex_interface* wc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Perhaps, rename wc to top_wait_context.
  • Are there plans to pass something different than pointer to wait_context into this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Renamed.
  2. Not really.

src/tbb/task.cpp Outdated
auto& dispatcher = *governor::get_thread_data()->my_task_dispatcher;

d1::reference_vertex* ref_counter{nullptr};
auto pos = dispatcher.m_reference_vertex_map.find(wc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To improve code readability I noticed that dispatcher.m_reference_vertex_map is used many times within this function. I suggest making an alias to it first, and then use the alias in other parts of the function.

Suggested change
auto pos = dispatcher.m_reference_vertex_map.find(wc);
auot& reference_map = dispatcher.m_reference_vertex_map;
auto pos = reference_map.find(wc);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/tbb/task.cpp Outdated
} else {
constexpr std::size_t max_reference_vertex_map_size = 1000;
if (dispatcher.m_reference_vertex_map.size() > max_reference_vertex_map_size) {
for (auto it = dispatcher.m_reference_vertex_map.begin(); it != dispatcher.m_reference_vertex_map.end();) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, there might be a situation when a processing of this loop might take significant amount of time. Shall we introduce some threshold that would regulate how much time a thread could spend cleaning the container?
Perhaps, put a TODO note about this at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO

Comment on lines +209 to +210
std::uint64_t ref = m_ref_count.fetch_sub(static_cast<std::uint64_t>(delta)) - static_cast<std::uint64_t>(delta);
if (ref == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, this would make this part a bit more readable.

Suggested change
std::uint64_t ref = m_ref_count.fetch_sub(static_cast<std::uint64_t>(delta)) - static_cast<std::uint64_t>(delta);
if (ref == 0) {
std::uint64_t prev_ref_value = m_ref_count.fetch_sub(static_cast<std::uint64_t>(delta));
if (prev_ref_value == static_cast<std::uint64_t>(delta)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. @kboyarinov what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Aleksei, since the arithmetic operation after the long fetch_sub expression is not much noticeable.
If you are warned about the logical part, I guess the check can also be if (prev_ref_value - delta == 0) but I am not sure about that.

}
}

void release(std::uint32_t delta = 1) override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delta can be negative, but here possible parameter's values are only positive. Consider, naming the parameter differently here and in other similar places. Perhaps, release_num or simply num would suit better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is identical to wait_contex

Comment on lines 189 to 192
void release(std::uint32_t, const d1::execution_data&) override {
__TBB_ASSERT(false,
"This method is overloaded only to fulfill the base class interface requirements, and thus, it should not be called.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually it means that the inheritance is incorrect. Introducing intermediate interface would not only avoid writing such implementations but also would add to code readability.

Copy link
Contributor Author

@pavelkumbrasev pavelkumbrasev Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kboyarinov what do you think?
1 extra interface class or 1 method with assert in private section.

include/oneapi/tbb/task_group.h Outdated Show resolved Hide resolved
Comment on lines -434 to -470
template<typename F>
class function_task : public task {
const F m_func;
wait_context& m_wait_ctx;
small_object_allocator m_allocator;

void finalize(const execution_data& ed) {
// Make a local reference not to access this after destruction.
wait_context& wo = m_wait_ctx;
// Copy allocator to the stack
auto allocator = m_allocator;
// Destroy user functor before release wait.
this->~function_task();
wo.release();

allocator.deallocate(this, ed);
}
task* execute(execution_data& ed) override {
task* res = d2::task_ptr_or_nullptr(m_func);
finalize(ed);
return res;
}
task* cancel(execution_data& ed) override {
finalize(ed);
return nullptr;
}
public:
function_task(const F& f, wait_context& wo, small_object_allocator& alloc)
: m_func(f)
, m_wait_ctx(wo)
, m_allocator(alloc) {}

function_task(F&& f, wait_context& wo, small_object_allocator& alloc)
: m_func(std::move(f))
, m_wait_ctx(wo)
, m_allocator(alloc) {}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was that removed because of possibility to reuse function_task above that inherits task_handle_task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their layouts differ however. The PR is marked as backward compatible though. Is it because we are not certain about header-only backward incompatibilities or I miss something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to recompile the whole application.
It is explained here:
#1371

@pavelkumbrasev pavelkumbrasev force-pushed the dev/pavelkumbrasev/task_group_refactoring branch from af785df to bdd7ba6 Compare June 26, 2024 12:18
pavelkumbrasev and others added 17 commits July 8, 2024 15:36
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Co-authored-by: Konstantin Boyarinov <konstantin.boyarinov@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Co-authored-by: Konstantin Boyarinov <konstantin.boyarinov@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Co-authored-by: Aleksei Fedotov <aleksei.fedotov@intel.com>
@pavelkumbrasev pavelkumbrasev force-pushed the dev/pavelkumbrasev/task_group_refactoring branch from bdd7ba6 to bd9a34c Compare July 8, 2024 14:36
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
include/oneapi/tbb/detail/_task_handle.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_group.h Outdated Show resolved Hide resolved
@kboyarinov
Copy link
Contributor

Other than this - LGTM

pavelkumbrasev and others added 3 commits July 9, 2024 08:51
Co-authored-by: Konstantin Boyarinov <konstantin.boyarinov@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
@kboyarinov kboyarinov merged commit 1f52f50 into master Jul 9, 2024
22 checks passed
@kboyarinov kboyarinov deleted the dev/pavelkumbrasev/task_group_refactoring branch July 9, 2024 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants