`PooledStackAllocator` helps improve performance of thread creating #324

Coldwings · 2024-01-08T10:05:15Z

In typical scenarios, we utilize photon::thread_pool primarily due to the efficiency gains from reusing existing photon threads rather than creating new ones.

From my observation, the most resource-intensive aspect of creating a photon thread is the allocation of its stack memory. It is notably more straightforward and offers better performance to reuse pre-allocated stack memory directly compared to reusing it through a thread pool mechanism.

Although the current IOAlloc toolset isn't compatible with stack allocation operations that cannot rely on photon thread utilities, I have developed a thread-local stack pool, enabling the practical reuse of already allocated stacks.

In our tests, the photon::WorkPool utilizing a pooled stack allocator outperforms the version using a traditional ThreadPool, and both these approaches significantly surpass direct thread creation in terms of performance.

Signed-off-by: Coldwings <coldwings@me.com>

lihuiba · 2024-01-09T03:40:12Z

thread/thread-pool.cpp

+ }
+
+ void set_bypass_threadpool(bool flag) {
+ __bypass_threadpool = true;


Can we set capacity = 0 instead?

beef9999 · 2024-01-09T07:12:34Z

Why the pooled stack allocator is faster than the thread pool?

Signed-off-by: Coldwings <coldwings@me.com>

Coldwings · 2024-01-09T07:34:05Z

Why the pooled stack allocator is faster than the thread pool?

The most heavy step of creating new photon thread is stack allocation, both thread pool and pooled stack allocator try to reuse allocated stack, but in different way.

In ThreadPool, all photon threads will not be put into DONE state when task finish, but keep in a indentity-pool to wait for reuse. The thread state is always kept and have to carefully deal with. It is ok for using ThreadPool in single vcpu. but using over multple-vcpus is quite stupid. Every step with locks and cross-vcpu notifications are always slow and too heavy.

The pooled-based allocator kept only stack memory allocation, never keep thread state at all. That makes it do not need to deal with locks and thread states, just a simple thread-local allocator. That is much simpler and efficient in tests. All thread create and die have no cross-vcpu related works.

Though, if workload always create new thread from one vcpu and then migrate to another, and let photon thread finished in vcpus that never create new threads, the thread local pool will never helps. (For example: always thread create in vcpu-A, then migrate to B, when work done, it will return stack to PooledStackAllocator in B. if B never create new threads, those stacks will never be reused.)

PooledStackAllocator helps improve performance of thread creating

c087fcf

Signed-off-by: Coldwings <coldwings@me.com>

Coldwings requested review from lihuiba and beef9999 January 8, 2024 10:05

lihuiba reviewed Jan 9, 2024

View reviewed changes

thread/thread-pool.cpp

}

void set_bypass_threadpool(bool flag) {

__bypass_threadpool = true;

Copy link

Collaborator

lihuiba Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set capacity = 0 instead?

Coldwings added 2 commits January 9, 2024 15:16

Fix compile on macos

9274ed4

Signed-off-by: Coldwings <coldwings@me.com>

When threadpool size is 0, using photon::thread_create directly

2b6d511

Signed-off-by: Coldwings <coldwings@me.com>

Coldwings force-pushed the main branch from 00547b3 to 2b6d511 Compare January 9, 2024 07:17

lihuiba approved these changes Jan 9, 2024

View reviewed changes

lihuiba merged commit 5b92ecc into alibaba:main Jan 9, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`PooledStackAllocator` helps improve performance of thread creating #324

`PooledStackAllocator` helps improve performance of thread creating #324

Coldwings commented Jan 8, 2024

lihuiba Jan 9, 2024

beef9999 commented Jan 9, 2024

Coldwings commented Jan 9, 2024

PooledStackAllocator helps improve performance of thread creating #324

PooledStackAllocator helps improve performance of thread creating #324

Conversation

Coldwings commented Jan 8, 2024

lihuiba Jan 9, 2024

Choose a reason for hiding this comment

beef9999 commented Jan 9, 2024

Coldwings commented Jan 9, 2024

`PooledStackAllocator` helps improve performance of thread creating #324

`PooledStackAllocator` helps improve performance of thread creating #324