Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PooledStackAllocator helps improve performance of thread creating #324

Merged
merged 3 commits into from
Jan 9, 2024

Conversation

Coldwings
Copy link
Collaborator

In typical scenarios, we utilize photon::thread_pool primarily due to the efficiency gains from reusing existing photon threads rather than creating new ones.

From my observation, the most resource-intensive aspect of creating a photon thread is the allocation of its stack memory. It is notably more straightforward and offers better performance to reuse pre-allocated stack memory directly compared to reusing it through a thread pool mechanism.

Although the current IOAlloc toolset isn't compatible with stack allocation operations that cannot rely on photon thread utilities, I have developed a thread-local stack pool, enabling the practical reuse of already allocated stacks.

In our tests, the photon::WorkPool utilizing a pooled stack allocator outperforms the version using a traditional ThreadPool, and both these approaches significantly surpass direct thread creation in terms of performance.

Signed-off-by: Coldwings <coldwings@me.com>
}

void set_bypass_threadpool(bool flag) {
__bypass_threadpool = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set capacity = 0 instead?

@beef9999
Copy link
Collaborator

beef9999 commented Jan 9, 2024

Why the pooled stack allocator is faster than the thread pool?

Signed-off-by: Coldwings <coldwings@me.com>
Signed-off-by: Coldwings <coldwings@me.com>
@Coldwings
Copy link
Collaborator Author

Why the pooled stack allocator is faster than the thread pool?

The most heavy step of creating new photon thread is stack allocation, both thread pool and pooled stack allocator try to reuse allocated stack, but in different way.

In ThreadPool, all photon threads will not be put into DONE state when task finish, but keep in a indentity-pool to wait for reuse. The thread state is always kept and have to carefully deal with. It is ok for using ThreadPool in single vcpu. but using over multple-vcpus is quite stupid. Every step with locks and cross-vcpu notifications are always slow and too heavy.

The pooled-based allocator kept only stack memory allocation, never keep thread state at all. That makes it do not need to deal with locks and thread states, just a simple thread-local allocator. That is much simpler and efficient in tests. All thread create and die have no cross-vcpu related works.

Though, if workload always create new thread from one vcpu and then migrate to another, and let photon thread finished in vcpus that never create new threads, the thread local pool will never helps. (For example: always thread create in vcpu-A, then migrate to B, when work done, it will return stack to PooledStackAllocator in B. if B never create new threads, those stacks will never be reused.)

@lihuiba lihuiba merged commit 5b92ecc into alibaba:main Jan 9, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants