Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PAL/vm-common] Check the "break" variable less frequently in delay() #42

Open
wants to merge 1 commit into
base: dimakuv/add-candle-rust-example
Choose a base branch
from

Conversation

dimakuv
Copy link

@dimakuv dimakuv commented Jul 31, 2024

Description of the changes

Previously, delay() function accessed the "break out of loop early" variable continue_gate basically on every CPU cycle. This variable is typically a global variable causing high contention on multi-core workloads. This e.g. manifested in the Candle Quantized LLaMA app.

This PR fixes this by checking the variable less frequently. The current heuristic is to check it every 1 ms.

How to test this PR?

Run the Candle example; it shows better scalability.

TODO: Run other benchmarks to see if they are also ok.


This change is Reviewable

Previously, `delay()` function accessed the "break out of loop early"
variable `continue_gate` basically on every CPU cycle. This variable
is typically a global variable causing high contention on multi-core
workloads. This e.g. manifested in the Candle Quantized LLaMA app.

This commit fixes this by checking the variable less frequently.
The current heuristic is to check it every 1 ms.

Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
@dimakuv dimakuv force-pushed the dimakuv/fix-candle-rust-perf-32-threads branch from 4a973e6 to 1d7aee2 Compare July 31, 2024 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant