-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jit: Replace LFU with LRU cache replacement policy #518
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks
Benchmark suite | Current: 5ffb157 | Previous: 448f434 | Ratio |
---|---|---|---|
Dhrystone |
1512 Average DMIPS over 10 runs |
1556 Average DMIPS over 10 runs |
1.03 |
Coremark |
1399.459 Average iterations/sec over 10 runs |
1400.104 Average iterations/sec over 10 runs |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
@jserv , can we suppress the warnings from scan-build? https://github.com/sysprog21/rv32emu/actions/runs/12116509397/job/33777032051?pr=518 |
Since both 'list' and 'hlist' are heavily tested, we can suppress the potential false alarm raised by LLVM static analysis. |
The PR description and the first patch mention performance issues but provide no benchmark numbers. They also state that LRU fits our needs better than LFU without explaining the workload or why it is more suitable. This contradicts the conclusion of commit bdc5348 ("Implement LFU as default cache along with memory pool (#125)") without providing an explanation. Could you update the descriptions to address these issues? |
Sure, I will provide some benchmarks to indicate the performance difference on the upcoming changes. |
266f269
to
05d31c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify the variant of LRU.
src/cache.c
Outdated
if (!revived_entry) { | ||
new_entry->frequency = 1; | ||
} else { | ||
new_entry->frequency = revived_entry->frequency + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use revived_entry->frequency++
for shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thess two line are not logically equivalent. I think this may be somewhat misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thess two line are not logically equivalent. I think this may be somewhat misleading.
My bad. It should be ++revived_entry->frequency
. Pre-increment has same effect of revived_entry->frequency + 1
.
Let me clarify the only changes is the rhs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The revived_entry
would be read-only and freed after the stored information is inherited by the new entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Reorder the commit order such that squashing
Update cache tests
withReplace LFU with LRU cache replacement policy
or arrange them as consecutive commit because these two commits are highly relevant . This ensures that the commit log remains clear and easy to understand. -
Commit message typo in first commit:
Matric
->Metric
.
7f1c234
to
448f434
Compare
The LFU (least-frequently-used) cache replacement policy would be inefficient when the cache is not big enough to hold all translated blocks (IRs). For example, there are two new entries going to be inserted to cache. They would evict each other because the other one has the least used times. This could cause the inefficiency in cache and make the cache-hit ratio fallen. The LRU (least-recently-used) cache replacement policy is more suitable for our needs. However, when the least used cache is evicted from the cache, the counter which is used to trigger JIT compilation is going to be lost. To address this issue, this patch introduces the degenerated adaptive replacement cache (ARC) which has only LRU part and its ghost list. The evicted entries will be temporarily preserved in the ghost list as the history. If the key of inserted entry matches the one in the ghost list, the history will be freed, and the frequency mentioned above would be inherited by the new entry. The performance difference between the original implementation and this patch is shown below: * 1024 entries (10 bits) | Metric | Original | Patched | |-----------|----------------|----------------| | dhrystone | 17720 DMIPS | 17660 DMIPS | | coremark | 5540 iters/s | 5550 iters/s | | aes | 9.164 s | 9.084 s | | nqueens | 1.551 s | 1.493 s | | hamilton | 12.917 s | 12.565 s | * 256 entries (8 bits) | Metric | Original | Patched | |-----------|----------------|----------------| | dhrystone | 17420 DMIPS | 18025 DMIPS | | coremark | 48 iters/s | 5575 iters/s | | aes | 8.904 s | 8.834 s | | nqueens | 6.416 s | 1.348 s | | hamilton | 3400 s | 13.004 s | * 64 entries (6 bits) | Metric | Original | Patched | |-----------|----------------|----------------| | dhrystone | 17720 DMIPS | 17850 DMIPS | | coremark | (timeout) | 215 iters/s | | aes | 342 s | 8.882 s | | nqueens | 680 s | 1.506 s | | hamilton | (timeout) | 13.724 s | * Experimental Linux Kernel booting: 126s (original) -> 21s (patched)
T2C uses the translated blocks (IRs) to establish the LLVM-IR. However, the cache might be modified by the main thread while the background thread is compiling.
Thank @vacantron for contributing! |
jit: Replace LFU with LRU cache replacement policy
The LFU (least-frequently-used) cache replacement policy would be inefficient when the cache is not big enough to hold all translated blocks (IRs). For example, there are two new entries going to be inserted to cache. They would evict each other because the other one has the least used times. This could cause the inefficiency in cache and make the cache-hit ratio fallen.
The LRU (least-recently-used) cache replacement policy is more suitable for our needs. However, when the least used cache is evicted from the cache, the counter which is used to trigger JIT compilation is going to be lost.
To address this issue, this patch introduces the degenerated adaptive replacement cache (ARC) which has only LRU part and its ghost list. The evicted entries will be temporarily preserved in the ghost list as the history. If the key of inserted entry matches the one in the ghost list, the history will be freed, and the frequency mentioned above would be inherited by the new entry.
The performance difference between the original implementation and this patch is shown below: