-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply branch prediction for indirect jump #269
Conversation
Can you check the strategies and implementations at https://github.com/bucaps/marss-riscv/tree/master/src/riscvsim/bpu as well? |
I'm not sure whether the history table has a significant effect on the performance. It looks like the history table is the cache of the block map, doesn't it? According to my understanding, |
The original design is block_find(). |
Sure, I know it is the original design. No disrespect, I am just curious to know what's the meaning of the "overhead" on the original design. I assume that the original design has O(1) complexity on average, and the extra cost of the original design should only happen when "find miss" on the map. Because of these great features, I can't feel how the history table can improve on top of this. On the other hand, although the metric shows the proposed design is better, I suspect this improvement is still in the margin of error. |
In the block_find(), it needs hash function, comparison and the overhead of function call. The overhead of branch history table is extra memory, but it only needs several comparisons without hash function and function call. Actually, the branch table is designed for further improving JIT indirect jump codegen and its benefit on interpreter is not substantial. |
Got it. Possibly because of the simple mechanism now in the history table with the interpreter mode, I can't see its power on the emulator. I also assume that the cost of hash function should be low which can be ignored, but this might not be true. Thanks for your kind reply! |
3d2bcff
to
fa3514c
Compare
fa3514c
to
a5a4222
Compare
The two-level adaptive predictor's strength lies in its ability to swiftly adapt to and predict repetitive patterns effectively. This prediction technique is employed in many contemporary microprocessors, and rv32emu could potentially gain advantages from adopting such an approach. Let's assess its performance using the reference code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo ("proposed") in git commit message. Update the numbers if you have recent measurements.
a5a4222
to
561c4c7
Compare
Previously, it was necessary to perform a block cache lookup at the end of an indirect jump emulation; however, the associated overhead of this operation proved to be substantial. To mitigate this overhead, we have introduced a branch history table that captures the historical data of indirect jump targets. Given the limited number of entries in the branch history table, the lookup overhead is significantly reduced. As shown in the performance analysis provided below, the branch history table has demonstrably enhanced the overall performance. | Metric | original | proposed | |-----------+--------------+--------------| | Dhrystone | 2932.3 DMIPS | 2985.2 DMIPS | | CoreMark | 2231 iter/s | 2236 iter/s | | Stream | 76.04 sec | 75.299 sec | | Nqueens | 4.069 sec | 3.933 sec |
561c4c7
to
b24431c
Compare
Apply branch prediction for indirect jump
Previously, it was necessary to perform a block cache lookup at the end of an indirect jump emulation; however, the associated overhead of this operation proved to be substantial. To mitigate this overhead, we have introduced a branch history table that captures the historical data of indirect jump targets. Given the limited number of entries in the branch history table, the lookup overhead is significantly reduced.
As shown in the performance analysis provided below, the branch history table has demonstrably enhanced the overall performance.
See: #268