Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply branch prediction for indirect jump #269

Merged
merged 1 commit into from
Nov 21, 2023

Conversation

qwe661234
Copy link
Collaborator

@qwe661234 qwe661234 commented Nov 19, 2023

Previously, it was necessary to perform a block cache lookup at the end of an indirect jump emulation; however, the associated overhead of this operation proved to be substantial. To mitigate this overhead, we have introduced a branch history table that captures the historical data of indirect jump targets. Given the limited number of entries in the branch history table, the lookup overhead is significantly reduced.

As shown in the performance analysis provided below, the branch history table has demonstrably enhanced the overall performance.

Metric original proposed
Dhrystone 2932.3 DMIPS 2985.2 DMIPS
CoreMark 2231 iter/s 2236 iter/s
Stream 76.04 sec 75.299 sec
Nqueens 4.069 sec 3.933 sec

See: #268

@jserv
Copy link
Contributor

jserv commented Nov 19, 2023

Can you check the strategies and implementations at https://github.com/bucaps/marss-riscv/tree/master/src/riscvsim/bpu as well?

src/rv32_template.c Fixed Show resolved Hide resolved
@jserv jserv requested a review from RinHizakura November 19, 2023 12:52
src/rv32_template.c Outdated Show resolved Hide resolved
@RinHizakura
Copy link
Collaborator

RinHizakura commented Nov 19, 2023

I'm not sure whether the history table has a significant effect on the performance. It looks like the history table is the cache of the block map, doesn't it?

According to my understanding, block_find() should be O(1) in the average case, and the entries in the block cache should only be evicted when doing the block_map_clear(). Since block_map_clear() should happen rarely, the probability of missing on the block map would be low.

@qwe661234
Copy link
Collaborator Author

I'm not sure whether the history table has a significant effect on the performance. It looks like the history table is the cache of the block map, doesn't it?

According to my understanding, block_find() should be O(1) in the average case, and the entries in the block cache should only be evicted when doing the block_map_clear(). Since block_map_clear() should happen rarely, the probability of missing on the block map would be low.

The original design is block_find().

@RinHizakura
Copy link
Collaborator

RinHizakura commented Nov 19, 2023

The original design is block_find().

Sure, I know it is the original design.

No disrespect, I am just curious to know what's the meaning of the "overhead" on the original design. I assume that the original design has O(1) complexity on average, and the extra cost of the original design should only happen when "find miss" on the map. Because of these great features, I can't feel how the history table can improve on top of this. On the other hand, although the metric shows the proposed design is better, I suspect this improvement is still in the margin of error.

@qwe661234
Copy link
Collaborator Author

qwe661234 commented Nov 19, 2023

The original design is block_find().

Sure, I know it is the original design.

No disrespect, I am just curious to know what's the meaning of the "overhead" on the original design. I assume that the original design has O(1) complexity on average, and the extra cost of the original design should only happen when "find miss" on the map. On the other hand, although the metric shows the proposed design is better, I suspect this improvement is still in the margin of error.

In the block_find(), it needs hash function, comparison and the overhead of function call. The overhead of branch history table is extra memory, but it only needs several comparisons without hash function and function call.

Actually, the branch table is designed for further improving JIT indirect jump codegen and its benefit on interpreter is not substantial.

@RinHizakura
Copy link
Collaborator

RinHizakura commented Nov 19, 2023

In the block_find(), it needs hash function, comparison and the overhead of function call. The overhead of branch history table is extra memory, but it only needs several comparisons without hash function and function call.

Actually, the branch table is designed for further improving JIT indirect jump codegen and its benefits on interpreter is not substantial.

Got it. Possibly because of the simple mechanism now in the history table with the interpreter mode, I can't see its power on the emulator. I also assume that the cost of hash function should be low which can be ignored, but this might not be true.

Thanks for your kind reply!

@qwe661234 qwe661234 force-pushed the branch_predictor branch 2 times, most recently from 3d2bcff to fa3514c Compare November 20, 2023 08:39
src/emulate.c Show resolved Hide resolved
src/rv32_template.c Outdated Show resolved Hide resolved
@jserv
Copy link
Contributor

jserv commented Nov 20, 2023

Can you check the strategies and implementations at https://github.com/bucaps/marss-riscv/tree/master/src/riscvsim/bpu as well?

The two-level adaptive predictor's strength lies in its ability to swiftly adapt to and predict repetitive patterns effectively. This prediction technique is employed in many contemporary microprocessors, and rv32emu could potentially gain advantages from adopting such an approach. Let's assess its performance using the reference code.

Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix typo ("proposed") in git commit message. Update the numbers if you have recent measurements.

Previously, it was necessary to perform a block cache lookup at the end
of an indirect jump emulation; however, the associated overhead of this
operation proved to be substantial. To mitigate this overhead, we have
introduced a branch history table that captures the historical data of
indirect jump targets. Given the limited number of entries in the
branch history table, the lookup overhead is significantly reduced.

As shown in the performance analysis provided below, the branch history
table has demonstrably enhanced the overall performance.

|  Metric   |   original   |   proposed   |
|-----------+--------------+--------------|
| Dhrystone | 2932.3 DMIPS | 2985.2 DMIPS |
| CoreMark  | 2231 iter/s  | 2236 iter/s  |
| Stream    | 76.04 sec    | 75.299 sec   |
| Nqueens   | 4.069 sec    | 3.933 sec    |
@qwe661234 qwe661234 requested a review from jserv November 21, 2023 09:12
@jserv jserv merged commit fb2ece9 into sysprog21:master Nov 21, 2023
21 checks passed
vestata pushed a commit to vestata/rv32emu that referenced this pull request Jan 24, 2025
Apply branch prediction for indirect jump
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants