Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TLB bug with zeroed upper bits #496

Merged
merged 1 commit into from
Jul 15, 2024

Conversation

Timmmm
Copy link
Collaborator

@Timmmm Timmmm commented Jun 18, 2024

These lines of code zeroed the top 25 bits of an Sv39 address before translating it. That meant an address like 0xFFFFFFFFFFFFFFFF would be stored in the TLB as 0x0000007fffffffff. This caused a bug with sfence.vma when the rs1 (virtual address) argument was not zero. If you did sfence.vma 0xFFFFFFFFFFFFFFFF, x0 it should clear the TLB entry but it doesn't because it naively checks 0xFFFFFFFFFFFFF000 == 0x0000007ffffff000.

Currently there is only one TLB entry so it would need to do an sfence.vma for the page that was currently executing to be visible, otherwise the next fetch would clear it anyway.

An alternative fix would be to clear the upper 25 bits of vMatchMask, but this is simpler.

@Timmmm Timmmm changed the title Fix TLB bug Fix TLB bug with zeroed upper bits Jun 18, 2024
Timmmm added a commit to Timmmm/sail-riscv that referenced this pull request Jun 18, 2024
This has a very significant effect on the emulator performance with virtual memory enabled (e.g. when booting Linux).

Having a more realistic TLB can also help expose bugs, like the recent bug in riscv#496. For example with a larger TLB you can add an assertion in `translate()`. It also allows you to detect missing `sfence.vma`s in software by asserting that the TLB translation is the same as the uncached translation. This is not implemented in this change but we can do it in future (or you can just hack it in when needed). The detection will never be 100% but the bigger the TLB the more missing fences you will detect.

64 was chosen based on benchmarks I did last year. Performance when booting Linux keeps going up until you get to 64.
@Timmmm Timmmm mentioned this pull request Jun 18, 2024
@Timmmm Timmmm requested a review from rsnikhil June 18, 2024 15:44
Copy link

github-actions bot commented Jun 18, 2024

Test Results

712 tests  ±0   712 ✅ ±0   0s ⏱️ ±0s
  6 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 76282a9. ± Comparison against base commit eeb9270.

♻️ This comment has been updated with latest results.

@Alasdair
Copy link
Collaborator

Alasdair commented Jul 1, 2024

We should get this fixed, but I think the question we need to answer is how the upper bits of the sfence.vma 0xFFFFFFFFFFFFFFFF instruction should be treated? If the virtual addresses are 39-bits does that mean all bits above 39 should be treated the same? i.e. should something like:

sfence.vma 0xABCDFFFFFFFFFFFF

also clear that same VA?

@jrtc27
Copy link
Collaborator

jrtc27 commented Jul 1, 2024

From the priv spec:

If the value held in rs1 is not a valid virtual address, then the SFENCE.VMA instruction has no effect. No exception is raised in this case.

@jrtc27
Copy link
Collaborator

jrtc27 commented Jul 1, 2024

Of course, one is free to mask off the high bits too, since it's indistinguishable from an implementation that decided to evict that entry from its TLB of its own accord at that exact moment.

@Timmmm
Copy link
Collaborator Author

Timmmm commented Jul 2, 2024

If the value held in rs1 is not a valid virtual address, then the SFENCE.VMA instruction has no effect. No exception is raised in this case.

That should be the case because only valid virtual addresses make it into the TLB. Unless... you switch virtual memory modes (e.g. Sv57->Sv39) then do sfence.vma on an Sv57 address and then switch back or something?

In any case I think it's ok because this note contradicts the quote above:

It is always legal to over-fence, e.g., by fencing only based on a subset of the bits in rs1 and/or rs2, and/or by simply treating all SFENCE.VMA instructions as having rs1=x0 and/or rs2=x0.

Not a very clearly written part of the spec, but I think as long as we invalidate at least all the matching entries then it is compliant.

@Timmmm
Copy link
Collaborator Author

Timmmm commented Jul 2, 2024

I think it should really say:

If the value held in rs1 is not a valid virtual address, then the SFENCE.VMA instruction is not required to have any effect. No exception is raised in this case.

These lines of code zeroed the top 25 bits of an Sv39 address before translating it. That meant an address like 0xFFFFFFFFFFFFFFFF would be stored in the TLB as 0x0000007fffffffff. This caused a bug with `sfence.vma` when the `rs1` (virtual address) argument was not zero. If you did `sfence.vma 0xFFFFFFFFFFFFFFFF, x0` it should clear the TLB entry but it doesn't because it naively checks `0xFFFFFFFFFFFFF000 == 0x0000007ffffff000`.

Currently there is only one TLB entry so it would need to do an `sfence.vma` for the page that was currently executing to be visible, otherwise the next fetch would clear it anyway.

An alternative fix would be to clear the upper 25 bits of `vMatchMask`, but this is simpler.
Timmmm added a commit to Timmmm/sail-riscv that referenced this pull request Jul 4, 2024
This has a very significant effect on the emulator performance with virtual memory enabled (e.g. when booting Linux).

Having a more realistic TLB can also help expose bugs, like the recent bug in riscv#496. For example with a larger TLB you can add an assertion in `translate()`. It also allows you to detect missing `sfence.vma`s in software by asserting that the TLB translation is the same as the uncached translation. This is not implemented in this change but we can do it in future (or you can just hack it in when needed). The detection will never be 100% but the bigger the TLB the more missing fences you will detect.

64 was chosen based on benchmarks I did last year. Performance when booting Linux keeps going up until you get to 64.
@Timmmm
Copy link
Collaborator Author

Timmmm commented Jul 8, 2024

@Alasdair translate() is called here and you can see a few lines above it returns TR_Failure if the address is not a valid virtual address, so it must already be correctly sign-extended.

@Timmmm
Copy link
Collaborator Author

Timmmm commented Jul 8, 2024

PR to update the spec wording: riscv/riscv-isa-manual#1510

@Alasdair Alasdair self-assigned this Jul 15, 2024
@billmcspadden-riscv billmcspadden-riscv merged commit d96f89b into riscv:master Jul 15, 2024
2 checks passed
@Timmmm Timmmm deleted the user/timh/tlb_fix branch August 30, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants