Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch on VM SV32 reserved tests #1198

Closed
Zain2050 opened this issue Dec 18, 2024 · 19 comments
Closed

Mismatch on VM SV32 reserved tests #1198

Zain2050 opened this issue Dec 18, 2024 · 19 comments
Labels
bug Something isn't working

Comments

@Zain2050
Copy link
Contributor

Executing "wsim rv32gc arch32vm_sv32" runs virtual memory SV32 tests on Wally. All SV32 tests succeed, but reserved tests fail, showing us a mismatch. Following is a snippet of the output.

# rv32i_m/vm_sv32/src/vm_nleaf_pte_level0_S_mode.S succeeded.  Brilliant!!!
# rv32i_m/vm_sv32/src/vm_nleaf_pte_level0_U_mode.S succeeded.  Brilliant!!!
#   Error on test rv32i_m/vm_sv32/src/vm_reserved_pte_S_mode.S result         138: adr = 80c03338 sim (D$) 0000000f signature = 0000000c
# ** Note: $stop    : /home/mzain/cvw/testbench/testbench.sv(963)

Executing "wsim rv32gc /home/user/cvw/tests/riscof/work/riscv-arch-test/rv32i_m/vm_sv32/src/vm_reserved_pte_S_mode.S/dut/my.elf --lockstepverbose" shows us that it is being caused due to a fetch exception.

# Info   MEMRM 0x80c02914 0x80c02914 4 202000cd L1 (refRoot/cpu/64-bit Supervisor Physical unified)
# Info 783: 'refRoot/cpu', 0x00000000917ffffc: Supervisor *** FETCH EXCEPTION ***
# Info   MEMRM 0x80c02914 0x80c02914 4 202000cd L1 (refRoot/cpu/64-bit Supervisor Physical unified)

This problem is occurring with the following reserved tests:

  • vm_reserved_pte_S_mode
  • vm_reserved_pte_U_mode
  • vm_reserved_rwx_pte_S_mode
  • vm_reserved_rwx_pte_U_mode

In order to recreate this, we need to pull SV32 tests from riscv-non-isa/riscv-arch-test#516. Run make in cvw to compile these tests and then use the above wsim commands to run them.

@jordancarlin
Copy link
Member

Transferring this issue to the CVW repo

@jordancarlin jordancarlin transferred this issue from openhwgroup/cvw-arch-verif Dec 18, 2024
@davidharrishmc
Copy link
Contributor

davidharrishmc commented Dec 18, 2024 via email

@Zain2050
Copy link
Contributor Author

Sure. These are the instructions being executed before the fetch exception.

# Info   mstatus 00000800 -> 00000080 [SD:0 TSR:0 TW:0 TVM:0 MXR:0 SUM:0 MPRV:0 XS:0(Off) FS:0(Off) MPP:1->0 VS:0(Off) SPP:0 MPIE:0->1 UBE:0 SPIE:0 MIE:0 SIE:0]
# Info 772: 'refRoot/cpu', 0x000000009000091c(main+57c): Supervisor 00000013 addi    x0,x0,0
# Info   MEMX 0x9000091c 0x8000091c 2 0013
# Info   MEMX 0x9000091e 0x8000091e 2 0000
# Info 773: 'refRoot/cpu', 0x0000000090000920(main+580): Supervisor 004d8d93 addi    x27,x27,4
# Info   MEMX 0x90000920 0x80000920 2 8d93
# Info   MEMX 0x90000922 0x80000922 2 004d
# Info   x27 93014518 -> 9301451c
# Info 774: 'refRoot/cpu', 0x0000000090000924(main+584): Supervisor 01060613 addi    x12,x12,16
# Info   MEMX 0x90000924 0x80000924 2 0613
# Info   MEMX 0x90000926 0x80000926 2 0106
# Info   x12 00000010 -> 00000020
# Info 775: 'refRoot/cpu', 0x0000000090000928(main+588): Supervisor 00100313 addi    x6,x0,1
# Info   MEMX 0x90000928 0x80000928 2 0313
# Info   MEMX 0x9000092a 0x8000092a 2 0010
# Info   x6 10000010 -> 00000001
# Info 776: 'refRoot/cpu', 0x000000009000092c(main+58c): Supervisor 02030663 beq     x6,x0,90000958
# Info   MEMX 0x9000092c 0x8000092c 2 0663
# Info   MEMX 0x9000092e 0x8000092e 2 0203
# Info 777: 'refRoot/cpu', 0x0000000090000930(main+590): Supervisor 004002b7 lui     x5,0x400
# Info   MEMX 0x90000930 0x80000930 2 02b7
# Info   MEMX 0x90000932 0x80000932 2 0040
# Info   x5 00000918 -> 00400000
# Info 778: 'refRoot/cpu', 0x0000000090000934(main+594): Supervisor ffc28293 addi    x5,x5,-4
# Info   MEMX 0x90000934 0x80000934 2 8293
# Info   MEMX 0x90000936 0x80000936 2 ffc2
# Info   x5 00400000 -> 003ffffc
# Info 779: 'refRoot/cpu', 0x0000000090000938(main+598): Supervisor 0167d793 srli    x15,x15,0x16
# Info   MEMX 0x90000938 0x80000938 2 d793
# Info   MEMX 0x9000093a 0x8000093a 2 0167
# Info   x15 91400000 -> 00000245
# Info 780: 'refRoot/cpu', 0x000000009000093c(main+59c): Supervisor 01679793 slli    x15,x15,0x16
# Info   MEMX 0x9000093c 0x8000093c 2 9793
# Info   MEMX 0x9000093e 0x8000093e 2 0167
# Info   x15 00000245 -> 91400000
# Info 781: 'refRoot/cpu', 0x0000000090000940(main+5a0): Supervisor 005782b3 add     x5,x15,x5
# Info   MEMX 0x90000940 0x80000940 2 82b3
# Info   MEMX 0x90000942 0x80000942 2 0057
# Info   x5 003ffffc -> 917ffffc
# Info 782: 'refRoot/cpu', 0x0000000090000944(main+5a4): Supervisor 000280e7 jalr    x1,0(x5)
# Info   MEMX 0x90000944 0x80000944 2 80e7
# Info   MEMX 0x90000946 0x80000946 2 0002
# Info   x1 feedbead -> 90000948
# Info   MEMRM 0x80c02914 0x80c02914 4 202000cd L1 (refRoot/cpu/64-bit Supervisor Physical unified)
# Info 783: 'refRoot/cpu', 0x00000000917ffffc: Supervisor *** FETCH EXCEPTION ***
# Info   MEMRM 0x80c02914 0x80c02914 4 202000cd L1 (refRoot/cpu/64-bit Supervisor Physical unified)

@davidharrishmc
Copy link
Contributor

davidharrishmc commented Dec 18, 2024 via email

@davidharrishmc
Copy link
Contributor

@Zain2050 : @rosethompson is experiencing a similar issue that Breker throws a page fault and Wally doesn't. I don't know if the two issues are related, but it's definitely timely. If you can get to the root cause and fix soon, it would confirm or refute the two mismatches being related.

@jordancarlin jordancarlin added the bug Something isn't working label Dec 19, 2024
@rosethompson
Copy link
Contributor

Did you find the root cause of the mismatch? The Breker page fault is a related to non-leaf PTE non-zero dirty, access, and user bits. Breker expects to generate a page fault and Wally does not.

@davidharrishmc
Copy link
Contributor

davidharrishmc commented Dec 20, 2024 via email

@rosethompson
Copy link
Contributor

We actually have to modify the HPTW because it's a non-leaf PTE which will cause the fault. I've already made a patch and demonstrated it matches Breker. However I'm not convinced it's actually the correct thing to do. The spec just says the bits are reserved and should be zeroed by software for forward compatibility. There is no mention of faulting.

@davidharrishmc
Copy link
Contributor

davidharrishmc commented Dec 20, 2024 via email

@rosethompson
Copy link
Contributor

I haven't been able to get ImperasDV to run with Breker.

@jordancarlin
Copy link
Member

jordancarlin commented Dec 20, 2024 via email

@rosethompson
Copy link
Contributor

@jordancarlin Were you able to get the two running together on chips? When I try to run them together they just hang loading uvm_dpi.so
# Loading /opt/mentor/questasim/questasim/questasim/uvm-1.1d/linux_x86_64/uvm_dpi.so

@jordancarlin
Copy link
Member

jordancarlin commented Dec 20, 2024 via email

@rosethompson
Copy link
Contributor

That probably explains it. I figured it would error out immediately rather than hang with the wrong imperas.ic, but instead it just hangs indefinitely.

@Zain2050
Copy link
Contributor Author

My apologies for the late response. I read the test file. First it sets WX bits and then performs sw (W), lw (R) and jalr (X), from which we expect 3 faults. From the logs, I observed that sw and lw are correctly triggering faults and the mismatch occurs on jalr. DUT tries to execute the instruction instead of raising an exception, which in turn leads to mismatches in CSR values.

# Info   MEMRM 0x80c02914 0x80c02914 4 202000cd L1 (refRoot/cpu/64-bit Supervisor Physical unified)
# Info 783: 'refRoot/cpu', 0x00000000917ffffc: Supervisor *** FETCH EXCEPTION ***
# Info   MEMRM 0x80c02914 0x80c02914 4 202000cd L1 (refRoot/cpu/64-bit Supervisor Physical unified)
# Info   MEMRM 0x80c02914 0x80c02914 4 202000cd L1 (refRoot/cpu/64-bit Supervisor Physical unified)
# Info (OP_NCH) GlobalTime:0.000000 LocalTime:0.000008 Net:refRoot/coverpoint 1 => 0
# Info (OP_NCH) GlobalTime:0.000000 LocalTime:0.000008 Net:refRoot/coverpoint 0 => 4
# Info (OP_NCH) GlobalTime:0.000000 LocalTime:0.000008 Net:refRoot/coverpoint 4 => 3
# Info (OP_NCH) GlobalTime:0.000000 LocalTime:0.000008 Net:refRoot/coverpoint 3 => 0x1B
# Info (OP_NCH) GlobalTime:0.000000 LocalTime:0.000008 Net:refRoot/coverpoint 0x1B => 0xD
# Warning (RISCV_PTWE) CPU 'refRoot/cpu': Page table entry R=0 and W=1 [address=0x917ffffc PTEAddress=0x80c02914 access=X]
# Info (OP_NCH) GlobalTime:0.000000 LocalTime:0.000008 Net:refRoot/coverpoint 0xD => 1
# Warning (RISCV_IMA) CPU 'refRoot/cpu' 0x917ffffc 0000     c.illegal: Page fault at fetch address (0x917ffffc)
# Info   mstatus 00000080 -> 00000800 [SD:0 TSR:0 TW:0 TVM:0 MXR:0 SUM:0 MPRV:0 XS:0(Off) FS:0(Off) MPP:0->1 VS:0(Off) SPP:0 MPIE:1->0 UBE:0 SPIE:0 MIE:0 SIE:0]
# Info   mepc 9000091c -> 917ffffc
# Info   mcause 0000000d -> 0000000c [Interrupt:0 Code:13(Load page fault)->12(Instruction page fault)]
# Info   mtval 91400014 -> 917ffffc
# Info 784: 'refRoot/cpu', 0x0000000080001280(Mtrampoline): Machine 0800006f jal     x0,80001300
# Info   MEMX 0x80001280 0x80001280 2 006f
# Info   MEMX 0x80001282 0x80001282 2 0800
# Info (IDV) Instruction executed prior to mismatch '0x90000944(): 000280e7 jalr    x1,0(x5)'
# Error (IDV) PC mismatch (HartId:0, PC:0x80001280 Mtrampoline+0):
# Error (IDV) Mismatch 0>
# Error (IDV)   . dut:0x917ffffc 
# Error (IDV)   . ref:0x80001280 Mtrampoline+0
# Error (IDV) Insn. bit pattern mismatch (HartId:0, PC:0x80001280 Mtrampoline+0):
# Error (IDV) Mismatch 1>
# Error (IDV)   . dut:00008067 jalr    x0,0(x1)
# Error (IDV)   . ref:0800006f jal     x0,80001300
# Error (IDV) CSR register value mismatch (HartId:0, PC:0x80001280 Mtrampoline+0):
# Error (IDV) Mismatch 2> CSR 300 (mstatus)
# Error (IDV)   . dut:0x00000080 SD:0 TSR:0 TW:0 TVM:0 MXR:0 SUM:0 MPRV:0 XS:0(Off) FS:0(Off) MPP:0 VS:0(Off) SPP:0 MPIE:1 UBE:0 SPIE:0 MIE:0 SIE:0 (not updated)
# Error (IDV)   . ref:0x00000800 SD:0 TSR:0 TW:0 TVM:0 MXR:0 SUM:0 MPRV:0 XS:0(Off) FS:0(Off) MPP:1 VS:0(Off) SPP:0 MPIE:0 UBE:0 SPIE:0 MIE:0 SIE:0
# Error (IDV) Mismatch 3> CSR 341 (mepc)
# Error (IDV)   . dut:0x9000091c (not updated)
# Error (IDV)   . ref:0x917ffffc
# Error (IDV) Mismatch 4> CSR 342 (mcause)
# Error (IDV)   . dut:0x0000000d Interrupt:0 Code:13(Load page fault) (not updated)
# Error (IDV)   . ref:0x0000000c Interrupt:0 Code:12(Instruction page fault)
# Error (IDV) Mismatch 5> CSR 343 (mtval)
# Error (IDV)   . dut:0x91400014 (not updated)
# Error (IDV)   . ref:0x917ffffc
# Error (IDV) testbench.idv_trace2api.state_compare @ 21160: MISMATCH

@Zain2050
Copy link
Contributor Author

In tlbcontrol.sv, reserved encoding logic is correctly implemented.
assign ReservedEncoding = PTE_W & ~PTE_R;

However, it is only checked for DTLB (Data TLB) and not in the case of ITLB (Instruction TLB). I might be wrong, but I think this is the cause of the problem.
Adding fault checking for reserved encoding in ITLB fixes it. Execution completes successfully and the gives the following report.

# Info (IDV) ---------------------------------------------------
# Info (IDV) ImperasDV VERIFICATION REPORT
# Info (IDV)   Instruction retires   : 2,767
# Info (IDV)   Traps                 : 17
# Info (IDV)   Interrupt events      : 0
# Info (IDV)   Ending cycle count    : 5,643
# Info (IDV)                               Sets / Compares
# Info (IDV)     PC                  :    2,784 / 2,767
# Info (IDV)     Instruction         :    2,784 / 2,767
# Info (IDV)     GPR                 :    2,233 / 2,233
# Info (IDV)     CSR                 :    5,676 / 154
# Info (IDV)     FPR                 :        0 / 0
# Info (IDV)     VR                  :        0 / 0 (disabled)
# Info (IDV)  
# Info (IDV)   Total compares        : 7,921
# Info (IDV)   Mismatches            : 0
# Info (IDV) ---------------------------------------------------

Can someone verify that what I did was right?

@davidharrishmc
Copy link
Contributor

@Zain2050 that looks like you've got it! Excellent work! Finding and fixing real RTL bugs is one of the most important outcomes of all this DV work.

If you are up for it, the next step would be to make a PR to cvw for your RTL fix. Think carefully about implementing RTL changes in the best possible way. This often involves reading the rest of the file and understanding the context where you are making the change. Where there's one rat, there are many rats, so think about whether the bug you are fixing is an instance of a larger class of possible bugs, and try to fix all of them at once if you can find any others. run "regression-wally --fcov" and regular "regression-wally" to test the fixe. Then make a PR to cvw. When the PR is accepted, close this issue with a message cross-referencing the PR that fixed it.

@Zain2050
Copy link
Contributor Author

Thanks a lot! I'll review it thoroughly, check for similar issues, and make sure I haven't broken anything. Once everything looks good, I'll make a PR.

@Zain2050
Copy link
Contributor Author

This issue has been fixed by PR #1206. Therefore, closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants