arch: riscv: handle interrupt level for CLIC #75581

jimmyzhe · 2024-07-08T10:20:32Z

RISC-V CLIC supports mintstatus.MIL (RO) and mcause.MPIL (RW) for interrupt levels.
When an interrupt happends, mintstatus.MIL (current interrupt level) is set to mcause.MPIL (previous interrupt level). Each ISR must execute MRET to restore mcause.MPIL back to mintstatus.MIL.

To handle nested interrupts with CLIC interrupt levels, save and restore mcause.MPIL in ISRs. Use RISCV_ALWAYS_SWITCH_THROUGH_ECALL to ensure ISRs do not exit without executing MRET.

tovine

When an interrupt happends, mintstatus.MIL (current interrupt level) is set to mcause.MPIL (previous interrupt level). Each ISR must execute MRET to restore mcause.MPIL back to mintstatus.MIL.

Let me see if my understanding is correct; this is the sequence you mean?

When an interrupt happens:
mcause.mpil = mintstatus.mil
And on mret:
mintstatus.mil = mcause.mpil

tovine · 2024-07-09T15:19:54Z

arch/riscv/core/isr.S

+	or t0, t0, t1
+	csrw mcause, t0
+#endif /* CONFIG_RISCV_HAS_CLIC */
+
 	/* Restore MEPC and MSTATUS registers */


What's the reason for all this masking instead of just restoring the saved mcause value?

Yes, that's what I mean.

According to CLIC doc, when CLIC mode is enabled, mcause has new fields mcause.mpp and mcause.mpie that mirror mstatus.mpp and mstatus.mpie.
Restoring the entire mcause here pollutes the previous mstatus settings.

But mstatus is also restored later (line -689/+707), so it doesn't really matter if you "pollute" it?

Besides, if you end up in a situation where the thread context of mcause and mstatus are inconsistent then I'd say you have bigger problems, as the thread state would essentially be corrupt

Yes, mcause.mpp and mcause.mpie fields are always overwritten by the later storing to mstatus. The mirror fields in mstatus and mcause always remain the same in the stack frame because they are restored together.

However, the RISC-V PMP stack guard and userspace mechanism are based on mstatus.mpp. I am still figuring out if it is safe when mcause.mpp is restored before access to the stack (line -704/+705).

lr t0, __struct_arch_esf_mepc_OFFSET(sp) lr t2, __struct_arch_esf_mstatus_OFFSET(sp)

However, the RISC-V PMP stack guard and userspace mechanism are based on mstatus.mpp

Are you sure this is correct? Sounds a bit strange that they would depends on the previous privilege mode (mpp) instead of the current one.
I don't think mstatus.mpp would take any effect until after executing mret to get back to whatever mode you were in...

Sorry, I misunderstood the userspace. Only the PMP stack guard is based on mstatus.mpp.
According to The RISC-V Instruction Set Manual: Volume II 3.1.6.3. Memory Privilege in mstatus Register, we can use mstatus.mprv and mstatus.mpp to perform load/store operations with MMU translation or PMP protection in M-mode.
The PMP stack guard has used this to detect thread and interrupt stack overflow since PR #44651.

I updated the restoring for mcause. It now saves and restores the entire mcause and avoids storing to memory between restoring mcause and mstatus.

Aah, I hadn't seen this one before - sorry about that, you were right about MPP being used in some cases. 🙂

When MPRV=1, load and store memory
addresses are translated and protected, and endianness is applied, as though the current privilege
mode were set to MPP.

I think your updated code looks much better though 👍

masz-nordic

This change breaks our internal CI with VPR cores.
Blocking until we have time to investigate.

jimmyzhe · 2024-07-12T09:13:59Z

I found that earlier Andes CLIC (N22 core) also fail because MRET only updates mintstatus when mcause.interrupt=1.
However, for the recent Andes CLIC (N225, D23 core), MRET always update mintstatus regardless of mcause.interrupt, which matches the latest CLIC spec (5.7.6 and 5.9.7).

When an interrupt causes a reschedule, an ISR may exit (MRET) with mcause.interrupt=0, and the earlier Andes CLIC doesn't update mintstatus, leading to masked interrupts after MRET.

I am not sure whether other CLIC implementations have similar condition.
@soburi fyi.

github-actions · 2024-09-11T00:30:42Z

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

jimmyzhe · 2024-09-11T09:22:28Z

Rebased and updated for gd32vf103.

After I tested on longan_nano board, I found that the Nuclei ECLIC has the same behavior as the earlier Andes CLIC. Both require mcause.interrupt=1 to restore mintstatus when executing MRET.

In my opinion, this seems to be a SoC-specific behavior because this is not defined in the RISC-V CLIC spec, and the latest Andes CLIC doesn't have this limitation.

This limitation can be handled by SOC-specific context (CONFIG_RISCV_SOC_CONTEXT_SAVE) to set the mcause.interrupt to ensure the interrupt level is restored correctly after MRET.

I tested these changes on longan_nano with the following test cases:

tests/kernel/common/kernel.common
tests/kernel/interrupt/arch.interrupt
tests/kernel/gen_isr_table/arch.interrupt.gen_isr_table.riscv_direct
tests/kernel/gen_isr_table/arch.interrupt.gen_isr_table.riscv_no_direct

jimmyzhe · 2024-09-18T02:07:24Z

@masz-nordic, is there any updated information about the VPR cores?

masz-nordic · 2024-09-19T08:53:00Z

Unfortunately no. Perhaps a way to unblock you would be to make those changes depend on !defined(CONFIG_RISCV_CORE_NORDIC_VPR).

ycsin · 2024-09-19T09:13:16Z

Perhaps a way to unblock you would be to make those changes depend on !defined(CONFIG_RISCV_CORE_NORDIC_VPR).

Or maybe have something like CLIC_SUPPORT_NESTED_INTERRUPT, and select that accordingly? If it is in the specs, then maybe select that in RISCV_HAS_CLIC if !NRFX_CLIC

marnold-b · 2024-09-19T10:51:31Z

Or maybe have something like CLIC_SUPPORT_NESTED_INTERRUPT, and select that accordingly?

+1 for this solution, that makes it possible to disable it in a downstream soc as well.

CLIC supports mintstatus.MIL (RO) and mcause.MPIL (RW) for the current interrupt level and the previous interrut level before a trap. Each ISR must execute MRET to set mcause.MPIL back to mintstatus.MIL. This commit introduces CONFIG_CLIC_SUPPORT_INTERRUPT_PREEMPTION to handle mcause.MPIL for interrupt preemption in nested ISR, and uses CONFIG_RISCV_ALWAYS_SWITCH_THROUGH_ECALL to ensure ISR always switch out with MRET. e.g. With CONFIG_RISCV_ALWAYS_SWITCH_THROUGH_ECALL=n, a context-switch in ISR may skip MRET in this flow: IRQ -> _isr_wrapper -> z_riscv_switch() -> retrun to arch_switch() Signed-off-by: Jimmy Zheng <jimmyzhe@andestech.com>

For Nuclei ECLIC, the interrupt level (mintstatus.MIL) is restored from the previous interrupt level (mcause.MPIL) only if mcause.interrupt is set. This behavior is not defined in the RISC-V CLIC spec. If an ISR causes a context switch and mcause.interrupt is not set in the next context (e.g. the next context is yielded from ecall), interrupts will be masked after MRET because the interrupt level is not restored. Use SOC-specific context to set mcause.interrupt to ensure the interrupt level is restored correctly. Signed-off-by: Jimmy Zheng <jimmyzhe@andestech.com>

jimmyzhe · 2024-09-23T09:26:17Z

@ycsin, thank you for your suggestion, the updated commit introduces CLIC_SUPPORT_INTERRUPT_PREEMPTION.

I used "interrupt preemption" because the VPR core may support nested ISR with IRQ offload (e.g. test_nested_irq_offload in tests/kernel/common), and I think 'interrupt preemption' is more appropriate for interrupt level handling , as described in the CLIC spec.

@masz-nordic, the VPR core is filtered by CLIC_SUPPORT_INTERRUPT_PREEMPTION. If this causes confusion about VPR core's hardware capability, the Kconfig name can be changed.

zephyrbot added area: Architectures area: RISCV RISCV Architecture (32-bit & 64-bit) labels Jul 8, 2024

zephyrbot requested review from carlocaione, dcpleung, edersondisouza, fkokosinski, katsuster, kgugala, mgielda, nashif, npitre, tgorochowik and ycsin July 8, 2024 10:21

zephyrbot assigned fkokosinski, kgugala and tgorochowik Jul 8, 2024

carlescufi requested a review from masz-nordic July 8, 2024 10:24

jimmyzhe force-pushed the fixed_riscv_clic branch from a39bcbd to a1c7d8e Compare July 8, 2024 15:22

zephyrbot added the platform: nRF Nordic nRFx label Jul 8, 2024

zephyrbot requested review from anangl, jaz1-nordic and kl-cruz July 8, 2024 15:23

tovine reviewed Jul 9, 2024

View reviewed changes

jimmyzhe force-pushed the fixed_riscv_clic branch from a1c7d8e to e23074a Compare July 11, 2024 02:44

npitre previously approved these changes Jul 11, 2024

View reviewed changes

tovine approved these changes Jul 11, 2024

View reviewed changes

masz-nordic requested changes Jul 11, 2024

View reviewed changes

github-actions bot added the Stale label Sep 11, 2024

jimmyzhe dismissed npitre’s stale review via db1652e September 11, 2024 08:56

jimmyzhe force-pushed the fixed_riscv_clic branch from e23074a to db1652e Compare September 11, 2024 08:56

zephyrbot added the platform: GD32 GigaDevice label Sep 11, 2024

github-actions bot removed the Stale label Sep 12, 2024

ycsin requested a review from masz-nordic September 18, 2024 03:45

jimmyzhe added 2 commits September 23, 2024 13:23

jimmyzhe force-pushed the fixed_riscv_clic branch from db1652e to 8221aac Compare September 23, 2024 07:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arch: riscv: handle interrupt level for CLIC #75581

arch: riscv: handle interrupt level for CLIC #75581

jimmyzhe commented Jul 8, 2024

tovine left a comment

tovine Jul 9, 2024

jimmyzhe Jul 10, 2024

tovine Jul 10, 2024 •

edited

Loading

jimmyzhe Jul 10, 2024

tovine Jul 10, 2024

jimmyzhe Jul 11, 2024

jimmyzhe Jul 11, 2024

tovine Jul 11, 2024

masz-nordic left a comment

jimmyzhe commented Jul 12, 2024

github-actions bot commented Sep 11, 2024

jimmyzhe commented Sep 11, 2024

jimmyzhe commented Sep 18, 2024

masz-nordic commented Sep 19, 2024

ycsin commented Sep 19, 2024 •

edited

Loading

marnold-b commented Sep 19, 2024

jimmyzhe commented Sep 23, 2024

arch: riscv: handle interrupt level for CLIC #75581

Are you sure you want to change the base?

arch: riscv: handle interrupt level for CLIC #75581

Conversation

jimmyzhe commented Jul 8, 2024

tovine left a comment

Choose a reason for hiding this comment

tovine Jul 9, 2024

Choose a reason for hiding this comment

jimmyzhe Jul 10, 2024

Choose a reason for hiding this comment

tovine Jul 10, 2024 • edited Loading

Choose a reason for hiding this comment

jimmyzhe Jul 10, 2024

Choose a reason for hiding this comment

tovine Jul 10, 2024

Choose a reason for hiding this comment

jimmyzhe Jul 11, 2024

Choose a reason for hiding this comment

jimmyzhe Jul 11, 2024

Choose a reason for hiding this comment

tovine Jul 11, 2024

Choose a reason for hiding this comment

masz-nordic left a comment

Choose a reason for hiding this comment

jimmyzhe commented Jul 12, 2024

github-actions bot commented Sep 11, 2024

jimmyzhe commented Sep 11, 2024

jimmyzhe commented Sep 18, 2024

masz-nordic commented Sep 19, 2024

ycsin commented Sep 19, 2024 • edited Loading

marnold-b commented Sep 19, 2024

jimmyzhe commented Sep 23, 2024

tovine Jul 10, 2024 •

edited

Loading

ycsin commented Sep 19, 2024 •

edited

Loading