Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch: riscv: handle interrupt level for CLIC #75581

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jimmyzhe
Copy link
Collaborator

@jimmyzhe jimmyzhe commented Jul 8, 2024

RISC-V CLIC supports mintstatus.MIL (RO) and mcause.MPIL (RW) for interrupt levels.
When an interrupt happends, mintstatus.MIL (current interrupt level) is set to mcause.MPIL (previous interrupt level). Each ISR must execute MRET to restore mcause.MPIL back to mintstatus.MIL.

To handle nested interrupts with CLIC interrupt levels, save and restore mcause.MPIL in ISRs. Use RISCV_ALWAYS_SWITCH_THROUGH_ECALL to ensure ISRs do not exit without executing MRET.

Copy link

@tovine tovine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When an interrupt happends, mintstatus.MIL (current interrupt level) is set to mcause.MPIL (previous interrupt level). Each ISR must execute MRET to restore mcause.MPIL back to mintstatus.MIL.

Let me see if my understanding is correct; this is the sequence you mean?

When an interrupt happens:
mcause.mpil = mintstatus.mil
And on mret:
mintstatus.mil = mcause.mpil

or t0, t0, t1
csrw mcause, t0
#endif /* CONFIG_RISCV_HAS_CLIC */

/* Restore MEPC and MSTATUS registers */
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for all this masking instead of just restoring the saved mcause value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I mean.

According to CLIC doc, when CLIC mode is enabled, mcause has new fields mcause.mpp and mcause.mpie that mirror mstatus.mpp and mstatus.mpie.
Restoring the entire mcause here pollutes the previous mstatus settings.

Copy link

@tovine tovine Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But mstatus is also restored later (line -689/+707), so it doesn't really matter if you "pollute" it?

Besides, if you end up in a situation where the thread context of mcause and mstatus are inconsistent then I'd say you have bigger problems, as the thread state would essentially be corrupt

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mcause.mpp and mcause.mpie fields are always overwritten by the later storing to mstatus. The mirror fields in mstatus and mcause always remain the same in the stack frame because they are restored together.

However, the RISC-V PMP stack guard and userspace mechanism are based on mstatus.mpp. I am still figuring out if it is safe when mcause.mpp is restored before access to the stack (line -704/+705).

	lr t0, __struct_arch_esf_mepc_OFFSET(sp)
	lr t2, __struct_arch_esf_mstatus_OFFSET(sp)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the RISC-V PMP stack guard and userspace mechanism are based on mstatus.mpp

Are you sure this is correct? Sounds a bit strange that they would depends on the previous privilege mode (mpp) instead of the current one.
I don't think mstatus.mpp would take any effect until after executing mret to get back to whatever mode you were in...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood the userspace. Only the PMP stack guard is based on mstatus.mpp.
According to The RISC-V Instruction Set Manual: Volume II 3.1.6.3. Memory Privilege in mstatus Register, we can use mstatus.mprv and mstatus.mpp to perform load/store operations with MMU translation or PMP protection in M-mode.
The PMP stack guard has used this to detect thread and interrupt stack overflow since PR #44651.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the restoring for mcause. It now saves and restores the entire mcause and avoids storing to memory between restoring mcause and mstatus.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, I hadn't seen this one before - sorry about that, you were right about MPP being used in some cases. 🙂

When MPRV=1, load and store memory
addresses are translated and protected, and endianness is applied, as though the current privilege
mode were set to MPP.

I think your updated code looks much better though 👍

npitre
npitre previously approved these changes Jul 11, 2024
Copy link
Collaborator

@masz-nordic masz-nordic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks our internal CI with VPR cores.
Blocking until we have time to investigate.

@jimmyzhe
Copy link
Collaborator Author

I found that earlier Andes CLIC (N22 core) also fail because MRET only updates mintstatus when mcause.interrupt=1.
However, for the recent Andes CLIC (N225, D23 core), MRET always update mintstatus regardless of mcause.interrupt, which matches the latest CLIC spec (5.7.6 and 5.9.7).

When an interrupt causes a reschedule, an ISR may exit (MRET) with mcause.interrupt=0, and the earlier Andes CLIC doesn't update mintstatus, leading to masked interrupts after MRET.

I am not sure whether other CLIC implementations have similar condition.
@soburi fyi.

Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Sep 11, 2024
@jimmyzhe
Copy link
Collaborator Author

Rebased and updated for gd32vf103.

After I tested on longan_nano board, I found that the Nuclei ECLIC has the same behavior as the earlier Andes CLIC. Both require mcause.interrupt=1 to restore mintstatus when executing MRET.

In my opinion, this seems to be a SoC-specific behavior because this is not defined in the RISC-V CLIC spec, and the latest Andes CLIC doesn't have this limitation.

This limitation can be handled by SOC-specific context (CONFIG_RISCV_SOC_CONTEXT_SAVE) to set the mcause.interrupt to ensure the interrupt level is restored correctly after MRET.

I tested these changes on longan_nano with the following test cases:

  • tests/kernel/common/kernel.common
  • tests/kernel/interrupt/arch.interrupt
  • tests/kernel/gen_isr_table/arch.interrupt.gen_isr_table.riscv_direct
  • tests/kernel/gen_isr_table/arch.interrupt.gen_isr_table.riscv_no_direct

@github-actions github-actions bot removed the Stale label Sep 12, 2024
@jimmyzhe
Copy link
Collaborator Author

@masz-nordic, is there any updated information about the VPR cores?

@masz-nordic
Copy link
Collaborator

Unfortunately no. Perhaps a way to unblock you would be to make those changes depend on !defined(CONFIG_RISCV_CORE_NORDIC_VPR).

@ycsin
Copy link
Member

ycsin commented Sep 19, 2024

Perhaps a way to unblock you would be to make those changes depend on !defined(CONFIG_RISCV_CORE_NORDIC_VPR).

Or maybe have something like CLIC_SUPPORT_NESTED_INTERRUPT, and select that accordingly? If it is in the specs, then maybe select that in RISCV_HAS_CLIC if !NRFX_CLIC

@marnold-b
Copy link
Contributor

Or maybe have something like CLIC_SUPPORT_NESTED_INTERRUPT, and select that accordingly?

+1 for this solution, that makes it possible to disable it in a downstream soc as well.

CLIC supports mintstatus.MIL (RO) and mcause.MPIL (RW) for the current
interrupt level and the previous interrut level before a trap. Each ISR
must execute MRET to set mcause.MPIL back to mintstatus.MIL.

This commit introduces CONFIG_CLIC_SUPPORT_INTERRUPT_PREEMPTION to handle
mcause.MPIL for interrupt preemption in nested ISR, and uses
CONFIG_RISCV_ALWAYS_SWITCH_THROUGH_ECALL to ensure ISR always switch out
with MRET.

e.g.
  With CONFIG_RISCV_ALWAYS_SWITCH_THROUGH_ECALL=n, a context-switch in
  ISR may skip MRET in this flow:
  IRQ -> _isr_wrapper -> z_riscv_switch() -> retrun to arch_switch()

Signed-off-by: Jimmy Zheng <jimmyzhe@andestech.com>
For Nuclei ECLIC, the interrupt level (mintstatus.MIL) is restored from
the previous interrupt level (mcause.MPIL) only if mcause.interrupt is set.
This behavior is not defined in the RISC-V CLIC spec.
If an ISR causes a context switch and mcause.interrupt is not set in the
next context (e.g. the next context is yielded from ecall), interrupts will
be masked after MRET because the interrupt level is not restored.

Use SOC-specific context to set mcause.interrupt to ensure the interrupt
level is restored correctly.

Signed-off-by: Jimmy Zheng <jimmyzhe@andestech.com>
@jimmyzhe
Copy link
Collaborator Author

@ycsin, thank you for your suggestion, the updated commit introduces CLIC_SUPPORT_INTERRUPT_PREEMPTION.

I used "interrupt preemption" because the VPR core may support nested ISR with IRQ offload (e.g. test_nested_irq_offload in tests/kernel/common), and I think 'interrupt preemption' is more appropriate for interrupt level handling , as described in the CLIC spec.

@masz-nordic, the VPR core is filtered by CLIC_SUPPORT_INTERRUPT_PREEMPTION. If this causes confusion about VPR core's hardware capability, the Kconfig name can be changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Architectures area: RISCV RISCV Architecture (32-bit & 64-bit) platform: GD32 GigaDevice platform: nRF Nordic nRFx
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants