Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[top_darjeeling] Memory scrambling and ECC | undetected faults #17661

Open
johannheyszl opened this issue Mar 23, 2023 · 5 comments
Open

[top_darjeeling] Memory scrambling and ECC | undetected faults #17661

johannheyszl opened this issue Mar 23, 2023 · 5 comments
Labels
Component:Darjeeling Component:RTL Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones Hotlist:Security Security Opinion Needed IP:otbn IP:rv_core_ibex Priority:P1 Priority: high triaged-security

Comments

@johannheyszl
Copy link
Contributor

desc

Initially discussed in issue #10976 and Security WG 2022-12-08 and 2022-03-03. Please see issue (bottom) for summary.
Conclusion:

  • A change for integrated OT is highly recommended since single bit flips unscramble to multi-bit flips which are partly not detected and lead to silent data corruption which is not acceptable in the integrated setting (high-availability / server systems with one bit faults possible).

Current proposals on how to change for integrated:

  • Switch order of ECC and scrambling so that ECC closer to storage. Similarly: Have memory ECC as the last function before storage including scrambling and potential bus integrity ECC.
  • Remove permutation layer from scrambling to avoid single-to-multi-bit passing ECC issues.
  • Add a single parity bit to the memory that is checked before the de-scrambling to cover single-bit memory corruption.
  • Don’t feed the the ECC bits through the scrambling logic and instead separately compute how the ECC bits need to change for the scrambled/de-scrambled data. This allows checking the ECC right after reading out from memory AND again close to the CPU. The challenge is the non-linear S-Box layer in the scrambling (this is similar to a masking countermeasures against SCA). Not feeding the ECC bits through the scrambler can save some area as the scrambling becomes simpler (no longer applied to an odd number of bits).
  • Note: Reliability concerns can partly be addressed by regularly flushing out caches using a fence.i instruction, reprogramming OTBN, and potentially hashing segments of RAM.

cc @neeraj-rv @vogelpi @GregAC @moidx @bilgiday @cdgori @msfschaffner @tjaychen

@msfschaffner
Copy link
Contributor

A few points to consider when selecting one of the options above:

  • The decision to pass ECC through scrambling (instead of scramble first, then ECC) has been taken to facilitate the end-to-end transmission integrity scheme. If we translate / recompute ECC we introduce a potential weak point. There are ways to protect against that (e.g. make sure there is temporal overlap between bus ECC and recomputed ECC for the SRAM).
  • The diffusion layer has been added because the counter mode scrambling is vulnerable to single bit tampering attacks on the memory side (it just XOR's a keystream on top of the plaintext).
  • Adding a single parity bit on the SRAM side would probably be the most straightforward change in terms of implementation effort. The question is whether this, together with ECC+scrambling has a high enough detection rate. Some simulations as proposed in [OTBN/SRAM/Icache] Investigate interaction of scrambling and bus integrity #10976 would probably be helpful to assess this.

@msfschaffner msfschaffner added the Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones label Oct 7, 2023
@msfschaffner msfschaffner added Hotlist:Security Security Opinion Needed Earlgrey-PROD Triaged Temporary label to triage issues into Earlgrey-PROD Milestones and removed Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones labels Nov 3, 2023
@johannheyszl johannheyszl added triaged-security Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones and removed Earlgrey-PROD Triaged Temporary label to triage issues into Earlgrey-PROD Milestones labels Dec 12, 2023
@johannheyszl
Copy link
Contributor Author

An RFC under consideration for Top Earl Grey is relevant for Darjeeling #20788

@GregAC
Copy link
Contributor

GregAC commented Mar 12, 2024

@johannheyszl can this be closed now #20890 has been merged?

@msfschaffner
Copy link
Contributor

This is an issue for Darjeeling, and we should probably leave this open until it is fully resolved for integrated top-levels. #20890 is only part of the solution, we still have to discuss how to change the SRAM memory controllers so that ECC errors can be tolerated and reported.

@johannheyszl
Copy link
Contributor Author

@msfschaffner is correct. The Earl Grey solution does not fully cover the needs in Darjeeling. Notes can be found in the Earl Grey RFC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:Darjeeling Component:RTL Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones Hotlist:Security Security Opinion Needed IP:otbn IP:rv_core_ibex Priority:P1 Priority: high triaged-security
Projects
None yet
Development

No branches or pull requests

3 participants