CSetBoundsRoundDown #74

nwf · 2024-10-03T19:49:42Z

See #72.

This copies "the TLS stack buffer trick" into the base RTOS for broader use. The implementation can be replaced with [CSetBoundsRoundDown](CHERIoT-Platform/cheriot-sail#74) if and when that lands in the ISA.

kliuMsft · 2024-11-11T02:25:37Z

@rmn30 can this be accomplished by making smaller changes to setCapBounds, for example by removing the T=T+1 part of the following code?

https://github.com/CHERIoT-Platform/cheriot-sail/blob/64b2563e2ffc19d6bfb5a9e97c47a2b7a9207cf8/src/cheri_cap_common.sail#L456C2-L463C1

If we implement this instruction using the same way it is defined in PR, the new length has to be computed before the setCapBounds logic. This would impact both critical path timing and area.

rmn30 · 2024-11-11T09:58:19Z

I think it might take a little more than just eliminating the T increment (because we also want to avoid rounding base down) but I think we should be able to come up with an implementation. It may even be a bit simpler than existing CSetBounds.

nwf · 2024-11-15T02:52:09Z

So, musing aloud... given CSetBoundsRoundDown cd, cs, rs, the resulting (decoded) cd.base (and cd.address) is cs.address and cd.top is min(cs.address + rs, [some expression involving the mantissa width and cs.address]). Does it follow that we can set the (encoded) cd.E to be min(ctz(cs.address), ctz(cs.address + rs)), to ensure that the least significant 1 in either cd.base or cd.top is the minimum bit of the (shifted, encoded) cd.T and cd.B fields? Is that then enough to give us fast computation of cd.B (extract mantissa width from cs.address at cd.E shift) and cd.T (cd.B + rs >> cd.E, if that doesn't overflow the mantissa width, or cd.B + (1 << mantissa_width) - 1 if it does)? We'd still need to check that cd.T is within bounds of cs, I think?

kliuMsft · 2024-11-15T04:44:05Z

Not quite sure I am following - wouldn't we want to compute the exponent of length (rs2) first and round the length down?

rmn30 · 2024-11-15T13:10:27Z

I think we could use 23 - clz(len) to calculate the preferred exponent, e_l, as per the existing CSetBounds. If this allows us to represent base exactly then I think we can use it as is (although we shouldn't increment T by one in case of inexact top). We can compare e_l to e_b = ctz(base) to work this out: if e_l <= e_b we are good. Otherwise we should use e_b and ~~check whether we can represent the requested range (modulo possibly inexact top) or whether we should~~ return a maxlen cap for e_b.

I've not thought this through entirely and would need to do formal checks once we have it in Sail.

Edit: definitely needs more thought. base can be made more than e_b aligned by using a value of B with trailing zeros, so this may be suboptimal. I'm also worried we could end up generating non-canonical encodings (that wouldn't be generated by existing CSetBounds) which could confuse matters.

After more thought: Since e_l is the smallest exponent that can represent the requested length if we have to use the smaller e_b to align the base we know we have to return a max length cap. The only other thing we have to deal with if is e_l is 24 (max e) and 14 < e_b < 24 : in this case we return a maxlen cap with e=14. If we adopted the optimised bounds encoding in #45 we wouldn't need that special case.

rmn30 · 2024-11-16T17:10:02Z

Attempt at Sail for above: 823e75b

nwf · 2024-11-17T00:04:41Z

That (comment and 823e75b) looks sensible to me and corrects a bug in my original attempt (I'd missed the 14 < e_b < 24 case).

kliuMsft · 2024-11-18T02:44:33Z

Ok this looks good. I am still mulling ways to merge this with the existing setCapBounds logic to save some area (a little tricky there since setCapBounds is also the critical timing path). But even if that doesn't work out it may not be too bad (maybe adding additional 2% of area or so?).

kliuMsft · 2024-11-18T05:05:55Z

@rmn30 also want to confirm - looks like the inCapBounds check is the still the same, i.e., compare the "requested" top vs the cs1.top and the new base (cs1.address) vs cs1.base?

rmn30 · 2024-11-18T10:25:57Z

@rmn30 also want to confirm - looks like the inCapBounds check is the still the same, i.e., compare the "requested" top vs the cs1.top and the new base (cs1.address) vs cs1.base?

Yes, this should never return a length that is greater than the requested length so the existing check works fine. Would like to have a proof of this, though.

rmn30 · 2024-11-18T10:40:43Z

I am still mulling ways to merge this with the existing setCapBounds logic to save some area (a little tricky there since setCapBounds is also the critical timing path)

Could you look at the last commit in this branch that combines this with some encoding changes? I think this simplifies setCapBounds (both versions) as well as making the encoding more efficient.

kliuMsft · 2024-11-18T21:15:55Z

Hmm I can see the good things but also a bit nervous about changing encoding at this stage. The previous commit is incremental so it is less risk (worst case we just don't use the new instruction) and relatively easy to verify by adding new tests/properties. On the other hand, an encoding change impacts the behavior of existing instructions and requires more extensive changes in verification test cases and formal checks. Also the area/timing impacts (even though they might be positive) need to be evaluated as well. So, I would say the better way is to decouple those two and just get the csetboundsrounddown to work here. After this round of changes we can spend sometime on evaluating the encoding change and update the DV infrastructure.

kliuMsft · 2024-11-18T23:43:34Z

Also one potential issue about the proposed encoding is that it may take a little longer to calculate top correction.. Currently we only do T<B and A < B comparison. With this change we need to figure out T[8] based on E first.. May not be a huge deal but it is on the memory critical path (since we calculate the correction right after the cap bits are loaded from the memory, before it is written into the register file (corrections are part of the register file).

rmn30 · 2024-11-19T12:12:35Z

Yeah, the encoding change is potentially risky and not necessary for this change but I wanted to see how they would work together. Would be good to chat with you about it as there a few options to potentially help with timing.

worst case we just don't use the new instruction

Actually, the worst case is we accidentally introduce non-monotonic capability manipulation... I think it's unlikely but would like to do some proof to check. We could have a chicken bit just in case but hopefully unnecessary.

rmn30

LGTM if @kliuMsft is happy.

src/cheri_cap_common.sail

kliuMsft · 2024-11-19T18:48:39Z

Looks good to me. We can experiment more with encoding after this. Chicken bit sounds a good idea but we need to figure out a way to control the chicken bit - perhaps just use a memory-mapped register input (the same way we use to control the revoker)?

nwf · 2024-11-20T00:27:12Z

I've written some sail $properties (that turned out to be wrong; nevertheless...) for CSetBoundRoundsDown and now I believe that CSetBoundsRoundDown needs to be "clever" around (or at least aware of) the gap in representable lengths between 8M (the maximum with e = 14) and 16M (the minimum with e = 24).

nwf · 2024-11-20T04:39:00Z

OK, here (222b1fb, and in particular 7805dcb, to be squashed if correct) is an attempt at fixing the "rounding down saturated exponents might round to zero" issue.

The three SMT properties seem like they might be useful ones. prop_csbrd_nonzero and prop_csbrd_exact check quickly, and prop_csbrd_brief is still running, but it's getting late here.

src/cheri_insts.sail

src/cheri_cap_common.sail

src/cheri_insts.sail

properties/props_setboundsrounddown.sail

src/cheri_cap_common.sail

FIXES #72 Co-authored-by: Nathaniel Wesley Filardo <wes.filardo@scisemi.com>

Co-authored-by: Robert Norton <robert.norton@microsoft.com>

nwf force-pushed the 202410-nwf-csetboundsrounddown branch 2 times, most recently from 0d2c6b6 to 8ad5dfc Compare October 4, 2024 16:33

This was referenced Oct 6, 2024

tls.cc precisely_bound_buffer representation fix CHERIoT-Platform/network-stack#37

Closed

Add a Capability::bounds().set_inexact_at_most() method CHERIoT-Platform/cheriot-rtos#310

Merged

This was referenced Nov 6, 2024

Future small tweaks to switcher? CHERIoT-Platform/cheriot-rtos#334

Open

WIP: various tweaks to, and a pile of documentation for, the switcher and exception handler CHERIoT-Platform/cheriot-rtos#320

Merged

nwf force-pushed the 202410-nwf-csetboundsrounddown branch from 8ad5dfc to 716ad2c Compare November 19, 2024 15:23

rmn30 approved these changes Nov 19, 2024

View reviewed changes

src/cheri_cap_common.sail Show resolved Hide resolved

nwf force-pushed the 202410-nwf-csetboundsrounddown branch from 716ad2c to db662a3 Compare November 19, 2024 18:23

kliuMsft approved these changes Nov 19, 2024

View reviewed changes

nwf force-pushed the 202410-nwf-csetboundsrounddown branch from db662a3 to 3990ca5 Compare November 20, 2024 04:35

nwf requested review from kliuMsft and rmn30 November 20, 2024 04:36

nwf force-pushed the 202410-nwf-csetboundsrounddown branch from 3990ca5 to 222b1fb Compare November 20, 2024 04:38

nwf force-pushed the 202410-nwf-csetboundsrounddown branch 2 times, most recently from 1962218 to e9ff9ff Compare November 20, 2024 16:10

rmn30 reviewed Nov 20, 2024

View reviewed changes

src/cheri_insts.sail Outdated Show resolved Hide resolved

nwf force-pushed the 202410-nwf-csetboundsrounddown branch 4 times, most recently from 8b1e714 to f0f8758 Compare November 21, 2024 03:29