-
Notifications
You must be signed in to change notification settings - Fork 7
/
chap-rationale.tex
1553 lines (1349 loc) · 81.4 KB
/
chap-rationale.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{Detailed Design Rationale}
\label{chap:rationale}
During the design of CHERI that began in 2010, we considered many
different capability architectures and design approaches. This chapter
describes various design choices; it briefly outlines some
possible alternatives, and provides rationales for the selected
choices.
\section{High-Level Design Approach: Capabilities as Pointers}
Our goals of providing fine-grained memory protection and compartmentalization
led to an early design choice to allow capabilities to be used as C- and
C++-language pointers.
This rapidly led to various conclusions:
\begin{itemize}
\item Capabilities exist within virtual address spaces, imposing an ordering in
which capability protections are evaluated before virtual-memory
protections; this in turn had implications for the hardware composition of
the capability coprocessor and conventional interactions with the MMU.
\item Capability pointers can be treated by the compiler in much the same way
as integer pointers, meaning that they will be loaded, manipulated,
dereferenced, and stored via registers and to/from general-purpose memory
only by explicit instructions.
These instructions were modeled on similar conventional RISC instructions.
\item Incremental deployment within programs meant that not all pointers would
immediately be converted from integers to capabilities, implying that both
forms might coexist in the same virtual memory;
also, there was a strong desire to embed capabilities
within data structures, rather than store them in separate segments,
which in turn required fine-granularity tagging.
\item Incremental deployment and compatibility with the UNIX model implied
the need to retain
the general-purpose memory management unit (MMU) more or less as
it then existed, including support for variable page sizes, page table layout,
and so on.
\end{itemize}
\section{Tagged Memory for Non-Probablistic Protection}
\label{sec:probablistic_capability_protection}
Introducing tagged memory has the potential to impose a substantial adoption
cost for CHERI, due to greater microarchitectural disruption.
We have demonstrated that there are efficient implementations of memory
tagging, even without integrated tag support within
DRAM~\cite{joannou2017:tagged-memory, UCAM-CL-TR-936}, but even so there is a significant
concern as to whether potential adopters will perceive the hurdle of adopting
tagged memory as outweighing the benefits that tagged memory brings.
In this section, we consider the benefits of tagging, as well as how
cryptographic non-tagged approaches might be used.
Tagging offers a number of significant potential benefits:
\begin{itemize}
\item Tags are a deterministic (non-probabilistic) means of protecting the
integrity and provenance validity of pointers in memory.
Probabilistic schemes, such as cryptographic hashes, are exposed both to
direct brute forcing (especially due to limited bit investment within
pointers) and also reinjection if leaked to attackers.
\item Tags offer strong atomicity properties that are also well-aligned with
current microarchitecture (e.g., in caches), avoiding the need for
substantial disruption close to the processor.
\item Tags have highly efficient microarchitectural implementations, including
being directly embedded in tagged DRAM (an option likely to become increasingly
available due to the widespread adoption of
error-correcting codes, and also via tag
controllers and tag caches that are affine to the DRAM controller.
These may be substantially more performance- and energy-efficient than
cryptographic techniques that would require hashes to be calculated or checked.
\item Tags offer strong C-language compatibility, which has been demonstrated
with significant software corpuses -- including operating-system kernels
(FreeBSD), the complete UNIX userspace (FreeBSD), and significant C and
C++-language applications (the Postgres database, OpenSSH client and server,
and WebKit web-rendering framework).
Key areas of incompatibility include the need to explicitly preserve tags
during memory copies via capability-sized, capability-aligned loads and
stores, and stronger alignment requirements for pointers.
The operating system must also support maintaining tags in virtual memory,
including across operations such as swapping, memory compression, and
virtual-machine migration.
In general, we have found that the modifications are modestly sized,
although some impacts (such as the cost of tag preservation and
restoration) are not yet fully quantified -- e.g., for memory compression.
\item Tags allow pointers to be deterministically identified in memory, a
foundation for strong temporal memory-safety techniques such as revocation
and garbage collection.
\item The choice between tag-preserving and tag-stripping memory copying
allows software to impose policies on when it is appropriate and safe for
pointers to move between protection domains.
For example, a kernel can selectively preserve tags in system-call
arguments,
preventing data copied into the kernel from an untrustworthy process from
being interpreted as a pointer within the kernel, or when received by
another process.
\end{itemize}
As an alternative to tagging, one could imagine making use of probabilistic
cryptographic hashing techniques that protect capabilities from corruption,
not unlike Cryptographic Control-Flow Integrity
(CCFI)~\cite{Mashtizadeh_CCFICryptographicallyEnforced_2015} or Arm's ARM v8.3 Pointer
Authentication Codes (PAC).
Some number of bits would be co-opted from either the virtual address (as is
the case in CCFI or PAC), or from the metadata portion of a CHERI capability to
hold a keyed hash, protecting the contents from corruption in memory or due to
mis-manipulation in a register, rather than a tag.
With additional capability metadata bits available, consumption of
virtual-address bits could be reduced.
Wherever the CHERI architecture requires a tag check, a cryptographic hash
check could instead be required architecturally.
Wherever the CHERI architecture maintains a tag during pointer manipulation,
the cryptographic hash could be updated.
While architectural behavior might appear to require frequent checks of, and
updates to, the hash (e.g., during loop iteration as a register is
successively incremented and then used for loads or stores), it is conceivable
that microarchitectural techniques (such as speculation) might both reduce the
delay associated with those updates, and perhaps also elide them entirely,
updating the hash only during write back.
Tags appear to offer the following essential advantages over cryptographic
approaches:
\begin{itemize}
\item Tags offer deterministic rather than probabilistic protection,
and require neither secrecy of a cryptographic key, nor brute-forcing resistance given a
bounded number of hash bits.
Depending on the OS model, cryptographic keys might also be shared by more
than one address space -- e.g., if \ccode{fork()} is frequently used to
generate multiple processes, or if there is a shared memory segment that
includes linked pointers.
\item Tags do not rely on cryptographic hash generation during capability
updates, nor checking during dereference.
These could otherwise lead to a performance overhead (e.g., as a result of
load-to-use or check-to-use delays), or energy-use overheads (due to
frequent cryptographic hash operations).
\item Tags prevent reinjection of leaked pointer values, even though the
bitwise pattern of the addressable memory contents remain identical.
Potential vulnerabilities with hash-based protection include leaking a valid
pointer value to a local or remote attacker via socket communications.
The attacker could later reinject that value -- potentially into a different
process if they share keying material (e.g., if they are forked from the
same parent).
\item Tags ensure provenance validity of capabilities, such that the TCB can
deterministically ensure that a pointer value is no longer in memory.
As with the previous item, this protects against reinjection, but has the
stronger inductive property that the TCB can reliably perform revocation or
garbage collection.
This is also essential to compartmentalization strength.
\end{itemize}
However, a hash-based approach also has several appealing properties when
compared to tags:
\begin{itemize}
\item Cryptographic hashes do not require the implementation of tagged memory,
which could reduce memory-subsystem complexity and DRAM-traffic impact.
\item Cryptographic hashes do not impose alignment requirements on
capabilities, which may improve compatibility.
\item Cryptographically protected capabilities can be copied in memory,
swapped to disk, or migrated in virtual-machine images, without special
support for tags.
This could entirely avoid the need for special capability load and store
instructions, although retaining them might assist with microarchitectural
optimization of hash use.
\end{itemize}
If hashed-based protection were viewed as a stepping stone to a full CHERI
implementation, substituting hashing for tags in an initial implementation,
there are several steps that could be taken to reduce the further disruption
associated with later tag adoption:
\begin{itemize}
\item Explicit capability load and store instructions would be maintained and
used in future capability-aware memory copying, etc.
\item Capability load and store instructions would require strong alignment
for values that would later be used for load and store, even though this is
not required with hashing.
\item Other non-tag-related capability properties, such as monotonicity, would
continue to be enforced via guarded manipulation.
\end{itemize}
However, substantially smaller benefit would arise prior to the introduction
of tags: capabilities would be able to provide capability-like spatial memory
protection, and probabilistic pointer integrity protection, but not the
non-probabilistic protection or enforcement of provenance validity required
for stronger policies such as preventing pointer reinjection, supporting
temporal memory safety through deterministic pointer identification in memory,
or enabling in-address-space compartmentalization that depends on those
properties.
\section{Capability Register File}
CHERI extends existing general-purpose integer registers to hold
capabilities. This design is similar to the manner in which the
32-bit x86 ISA was extended to support 64-bit registers. However,
this is not the only way to add CHERI capability registers to an architecture.
We initially used a separate register file for capability registers on CHERI-MIPS for a
few pragmatic reasons:
\begin{itemize}
\item Coprocessor interfaces on MIPS assume additional
register files (a la floating-point registers).
\item The initial 256-bit capability registers were quite large, and by giving the capability
coprocessor its own pipeline for manipulations, we could avoid enforcing a
256-bit-wide path through the main pipeline.
\item It is more obvious, given a coprocessor-based interface, how to provide
compatibility support in which the capability coprocessor is ``disabled,''
the default configuration in order to support unmodified MIPS compilers and
operating systems.
\end{itemize}
Early in our design cycle, capability registers were able to hold only true
capabilities (i.e., with tags); later, we weakened this requirement by adding
an explicit tag bit to each register, in order to improve support for
capability-oblivious code such as memory-copy routines able to copy data
structures consisting of both capabilities and ordinary data.
With the separate register file on CHERI-MIPS, we also added
instructions for copying non-capability data from a capability register
into a general-purpose integer register. A use case for this was when a function was called
with a parameter whose type is the union of a pointer and a non-pointer type,
such as an int. This parameter had to be passed in a capability register, because
the tag needed to be preserved when it held a capability. If the body of
the function accessed the non-capability branch of the union, it needed to
get the non-capability bits out of the capability register and into a general
purpose register. This was originally done by spilling the capability register to the
stack and then reading it back into a general-purpose integer register, but the register
to register copy of \insnnoref{CGetAddr} proved faster.
Another design variation might have specific capability registers
more tightly coupled with general-purpose integer registers -- an approach we discussed
extensively, especially when comparing with the bounds-checking literature,
which has explored techniques based on {\em sidecar registers} or associative
look-aside buffers.
Many of these approaches did not adopt tags as a means of strong integrity
protection (which we require for the compartmentalization model), which
would make associative techniques less suitable.
Further, we felt that the working-set properties of the two register files
might be quite different; effectively pinning the two to one another would
reduce the efficiency of both.
With register tags and 128-bit compressed capabilities, extending
existing general-purpose registers to support capabilities became a
feasible approach, as register size doubled rather than quadrupled.
This approach resulted in improved efficiency in implementations as
well as greater software compatibility. For example, in the case
described above for a function parameter with a union, the integer
branch of the union can be accessed by using the integer portion of
the relevant general-purpose register without requiring a separate
instruction. As a result, all of the current CHERI architectures
extend existing general-purpose registers to hold capabilities.
\section{The Compiler is Not Part of the TCB for Isolated Code}
CHERI is designed to support the isolation of arbitrary untrustworthy code,
including code compiled with an incorrect or compromised compiler.
The security argument outlined in
Chapter~\ref{chap:assurance} starts with the premise that the attacker is able to
run arbitrary machine-code. This approach has advantages for high-assurance systems:
compilers are often large and complex programs, and proving correctness of their
security mechanisms is easier if it does not depend on also proving the correctness
of the compiler. This approach also has the advantage that users are not restricted
by the security design to programming in just one programming language, and can use
any language for which a compiler has been written. In particular, it is a design
goal of CHERI that it be able to run legacy code written in C.
Some earlier capability machines, such as the Burroughs B5000, made the compiler
a privileged program. We have followed the alternative approach taken in capability machines
such as CAP, in which the compiler was not privileged.
\mrnote{We could expand on this, perhaps in the high-assurance section. We do depend
on the compiler being correct, in the sense that if the attacker has complete
control of the compiler, he can make the programs you've compiled with it do
whatever you want. The property we're looking for is more like: assuming the TCB
has been compiled with a correct compiler, we can allow untrusted users to compile
their code using whatever compiler they want, without fear that this will let them
break out of the sandbox. We probably do depend on the \emph{dynamic linker} being
correct -- this depends on how we load code into a sandbox.}
\section{Base and Length Versus Lower and Upper Bounds}
The CHERI architecture permits two different interpretations of capabilities:
as a virtual address paired with lower and upper bounds, and as a base,
length, and current offset.
These different interpretations support differing C-language models for
pointers.
The former, in which pointer-casts to integers return their virtual addresses, is more compatible with current software, but risks leaking those virtual
addresses (or their implications) out of tagged values where they cannot be
found for the purposes of pointer-transformation techniques such as copying
garbage collection.
The latter, in which pointer-casts to integers return their offsets, is less
compatible (as comparisons between pointers into different buffers may give
surprising equality results), but avoids leakage of virtual address out of
tagged values, enabling techniques such as copying garbage collection.
Over time, our thinking on these two approaches has shifted from aiming to
support copying garbage collection in C to one focused on revocation and
greater compatibility.
While some C source code naturally is extremely careful to avoid integer
interpretations of pointers, significant amounts of historic code, especially
systems code, cannot avoid this idiomatic use.
For example, run-time linkers and memory allocators both naturally consider
integer virtual addresses as part of their operation.
More subtly, techniques such as ordering locks for objects based on object
address, or sorting trees based on object address, make copying garbage
collection a difficult prospect.
\pgnnote{copying???}
Compressed capabilities further complicate this story, as a precise lower
bound may not be possible without padding; this is easy to arrange within
memory allocators for new allocations, but when subsetting an existing
allocation (e.g., to describe the bounds of an array embedded within another
structure), the 0 offset from the bottom of the embedded structure may not
carry over to being a 0 offset relative to the base address of a capability.
In recent versions of the CHERI C compiler (with the CHERI-LLVM
back-end), we have shifted to preferring a virtual-address
interpretation of pointers in all cases except those where specific
built-in functions are used to query the offset. We retain an
optional compiler mode utilizing an offset interpretation, which will
be suitable for future experimentation with copying garbage
collection.
\section{Signed and Unsigned Offsets}
In the CHERI instructions that take both a register offset and an immediate
offset, the register offset is treated as unsigned integer, whereas the
immediate offset is treated as a signed integer.
Register offsets are treated as unsigned, so that given a capability to
the entire address space (except for the very last byte, as
explained above), a register offset can be used to access any byte within it.
Signed register offsets would have the disadvantage that negative offsets
would fail the capability bounds check, and memory at offsets within the
capability greater than $2^{63}$ would not be accessible.
Immediate offsets, on the other hand, are signed, because the C compiler
often refers to items on the stack using the stack pointer as register
offset plus a negative immediate offset.
We have already encountered observable difficulty due to a reduced number of
bits available for immediate offsets in capability-relative memory operations
when dealing with larger stack-frame sizes; it is unclear what real
performance cost this might have (if any), but it does reemphasize the
importance of careful investment of how instruction bits are encoded.
\section{Address Computation Can Wrap Around}
If the target address of a load or store (base $+$ offset $+$ register offset
$+$ scaled immediate offset) is greater than \emph{max\_addr} or less than
zero, it wraps around modulo $2^{64}$. The load or store succeeds if this
modulo arithmetic address is within the bounds of the capability (and other
checks, such as for permissions, also succeed).
An alternative choice would have been for an overflow in the address computation
to cause the load or store to fail with a length-violation exception.
The approach of allowing the address to wrap around does not allow malicious
code to break out of a sandbox, because a bounds check is still performed on
the wrapped-around address.
However, there is a potential problem if a program uses an array offset that
comes from a potentially malicious source. For example, suppose that code for
parsing packet headers uses an offset within the packet to determine the
position of the next header. The threat is that an attacker can put in a
very large value for the offset, which will cause wrap-around, and result
in the program accessing memory that it is permitted to access, but was not
intended to be accessed at this point in the packet processing. This attack
is similar to the confused deputy attack. It can be defended against by
appropriate use of \insnref{CSetBounds}, or by using some explicit
range checks in application code in addition to the bounds checks that are
performed by the capability hardware.
\nwfnote{Maybe "Using \insnref{CSetBounds} to derive a capability
to just the array, and using this capability for offsetting, supplants any
explicit range checks in application code." This might also be a good place
to say something about Meltdown and Spectre (variant 1)? "By informing the
architecture of the intended bounds of access, even speculative use of a
capability can be precisely confined."}
The advantage of the approach that we have taken is that it fits more naturally
with C language semantics, and with optimizations that can occur inside compilers.
The following are equivalent in C:
\begin{itemize}
\item
a[x + y]
\item
*(a + x + y)
\item
(a + x)[y]
\item
(a + y)[x]
\end{itemize}
They would not be equivalent if they had different behavior on overflow, and
the C compiler would not be able to perform optimizations that relied on
this kind of reordering.
\section{Overwriting Capabilities in Memory}
In CHERI, if a valid in-memory capability is partly overwritten via an
untagged data store, then the tag associated with the in-memory capability
is cleared, making it an invalid capability that cannot be dereferenced.
Alternative designs would have been for the capability to be zeroed first
before being overwritten; or for the write to raise an exception (with
an explicit ``clear tag in memory'' operation for the case when a
program really intends to overwrite a capability with non-capability data).
The chosen approach is simpler to
implement in hardware. If store instructions needed to check the tag bit
of the memory location that was being written, then they would need a
read-modify-write cycle to memory, rather than just a write.
(However, once the memory system needs
to deal with cache coherence, a write is not that much simpler than a
read-modify-write.)
The CHERI behavior also has the advantage that programs can write to a
memory location (e.g., when spilling a register onto the stack) without
needing to worry about whether that location previously contained a
capability or non-capability data.
A potential disadvantage is that the contents of capabilities cannot be
kept secret from a program that uses them. A program can always discover
the contents of a capability by overwriting part of it, then reading the
result as non-capability data. In CHERI, there are
intentionally
other, more direct, ways
for a program to discover the contents of a capability it owns, and this
does not present a security vulnerability.
However, there are ABI concerns: we have tried to design the ISA in such a
way that software does not need to be aware of the in-memory layout of
capabilities. As it is necessarily exposed, there is a risk that software
might become dependent on a specific layout.
One noteworthy case is in the operating-system paging code, which must
save and restore capabilities and their tags separately.
This can be
accomplished by using instructions such as \insnref{CGetBase} on untagged
values loaded from disk and then refining an in-hand capability using
\insnref{CSetBounds}; however,
this requires a complex series of instructions.
\insnref{CBuildCap} can add a
tag to an untagged value in a capability-register operand authorized by a
second operand holding a suitably authorized capability.
This avoids software
awareness of the in-memory layout and accelerates tag restoration
when implementing system services such as swap.
This instruction in effect implements rederivation, which is also possible
using a sequence of individual instructions refining the authorizing
capabilities bounds, permissions, object type, and so on.
\insnref{CBuildCap} is not intended to change the set of reachable
capabilities.
\section{Reading Capabilities as Bytes}
In CHERI, if a non-capability data load instruction such as \insnnoref{LD} is used
on a memory location containing a capability, the internal representation
of the capability is read. An alternative architecture would have
such loads return zero, or raise an exception.
As noted above,
because the contents of capabilities are not secret, allowing them to be
read as raw data is not a security vulnerability.
\section{OTypes Are Not Secret}
Another consequence of the decision not to make the contents of capabilities secret
is that the \cotype{} field is not secret. It is possible to determine the
\cotype{} of a capability by reading it with \insnref{CGetType}, or by
reading the capability as bytes. If a program has two pairs of code and data
capabilities, ($c_1$, $d_1$) and ($c_2$, $d_2$) it can check if $c_1$ and $c_2$
have the same \cotype{} by invoking \insnref{CInvoke} on ($c_1$, $d_2$).
\jrtcnote{This is a weird thing to say; yes you implicitly check by not
trapping, but, uh, don't use it that way?}
As a result, a program can tell whether it has been passed an object of
\cotype{} O or an interposing object of \cotype{} I that forwards the
\insnref{CInvoke} on to an object of \cotype{} O (e.g. after having performed
some additional access control checks or auditing first).
\section{Capability Registers are Dynamically Tagged}
In CHERI, capability registers and memory locations have a tag bit
that indicates whether they hold a capability or non-capability data.
(An alternative architecture would give memory locations a tag bit,
where capability registers could contain only capabilities -- with
an exception raised if an attempt were made to load non-capability data into a
capability register with \insnref{CLC}.)
Giving capability registers and memory locations a tag bit
simplifies the implementation of \ccode{memcpy()}.
In CHERI, \ccode{memcpy()} copies
the tag bit as well as the data so that it can be used to copy structures
containing capabilities. As capability registers are dynamically tagged,
\ccode{memcpy()} can copy a structure by loading
its constituent words into capability
registers and storing them to memory, without needing to know at compile time
whether it is copying a capability or non-capability data.
Tag bits on capability registers may also be useful for dynamically typed
languages in which a parameter to a function can be (at run time) either a
capability or an integer. \ccode{memcpy()} can be regarded as
a function whose parameter (technically a \ccode{void *}) is
dynamically typed.
\section{Separate Permissions for Storing Capabilities and Data}
CHERI has separate permission bits for storing a capability versus storing
non-capability data (and similarly, for loading a capability versus loading
non-capability data).
(An alternative design would be just one \cappermL{} and just one
\cappermS{} permission that were used for both capabilities and non-capability data.)
The advantage of separate permission bits for capabilities is that
that there can be two protected subsystems that communicate via a memory
buffer to which they have \cappermL{} and \cappermS{} permissions, but
do not have \cappermLC{} or \cappermSC{}. Such
communicating subsystems cannot pass capabilities via the shared buffer, even
if they collude. (We realized that this was potentially a requirement when
trying to formally model the security guarantees provided by CHERI.)
\section{Capabilities Contain a Cursor}
In the C language, pointers can be both incremented and decremented.
C pointers are sometimes used as a cursor that points to the current working
element of an array, and is moved up and down as the computation progresses.
CHERI capabilities include an offset field, which gives the difference between
the base of the capability and the memory address that is currently of
interest. The offset can be both incremented and decremented without changing
\cbase{}, so that it can be used to implement C pointers.
In the ANSI C standard, the behavior is undefined if a pointer is incremented
more than {\it one} beyond the end of the object to which it points. However, we have found
that many existing C programs rely on being able to increment a pointer beyond
the end of an array, decrement it back within range, and then deference it.
In particular, network packet processing software often does this.
In order to support programs that do this, CHERI offsets are allowed to take
on any value.%
%
\footnote{CHERI Concentrate (\cref{subsec:cheri-concentrate}) exploits the
observation that, in practice, pointers do not wander ``far'' from their base
to reduce the number of bits used to store the base, cursor, and limit
addresses. Attempts to move the cursor far out of bounds will, instead, yield
an un-tagged result.}
%
A range check is performed when the capability is
dereferenced, so buffer overflows are prevented; thus, the offset can take
on intermediate out-of-range values as long as it is not dereferenced.
An alternative architecture would have not included an offset within the
capability. This could have been supported by two different capability types
in C, one that could not be decremented (but was represented by just a
capability) and one that supported decrementing (but was represented by a pair of
a capability and a separate integer for the offset). Programming languages
that did not have pointer arithmetic could have their pointers compiled as
just a capability.
The disadvantage of including offsets within capabilities is that it wastes
64 bits in each capability in cases where offsets are not needed (e.g.,
when compiling languages that don't have pointer arithmetic, or when
compiling C pointers that are statically known to never be decremented).
The alternative (no offset) architecture could have used those 64 bits
of the capability for other purposes, and stored an extra offset outside
the capability when it was known to be needed. The disadvantage of the
no-offset architecture is that C pointers become either unable to support
decrementing or enlarging: because capabilities need to be aligned, a pair of a
capability and an integer will usually end up
being padded to the size of two capabilities, doubling the size of a C pointer,
and this is a serious performance consideration.
Another disadvantage of the no-offset alternative is that it makes the
seal/unseal mechanism considerably more complicated and hard to explain.
A program that has a capability for a range of types has to somehow select
which type within its permitted range of types it wishes to use when sealing a
particular data capability. The CHERI architecture uses the offset for this
purpose; not having an offset field leads to more complex encodings when
creating sealed capabilities.
By comparison, the CCured language includes both \ccode{FSEQ} and
\ccode{SEQ} pointers. CHERI capabilities are analogous to CCured's
\ccode{SEQ} pointers. The alternative (no offset) architecture
would have capabilities that acted like CCured's FSEQ, and used an extra
offset when implementing SEQ semantics.
\jhbnote{This section seems relevant to the initial 256-bit
capabilities and no-longer relevant for compressed capabilities.
Perhaps it just needs to be explained as such rather than outright removed.}
\section{NULL Does Not Have the Tag Bit Set}
In some programming languages, pointer variables must always point to
a valid object. In C, pointers can either point to an object or be NULL;
by convention, NULL is the integer value zero cast to a pointer type.
If hardware capabilities are used to implement a language that has NULL
pointers, how is the NULL pointer represented? CHERI capabilities have
a \ctag{} bit; if the \ctag{} bit is set, a valid capability follows, otherwise
the remaining data can be interpreted as (for example) bytes or integers.
The representation we have chosen for NULL is that the \ctag{} bit is not set
and the \cbase{} and \clength{} fields are zero; effectively, NULL is the
integer zero stored as a non-capability value in a capability register.
An alternative representation we have could have chosen for NULL would
have been with the \ctag{} bit set, and zero in the \cbase{} field and
\clength{} fields. Effectively, NULL would have been a capability for
an array of length zero.
The advantages of NULL's \ctag{} bit being unset are:
\begin{itemize}
\item
Initializing a region of memory by writing zero bytes to it will initialize
all capability variables within the region to the NULL capability. Initializing
memory by writing zeros is, for example, done by the C \ccode{calloc()}
function, and by some operating systems.
\end{itemize}
\section{The length of NULL is MAXINT}
Given that we have chosen NULL to have its tag bit unset, it isn't semantically
meaningful to talk about its length, as NULL is not a reference to a region
of memory. But programs can still attempt to query the length of NULL, and
the questions arises as to which value is returned.
We have chosen the length of NULL to be $2^{64}-1$, as this simplifies the
implementation of compressed capabilities. To support the semantics of the
C language, the capability compression scheme must be able to represent
all $2^{64}$ possible values of \coffset{} when \ctag{} is set and \clength{}
is MAXINT. If we make the length of NULL be MAXINT, the compressed capability
format can use the same encoding regardless of whether \ctag{} is set or
not: NULL becomes a value whose \coffset{} is currently zero, but that can
be changed (with \insnref{CIncOffset}) to any integer value without
becoming unrepresentable.
Alternative design choices included:
\begin{itemize}
\item
Use a capability compression algorithm that also has the property that all
values of \coffset{} are representable when \clength{} is zero, and make
the length of NULL be zero. Versions of the CHERI ISA prior to V7 allowed the
length of NULL to be implementation-defined, and used a compression algorithm
that had this property, so the length of NULL could be zero. To enable the
use of compression algorithms that don't have this property, the V7 ISA
defines the length of NULL to be MAXINT.
\item
Use a different compression algorithm depending on whether \ctag{} is set
or not. This might make the hardware more complex, but there is no reason in
principle why valid capabilities (\ctag{} set) and integers packed into
capability registers (\ctag{} unset) should have to use the same compression
algorithm.
\end{itemize}
\section{Permission Bits Determine the Type of a Capability}
In CHERI, a capability's permission bits together with the \cotype{} field
determine what kind of capability it is. A capability for a region of memory
is unsealed (a \cotype{} of $2^{64}-1$) and has \emph{\cappermL{}} and/or \emph{\cappermS{}} set;
a capability for an object is sealed and has \emph{\cappermX{}}
unset; a capability to call a protected subsystem (a ``call gate'') is
sealed and has \emph{\cappermX{}} set; a capability that allows
the owner to create objects whose type identifier (\cotype{}) falls within
a range is unsealed and has \emph{\cappermSeal{}} set.
An alternative architecture would have included a separate
\emph{capability type} field, as well as the \cperms{} field, within each
capability; the meaning of the rest of the bits in the capability would have
been dependent on the value of the \emph{capability type} field.
A potential disadvantage of not having a \emph{capability type} field is that
different kinds of capability cannot use the remaining bits of the capability
in different ways.
A consequence of the architecture we have chosen is that it is possible for
software receiving the primordial, omnipotent capability to create capabilities
with arbitrary permissions. Some of these sets of permissions do not have a
clear use case; they just exist as a consequence of the representation chosen
for capabilities' permissions. (Other choices are possible; see
\cref{app:exp:compressperm} for a less-orthogonal representation.)
\mrnote{TO DO: Explain that capabilities with the Permit\_Seal capability
are really a different type of capability from memory capabilities, and
could in principle have used a different encoding to save bits. We don't
have a use case for a capability with both Permit\_Seal and read/write
permissions. If they were different types, you would need some mechanism to
obtain the initial sealing capability.}
\section{Object Types Are Not Addresses}
In CHERI, we make a distinction between the unique identifier for an
object type (the \cotype{} field) and the address of the executable code
that implements a method on the type (the \cbase{} $+$ \coffset{} fields
in a sealed executable capability).
An alternative architecture would have been to use the same fields for
both, and take the entry address of an object's methods as a convenient
unique identifier for the type itself.
The architecture we have chosen is conceptually simpler and easier to
explain. It has the disadvantage that the type field is constrained to
a limited number of bits, as there is insufficient space inside the
capability for more.
The alternative of treating the set of object type identifiers as being the
same as the set of memory addresses enables the saving of some bits within
a capability by using the same field for both.
It also simplifies
assigning type identifiers to protected subsystems: each subsystem can
use its start address as the unique identifier for the type it implements.
Subsystems that need to implement multiple types, or create new types
dynamically can be given a capability with the permission
\emph{Permit\_Set\_Type} set for a
range of memory addresses, and they are then able to use types within that
range. (The current CHERI ISA does not include the
\emph{Permit\_Set\_Type} permission;
it would be needed only for this alternative approach). This avoids the need
for some sort of privileged type manager that
creates new type identifiers; such a type manager is potentially a source
of covert channels. (Suppose that the type manager and allocated
type identifiers in numerically ascending order. A subsystem that asks the
type manager twice for a new type id and gets back $n$ and $n+1$ knows that no
other subsystem has asked for a new type id in between the two calls; this
could in principle be used for covert communication between two subsystems
that were supposed to be kept isolated by the capability mechanism.)
\section{Unseal is an Explicit Operation}
In CHERI, it requires an explicit operation to
convert an undereferenceable pointer to an object into a pointer that
allows the object's contents to be inspected or modified directly.
This can be done directly with the \insnref{CUnseal} operation,
or by using \insnref{CInvoke} to run the result of unsealing the first
argument on the result of unsealing the second argument.
An alternative architecture would have been one with ``implicit'' unsealing,
where a sealed capability could be dereferenced without
explicitly unsealing it first, provided that the subsystem attempting the
dereference had some kind of ambient authority that permitted it to deference
sealed capabilities of that type. This ambient authority could have taken
the form of a protection ring or the \cotype{} field of \PCC{}.
A disadvantage of an implicit unseal approach such as the one outlined above
is that it is potentially vulnerable to the ``confused deputy''
problem~\cite{Hardy1988}: the attacker calls a protected subsystem, passing
a sealed capability in a parameter that the called subsystem expects to be
unsealed. If unsealing is implicit, the protected subsystem can be tricked
by the attacker into using its privileges to read or write to memory to
which the attacker does not have access.
The disadvantage of the architecture we have chosen is that protected subsystems
need to be careful not to leak capabilities that they have unsealed, for example
by leaving them on the stack when they return to their caller. In an
architecture with ``implicit unseal'', protected subsystems would just need
to delete their ambient authority for the type before returning, and would
not need to explicitly clean up all the unsealed capabilities that they
had created.
\section{CMove is not Implemented as CIncOffset}
\insnref{CMove} is an independent instruction to move a capability value
from one register to another.
In conventional instruction-set design, integer \insnnoref{Move} is
frequently an assembler pseudo-operation that expands to an arithmetic
operation that does not modify the value (e.g., an add instruction with the
zero register as one operand).
In an earlier CHERI design, we similarly implemented \insnref{CMove} is an
assembler pseudo-operation that expanded to \insnref{CIncOffset} with an
offset of zero.
This required that the \insnref{CIncOffset} instruction treat a zero
offset as a special case, allowing it to be used to move sealed capabilities
and values with the tag bit unset.
Using a separate opcode for \insnref{CMove} has the disadvantage of
consuming another opcode, but avoids this special case in the definition of
\insnref{CIncOffset} in which an exception will not be thrown if a zero
operand is used.
We have therefore changed to specifying an explicit \insnref{CMove}
instruction, and removed special casing in \insnref{CIncOffset}.
\section{Instruction-Set Randomization}
CHERI does not include features for instruction set
randomization~\cite{Keromytis2003};
the unforgeability of capabilities in CHERI can be used as an alternative
method of providing control flow integrity.
However, instruction set randomization would be easy to add, as long as
there are enough spare bits available inside a capability (the 128 bit
representation of capabilities does not have many spare bits). Code
capabilities could contain a key to be used for instruction set
randomization, and capability branches such as \insnref{CJR} could
change the current ISR key to the value given in the capability that is
branched to.
\section{System Privilege Permission}
In the current version of the CHERI, one of the capability permission bits
authorizes access to privileged processor features that would allow
bypass of the capability model, if present on \PCC{}.
This is intended to be used by hybrid operating-system kernels to manage
virtual address spaces, exception handling, interrupts, and other necessary
architectural features that do not map cleanly into memory-oriented
capabilities.
It can also be used by stand-alone CHERI-based microkernels to control use
of the exception-handling and cache-management mechanisms, and of the MMU on
MMU-enabled hardware.
Although the permission limits use of features to control the virtual address
space (e.g., MMU special register manipulation), it does not prevent access to kernel-only
portions of the virtual address space.
This allows kernel code to operate without privileged permission using the
capability mechanism to limit which portions of kernel address space are
available for use in constrained compartments.
We employ a single permission bit to conserve space,
but also because it offers a coherent view on architectural
privilege: many of the privileged architectural instructions allow bypass of
in-address-space memory protection in different ways, and using subsets of
those operations safely would be quite difficult.
In earlier versions of the CHERI ISA, we employed multiple privileged bits,
but did not find the differentiation useful in practical software design.
In more feature-rich privileged instruction sets (e.g., those with
virtualization features), a more fine-grained decomposition might be of
greater utility, and could motivate a new capability format intended to
authorize use of privilege.
In earlier versions, the privileged permission(s) controlled use of only CHERI-specific
privileges (i.e., exception-handling capabilities); in the current version, the
bit controls all privileges available only in kernel mode including
MMU registers and exception return instructions.
This allows compartmentalization within the kernel address space (e.g., to
sandbox untrustworthy components), as well as more general mitigation by
limiting use of privileged features to only selected code components, jumped
to via code pointers carrying the privileged permission.
If virtual-memory and exception-handling features were not controlled by this
permission bit, use of those ISA features would allow bypass of in-kernel
compartmentalization.
Regardless of this bit, extreme care is required to safely compartmentalize
within an operating-system kernel.
In our design, absence of the privileged permission denies use of privileged
ISA features, but presence does not grant that right unless it is also
authorized by kernel mode.
Other compositions of the capability permission bit and existing
ring-based authorization are imaginable.
For example, the permission bit could grant privileged ISA use in userspace
regardless of ring.
While this composition might allow potentially interesting delegation of
privilege to user components, the lack of granularity of control appears to
offer little benefit when a similar effective delegation can be implemented
via the exception model and implied ring transition.
In a ring-free design (e.g., one without an MMU or kernel/supervisor/user
modes), however, the privileged permission would be the sole means of
authorizing privilege.
Another design choice is that we have not added new
capability-based privilege instructions; instead, we chose to limit use of
existing instructions (such as those used in MMU management).
This fails to extend the principle of intentional use to these privileged
features; in return we achieve reduced disruption to current software
stacks, and avoid introducing new instructions in the opcode space.
Despite that slight apparent shortcoming,
we observe that fine-grained privilege can still be accomplished --
due to use of a permission bit
%%%% NO buts, please
on \PCC{}: even within a highly privileged
kernel, most functions might operate without the ability to employ privileged
instructions, with an explicit use of \insnref{CJALR} to jump to a code
pointer with the \cappermASR{} permission enabled -- which
executes
only the necessary instructions and reduces the window of opportunity for
privilege misuse.
An alternative design would extend the privileged instruction set to
include versions that accept explicit capability operands authorizing use of
those instructions, in a manner similar to our extensions to our
capability-extended load and store instructions.
Another variation on this scheme would authorize setting of a privilege status
register, enabling specific instructions (or classes of instructions) based on
an offered capability, combining these two approaches to authorize selected
(but unmodified) privileged instructions.
Finally, it is conceivable that capabilities could be used to authorize
delegation of the right to use privileged instructions to userspace code,
rather than simply restricting the right to use privileged instructions in
kernel code.
We have opted to limit our approach to using capabilities to restrict features,
with a simple and deterministic composition of features.
\section{CInvoke: Jump-Based Domain Transition}
\label{sec:jump-based-domain-transition}
Earlier versions of the CHERI-MIPS ISA included an exception-based
mechanism for domain transition via a pair of \insnnoref{CCall}
and \insnnoref{CReturn} instructions. The use of exceptions
introduced both runtime overhead and implementation complexity in the
kernel. We replaced this mechanism with \insnref{CInvoke},
which provides jump-like semantics.
Non-monotonicity is accomplished by virtue of unsealing the sealed
operand capabilities to \insnref{CInvoke}.
%In both cases, destination code is controlled by the trusted computing base.
% What does this mean? CCalls1 jumps into an arbitrary sealed domain.
% Will it always be controlled by the trusted computing base?
% Maybe we're saying that it often is? Anyway, commented out for now.
It is possible to imagine more comprehensive jump-based instructions
including:
\begin{itemize}
\item A variation that has link-register semantics, saving the caller \PCC{}
in a manner similar to \insnref{CJALR}.
We choose not to implement this to avoid writing two general-purpose registers
in one instruction, and because the
caller can itself perform a move to a link destination based on
\insnref{AUIPCC}.
\item A variation that seals caller \PCC{} and \IDC{} to construct a
return-capability pair.
We choose not to implement this to avoid multiple register writes in one instruction,
and because the
caller can itself perform any necessary sealing of its own return state, if
required.
Further, to provide strict call-return semantics, additional more complex
behavior is required, which is not well captured by a single RISC
instruction.
\end{itemize}
In general, we anticipate that \insnref{CInvoke} will be used
to invoke trusted software routines. For situations involving
mutual distrust, \insnref{CInvoke} can be used to invoke
a trusted supervisor responsible for mediating messages and
requests between distrusting parties. The supervisor would be
responsible for clearing non-argument capability and general-purpose
integer registers and performing any additional checks.
The \insnref{CInvoke} trusted
routine can jump out of trusted code without
any special handling in the ISA, as it will conform to monotonic
semantics -- i.e., the clearing of registers that should not be passed to the
callee, followed by a \insnref{CJR} to transfer control to the callee.
\section{Compressed Capabilities}
\label{sec:rational:comressed}
In prior CHERI ISA versions, we specified a 256-bit capability
representation able to fully represent byte-granularity protection.
This allowed arbitrary subsets of the address space to be described, as well as
providing substantial
space for object types, software-defined permissions, and so on.
However, they come at a significant performance overhead: the size of 64-bit
pointers is quadrupled, increasing cache footprint and utilization of memory
bandwidth.
Fat-pointer compression techniques exploit information redundancy between the
base, pointer, and bounds to reduce the in-memory footprint of fat pointers,
reducing the precision of bounds -- with substantial space savings.
We now specify only compressed capabilities, whether 64-bit capabilities for
32-bit architectural addresses, or 128-bit capabilities for 64-bit
architectural addresses.
Prior versions of our compression approaches, the CHERI-128 candidates, are
described in \cref{app:cheri-128}.
\subsection{Semantic Goals for Compressed Capabilities}
Our target for compressed capabilities was 128 bits: the next natural
power-of-two pointer size above 64-bit pointers, with an expected one-third of
the overhead of the full 256-bit scheme.
A key design goal was to allow both 128-bit and 256-bit capabilities to be
used with the same instruction set, permitting us to maintain and evaluate
both approaches side-by-side.
To this end, and in keeping with previously published schemes, the CHERI ISA
continues to access fields such as permissions, pointer, base, and bounds via
64-bit general-purpose integer registers.
The only visible semantic changes between 256-bit and 128-bit operation should
be these:
the in-memory footprint when a capability register is loaded or stored,
the density of tags (doubled when the size of a capability is halved),
potential imprecision effects when adjusting bounds, potential loss of tag if
a pointer goes (substantially) out of bounds, a reduced number of permission
bits, a reduced object type space, and (should software inspect it) a change
in the in-memory format.
The scheme described in our specification is the result of substantial
iteration through designs attempting to find a set of semantics that support
both off-the-shelf C-language use, as well as providing strong protection.
Existing pointer-compression schemes generally provided suitable monotonicity
(pointer manipulation cannot lead to an expansion of bounds) and a completely
accurate underlying pointer, allowing base and bounds to experience