-
Notifications
You must be signed in to change notification settings - Fork 234
Abstractions for Reasoning about Memory in F* Low*
Low* is a DSL in F* that allows to build verified C-like programs. It provides a low-level memory model with support for pointers, arrays, stack and heap allocation, deallocation, and arbitrary aliasing patterns. It has a straightforward metatheory built by relating its memory model to that of CompCert Clight.
Low* has been used successfully to verify a variety of non-trivial programs. The main verification task in Low* involves reasoning about memory. For this, Low* provides libraries that structure specifications in the style of Kassios' dynamic frames (aka modifies clauses). This style of specification has a number of benefits: it is general, can handle arbitrary pattern of aliasing, and it is embedded within first-order logic so encodable to an SMT solver.
Nevertheless, reasoning about memory in Low* remains challenging, incurring a significant specification and proof overhead in stating and maintaining anti-aliasing invariants. We find that it scales poorly in two respects:
Scaling problem 1: reasoning about the shape and contents of memory is intermingled with reasoning about other concerns (e.g., arithmetic), and the interactions can cause proofs to slow down significantly.
Scaling problem 2: when programming with a large number of mutable locations, specifications, and proofs, it often becomes quadratic to reason about locations pairwise.
A widespread approach to reasoning about memory (and an alternative to dynamic frames) is separation logic (SL). SL simplifies reasoning about memory, particularly in the common case where accessing mutually disjoint memory locations. In such a setting, SL specifications and proofs can closely resemble reasoning about functional programs that manipulate an n-tuple of values, largely ignoring the fact that these values are actually maintained within mutable memory cells.
Structuring proofs using separation logic stands to significantly simplify the verification of many (most?) Low* programs, since most programs use relatively simple aliasing patterns. However, building verification conditions in separation logic in a manner suitable for encoding to an SMT solver remains an active area of research.
Prior attempts at encoding separation logic in F*/Low* involved the following elements:
-
A library that defined the separating conjunction ( * ) of SL as an assertion on top of F*'s existing memory models.
-
Specifying programs using * instead of the underlying, lower-level memory model concepts
-
Building verification conditions as usual, but pre-processing the VC with tactics to solve for quantifiers using AC-rewriting of the *-connective, before feeding this VC to the SMT solver.
While this approach is general and may still be promising, we have yet to make it work at scale. Part of the difficulty is perhaps due to the need to run tactics efficiently over monolithic large VCs that mix a variety of concerns, memory being only one part. In other words, our prior attempts have tackled "Scaling problem 2", but "Scaling problem 1" remains a significant hurdle. But even then, our prior attempts have in most part relied solely on tactics to write and verify such programs, not providing the programmer a convenient manual fallback option, e.g., to make an explicit call to the framing combinator.
Meanwhile, we have been exploring another approach that aims to provide similar SL-like capabilities to reasoning about memory, but through a very different (actually, increasingly more related) mechanism, that aims to also address "Scaling problem 1", i.e., we are aiming to factor reasoning about memory footprints out of the VC completely.
The main idea: RST a res pre post
We provide an indexed effect as a structured way to specify the footprint and an invariant of a computation, together with a high-level view of the relevant portion of the memory.
This contrasts with Low*'s ST a pre post
effect, where read footprints
are specified using liveness assertions in pre, and write footprints
are specified using modifies
assertions in post
...
The index res : resource
on the RST effect describes:
-
the set
fp
of memory locations that a computation may read or write -
an invariant
inv
on memory that depends on at mostfp
locations -
a type
t
of high-level views of memories satisfyinginv
-
a function
sel
computing a view for a given memory satisfyinginv
, which (similarly toinv
) depends on at mostfp
locations
We can then straightforwardly define (i) an empty resource (with fp
empty, inv
taken to be True
, and view type unit
), and (ii) a
separating conjunction of resources res1 <*> res2
(with fp
the
union of footprints, inv
taken to be the pointwise conjunction of
the individual invariants together with an additional footprint
disjointness condition, and sel
computing the pair of the two views).
For example, consider two styles of specification for a computation that swaps the contents of two pointers.
- Classic Low*
val swap (x y: pointer a)
: Stack unit
(requires fun h ->
live h x /\
live h y /\
disjoint x y)
(ensures fun h0 _ h1 ->
live h1 x /\
live h1 y /\
h1.[x] == h0.[y] /\
h1.[y] == h0.[x])
- RST
val swap (x y: pointer a)
: RST unit (ptr_resource x <*> ptr_resource y)
(requires fun _ -> True)
(ensures fun h0 _ h1 ->
sel h0 (ptr_view x) == sel h1 (ptr_view y) /\
sel h0 (ptr_view y) == sel h1 (ptr_view x))
Some main benefits of this style:
-
It factors out specification and reasoning about memory footprints from the rest of the spec. (addressing problem 1, i.e., memory reasoning is no longer intermingled with the rest)
-
It internalises reasoning about disjointness/liveness, and maybe even some footprint-local invariants into the resource, freeing the rest of the program from confronting them. (addressing problem 2, providing a form of separation logic)
-
It provides a user-defined (more) high-level view of the memory on top of the underlying monolithic memory model to write specifications.
The RST a res pre post
effect differs from the previous iteration of
these ideas (the LensST a l pre post
effect) as follows:
-
The resource
res
does not contain an abstract set of handles (to name the memory locations infp
) any more. The motivation for this change is two-fold: on the one hand, this information is already naturally encapsulated in the definition ofsel
. But more importantly, asRST
is just a shallow embedding, it does not seem feasible to automatically generate and manage such handles, e.g., compared to what one can do about them in a deeply embedded setting, such as Vale. -
The resource contains only a (high-level) view of the memory instead of a full-blown lens (with both view and update components). This change was motivated by the style of specs we are already writing, i.e., the
sel-modifies
style, which naturally does not include uses of memory updates. In fact, with this newRST
effect, we will no more make use ofmodifies
clauses in user-written specifications, instead making use of equality of views of frames for that purpose. -
The pre- and postconditions in
RST a res pre post
are on entire memories (restricted to satisfy theinv
ofres
), instead of being on the view typet
. This change is related to not associating any abstract set of handles with a resource. While the pre- and postconditions are defined on the entire memory, we expect that the style of specification one writes will be given in terms of the views of (abstract) resources (see the type of theswap
function above). Also, having access to the entire memory allows us to easily define the framing combinator in terms of the view of the frame (below).
Compositionality and framing
(This was previously stated as an outstanding challenge in the LensST
iteration.)
In order to (i) define the framing combinator, and (ii) relate
abstract resources to resources describing their internal
components (e.g., relating a 2D point with two disjoint pointers),
we define a general notion of (constructive) resource inclusion
as a type let r_includes (outer inner:resource) = delta:resource{...}
of resource-deltas between outer
and inner
, where the refinement
requires delta
to truly be a delta (in terms of fp
and inv
).
We can then straightforwardly give instances of
r_includes (res1 <*> res2) res1
and
r_includes (res2 <*> res1) res1
, both given by res2
.
But this notion of resource inclusion can also be used to pack
and unpack abstract resources, e.g.,
r_includes (ptr_resource x <*> ptr_resource y) (point_resource (mk_point x y))
,
that packs a pair of pointers into one abstract 2D point, and vice versa.
The plan is to infer these deltas (at least of the first kind) using
tactics and meta-programming, and then use them in the applications of the
framing combinator. Note that the generality of this notion of
resource inclusion means that we ought to be instantiate our existing
commutative monoid tactics to split a term built out of resources
and <*>
s into two resources, the footprint and the frame, and
them simply treat the frame part as the needed resource-delta.
The framing combinator itself can then be defined as follows:
let frame #a #outer #inner (delta:r_includes outer inner) #pre #post
(f:unit -> RST a inner pre post)
: RST a outer pre (fun h0 x h1 -> post h0 x h1 /\
sel (view_of delta) h0 == sel (view_of delta) h1)
= f ()
where we use the view of the frame to state that the frame
remains unchanged during the evaluation of f
, as opposed
to using the modifies clause. For this to
be possible, it is crucial that we store an entire resource
as the delta, rather than, say, just a delta-footprint.
As a first stepping stone, we would like to infer the delta
argument using tactics, and then at a later date perhaps add
custom support to F* to insert frame
s automatically.
As noted above, compared to our prior separation logic attempts,
where framing was built into the specifications of actions and we
solely relied on tactics for discharging the generated VCs,
this new RST
style allows programmers to call the frame
combinator explicitly in case it or the delta
cannot be inferred,
or if that inference has not yet been implemented (like now).
Abstraction
(This was also stated as an outstanding challenge in the LensST
iteration.)
As the initial experiments show, working with abstract resources
fits neatly into the RST a res pre post
setting. But more needs
to be done, e.g., moving from the toy example of 2D points to
linked lists and other more complex datastructures.
Allocation
The RST a res pre post
setting also naturally accommodates
allocation (of, say, pointers). However, as a result of there
being only one resource index on the effect, it only supports
scoped allocation. For more general allocation, see below.
The type of scoped allocation of pointers looks as follows:
val with_new_ptr (#res:resource)
(#a:Type)
(init:a)
(#b:Type)
(#pre:with_new_ptr_pre res)
(#post:with_new_ptr_post res b)
(f:(ptr:B.pointer a -> RST b (res <*> (ptr_resource ptr))
(fun h -> pre h /\ sel (ptr_view ptr) h == init)
post))
: RST b res pre post
where <with_new_ptr_pre res
and with_new_ptr_post res b
are types of refined pre- and postconditions that are required to be
suitably stable under pointer allocation (pushing and popping frames).
Code snippets
The current state of the RST a res pre post
can be found in the danel_lowstar_lens
branch in:
-
views and resources (LowStar.Resource)
-
RST
effect and framing (LowStar.RST) -
pointer resource (reading, writing, scoped allocation) (LowStar.RST.Pointer)
-
swapping two pointers (Swap)
-
2D points (Point, Point.Test)
-
2D points with equal coordinates (EqPoint,EqPoint.Test)
While our initial experiences with RST a res pre post
are
positive, there are still many places which we can improve.
For instance, coming back to the 2D point example, let us
restrict ourselves to points that also carry an additional
invariant relating the values of the two coordinates,
e.g., saying that they have to be equal, keeping the point
on the diagonal. In this case, we cannot give an instance
of the resource inclusion
r_includes (eq_point_resource p) (ptr_resource p.x <*> ptr_resource p.y)
because we are forced to select the empty resource as the only
delta candidate (delta
can only have the empty footprint),
and as a result we no longer can prove that the outer
resource's invariant is equivalent to the conjunction of the
inner resources' invariants.
We can work around this problem by (i) defining a weaker form of
resource inclusion (invariants do not have to be equivalent
but instead the outer invariant just has to imply the
conjunction of the inner invariants), and (ii) by defining
a variant of frame
that is restricted to postconditions
that are sufficient to re-establish the outer invariant
at the end of frame
. In that sense, this variant of
frame
is a little bit similar to the use of snapshots
in our monotonic state work to temporarily escape the preorder.
However, while we can work around the above problem, we
are still left facing other problems, e.g., how to give an instance
of r_includes (ptr_resource p.x <*> ptr_resource p.y) (eq_point_resource p)
,
and others alike, to use framing for packing-unpacking resources.
We believe running into such problems is a direct consequence
of RST
being indexed by a single resource, leading to various
combinators having to be defined in a scoped manner.
Instead, we see as a possible next step on from
RST a res pre post
an effect indexed by three resources
RST a (consumes res1) (returns res2) (provides res3) pre post
where res1
are the resources the caller has to hold,
res2
are the resources that are returned back to
the caller, and res3
are the fresh resources generated
by the given computation. Underneath we would also have
invariants such as that res2
is included in res1
,
and that res3
is disjoint from res1
.
In particular, we expect that this 3-index approach will allow us to implement both non-scoped allocation, and packing-unpacking invariants not possible with a single resource index. In particular, we would turn packing and unpacking abstract resources from being expressed as resource inclusion (as in the above 1-index style) to effectful computations consuming-returning-providing resources. Of course we would also keep the notion of resource inclusion, so that we can still define framing.