Add additional comments, windows compatibility, to crypto DSL (#275)

* fix: Enable frg -> rkt -> frg in crypto, stop asking for Glucose solver * minor: cleanup * minor: improve docs * add: README * minor: add comments, more reflection * minor: edit to readme
tnelson · Oct 1, 2024 · 43c3696 · 43c3696
1 parent 8987190
commit 43c3696
Show file tree

Hide file tree

Showing 6 changed files with 294 additions and 172 deletions.
diff --git a/forge/domains/crypto/README.md b/forge/domains/crypto/README.md
@@ -0,0 +1,115 @@
+# Crypto Domain Language
+
+This Forge language echoes the description in [Prototyping Formal Methods Tools: A Protocol Analysis Case Study](https://cs.brown.edu/~tbn/publications/ssdnk-fest21-forge.pdf) by Siegel, et al. We suggest using `#lang forge` to write your queries, importing the requisite [CPSA](https://github.com/mitre/cpsa) definitions from a similarly-named `#lang forge/domains/crypto` file. You can see an example of how this works in [`examples/reflect.rkt`](examples/reflect.rkt) (the CSPA definitions) and [`examples/reflect.frg`](examples/reflect.frg) (the queries to be run). 
+
+A reader interested in only the _domain model_ should skip to the final section of this page, which touches on lessons learned.
+
+## CPSA 
+
+This system accepts protocol definitions in CPSA's input language. As this effort was meant for teaching, we only support the "simple" term language at the moment, and some more advanced features may be elided. Concretely, there are two CPSA languages:
+* `defprotocol` expressions, which define the flow of a protocol execution from the perspective of its various participants; and 
+* `defskeleton` expressions, which define a specific point-of-view for the solver to focus on. 
+
+We briefly sketch the languages here, but the full scope of CPSA is beyond what we can explain in this document. For more information on CPSA, see the [user guide](https://hackage.haskell.org/package/cpsa-4.4.3/src/doc/cpsauser.html). 
+
+### Protocols
+
+A protocol definition consists of a name, an algebra to use, and a collection of roles. Each role comprises a set of variables and a _trace_, which describes the message sequence that a participant in this role expects. For example, the protocol below (taken from CPSA's [test suite](https://github.com/mitre/cpsa/blob/master/tst/reflect.scm)) describes a reflection of messages:
+
+```
+(defprotocol reflect basic
+  (defrole init
+    (vars (a b akey))
+    (trace
+     (send (enc b (invk a)))
+     (recv (enc a (invk b)))))
+  (defrole resp
+    (vars (a b akey))
+    (trace
+     (recv (enc b (invk a)))
+     (send (enc a (invk b))))))
+```
+
+From the initiator's perspective, they know two asymmetric keys, `a` and `b`. Then:
+  * They send key `b`, encrypted with `a`'s inverse key; 
+  * They receive `a`, encrypted with `b`'s inverse key.
+The responder also knows two asymmetric keys, which have the same names `a` and `b` but which may differ in value from the initiator's. Then:
+  * They receive `b` encrypted with `a`'s inverse key; 
+  * They send `a` encrypted with `b`'s inverse key. 
+
+There is, however, no guarantee that the messages these participants send and receive are not tampered with. 
+
+### Skeletons 
+
+A skeleton definition defines a point of view for a query. For example, this skeleton describes a situation where a complete execution of a responder is witnessed, and where the inverses of keys `a` and `b` are uniquely originating (i.e., are not guessed by some other principal). Crucially, the variables of a skeleton are not necessarily bound to the same values as the variables of any specific role in the protocol.
+
+```
+(defskeleton reflect
+  (vars (a b akey))
+  (defstrand resp 1 (a a) (b b))
+  (non-orig (invk a) (invk b)))
+```
+
+A query should involve only one skeleton. 
+
+## Crypto in Forge 
+
+Queries may be written in either the crypto language or in ordinary Relational Forge. We suggest the latter, as it is generally more concise to write specific queries in Forge rather than in the Racket library that the crypto language provides. Put another way, the domain-specificity of `#lang forge/domains/crypto` is particular to CPSA protocol and skeleton definitions; any additional constraints, bounds, etc. must be expressed separately. 
+
+### Model Predicates
+
+The following predicates are exposed to any `forge/domains/crypto` module, and to any `forge` module that imports a `forge/domains/crypto` module:
+  * `wellformed`: defines a set of domain constraints for crypto (should always be included);
+  * for each role type `r` in protocol `p`, a predicate `exec_p_r`, which asserts that there is some principle executing protocol `p` with role `r`; and
+  * for each skeleton for protocol `p`, a predicate `constrain_skeleton_p_i`, where `i` is a natural number (in the order that the skeleton definitions appear&mdash;CPSA skeletons don't have names in the same way protocols do, so the language just uses indexing).
+
+There are also a number of helper predicates made available:
+  * `getInv[k: akey]` accepts an asymmetric key and produces that key's inverse, if it exists; 
+  * `attacker_learns[s: strand, d: mesg]` accepts a strand and field name, and evaluates to true if and only if the attacker eventually learns the value of field `d` in strand `s`; ...
+
+Finally, there will be a `sig` assigned for each skeleton and role strand. These will have 
+appropriate fields for each variable declared. E.g., the initiator strand would be represented
+as `reflect_resp` (since the protocol name is `reflect` and the role name is `resp`). It would have fields `reflect_resp_a` and `reflect_resp_b` for the variables `a` and `b` declared for that role strand.
+
+After asserting `wellformed` and the pertinent role and skeleton predicates, the query can also contain additional constraints. For example, the reflection example we provide gives:
+
+```
+  // Enforce different principals (avoid scenario where one agent talks to itself) 
+  reflect_resp.agent != reflect_init.agent
+```
+
+### Bounds 
+
+Forge does _bounded_ model finding; that is, it renders the search problem decidable by limiting the scope of each type in the given search problem. In this case, that means Forge must bound the number of time slots, terms, keys, etc. 
+
+E.g., we might say that there are no more than `3` names in an execution of the reflection protocol: the initiator's name, the responder's name, and the attacker's name. (The attacker must be present, as the model follows Dolev-Yao is making them the medium by which principles communicate.)
+
+### Visualizing Executions 
+
+When Sterling opens, the solver will (perhaps after some delay) produce an instance. The default visualization will appear as a dense cluster of boxes and lines. To switch to the sequence diagram version, click on the 'Script' tab and paste in the contents of the [visualization script](./vis/crypto_viz.js). Then click 'Run'. 
+
+## Notes on the Model (Lessons Learned)
+
+The domain model is contained in `base.frg`. The biggest challenge we faced in the modeling was the propagation of knowledge. This is complicated by the existence of nested ciphertexts. For example, one may receive a series of terms:
+* `enc(k2, enc(k3, invk(k2)), invk(k1))`
+* `enc(x, invk(k3))`
+* `k1`
+at which point, `k1` can be used to unlock the outer layer of encryption, yielding:
+* `k2`
+* `enc(k3, invk(k2))`
+but then `k2` can be used to unlock the newly-learned term to yield:
+* `k3`
+which can finally be used to unlock the second term received to learn:
+* `x`. 
+
+All of this learning may be triggered by receiving a single term (in this case, `k1`), resulting in multiple decryption steps. When ciphertext terms can have unbounded nesting, this is challenging to model. 
+
+The problem is even more complex because cyclic justifications must be excluded. E.g., if an agent receives:
+* enc(x, inv(x))
+then knowing `x` would allow the term to be decrypted, yet decrypting the term allows the agent to learn `x`; a flawed model might allow the agent to learn `x` because, after all, it "knows the key `x`" in that time step. 
+
+Nevertheless, we wanted to explore this idea; it was, after all, originally an independent-study project! Thus, the crypto domain model in `base.frg` uses a micro-tick system to potentially unlock multiple nested ciphertext terms in one transition. _This has a negative impact on both performance and understandability._
+
+Yet, none of the examples from the paper were so complex! Our examples didn't target terms with more than 2 layers of nesting. It would have been far simpler to elide the micro-tick approach and just encode a single "chained" decryption as part of the constraint describing how knowledge is gained. In hindsight, that would have been a better choice for the _tool demo_; we did not need to reach for this level of generality. We preserve the original model both as a research artifact and, perhaps, an interesting essay in the art of modeling.
+
+
diff --git a/forge/domains/crypto/base.frg b/forge/domains/crypto/base.frg
@@ -1,48 +1,64 @@
 #lang forge
 
 /*
-  Base model of strand space (style) crypto
-  Tim Mia Abby
-  Opting to build in normal Forge, not Electrum
+  Base domain model of strand space style crypto (2021)
+    Abby Siegel
+    Mia Santomauro 
+    Tim Nelson 
+
+  We say "strand space style" above because this model approximates the strand-space 
+  formalism. See the "Prototyping Formal Methods Tools" paper for more information.
+
+  Design notes: 
+    - We opted to build this in Relational Forge, not Temporal Forge; at the time, 
+      Temporal Forge was very new and still being tested. 
+    - Forge has a somewhat more restricted syntax than Alloy. E.g., Forge doesn't 
+      have `facts` (which are always true); instead, predicates must be asserted. 
+    - CPSA has some idiosyncratic terminology, which we echo here somewhat. For 
+      example, the "strand" is not the same as the "agent" for that strand; it 
+      may be best to think of the agent as a knowledge database and the strand 
+      as the protocol role execution.
+    - This model embraces Dolev-Yao in a very concrete way: there is an explicit 
+      attacker, who is also the medium of communication between participants.
 */
 
-abstract sig mesg {} -- CPSA name for any term
+-- NOTE WELL: `mesg` is what CPSA calls terms; we echo that here, do not confuse 
+-- `mesg` with just messages being sent or received.
+abstract sig mesg {} 
 
 abstract sig Key extends mesg {}
-abstract sig akey extends Key {}
-sig skey extends Key {}
+abstract sig akey extends Key {} -- asymmetric key
+sig skey extends Key {}          -- symmetric key
 sig PrivateKey extends akey {}
 sig PublicKey extends akey {}
 
--- Cannot just write fun akey: set Key { PublicKey + PrivateKey }
---   since the macro expands variable declarations to field definitions,
---   and those need to be in terms of sigs
-
--- relation to match key pairs -- 
-
+-- Helper to hold relations that match key pairs
 one sig KeyPairs {
-  pairs: set PrivateKey -> PublicKey,
-  owners: set PrivateKey -> name,
-  ltks: set name -> name -> skey
+  pairs: set PrivateKey -> PublicKey, -- asymmetric key pairing
+  owners: set PrivateKey -> name,     -- who owns a key
+  ltks: set name -> name -> skey      -- symmetric long-term keys
 }
 
+/** Get a long-term key associated with a pair of agents */
 fun getLTK[name_a: name, name_b: name]: lone skey {
   (KeyPairs.ltks)[name_a][name_b]
 }
 
+/** Get the inverse key for a given key (if any) */
 fun getInv[k: Key]: one Key {
   (k in PublicKey => ((KeyPairs.pairs).k) else (k.(KeyPairs.pairs)))
   +
   (k in skey => k else none)
 }
 
 
--- t=0, t=1, ...
+-- Time indexes (t=0, t=1, ...). These are also used as micro-tick indexes, so the 
+-- bound on `Timeslot` will also affect how many microticks are available between ticks.
 sig Timeslot {
-  -- structure of time
+  -- structure of time (must be rendered linear in every run via `next is linear`)
   next: lone Timeslot,
 
-  -- <=1 actual "message" per timeslot
+  -- <=1 actual "message tuple" sent/received per timeslot
   sender: one strand,
   receiver: one strand,  
   data: set mesg,
@@ -52,7 +68,8 @@ sig Timeslot {
   workspace: set Timeslot -> mesg
 }
 
--- As names are sent messagest, they learn pieces of data --
+-- As names process received messages, they learn pieces of data
+-- (they may also generate new values on their own)
 sig name extends mesg {
   learned_times: set mesg -> Timeslot,
   generated_times: set mesg -> Timeslot
@@ -63,11 +80,12 @@ abstract sig strand {
   -- the name associated with this strand
   agent: one name
 }
-one sig AttackerStrand extends strand {}
 
+one sig AttackerStrand extends strand {}
 one sig Attacker extends name {}
 
 sig Ciphertext extends mesg {
+   -- encrypted with this key
    encryptionKey: one Key,
    -- result in concating plaintexts
    plaintext: set mesg
@@ -76,6 +94,7 @@ sig Ciphertext extends mesg {
 -- Non-name base value (e.g., nonces)
 sig text extends mesg {}
 
+/** The starting knowledge base for all agents */
 fun baseKnown[a: name]: set mesg {
     -- name knows all public keys
     PublicKey
@@ -90,6 +109,7 @@ fun baseKnown[a: name]: set mesg {
     a
 }
 
+/** This (large) predicate contains the vast majority of domain axioms */
 pred wellformed {
   -- Design choice: only one message event per timeslot;
   --   assume we have a shared notion of time
@@ -101,18 +121,20 @@ pred wellformed {
   all m: Timeslot | m.sender not in m.receiver
 
   -- workspace: workaround to avoid cyclic justification within just deconstructions
-  --  e.g., knowing or receiving enc(x, x)
   -- AGENT -> TICK -> MICRO-TICK LEARNED_SUBTERM
-  all d: mesg | all t: Timeslot | all microt: Timeslot | let a = t.receiver.agent | d in (workspace[t])[microt] iff {
-    -- received in the clear just now (base case)
+  all d: mesg | all t, microt: Timeslot | let a = t.receiver.agent | d in (workspace[t])[microt] iff {
+    -- Base case:
+    -- received the data in the clear just now 
     {d in t.data and no microt.~next}
     or
-    -- breaking down a ciphertext we learned *previously*, or that we've produced from something larger this timeslot
-    --     via a key we learned *previously*, or that we've produced from something larger in this timeslot
-    --  Note use of "previously" by subtracting the *R*TC is crucial in preventing cyclic justification.
-    -- the baseKnown function includes e.g. an agent's private key, otherwise "prior knowledge" is empty (even of their private key)
+    -- Inductive case:
+    -- breaking down a ciphertext we learned *previously*, or that we've produced from 
+    -- something larger this timeslot via a key we learned *previously*, or that we've 
+    -- produced from something larger in this timeslot Note use of "previously" by 
+    -- subtracting the *reflexive* transitive closure is crucial in preventing cyclic justification.
+    --   Note: the baseKnown function includes an agent's private key, otherwise "prior
+    --   knowledge" is empty (even of their private key!)
     { 
-
       --d not in ((a.workspace)[t])[Timeslot - microt.^next] and -- first time appearing
       {some superterm : Ciphertext | {      
       d in superterm.plaintext and     
@@ -122,20 +144,21 @@ pred wellformed {
   }
 
   -- names only learn information that associated strands are explicitly sent 
+  -- (start big disjunction for learned_times)
   all d: mesg | all t: Timeslot | all a: name | d->t in a.learned_times iff {
     -- they have not already learned this value
     {d not in (a.learned_times).(Timeslot - t.*next)} and 
 
-    -- This base-case is handled in the workspace now
-    -- They received a message directly containing d (may be a ciphertext)
+    -- This base-case is handled in the workspace now, hence commented out:
+    --   They received a message directly containing d (may be a ciphertext)
     { --{some m: Message | {d in m.data and t = m.sendTime and m.receiver.agent = a}}
     --or
 
     -- deconstruct encrypted term 
     -- constrain time to reception to avoid cyclic justification of knowledge. e.g.,
     --    "I know enc(other-agent's-private-key, pubk(me)) [from below via construct]"
     --    "I know other-agent's-private-key [from above via deconstruct]""
-    -- instead: separate the two temporally: deconstruct on recv, construct on non-reception?
+    -- instead: separate the two temporally: deconstruct on recv, construct on non-reception
     -- in that case, the cycle can't exist in the same timeslot
     -- might think to write an accessibleSubterms function as below, except:
     -- consider: (k1, enc(k2, enc(n1, invk(k2)), invk(k1)))
@@ -158,7 +181,7 @@ pred wellformed {
     or
     -- This was a value generated by the name in this timeslot
     {d in (a.generated_times).t}    
-    }} -- end big disjunction for learned_times 
+    }} -- (end big disjunction for learned_times)
 
   -- If you generate something, you do it once only
   all a: name | all d: text | lone t: Timeslot | d in (a.generated_times).t
@@ -169,9 +192,10 @@ pred wellformed {
   all m: Timeslot | m.sender = AttackerStrand or m.receiver = AttackerStrand 
 
   -- plaintext relation is acyclic  
-  --  NOTE WELL: if ever add another type of mesg that contains data, + inside ^.
+  --  NOTE WELL: if ever add another type of mesg that contains data, add with + inside ^.
   all d: mesg | d not in d.^(plaintext)
-
+
+  -- Disallow empty ciphertexts
   all c: Ciphertext | some c.plaintext
 
   (KeyPairs.pairs).PublicKey = PrivateKey -- total
@@ -185,9 +209,18 @@ pred wellformed {
   -- assume long-term keys are used for only one agent pair (or unused)
   all k: skey | lone (KeyPairs.ltks).k
 
-  -- Attacker's strand
+  -- The Attacker agent is represented by the attacker strand
   AttackerStrand.agent = Attacker
 
+  -- If one agent has a key, it is different from any other agent's key
+  all a1, a2: name | { 
+    (some KeyPairs.owners.a1 and a1 != a2) implies 
+      (KeyPairs.owners.a1 != KeyPairs.owners.a2)
+  }
+
+  -- private key ownership is unique 
+  all p: PrivateKey | one p.(KeyPairs.owners) 
+
   -- generation only of text and keys, not complex terms
   --  furthermore, only generate if unknown
   all n: name | {
@@ -201,13 +234,16 @@ pred wellformed {
   }
 }
 
+/** Definition of subterms for some set of terms */
 fun subterm[supers: set mesg]: set mesg {
   -- VITAL: if you add a new subterm relation, needs to be added here, too!
   supers +
   supers.^(plaintext) -- union on new subterm relations inside parens
 }
 
--- Note it's vital this definition is about strands, not names
+/** When does a strand 'originate' some term? 
+(Note: it's vital this definition is about strands, not names.)
+*/
 pred originates[s: strand, d: mesg] {
 
   -- unsigned term t originates on n in N iff
@@ -240,31 +276,7 @@ pred strand_agent_learns[learner: strand, s: strand, d: mesg] {
   s.d in (learner.agent).learned_times.Timeslot
 }
 
-pred temporary {
-  -- upper bounds for one sig have size > 1 at the moment; fix
-  one Attacker
-
-  -- for checking, debugging:
-  all a1, a2: name | { 
-    -- If one has a key, keys are different
-    (some KeyPairs.owners.a1 and a1 != a2) implies 
-      (KeyPairs.owners.a1 != KeyPairs.owners.a2)
-  }
-
-  all p: PrivateKey | one p.(KeyPairs.owners) 
-
-  -- The Attacker agent is represented by the attacker strand
-  agent.Attacker = AttackerStrand
-
-  -- there are some keys in the world 
-  #Key > 0
-
-  -- abstractness performance experiment
-  --all m : mesg | m in Key + name + Ciphertext + text
-  --all k : Key | k in skey + akey
-  --all ak : akey | ak in PrivateKey + PublicKey
-
-}
+/***************************************************************/
 
 ------------------------------------------------------
 -- Keeping notes on what didn't work in modeling;
@@ -276,9 +288,6 @@ pred temporary {
 --    needs knowledge to grow on the way through the tree, possibly sideways
 -- so this approach won't work
 /*fun accessibleSubterms[supers: set mesg, known: set mesg]: set mesg {
-  -- VITAL: ditto above 
-  -- plaintext: Ciphertext -> set mesg
-  -- set of Ciphertexts this agent can now open: 
   let openable = {c: Ciphertext | getInv[c.encryptionKey] in known} |
     supers + 
     supers.^(plaintext & (openable -> mesg))