title | filename | chapternum |
---|---|---|
Computational security |
lec_02_computational-security |
2 |
Additional reading: Sections 2.2 and 2.3 in Boneh-Shoup book. Chapter 3 up to and including Section 3.3 in Katz-Lindell book.
Recall our cast of characters- Alice and Bob want to communicate securely over a
channel that is monitored by the nosy Eve. In the last lecture, we have seen the
definition of perfect secrecy that guarantees that Eve cannot learn anything
about their communication beyond what she already knew. However, this security
came at a price. For every bit of communication, Alice and Bob have to exchange
in advance a bit of a secret key. In fact, the proof of this result gives rise
to the following simple Python program that can break every encryption
scheme that uses, say, a
from itertools import product # Import an iterator for cartesian products
from random import choice # choose random element of list
# Gets ciphertext as input and two potential plaintexts
# Returns most likely plaintext
# We assume we have access to the function Encrypt(key,plaintext)
def Distinguish(ciphertext,plaintext1,plaintext2):
for key in product([0,1], repeat = 128): # Iterate over all possible keys of length 128
if Encrypt(key, plaintext1)==ciphertext:
return plaintext1
if Encrypt(key, plaintext2)==ciphertext:
return plaintext2
return choice([plaintext1,plaintext2])
The program Distinguish
will break any Encrypt
, in the sense that there exist a pair of messages
Distinguish
$(
Now, generating, distributing, and protecting huge keys causes immense
logistical problems, which is why almost all encryption schemes used in practice
do in fact utilize short keys (e.g.,
So, why can't we use the above Python program to break all encryptions in the
Internet and win infamy and fortune? We can in fact, but we'll have to wait a
really long time, since the loop in Distinguish
will run
However, the fact that this particular program is not a feasible attack, does
not mean there does not exist a different attack. But this still suggests a
tantalizing possibility: if we consider a relaxed version of perfect secrecy that
restricts Eve to performing computations that can be done in this universe
(e.g., less than
This in fact does seem to be the case, but as we've seen, defining security is a subtle task, and will take some care. As before, the way we avoid (at least some of) the pitfalls of so many cryptosystems in history is that we insist on very precisely defining what it means for a scheme to be secure.
Let us defer the discussion how one defines a function being computable in "less
than
An encryption scheme
Note: It is important to keep track of what is known and unknown to the adversary Eve. The adversary knows the set
firstcompdef{.ref} seems very natural, but is in fact impossible to achieve if the key is shorter than the message.
Before reading further, you might want to stop and think if you can prove that there is no, say, encryption scheme with
The reason firstcompdef{.ref} can't be achieved is that if the message is even one bit
longer than the key, we can always have a very efficient procedure
that achieves success probability of about Distinguish
by choosing the key at random. Since we have some small chance of guessing correctly, we will get a small advantage over half.
Of course an advantage of
An encryption scheme
Having learned our lesson, let's try to see that this strategy does give us the kind of conditions we desired. In particular, let's verify that this definition implies the analogous condition to perfect secrecy.
If
Before proving this theorem note that it gives us a pretty strong guarantee. In
the exercises we will strengthen it even further showing that no matter what
prior information Eve had on the message before, she will never get any
non-negligible new information on it.^[The latter property is known as "semantic security", see also section 3.2.2 of Katz Lindell on "semantic security" and Section 2 of Boneh-Shoup "computational ciphers and semantic security".]
One way to phrase it is that if the sender used a
Before reading the proof, try to again review the proof of twotomanythm{.ref}, and see if you can generalize it yourself to the computational setting.
::: {.proof data-ref="twotomanycomp"} The proof is rather similar to the equivalence of guessing one of two messages vs. one of many messages for perfect secrecy (i.e., twotomanythm{.ref}). However, in the computational context we need to be careful in keeping track of Eve's running time. In the proof of twotomanythm{.ref} we showed that if there exists:
- A subset
$M\subseteq {{0,1}}^\ell$ of messages
and
-
An adversary
$Eve:{{0,1}}^o\rightarrow{{0,1}}^\ell$ such that$$ \Pr_{m{\leftarrow_{\tiny R}}M, k{\leftarrow_{\tiny R}}{{0,1}}^n}[ Eve(E_k(m))=m ] > 1/|M| $$
Then there exist two messages
To adapt this proof to the computational setting and complete the proof of the current theorem it suffices to show that:
-
If the probability of
$Eve$ succeeding was$\tfrac{1}{|M|} + \epsilon$ then the probability of$Eve'$ succeeding is at least$\tfrac{1}{2} + \epsilon/2$ . -
If
$Eve$ can be computed in$T$ operations, then$Eve'$ can be computed in$T + 100\ell + 100$ operations.
This will imply that if
The first item can be shown by simply doing the same proof more carefully,
keeping track how the advantage over
The second item is obtained by looking at the definition of
The proof of twotomanycomp{.ref} is a model to how a great many of the results in this course will look like. Generally we will have many theorems of the form:
"If there is a scheme
$S'$ satisfying security definition$X'$ then there is a scheme$S$ satisfying security definition$X$ "
In the context of twotomanycomp{.ref},
The way you show that if $S'$ is secure then $S$ is secure is by giving a transformation from an adversary that breaks $S$ into an adversary that breaks $S'$
For computational secrecy, we will always want that
-
Coming up with the strategy
$Eve'$ . -
Analyzing the probability of success and in particular showing that if
$Eve$ had non-negligible advantage then so will$Eve'$ .
Note that, just like in the context of NP completeness or uncomputability reductions, security reductions work backwards.
That is, we construct the scheme
For practical security, often every bit of security matters.
We want our keys to be as short as possible and our schemes to be as fast as possible while satisfying a particular level of security.
In practice we would usually like to ensure that when we use a smallish security parameter such as
-
The honest parties (the parties running the encryption and decryption algorithms) are extremely efficient, something like 100-1000 cycles per byte of data processed. In theory terms we would want them be using an
$O(n)$ or at worst$O(n^2)$ time algorithms with not-too-big hidden constants. -
We want to protect against adversaries (the parties trying to break the encryption) that have much vaster computational capabilities. A typical modern encryption is built so that using standard key sizes it can withstand the combined computational powers of all computers on earth for several decades. In theory terms we would want the time to break the scheme to be
$2^{\Omega(n)}$ (or if not, at least$2^{\Omega(\sqrt{n})}$ or$2^{\Omega(n^{1/3})}$ ) with not too small hidden constants.
For implementing cryptography in practice, the tradeoff between security and efficiency can be crucial. However, for understanding the principles behind cryptography, keeping track of concrete security can be a distraction, and so just like we do in algorithms courses, we will use asymptotic analysis (also known as big Oh notation) to sweep many of those details under the carpet.
To a first approximation, there will be only two types of running times we will encounter in this course:
-
Polynomial running time of the form
$d\cdot n^c$ for some constants$d,c>0$ (or$poly(n)=n^{O(1)}$ for short), which we will consider as efficient. -
Exponential running time of the form
$2^{d\cdot n^{\epsilon}}$ for some constants$d,\epsilon >0$ (or$2^{n^{\Omega(1)}}$ for short) which we will consider as infeasible.2
Another way to say it is that in this course, if a scheme has any security at all, it will have at least
These are not all the theoretically possible running times.
One can have intermediate functions such as
Negligible probabilities. In cryptography, we care not just about the running time of the adversary but also about their probability of success (which should be as small as possible).
If
"Scheme $S$ is secure if for every polynomial $p(\cdot)$ and $p(n)$ time adversary $Eve$, there is some negligible function $\mu$ such that the probability that $Eve$ succeeds in the security game for $S$ is at most $trivial + \mu(n)$"
We now make these notions more formal.
::: {.definition title="Negligible function" #negligibledef}
A function
The following exercise provides a good way to get some comfort with this definition:
::: {.exercise title="Negligible functions properties" #negligible}
-
Let
$\mu:\N \rightarrow [0,\infty)$ be a negligible function. Prove that for every polynomials$p,q:\R \rightarrow \R$ with non-negative coefficients such that$p(0) = 0$ , the function$\mu':\N \rightarrow [0,\infty)$ defined as$\mu'(n) = p(\mu(q(n)))$ is negligible. -
Let
$\mu:\N \rightarrow [0,\infty)$ . Prove that$\mu$ is negligible if and only if for every constant$c$ ,$\lim_{n \rightarrow \infty} n^c \mu(n) = 0$ . :::
::: {.remark title="Asymptotic analysis" #asymptotic}
The above definitions could be confusing if you haven't encountered asymptotic analysis before. Reading the beginning of Chapter 3 (pages 43-51) in the KL book, as well as the mathematical background lecture in my intro to TCS notes can be extremely useful. As a rule of thumb, if every time you see the word "polynomial" you imagine the function
What you need to remember is that negligible is much smaller than any inverse polynomial, while polynomials are closed under multiplication, and so we have the "equations"
and
As mentioned, in practice people really want to get as close as possible to
From now on, we will require all of our encryption schemes to be efficient
which means that the encryption and decryption algorithms should run in
polynomial time. Security will mean that any efficient adversary can make at
most a negligible gain in the probability of guessing the message over its a
priori probability.^[Note that there is a subtle issue here with the order of quantifiers. For a scheme to be efficient, the algorithms such as encryption and decryption need to run in some fixed polynomial time such as
We can now formally define computational secrecy in asymptotic terms:
An encryption scheme
One more detail that we've so far ignored is what does it mean exactly for a
function to be computable using at most
Uniform vs non-uniform models. While many computational texts focus on models such as Turing machines, in cryptography it is more convenient to use Boolean circuits which are a non uniform model of computation in the sense that we allow a different circuit for every given input length. The reasons are the following:
-
Circuits can express finite computation, while Turing machines only make sense for computing on arbitrarily large input lengths, and so we can make sense of notions such as "$t$ bits of computational security".
-
Circuits allow the notion of "hardwiring" whereby if we can compute a certain function
$F:{0,1}^{n+s} \rightarrow {0,1}^m$ using a circuit of$T$ gates and have a string$w \in {0,1}^s$ then we can compute the function$x \mapsto F(xw)$ using$T$ gates as well. This is useful in many cryptograhic proofs.
One can build the theory of cryptography using Turing machines as well, but it is more cumbersome.
::: {.remark title="Computing beyond functions" #computebeyondfunctions} Later on in the course, both our cryptographic schemes and the adversaries will extend beyond simple functions that map an input to an output, and we will consider interactive algorithms that exchange messages with one another. Such an algorithm can be implemented using circuits or Turing machines that take as input the prior state and the history of messages up to a certain point in the interaction, and output the next message in the interaction. The number of operations used in such a strategy is the total number of gates used in computing all the messages. :::
We are now ready to make our first conjecture:
The Cipher Conjecture:4 There exists a computationally secret encryption scheme
$(E,D)$ (where$E,D$ are efficient) with length function$\ell(n)=n+1$ .
A conjecture is a well defined mathematical statement which (1) we believe is true but (2) don't know yet how to prove. Proving the cipher conjecture will be a great achievement and would in particular settle the P vs NP question, which is arguably the fundamental question of computer science. That is, the following theorem is known:
If
We just sketch the proof, as this is not the focus of this course. If Distinguish
subroutine above that searches over all keys) then
this loop can be sped up exponentially .
While it is very widely believed that
There are several reasons to believe the cipher conjecture. We now briefly mention some of them:
-
Intuition: If the cipher conjecture is false then it means that for every possible cipher we can make the exponential time attack described above become efficient. It seems "too good to be true" in a similar way that the assumption that P=NP seems too good to be true.
-
Concrete candidates: As we will see in the next lecture, there are several concrete candidate ciphers using keys shorter than messages for which despite tons of effort, no one knows how to break them. Some of them are widely used and hence governments and other benign or not so benign organizations have every reason to invest huge resources in trying to break them. Despite that as far as we know (and we know a little more after Edward Snowden's revelations) there is no significant break known for the most popular ciphers. Moreover, there are other ciphers that can be based on canonical mathematical problems such as factoring large integers or decoding random linear codes that are immensely interesting in their own right, independently of their cryptographic applications.
-
Minimalism: Clearly if the cipher conjecture is false then we also don't have a secure encryption with a message, say, twice as long as the key. But it turns out the cipher conjecture is in fact necessary for essentially every cryptographic primitive, including not just private key and public key encryptions but also digital signatures, hash functions, pseudorandom generators, and more. That is, if the cipher conjecture is false then to a large extent cryptography does not exist, and so we essentially have to assume this conjecture if we want to do any kind of cryptography.
"Give me a place to stand, and I shall move the world" Archimedes, circa 250 BC
Every perfectly secure encryption scheme is clearly also
computationally secret, and so if we required a message of size
Moreover, this is just the beginning. There is a huge range of other useful cryptographic tools that we can obtain from this seemingly innocent conjecture: (We will see what all these names and some of these reductions mean later in the course.)
We will soon see the first of the many reductions we'll learn in this course. Together this "web of reductions" forms the scientific core of cryptography, connecting many of the core concepts and enabling us to construct increasingly sophisticated tools based on relatively simple "axioms" such as the cipher conjecture.
The task of Eve in breaking an encryption scheme is to distinguish between an
encryption of
::: {.definition title="Computational Indistinguishability (concrete definition)" #compindef}
Let
$$ | \Pr[ D(X) = 1 ] - \Pr[ D(Y) = 1 ] | \leq \epsilon ;. $$ :::
::: {.solvedexercise title="Computational Indistinguishability game" #compindex}
Prove that for every
-
We pick
$b \leftarrow_R {0,1}$ . -
If
$b=0$ , we let$w \leftarrow_R X$ . If$b=1$ , we let$w \leftarrow_R Y$ . -
We give
$Eve$ the input$w$ , and$Eve$ outputs$b' \in {0,1}$ . -
$Eve$ wins if$b=b'$ . :::
::: { .pause } Working out this exercise on your own is a great way to get comfortable with computational indistinguishability, which is a fundamental notion. :::
::: {.solution data-ref="compindex"}
For every function
Then the probability that
$$\Pr b=0 + \Pr[b=1] p_Y$$
and since
We see that
For the other direction, assume that
Then by definition of absolute value, there are two options. Either
Note that above we assume that the class of "functions computable in at most
As we did with computational secrecy, we can also define an asymptotic definition of computational indistinguishability.
::: {.definition title="Computational indistt" #compindefasymp}
Let
We say that ${ X_n }{n\in \N}$ and ${ Y_n }{n\in\N}$ are computationally indistinguishable, denoted by ${ X_n }{n\in\N} \approx { Y_n }{n\in\N}$, if for every polynomial
Solving the following asymptotic analog of compindex{.ref} is a good way to get comfortable with the asymptotic definition of computational indistinguishability:
::: {.exercise title="Computational Indistinguishability game (asymptotic)" #asymgame}
Let ${ X_n }{n\in \N},{Y_n}{n\in \N}$ and
-
We pick
$b \leftarrow_R {0,1}$ . -
If
$b=0$ , we let$w \leftarrow_R X_n$ . If$b=1$ , we let$w \leftarrow_R Y_n$ . -
We give
$Eve$ the input$w$ , and$Eve$ outputs$b' \in {0,1}$ . -
$Eve$ wins if$b=b'$ . :::
Dropping the index
We can use computational indistinguishability to phrase the definition of computational secrecy more succinctly:
Let
Working out the proof is an excellent way to make sure you understand both the definition of computational secrecy and computational indistinguishability, and hence we leave it as an exercise.
One intuition for computational indistinguishability is that it is related to some
notion of distance.
If two distributions are computationally
indistinguishable, then we can think of them as "very close" to one another, at
least as far as efficient observers are concerned. Intuitively, if
Suppose
Suppose that there exists a
Write $$ \Pr[ Eve(X_1)=1] - \Pr[ Eve(X_m)=1] = \sum_{i=1}^{m-1} \left( \Pr[ Eve(X_i)=1] - \Pr[ Eve(X_{i+1})=1] \right) ;. $$
Thus,
$$
\sum_{i=1}^{m-1} \left| \Pr[ Eve(X_i)=1] - \Pr[ Eve(X_{i+1})=1] \right| > (m-1)\epsilon
$$
and hence in particular there must exist some
Suppose that
::: {.proof data-ref="compindrepthm"}
For every
In other words $$ \left| {\mathbb{E}}{X_1,\ldots,X{i-1},Y_i,\ldots,Y_\ell}[ Eve'(X_1,\ldots,X_{i-1},Y_i,\ldots,Y_\ell) ] - {\mathbb{E}}{X_1,\ldots,X_i,Y{i+1},\ldots,Y_\ell}[ Eve'(X_1,\ldots,X_i,Y_{i+1},\ldots,Y_\ell) ] \right| > \epsilon;. $$
By linearity of expectation we can write the difference of these two expectations as $$ {\mathbb{E}}{X_1,\ldots,X{i-1},X_i,Y_i,Y_{i+1},\ldots,Y_\ell}\left[ Eve'(X_1,\ldots,X_{i-1},Y_i,Y_{i+1},\ldots,Y_\ell) - Eve'(X_1,\ldots,X_{i-1},X_i,Y_{i+1},\ldots,Y_\ell) \right] $$
By the averaging principle6 this means that there exist some values
The above proof illustrates a powerful technique known as the hybrid
argument whereby we show that two distribution
We now turn to show the length extension theorem, stating that if we have an encryption for
Suppose that
This might seem "obvious" but in cryptography, even obvious facts are
sometimes wrong, so it's important to prove this formally. Luckily, this is a
fairly straightforward implication of the fact that computational
indisinguishability is preserved under many samples. That is, by the security of
$$ (E'{k_1}(m_1),\ldots,E'{k_t}(m_t)) \approx (E'_{k_1}(m'1),\ldots,E'{k_t}(m'_t)) $$
for random
Randomized encryption scheme. We can now prove the full length extension theorem. Before doing so, we will need to generalize the notion of an encryption scheme to allow a randomized encryption scheme.
That is, we will consider encryption schemes where the encryption algorithm can "toss coins" in its computation.
There is a crucial difference between key material and such "as hoc" (sometimes also known as "ephemeral") randomness.
Keys need to be not only chosen at random, but also shared in advance between the sender and receiver, and stored securely throughout their lifetime.
The "coin tosses" used by a randomized encryption scheme are generated "on the fly" and are not known to the receiver, nor do they need to be stored long term by the sender.
So, allowing such randomized encryption does not make a difference for most applications of encryption schemes.
In fact, as we will see later in this course, randomized encryption is necessary for security against more sophisticated attacks such as chosen plaintext and chosen ciphertext attacks, as well as for obtaining secure public key encryptions.
We will use the notation
We can now show that given an encryption scheme with messages one bit longer than the key, we can obtain a (randomized) encryption scheme with arbitrarily long messages:
Suppose that there exists a
computationally secret encryption scheme
::: { .pause } This is perhaps our first example of a non trivial cryptographic theorem, and the blueprint for this proof will be one that we will follow time and again during this course. Please make sure you read this proof carefully and follow the argument. :::
::: {.proof data-ref="lengthextendthm"}
The construction, depicted in cipherlengthextensionfig{.ref}, is actually quite natural and variants of it are used in practice for stream ciphers, which are ways to encrypt arbitrarily long messages using a fixed size key.
The idea is that we use a key
Let
To decrypt
The above are clearly valid encryption and decryption algorithms, and hence the
real question becomes is it secure??. The intuition is that
Our discussion above looks like a reasonable intuitive argument, but to make sure it's true
we need to give an actual proof. Let
Claim: Let
Note that
Once we prove the claim then we are done since we know that for
every pair of messages
Proof of claim: We prove the claim by the hybrid method. For
where
$$ \left| {\mathbb{E}}{k{j-1}}[ Eve'(\alpha,E'{k{j-1}}(k_j,m_j),\beta) - Eve'(\alpha,E'{k{j-1}}(k'_j,m_j),\beta) ] \right| \geq \epsilon ;;(**) $$
But now consider the adversary
For concreteness sake let us give a precise definition of what it means for a function or probabilistic process
-
If you have taken any course on computational complexity (such as Harvard CS 121), then this is the model of Boolean circuits, except that we also allow randomization.
-
If you have not taken such a course, you might simply take it on faith that it is possible to model what it means for an algorithm to be able to map an input
$x$ into an output$f(x)$ using$T$ "elementary operations".
In both cases you might want to skip this appendix and only return to it if you find something confusing.
The model we use is a Boolean circuit that also has a
::: {.definition title="Probabilistic straightline program" #randprogdef} A probabilistic straightline program consists of a sequence of lines, each one of them one of the following forms:
-
foo = NAND(bar, baz)
wherefoo
,bar
,baz
are variable identifiers. -
foo = RAND()
wherefoo
is a variable identifier.
:::
Given a program X[
$i]
are considered input and output variables respectively.
If the input variables range from
If you haven't taken a class such as CS121 before, you might wonder how such a simple model captures complicated programs that use loops, conditionals, and more complex data types than simply a bit in
Advanced note: non uniformity. The computational model we use in this class is non uniform (corresponding to Boolean circuits) as opposed to uniform (corresponding to Turing machines). If this distinction doesn't mean anything to you, you can ignore it as it won't play a significant role in what we do next. It basically means that we do allow our programs to have hardwired constants of
Quantum computing. An interesting potential exception to this principle that every natural process should be simulatable by a straightline program of comparable complexity are processes where the quantum mechanical notions of interference and entanglement play a significant role. We will talk about this notion of quantum computing towards the end of the course, though note that much of what we say does not really change when we add quantum into the picture. As discussed in the CS 121 text, we can still capture these processes by straightline programs (that now have somewhat more complex form), and so most of what we'll do just carries over in the same way to the quantum realm as long as we are fine with conjecturing the strong form of the cipher conjecture, namely that the cipher is infeasible to break even for quantum computers. All current evidence points toward this strong form being true as well. The field of constructing encryption schemes that are potentially secure against quantum computers is known as post quantum cryptography and we will return to this later in the course.
Footnotes
-
Another version of "$t$ bits of security" is that a scheme has $t$ bits of security if for every $t_1+t_2 \leq t$, an attacker running in $2^{t_1}$ time can't get success probability advantage more than $2^{-t_2}$. However these two definitions only differ from one another by at most a factor of two. This may be important for practical applications (where the difference between $64$ and $32$ bits of security could be crucial) but won't matter for our concerns. ↩
-
Some texts reserve the term exponential to functions of the form $2^{\epsilon n}$ for some $\epsilon > 0$ and call a function such as, say, $2^{\sqrt{n}}$ subexponential . However, we will generally not make this distinction in this course. ↩
-
Negligible functions are sometimes defined with image equalling $[0,1]$ as opposed to the set $[0,\infty)$ of non-negative real numbers, since they are typically used to bound probabilities. However, this does not make much difference since if $\mu$ is negligible then for large enough $n$, $\mu(n)$ will be smaller than one. ↩
-
As will be the case for other conjectures we talk about, the name "The Cipher Conjecture" is not a standard name, but rather one we'll use in this course. In the literature this conjecture is mostly referred to as the conjecture of existence of one way functions, a notion we will learn about later. These two conjectures a priori seem quite different but have been shown to be equivalent. ↩
-
Results of this form are known as "triangle inequalities" since they can be viewed as generalizations of the statement that for every three points on the plane $x,y,z$, the distance from $x$ to $z$ is not larger than the distance from $x$ to $y$ plus the distance from $y$ to $z$. In other words, the edge $\overline{x,z}$ of the triangle $(x,y,z)$ is not longer than the sum of the lengths of the other two edges $\overline{x,y}$ and $\overline{y,z}$. ↩
-
This is the principle that if the average grade in an exam was at least $\alpha$ then someone must have gotten at least $\alpha$, or in other words that if a real-valued random variable $Z$ satisfies ${\mathbb{E}}[Z] \geq \alpha$ then $\Pr[Z\geq \alpha]>0$. ↩
-
The cost $10 \ell n$ is for the operations of feeding the "hardwired" strings $x_1,\ldots,x_{i-1}$, $y_{i+1},\ldots,y_\ell$ into $Eve'$. These take up at most $\ell n$ bits, and depending on the computational model, storing and feeding them into $Eve'$ may take $c\ell n$ steps for some small constant $c<10$. In the future, we will usually ignore such minor details and simply say that if $Eve'$ runs in polynomial time then so will $Eve$. ↩
-
The astute reader might note that the key $k_t$ is actually not used anywhere in the encryption nor decryption and hence we could encrypt $n$ more bits of the message instead in this final round. We used the current description for the sake of symmetry and simplicity of exposition. ↩