Additional reading: Sections 2.3 and 2.4 in Boneh-Shoup book. Chapter 3 up to and including Section 3.3 in Katz-Lindell book.
Recall our cast of characters- Alice and Bob want to communicate securely over a
channel that is monitored by the nosy Eve. In the last lecture, we have seen the
definition of perfect secrecy that guarantees that Eve cannot learn anything
about their communication beyond what she already knew. However, this security
came at a price. For every bit of communication, Alice and Bob have to exchange
in advance a bit of a secret key. In fact, the proof of this result gives rise
to the following simple Python program that can break every encryption
scheme that uses, say, a
# Gets ciphertext as input and two potential plaintexts
# Positive return value means first is more likely,
# negative means second is more likely,
# 0 means both have same likelihood.
#
# We assume we have access to the function Decrypt(key,ciphertext)
def Distinguish(ciphertext,plaintext1,plaintext2):
bias = 0
key = [0]*128 #128 0's
while(sum(key)<128):
p = Decrypt(key,ciphertext)
if p==plaintext1: bias++
if p==plaintext2: bias--
increment(key)
return bias
# increment key when thought of as a number sorted from least significant
# to most significant bit. Assume not all bits are 1.
def increment(key):
i = key.index(0);
for j in range(i-1): key[j]=0
key[i]=1
Now, generating, distributing, and protecting huge keys causes immense
logistical problems, which is why almost all encryption schemes used in practice
do in fact utilize short keys (e.g.,
So, why can't we use the above Python program to break all encryptions in the
Internet and win infamy and fortune? We can in fact, but we'll have to wait a
really long time, since the loop in Distinguish
will run
However, the fact that this particular program is not a feasible attack, does
not mean there does not exist a different attack. But this still suggests a
tantalizing possibility: if we consider a relaxed version of perfect secrecy that
restricts Eve to performing computations that can be done in this universe
(e.g., less than
This in fact does seem to be the case, but as we've seen, defining security is a subtle task, and will take some care. As before, the way we avoid (at least some of) the pitfalls of so many cryptosystems in history is that we insist on very precisely defining what it means for a scheme to be secure.
Let us defer the discussion how one defines a function being computable in "less
than
An encryption scheme
firstcompdef{.ref} seems very natural, but is in fact impossible to achieve if the key is shorter than the message.
Before reading further, you might want to stop and think if you can prove that there is no, say,
The reason firstcompdef{.ref} can't be achieved that if the message is even one bit
longer than the key, we can always have a very efficient procedure
that achieves success probability of about Distinguish
by choosing the key at random. Since we have some small chance of guessing correctly, we will get a small advantage over half.
To fix this definition, we do not consider guessing with such a tiny advantage as a "true break" of the scheme, and hence this will be the actual definition we use.
An encryption scheme
Having learned our lesson, let's try to see that this strategy does give us the kind of conditions we desired. In particular, let's verify that this definition implies the analogous condition to perfect secrecy.
If
Before proving this theorem note that it gives us a pretty strong guarantee. In
the exercises we will strengthen it even further showing that no matter what
prior information Eve had on the message before, she will never get any
non-negligible new information on it. One way to phrase it is that if the sender used a
Before reading the proof, try to again review the proof of twotomanythm{.ref}, and see if you can generalize it yourself to the computational setting.
The proof is rather similar to the equivalence of guessing one of two messages vs. one of many messages for perfect secrecy (i.e., twotomanythm{.ref}). However, in the computational context we need to be careful in keeping track of Eve's running time. In the proof of twotomanythm{.ref} we showed that if there exists:
- A subset
$M\subseteq {{0,1}}^\ell$ of messages
and
- An adversary
$Eve:{{0,1}}^o\rightarrow{{0,1}}^\ell$ such that
$$
\Pr_{m{\leftarrow_{\tiny R}}M, k{\leftarrow_{\tiny R}}{\{0,1\}}^n}[ Eve(E_k(m))=m ] > 1/|M|
$$
Then there exist two messages
To adapt this proof to the computational setting and complete the proof of the current theorem it suffices to show that:
- If the probability of
$Eve$ succeeding was$\tfrac{1}{|M|} + \epsilon$ then the probability of$Eve'$ succeeding is at least$\tfrac{1}{2} + \epsilon/2$ .
- If
$Eve$ can be computed in$T$ operations, then$Eve'$ can be computed in$T + 100\ell + 100$ operations.
This will imply that if
The first item can be shown by simply doing the same proof more carefully,
keeping track how the advantage over
The proof of twotomanycomp{.ref} is a model to how a great many of the results in this course will look like. Generally we will have many theorems of the form:
"If there is a scheme
$S'$ satisfying security definition$X'$ then there is a scheme$S$ satisfying security definition$X$ "
In the context of twotomanycomp{.ref},
The way you show that if $S'$ is secure then $S$ is secure is by giving a transformation from an adversary that breaks $S$ into an adversary that breaks $S'$
For computational secrecy, we will always want that
-
Coming up with the strategy
$Eve'$ . -
Analyzing the probability of success and in particular showing that if
$Eve$ had non-negligible advantage then so will$Eve'$ .
Note that, just like in the context of NP completeness or uncomputability reductions, security reductions work backwards.
That is, we construct the scheme
For practical security, often every bit of security matters. We want our keys to be as short as possible and our schemes to be as fast as possible while satisfying a particular level of security. However, for understanding the principles behind cryptography, keeping track of those bits can be a distraction, and so just like we do for algorithms, we will use asymptotic analysis (also known as big Oh notation) to sweep many of those details under the carpet.
To a first approximation, there will be only two types of running times we will encounter in this course:
-
Polynomial running time of the form
$d\cdot n^c$ for some constants$d,c>0$ (or$poly(n)=n^{O(1)}$ for short) , which we will consider as efficient -
Exponential running time of the form
$2^{d\cdot n^{\epsilon}}$ for some constants$d,\epsilon >0$ (or$2^{n^{\Omega(1)}}$ for short) which we will consider as infeasible.2
Another way to say it is that in this course, if a scheme has any security at all, it will have at least
These are not all the theoretically possible running times.
One can have intermediate functions such as
The above definitions could be confusing if you haven't encountered asymptotic analysis before. Reading the beginning of Chapter 3 (pages 43-51) in the KL book, as well as the mathematical background lecture in my intro to TCS notes can be extremely useful. As a rule of thumb, if every time you see the word "polynomial" you imagine the function
What you need to remember is that negligible is much smaller than any inverse polynomial, while polynomials are closed under multiplication, and so we have the "equations"
From now on, we will require all of our encryption schemes to be efficient
which means that the encryption and decryption algorithms should run in
polynomial time. Security will mean that any efficient adversary can make at
most a negligible gain in the probability of guessing the message over its a
priori probability.^[Note that there is a subtle issue here with the order of quantifiers. For a scheme to be efficient, the algorithms such as encryption and decryption need to run in some fixed polynomial time such as
An encryption scheme
One more detail that we've so far ignored is what does it mean exactly for a
function to be computable using at most
We are now ready to make our first conjecture:
The Cipher Conjecture:3 There exists a computationally secret encryption scheme
$(E,D)$ (where$E,D$ are efficient) with a key of size$n$ for messages of size$n+1$ .
A conjecture is a well defined mathematical statement which (1) we believe is true but (2) don't know yet how to prove. Proving the cipher conjecture will be a great achievement and would in particular settle the P vs NP question, which is arguably the fundamental question of computer science. That is, the following theorem is known:
If
We just sketch the proof, as this is not the focus of this course. If Distinguish
subroutine above that searches over all keys) then
this loop can be sped up exponentially .
While it is very widely believed that
There are several reasons to believe the cipher conjecture. We now briefly mention some of them:
-
Intuition: If the cipher conjecture is false then it means that for every possible cipher we can make the exponential time attack described above become efficient. It seems "too good to be true" in a similar way that the assumption that P=NP seems too good to be true.
-
Concrete candidates: As we will see in the next lecture, there are several concrete candidate ciphers using keys shorter than messages for which despite tons of effort, no one knows how to break them. Some of them are widely used and hence governments and other benign or not so benign organizations have every reason to invest huge resources in trying to break them. Despite that as far as we know (and we know a little more after Edward Snowden's revelations) there is no significant break known for the most popular ciphers. Moreover, there are other ciphers that can be based on canonical mathematical problems such as factoring large integers or decoding random linear codes that are immensely interesting in their own right, independently of their cryptographic applications.
-
Minimalism: Clearly if the cipher conjecture is false then we also don't have a secure encryption with a key, say, twice as long as the message. But it turns out the cipher conjecture is in fact necessary for essentially every cryptographic primitive, including not just private key and public key encryptions but also digital signatures, hash functions, pseudorandom generators, and more. That is, if the cipher conjecture is false then to a large extent crytpgoraphy does not exist, and so we essentially have to assume this conjecture if we want to do any kind of cryptography.
"Give me a place to stand, and I shall move the world" Archimedes, circa 250 BC
Every perfectly secure encryption scheme is clearly also
computationally secret, and so if we required a message of size
Moreover, this is just the beginning. There is a huge range of other useful cryptographic tools that we can obtain from this seemingly innocent conjecture: (We will see what all these names and some of these reductions mean later in the course.)
We will soon see the first of the many reductions we'll learn in this course. Together this "web of reductions" forms the scientific core of cryptography, connecting many of the core concepts and enabling us to construct increasingly sophisticated tools based on relatively simple "axioms" such as the cipher conjecture.
The task of Eve in breaking an encryption scheme is to distinguish between an
encryption of
Let
We say that
Note: The expression
We can use computational indistinguishability to phrase the definition of Computational secrecy more succinctly:
Let
Working out the proof is an excellent way to make sure you understand both the definition of Computational secrecy and computational indistinguishability, and hence we leave it as an exercise.
One intuition for computational indistinguishability is that it is related to some
notion of distance.
If two distributions are computationally
indistinguishable, then we can think of them as "very close" to one another, at
least as far as efficient observers are concerned. Intuitively, if
Suppose
Suppose that there exists a
Write $$ \Pr[ Eve(X_1)=1] - \Pr[ Eve(X_m)=1] = \sum_{i=1}^{m-1} \left( \Pr[ Eve(X_i)=1] - \Pr[ Eve(X_{i+1})=1] \right) ;. $$
Thus,
$$
\sum_{i=1}^{m-1} \left| \Pr[ Eve(X_i)=1] - \Pr[ Eve(X_{i+1})=1] \right| > (m-1)\epsilon
$$
and hence in particular there must exists some
Suppose that
For every
In other words $$ \left| {\mathbb{E}}{X_1,\ldots,X{i-1},Y_i,\ldots,Y_\ell}[ Eve'(X_1,\ldots,X_{i-1},Y_i,\ldots,Y_\ell) ] - {\mathbb{E}}{X_1,\ldots,X_i,Y{i+1},\ldots,Y_\ell}[ Eve'(X_1,\ldots,X_i,Y_{i+1},\ldots,Y_\ell) ] \right| > \epsilon;. $$
By linearity of expectation we can write the difference of these two expectations as $$ {\mathbb{E}}{X_1,\ldots,X{i-1},X_i,Y_i,Y_{i+1},\ldots,Y_\ell}\left[ Eve'(X_1,\ldots,X_{i-1},Y_i,Y_{i+1},\ldots,Y_\ell) - Eve'(X_1,\ldots,X_{i-1},X_i,Y_{i+1},\ldots,Y_\ell) \right] $$
By the averaging principle6 this means that there exist some values
The above proof illustrates a powerful technique known as the hybrid
argument whereby we show that two distribution
We now turn to show the length extension theorem, stating that if we have an encryption for
Suppose that
This might seem "obvious" but in cryptography, even obvious facts are
sometimes wrong, so it's important to prove this formally. Luckily, this is a
fairly straightforward implication of the fact that computational
indisinguishability is preserved under many samples. That is, by the security of
$$ (E'{k_1}(m_1),\ldots,E'{k_t}(m_t)) \approx (E'_{k_1}(m'1),\ldots,E'{k_t}(m'_t)) $$
for random
We can now prove the full length extension theorem. Before doing so, we will need to generalize the notion of an encryption scheme to allow a randomized encryption scheme.
That is, we will consider encryption schemes where the encryption algorithm can "toss coins" in its computation.
There is a crucial difference between key material and such "as hoc" randomness.
Keys need to be not only chosen at random, but also shared in advance between the sender and receiver, and stored securely throughout their lifetime.
The "coin tosses" used by a randomized encryption scheme are generated "on the fly" and are not known to the receiver, nor do they need to be stored long term by the sender.
So, allowing such randomized encryption does not make a difference for most applications of encryption schemes.
In fact, as we will see later in this course, randomized encryption is necessary for security against more sophisticated attackes such as chosen plaintext and chosen ciphertext attacks, as well as for obtaining secure public key encryptions.
We will use the notation
We can now show that given an encryption scheme with messages one bit longer than the key, we can obtain a (randomized) encryption scheme with arbitrarily long messages:
Suppose that there exists a
computationally secret encryption scheme
Let
To decrypt
The above are clearly valid encryption and decryption algorithms, and hence the
real question becomes is it secure??. The intuition is that
Our discussion above looks like a reasonable intuitive argument, but to make sure it's true
we need to give an actual proof. Let
Claim: Let
Note that
$$
where we use
Once we prove the claim then we are done since we know that for
every pair of message
Proof of claim: We prove the claim by the hybrid method. For
where
$$ \left| {\mathbb{E}}{k{j-1}}[ Eve'(\alpha,E'{k{j-1}}(k_{j},m_j),\beta) - Eve'(\alpha,E'{k{j-1}}(k'_j,m_j),\beta) ] \right| \geq \epsilon ;;(**) $$
But now consider the adversary
For concreteness sake let us give a precise definition of what it means for a function or probabilistic process
A probabilistic straightline program consists of a sequence of lines, each one of them one of the following forms:
foo = bar NAND baz
wherefoo
,bar
,baz
are variable identifiers. \foo = RAND
wherefoo
is a variable identifier. \
Given a program x_
and y_
are considered input and output variables respectively. We require such variables to have the forms x_
x_
y_
y_
If you haven't taken a class such as CS121 before, you might wonder how such a simple model captures complicated programs that use loops, conditionals, and more complex data types than simply a bit in
Advanced note: The computational model we use in this class is non uniform (corresponding to Boolean circuits) as opposed to uniform (corresponding to Turing machines). If this distinction doesn't mean anything to you, you can ignore it as it won't play a significant role in what we do next. It basically means that we do allow our programs to have hardwired constants of
$poly(n)$ bits where$n$ is the input/key length. In fact, to be precise, we will hold ourselves to a higher standard than our adversary, in the sense that we require our algorithms to be efficient in the stronger sense of being computable in uniform probabilistic polynomial time (for some fixed polynomial, often$O(n)$ or$O(n^2$ )), while the adversary is allowed to use non uniformity.
Footnotes
-
This is a slight simplification of the typical notion of "$t$ bits of security". In the more standard definition we'd say that a scheme has $t$ bits of security if for every $t_1+t_2 \leq t$, an attacker running in $2^{t_1}$ time can't get success probability advantage more than $2^{-t_2}$. However these two definitions only differ from one another by at most a factor of two. This may be important for practical applications (where the difference between $64$ and $32$ bits of security could be crucial) but won't matter for our concerns. ↩
-
Some texts reserve the term exponential to functions of the form $2^{\epsilon n}$ for some $\epsilon > 0$ and call a function such as, say, $2^{\sqrt{n}}$ subexponential . However, we will generally not make this distinction in this course. ↩
-
As will be the case for other conjectures we talk about, the name "The Cipher Conjecture" is not a standard name, but rather one we'll use in this course. In the literature this conjecture is mostly referred to as the conjecture of existence of one way functions, a notion we will learn about later. These two conjectures a priori seem quite different but have been shown to be equivalent. ↩
-
This definition implicitly assumes that $X$ and $Y$ are actually parameterized by some number $n$ (that is polynomially related to $o$) so for every polynomial $T(o)$ and inverse polynomial $\epsilon(o)$ we can take $n$ to be large enough so that $X$ and $Y$ will be $(T,\epsilon)$ indistinguishable. In all the cases we will consider, the choice of the parameter $n$ (which is usually the length of the key) will be clear from the context. ↩
-
Results of this form are known as "triangle inequalities" since they can be viewed as generalizations of the statement that for every three points on the plane $x,y,z$, the distance from $x$ to $z$ is not larger than the distance from $x$ to $y$ plus the distance from $y$ to $z$. In other words, the edge $\overline{x,z}$ of the triangle $(x,y,z)$ is not longer than the sum of the lengths of the other two edges $\overline{x,y}$ and $\overline{y,z}$. ↩
-
This is the principle that if the average grade in an exam was at least $\alpha$ then someone must have gotten at least $\alpha$, or in other words that if a real-valued random variable $Z$ satisfies ${\mathbb{E}}Z \geq \alpha$ then $\Pr[Z\geq \alpha]>0$. ↩
-
The astute reader might note that the key $k_t$ is actually not used anywhere in the encryption nor decryption and hence we could encrypt $n$ more bits of the message instead in this final round. We used the current description for the sake of symmetry and simplicity of exposition. ↩
-
An interesting potential exception to this principle that every natural process should be simulatable by a straightline program of comparable complexity are processes where the quantum mechanical notions of interference and entanglement play a significant role. We will talk about this notion of quantum computing towards the end of the course, though note that much of what we say does not really change when we add quantum into the picture. As discussed in my lecture notes, we can still capture these processes by straightline programs (that now have somewhat more complex form), and so most of what we'll do just carries over in the same way to the quantum realm as long as we are fine with conjecturing the strong form of the cipher conjecture, namely that the cipher is infeasible to break even for quantum computers. (All current evidence points toward this strong form being true as well.) ↩