-
Notifications
You must be signed in to change notification settings - Fork 1
/
main.tex
281 lines (211 loc) · 15.6 KB
/
main.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{hyperref}
\usepackage{amsmath,amssymb,amsthm}
\theoremstyle{plain}
\newtheorem{result}{Result}[section]
\theoremstyle{definition}
\newtheorem{remark}[result]{Remark}
\newtheorem{definition}[result]{Definition}
\newcommand{\GF}{\operatorname{GF}}
\title{DammSum: efficient mnemonic seeds from quasigroup checksums}
\author{Cypher Stack \\ for Slaz Labs}
\date{\today}
\begin{document}
\maketitle
\begin{abstract}
This technical note describes DammSum, a method for producing digital asset mnemonic seed checksums that are robust against all single substitutions and transpositions.
The method uses an optimization of the Damm algorithm to produce and validate checksums efficiently, in particular without requiring elementwise construction of a required quasigroup and its associated Cayley table.
\end{abstract}
This document describes technical information relevant to mnemonic seed generation in digital assets.
It has not been independently reviewed, and may contain errors.
The author asserts no warranty and disclaims liability for its use.
The author further expresses no endorsement of Slaz Labs or its associated entities.
\section{Introduction}
Many digital asset projects and protocols support the use of mnemonic seeds to generate keys in common client applications.
At its core, a mnemonic seed is simply a mapping between a given list of words and a corresponding bit string; depending on the method used, this bit string is used to produce keys, either directly or using a pseudorandom function like a hash function or key derivation function.
Mnemonic seeds are useful precisely because they are designed for use by humans, as they can be easily written down, spoken by voice, typed into wallet software, or otherwise securely stored using physical media.
However, this poses challenges:
\begin{itemize}
\item What happens if the user communicates the seed over a noisy channel, and one or more words are substituted with others?
\item What happens if the user reads the seed incorrectly, and one or more words are transposed?
\item What happens if the user stores the seed incorrectly, and it is truncated when later read?
\end{itemize}
One mitigation to these and related challenges is to introduce a checksum to the seed phrase.
When generating a seed, client software first uses suitable randomness to produce the bitstring intended for use in key generation.
It then maps the bitstring to the mnemonic seed using a given word list.
Finally, it produces the checksum, which (depending on the method used) is typically prepended or appended to the seed.
When later communicating or using the seed, client software ensures that the checksum is valid for the given seed.
When considering designs for checksums, we consider two important concepts from coding theory relating to errors: detection and correction.
A design that detects errors is intended to alert the user that some particular type of transmission error occurred, and that the seed is invalid.
A design that corrects errors can do more, identifying the (ideally unique) valid seed to which the errors were applied.
Different constructions provide varying degrees of error detection and correction.
In this technical note, we describe a checksum design called DammSum that has the following properties:
\begin{itemize}
\item The checksum design is compatible with the common Electrum word set used in many digital asset projects.
\item The checksum is a single word, regardless of the length of the seed.
\item All erasure errors are detected.
\item All single substitution errors are detected.
\item All single adjacent transpositions, including transposition of the checksum itself, are detected.
\item Computation and verification of a checksum is efficient, with no lookup tables or complex computations required.
\end{itemize}
While error detection is useful, this design does not permit unique correction of substitution, transposition, or erasure errors.
\section{Mathematical background}
In a dissertation \cite{damm_thesis} and related papers \cite{damm2000,damm2007}, Damm notes the applicability of group structures to check digits, in particular observing that while the use of a group operation on antisymmetric permutation compositions can always detect single substitutions, it is limited in detection of adjacent transpositions based on the order of the group.
It is shown that replacement of the general group structure with at least a weakly totally antisymmetric quasigroup removes this limitation, such that a checksum construction based on this structure reliably and provably detects single substitutions and adjacent transpositions.
This observation then reduces the problem of checksum design to construction of weakly totally antisymmetric quasigroups of the required order, as well as optimizations for evaluation of the quasigroup operation on each digit of the seed.
For completeness, we define quasigroups, the weakly totally antisymmetric and totally antisymmetric properties, and the Damm algorithm, but refer the reader to \cite{damm_thesis,damm2000,damm2007} for a more complete treatment.
\subsection{Quasigroups and asymmetry}
\begin{definition}
A \textit{quasigroup} $(G,\star)$ is a set $G \neq \emptyset$ closed under a binary operation $\star$ such that for all $a,b \in G$ there exist unique $x,y \in G$ such that $a \star x = b$ and $y \star a = b$.
\end{definition}
\begin{remark}
Throughout this note, we sometimes drop explicit parentheses for clarity of notation.
In such cases, assume that quasigroup operations are read from left to right where unambiguous; that is, the notation $x \star y \star z$ should be interpreted as $(x \star y) \star z$.
\end{remark}
For the Damm algorithm, we will consider finite quasigroups, where the definition implies the usual cancellation laws.
\begin{definition}
A quasigroup $(G,\star)$ is \textit{weakly totally antisymmetric} if for all $c,x,y \in G$, if $(c \star x) \star y = (c \star y) \star x$, then $x = y$.
\end{definition}
\begin{definition}
A quasigroup $(G,\star)$ is \textit{totally antisymmetric} if it is weakly totally antisymmetric and $x \star y = y \star x$ implies $x = y$ for all $x,y \in G$.
\end{definition}
The paper \cite{damm2007} proves that there exist totally antisymmetric quasigroups of order $n$ for all $n \not\in \{2,6\}$, and that there exists a weakly totally antisymmetric quasigroup of order $n$ if and only if there exists a totally antisymmetric quasigroup of order $n$.
For the specific DammSum construction we describe later, the following additional results from \cite{damm2007} will be useful.
\begin{result}
\label{result:gf_is_ta}
Let $k > 1$ be an integer, and let $G \equiv \GF(2^k)$ be the Galois field with $2^k$ elements.
Let $a \in G$ such that $a \not\in \{0,1\}$, and define the binary operation $\star$ such that $x \star y \equiv ax + y$ for all $x,y \in G$.
Then $(G,\star)$ is a totally antisymmetric quasigroup.
\end{result}
\begin{result}
\label{result:permute_wta}
Let $(G,\star)$ be a totally antisymmetric quasigroup, and let $\beta: G \to G$ be a permutation of the elements of $G$.
Define a binary operation $\star'$ on $G$ such that for all $x,y \in G$, we have $x \star' y \equiv x \star \beta(y)$.
Then $(G,\star')$ is a weakly totally antisymmetric quasigroup.
\end{result}
We now prove that a particular construction over a Galois field is a weakly totally antisymmetric quasigroup.
\begin{result}
\label{result:wta}
Let $k > 1$ be an integer, and let $G \equiv \GF(2^k)$ be the Galois field with $2^k$ elements.
Define a binary operation $\star'$ on $G$ such that for $x,y \in G$ we have $x \star' y \equiv 2 \cdot (x + y)$.
Then $(G,\star')$ is a weakly totally antisymmetric quasigroup.
\end{result}
\begin{proof}
Define a binary operation $\star$ on $G = \GF(2^k)$ such that $x \star y \equiv 2 \cdot x + y$ for all $x,y \in G$; then by Result \ref{result:gf_is_ta}, $(G,\star)$ is a totally antisymmetric quasigroup.
Let $\beta: G \to G$ be a permutation on $G$ defined such that $\beta(x) = 2 \cdot x$ for all $x \in G$.
Then for all $x,y \in G$ we have
\begin{alignat*}{1}
x \star' y &= 2 \cdot (x + y) \\
&= 2 \cdot x + 2 \cdot y \\
&= 2 \cdot x + \beta(y) \\
&= x \star \beta(y)
\end{alignat*}
Result \ref{result:permute_wta} implies that $(G,\star')$ is weakly totally antisymmetric.
\end{proof}
Note that by construction, $x \star' x = 0$ for all $x \in G$.
\subsection{Damm algorithm}
We now define the Damm algorithm, which produces a checksum that provably detects single substitutions and adjacent transpositions.
The algorithm requires a finite weakly totally antisymmetric quasigroup $(G,\star)$ of order $n$, where for notational convenience we denote $G = \{0,1,\ldots,n-1\}$.
For optimization purposes, it further requires that $x \star x = 0$ for all $x \in G$ (the element $0$ is arbitrary and denoted as such for convenience); this is equivalent to asserting that the diagonal of the Cayley table for $G$ contain only this element.
Let $w = d_m | d_{m-1} | \cdots | d_1$ be an $m$-digit word formed by concatenating the digits $\{d_i\}_{i=1}^m$, where $d_i \in G$ for all $i \in [1,m]$ and $m > 0$.
Define the checksum of $w$ to be the digit $d_0$ such that the equation
\[ 0 \star d_m \star d_{m-1} \star \cdots \star d_0 = 0 \]
holds.
Observe that because we require $x \star x = 0$ for all $x \in G$, we may simplify the above equation by defining
\[ d_0 \equiv 0 \star d_m \star d_{m-1} \star \cdots \star d_1 \]
and using the former equation as verification of the checksum $d_0$.
We now show that the Damm algorithm detects any single nontrivial substitution.
\begin{result}[Substitution]
\label{result:substitution}
Fix $m \geq 0$ and let $(d_i)_{i=0}^m$ be a vector with elements in a finite weakly totally antisymmetric quasigroup $(G, \star)$.
Let $j \in [0, m]$ be an arbitrary index, and let $d_j' \in (G, \star)$.
If
\[ 0 \star d_m \star \cdots d_j' \star \cdots \star d_0 = 0 \star d_m \star \cdots \star d_j \star \cdots \star d_0 \]
then $d_j' = d_j$.
\end{result}
\begin{proof}
We proceed by induction on $m \geq 0$.
For the case where $m = 0$, suppose that $0 \star d_0 = 0 \star d_0'$.
Because $G$ is a quasigroup, cancellation immediately gives that $d_0 = d_0'$.
Now suppose the result holds for some $m > 0$, and suppose that
\[ 0 \star d_{m+1} \star \cdots d_j' \star \cdots \star d_0 = 0 \star d_{m+1} \star \cdots \star d_j \star \cdots \star d_0 \]
holds.
If $j = 0$, then define
\[ d \equiv 0 \star d_{m+1} \star \cdots \star d_1 \]
and note that we have $d \star d_0 = d \star d_0'$.
Cancellation again gives that $d_0 = d_0'$.
If instead $j > 0$, define
\[ d \equiv 0 \star d_{m+1} \star \cdots d_j \star \cdots \star d_1 \]
and
\[ d' \equiv 0 \star d_{m+1} \star \cdots d_j' \star \cdots \star d_1 \]
and note that we have $d \star d_0 = d' \star d_0$.
Cancellation here gives that $d = d'$, so (after reindexing) by induction we must have $d_j = d_j'$.
Therefore the result holds for all $m$.
\end{proof}
Observe that this result holds in the case $m = 0$, which in our application would correspond to an invalid case where no digits are provided.
This is only to simplify the base case of the induction argument, but holds for the allowed range $m > 0$.
We now show that the Damm algorithm detects any single nontrivial transposition.
\begin{result}[Transposition]
\label{result:transposition}
Fix $m > 0$ and let $(d_i)_{i=0}^m$ be a vector with elements in a finite weakly totally antisymmetric quasigroup $(G, \star)$.
Let $j \in [0, m)$ be an arbitrary index.
If
\[ 0 \star d_m \star \cdots \star d_{j+1} \star d_j \star \cdots \star d_0 = 0 \star d_m \star \cdots \star d_j \star d_{j+1} \star \cdots \star d_0 \]
then $d_{j+1} = d_j$.
\end{result}
\begin{proof}
We proceed by induction on $m \geq 1$.
For the case where $m = 1$, suppose that $(0 \star d_1) \star d_0 = (0 \star d_0) \star d_1$.
Because $G$ is weakly totally asymmetric, it immediately follows that $d_1 = d_0$.
Now suppose the result holds for some $m > 1$, and suppose that
\[ 0 \star d_{m+1} \star \cdots \star d_{j+1} \star d_j \star \cdots \star d_0 = 0 \star d_{m+1} \star \cdots \star d_j \star d_{j+1} \star \cdots \star d_0 \]
holds.
If $j = 0$, then define
\[ c \equiv 0 \star d_{m+1} \star \cdots \star d_2 \]
and note that we have $(c \star d_1) \star d_0 = (c \star d_0) \star d_1$.
It again follows that $d_1 = d_0$.
If instead $j > 0$, define
\[ d \equiv 0 \star d_{m+1} \star \cdots \star d_{j+1} \star d_j \star \cdots d_1 \]
and
\[ d' \equiv 0 \star d_{m+1} \star \cdots \star d_j \star d_{j+1} \star \cdots d_1 \]
and note that we have $d \star d_0 = d' \star d_0$.
Because $G$ is a quasigroup, cancellation here gives that $d = d'$, so (after reindexing) by induction we must have that $d_{j+1} = d_j$.
Therefore the result holds for all $m$.
\end{proof}
\section{DammSum}
We now describe the construction of DammSum, a method for efficiently producing Damm-based checksums for digital asset mnemonic seed phrases.
Let $k \equiv 11$, so $2^k = 2^{11} = 2048$.
Let $m \equiv 12$.
Let $G \equiv \GF(2^k)$, and define the binary operation $\star'$ on $G$ such that $x \star' y \equiv 2 \cdot (x + y)$ for all $x,y \in G$.
Generation of a DammSum seed proceeds as follows:
\begin{enumerate}
\item For $i \in [1,m]$, sample a digit $d_i \in G$ uniformly at random, and let $w = d_m | d_{m-1} | \cdots | d_1$.
\item Compute $d_0 \equiv 0 \star' d_m \star' d_{m-1} \star' \cdots \star' d_1$.
\item For each $i \in [0,m]$, let $D_i$ be the English word from the Electrum word list corresponding to $d_i$.
\item Output the seed $D_m | D_{m-1} | \cdots | D_1 | D_0$.
\end{enumerate}
To verify a DammSum seed has the correct checksum:
\begin{enumerate}
\item For each $i \in [0,m]$, let $d_i$ be the element of $\GF(2^k)$ corresponding to $D_i$.
\item If the equation
\[ 0 \star' d_m \star' d_{m-1} \star' \cdots \star' d_0 = 0 \]
holds, then verification succeeds; otherwise, it fails.
\end{enumerate}
The above construction meets the requirements of the Damm algorithm by Result \ref{result:wta}.
Further, a seed generated in this manner has $\log_2\left((2^{11})^{12}\right) = 132$ bits of entropy.
\section{Implementation}
We can implement DammSum efficiently, and note here how to do so.
Consider a representation of $\GF(2^k)$ as the set of binary-valued polynomials of degree at most $k-1$, reduced by an arbitrary monic irreducible binary-valued polynomial $f$ of degree exactly $k$.
While all choices of $f$ are valid, for efficiency we seek a polynomial of low weight.
For our purpose, we use the table in \cite{hp}, which lists $f(x) = x^{11} + x^2 + 1$.
In order to compute the checksum for a seed, we must compute quantities of the form $2 \cdot (x + y)$ in $\GF(2^k)$.
The sum $x + y$ is simply the bitwise \texttt{XOR} of the binary representations of $x$ and $y$, which is trivial to compute.
In order to compute the multiplication of the sum $x + y$ by $2$, it suffices to perform a bitwise left shift of $x + y$ and, if the result has the $k$ bit set, \texttt{XOR} this result with the binary representation of $f(2) = 2053$.
We then iterate this process over each digit in the seed to produce the checksum.
\section{Observations}
We note that while computation and verification of DammSum checksums is extremely efficient and can detect single substitution and transposition errors, it cannot uniquely correct them in all cases.
In order to provide robust and flexible error correction, a design based on constructions like Bose-Chaudhuri-Hocquenghem codes \cite{hocquenghem,bose} is recommended; however, such constructions are generally more complex and marginally less efficient.
\bibliographystyle{plain}
\bibliography{main}
\end{document}