Abstract
We prove that Tandem-DM, one of the two “classical” schemes for turning an n-bit blockcipher of 2n-bit key into a double-block-length hash function, has birthday-type collision resistance in the ideal cipher model. For \(n=128\), an adversary must make at least \(2^{120.87}\) blockcipher queries to achieve chance 0.5 of finding a collision. A collision resistance analysis for Tandem-DM achieving a similar birthday-type bound was already proposed by Fleischmann, Gorski and Lucks at FSE 2009. As we detail, however, the latter analysis is wrong, thus leaving the collision resistance of Tandem-DM as an open problem until now. Our analysis exhibits a novel feature in that we introduce a trick never used before in ideal cipher proofs. We also give an improved bound on the preimage security of Tandem-DM. For \(n=128\), we show that an adversary must make at least \(2^{245.99}\) blockcipher queries to achieve chance 0.5 of inverting a randomly chosen point in the range. Asymptotically, Tandem-DM is proved to be preimage resistant up to \(2^{2n}/n\) blockcipher queries. This bound improves upon the previous best bound of \({{\varOmega }}(2^n)\) queries and is optimal (ignoring log factors) since Tandem-DM has range of size \(2^{2n}\).
Similar content being viewed by others
1 Introduction
The Tandem-DM compression function is a 3n-bit to 2n-bit compression function based on two applications of a blockcipher of 2n-bit key and n-bit word length (Fig. 1). While Tandem-DM was proposed by Lai and Massey in 1992 [8], the first proof of collision security for Tandem-DM (in the ideal cipher model, as is usual for all such proofs) was only proposed in 2009 by Fleischmann, Gorski and Lucks [4]. Unfortunately, as we detail in “Appendix,” the “FGL proof” (as we shall refer to it) has a number of serious flaws which invalidate it and are non-obvious to repair. The purpose of this paper is to offer a comprehensive security analysis of Tandem-DM, including both a correct collision resistance analysis and a proof of (close to) optimal preimage resistance.
In Sect. 3 we show that, as previously claimed [4], Tandem-DM does indeed have birthday-type collision security (necessitating at least \(2^{120.8}\) queries to break when the output length is \(2n = 256\) bits). A nice feature of our work is that the analysis is relatively simple compared to typical results in this area. This simplicity is afforded by a new trick we introduce, apparently not used before in ideal cipher analyses.
A similar technique is also used for a new preimage resistance analysis in Sect. 4. Our new upper bound for Tandem-DM is nearly optimal (up to a log factor), significantly improving upon the previous best bound of \({\varOmega }(2^n)\) queries.
1.1 Related Work on 2-Call Constructions
Another classical scheme for turning a 2n-bit key blockcipher into a 3n-bit to 2n-bit compression function is Abreast-DM, pictured in Fig. 2a, which was proposed by Lai and Massey in the same paper as Tandem-DM [8]. The collision resistance of Abreast-DM was independently resolved by Fleischmann, Gorski and Lucks [5] and Lee and Kwon [9], who both showed birthday-type collision resistance for Abreast-DM. Before that, Hirose [6] had given a collision resistance analysis for a general class of compression functions that included Abreast-DM as a special case, but under the assumption that the top and bottom blockciphers of the diagram be distinct (this considerably simplifies the analysis). The work by Hirose was further generalized by Özen and Stam [13], who additionally discuss schemes that are only secure in the iteration.
Another 3n-bit to 2n-bit compression function making two calls to a blockcipher of 2n-bit key was proposed by Hirose [7], who proved birthday-type collision resistance for his construction in the ideal cipher model. Hirose’s construction (Fig. 2b) is simpler than either Abreast-DM or Tandem-DM and in particular uses a single keying schedule for the top and bottom blockciphers. It is noteworthy that while Hirose introduced his construction over 10 years after Abreast-DM and Tandem-DM, his collision resistance analysis predates similar collision resistance analyses for Abreast-DM and Tandem-DM.
1.2 Related Work on 1-Call Constructions
Stam [19] proposed a class of “polynomial-based” 3n-bit to 2n-bit compression functions making a single call to a 2n-bit key blockcipher and subsequently proved [20] birthday-type collision resistance for this construction. Lee and Steinberger [11] proved collision resistance for the same compression function in the weaker “unpredictable cipher” model. Lucks [12] proposed a double-block-length hash function using a 3n-bit to 2n-bit compression function making a single call to a blockcipher of 2n-bit key and proved this hash function collision resistant in the ideal cipher model (see [13] for a generalization). However, Lucks’ construction is only secure in the iteration, as the compression function itself is collision insecure.
Earlier, Yi and Lam [22] had proposed a 3n-bit to 2n-bit compression function making a single call to a 2n-bit key blockcipher whose design was somewhat similar to Stam’s polynomial-based construction but which used a single-integer addition operation instead of several field multiplication operations. However, this construction was broken by Satoh et.al. and Wagner [16, 21].
1.3 Comparison
Of the three well-known 3n-bit to 2n-bit compression functions making two calls to a 2n-bit key blockcipher—those being Tandem-DM, Abreast-DM and Hirose’s construction—the two constructions whose collision resistance has been successfully resolved (Hirose and Abreast-DM) share the feature that the inputs to the top and bottom blockcipher are bijectively related. For example, for Abreast-DM, if the top blockcipher call is \(E_{B||L}(A)\), then the bottom blockcipher call (for the same input \(A||B||L\)) is \(E_{L||A}(\overline{B})\), where \(\overline{B}\) denotes bit complementation of B; thus, the inputs to the top and bottom blockciphers are related by the permutation \(\pi : \{0,1\}^{3n}\rightarrow \{0,1\}^{3n}\), \(\pi (X||Y||Z) = \overline{Y}||Z ||X\). (Here the last 2n bits are the key.) In Hirose’s construction, the inputs to the top and bottom blockciphers are related by the permutation \(\pi ^\prime : \{0,1\}^{3n} \rightarrow \{0,1\}^{3n}\), \(\pi ^\prime (X||Y||Z) = (X\oplus c)||Y ||Z\).
By contrast, Tandem-DM exhibits a more subtle relationship between the inputs of the top and bottom blockciphers, as an output of the top blockcipher is used to key the bottom blockcipher. It is the presence of this “feedback” within the construction, it seems, that has complicated efforts to prove a collision resistance bound. On the other hand, Tandem-DM still has the agreeable feature that the top and bottom blockcipher calls uniquely determine each other in the following sense: Given the key \(B||L\) and output R of the top cipher, one can determine the key \(L ||R\) and the input B of the bottom cipher and vice versa. This contrasts with constructions such as MDC-2 which use two calls to a blockcipher of n-bit key and in which the top and bottom blockcipher calls do not uniquely determine each other. Typically, collision resistance analyses are much harder for the latter kind of compression functions. (MDC-2 can only be proved non-trivially collision resistant in the iteration, and the current best bound of \({\varOmega }(2^{\frac{3}{5}n})\) queries due to Steinberger [18] is likely to be suboptimal.)
We note that the permutations \(\pi \) and \(\pi ^\prime \) discussed above share the common feature of having small cycle lengths—all cycles of \(\pi \) have length (dividing) 6 and all cycles of \(\pi ^\prime \) have length 2—which constitutes another strong similarity between Abreast-DM and Hirose’s scheme. In fact, due to this reason, Hirose’s collision resistance proof and the Abreast-DM collision resistance proof can be seen as special cases of the same framework [5, 9]. Building on this observation, Fleischmann et al. [5] defined a general class of compression functions called “Cyclic-DM” that are amenable to collision resistance analyses and that include Hirose’s scheme and Abreast-DM as special cases. Afterward, Fleischmann et al. [3] also provided a comprehensive generalization of their earlier works [4, 5]. In particular, a new and tighter collision resistance claim for Tandem-DM is made. This second analysis shares many of the flaws of the work [4] it is building upon. These flaws are fatal to the integrity of the argument and to the final bound, and we refer to the ePrint version of [10] for further details.
1.4 Preimage Resistance
For a particular point \(0^{2n}\), one can make queries \(E_{U||U}(U)\) for all \(U\in \{0,1\}^n\) to find a U such that \(E_{U||U}(U) = U\), and hence \(\mathsf{TDM}^E(U||U||U) = 0^{2n}\), with a high probability. Except this peculiarity, it has been an open problem to prove preimage resistance for values of q higher than \(2^n\). Tandem-DM inherits an obvious preimage resistance bound from ordinary Davies–Meyer, but once \(2^n\) queries are reached a “natural barrier” occurs: Namely, a blockcipher “loses randomness” after being queried \({\varOmega }(2^n)\) times on the same key (for example, when \(2^n-1\) queries have been made to a blockcipher under a given key, the answer to the last query under that key is deterministic). Going beyond, the \(2^n\) barrier seemed to require either a very technical probabilistic analysis, or some brand new idea. In this paper, we show a new idea which delivers tight bounds in a quite pain-free and untechnical fashion.
Table 1 numerically compares the collision and the preimage security of Abreast-DM, Hirose’s construction and Tandem-DM for \(n=128\). Note that the preimage security of Abreast-DM and Hirose’s construction is also obtained using our new proof technique (introduced in [1]). For the preimage security of Tandem-DM, we only consider a nonzero target image. (See Sect. 4 for the reason.)
2 Definitions
A blockcipher is a function \(E : \{0,1\}^m \times \{0,1\}^n \rightarrow \{0,1\}^n\) such that \(E(K,\cdot )\) is a permutation of \(\{0,1\}^n\) for each \(K \in \{0,1\}^m\). We call m the key size and n the word size of the blockcipher. It is customary to write \(E_K(X)\) instead of E(K, X) for \(K \in \{0,1\}^m\), \(X \in \{0,1\}^n\). The function \(E^{-1}_K(\cdot )\) denotes the inverse of \(E_K(\cdot )\) (as \(E_K(\cdot )\) is a permutation).
Given a blockcipher \(E : \{0,1\}^{2n} \times \{0,1\}^n \rightarrow \{0,1\}^n\), we define the Tandem-DM compression function \(\mathsf{TDM}^E : \{0,1\}^{3n} \rightarrow \{0,1\}^{2n}\) by
where
2.1 Collision Resistance
In the collision resistance experiment, a computationally unbounded adversary \({\mathcal {A}}\) is given oracle access to a blockcipher E uniformly sampled among all blockciphers of key length 2n and word length n. We allow \({\mathcal {A}}\) to query both E and \(E^{-1}\). After q queries to E, the query history of \({\mathcal {A}}\) is the (ordered) set of triples \(\mathcal {Q}= \{(X_i, K_i, Y_i)\}_{i=1}^q\) such that \(E_{K_i}(X_i) = Y_i\) and \({\mathcal {A}}\)’s i-th query is either \(E_{K_i}(X_i)\) or \(E_{K_i}^{-1}(Y_i)\) for \(1 \le i \le q\). We let \(\mathcal {Q}_i = \{(X_j, K_j, Y_j)\}_{j=1}^i\) be the first i elements of the query history; thus \(\mathcal {Q}= \mathcal {Q}_q\). We say \({\mathcal {A}}\) succeeds or finds a collision after its first i queries if there exist distinct 3n-bit values, \(A||B||L\), \(A'||B'||L'\) such that \(\mathsf{TDM}^E(A||B||L) = \mathsf{TDM}^E(A'||B'||L')\) and such that \(\mathcal {Q}_i\) contains both the queries necessary to compute \(\mathsf{TDM}^E(A||B||L)\) and \(\mathsf{TDM}^E(A'||B'||L')\). More formally—and see Fig. 3—we define this event by a predicate \(\mathsf{Coll}(\mathcal {Q}_i)\), which is true if and only if there exist n-bit values A, B, L, R, S, \(A'\), \(B'\), \(L'\), \(R'\), \(S'\) such that
and such that
We denote by
the maximum chance of an adversary making q queries causing \(\mathsf{Coll}(\mathcal {Q})\) to become true. The probability occurs over the uniform choice of E and over \({\mathcal {A}}\)’s coin tosses, if any. Also note that n is a hidden parameter.
The “XOR output” of a query \((X_i, K_i, Y_i)\) is the quantity \(X_i \oplus Y_i\). Another predicate which plays an important part in both our proof and the FGL proof is the “many queries with the same XOR output” predicate \(\mathsf{Xor}(\mathcal {Q})\), defined on a query history \(\mathcal {Q}= \{(X_i, K_i, Y_i)\}_{i=1}^q\) by
Here \(\alpha \) is a free parameter of the analysis which appears in the final collision resistance bound. (In [4], this predicate is named \(\textsc {Lucky}(\mathcal {Q})\); in [18] a similar predicate is named \(\textsf {Win0}(\mathcal {Q})\).) Without going into details at this point, we mention the FGL collision resistance proof—and ours, essentially, as well—upper bounds \(\Pr [\mathsf{Coll}(\mathcal {Q})]\) by \(\Pr [\mathsf{Xor}(\mathcal {Q})] + \Pr [\mathsf{Coll}(\mathcal {Q}) \wedge \lnot \mathsf{Xor}(\mathcal {Q})]\). A larger \(\alpha \) implies a lower value for \(\Pr [\mathsf{Xor}(\mathcal {Q})]\) and a higher value for \(\Pr [\mathsf{Coll}(\mathcal {Q}) \wedge \lnot \mathsf{Xor}(\mathcal {Q})]\). The best value of \(\alpha \) can be found numerically for a given value of n and q. Generally, readers may think of \(\alpha \) as some small constant value (e.g., for \(n = 128\) and \(q = 2^{120.87}\), \(\alpha = 16\)).
So far, we have described “infrastructure” that is common to both proofs. We shall now introduce some material proper to our proof. Note a query history \(\mathcal {Q}= \{(X_i, K_i, Y_i)\}_{i=1}^q\) does not record whether each triple \((X_i, K_i, Y_i)\) was obtained by the adversary through a forward query \(E_{K_i}(X_i)\) or a backward query \(E_{K_i}^{-1}(Y_i)\). For this, we maintain two arrays \(\textsf {Fwd}[\cdot ]\) and \(\textsf {Bwd}[\cdot ]\) where \(\textsf {Fwd}[i] = 1\) if and only if the adversary’s i-th query is a forward query and \(\textsf {Bwd}[i] = 1\) if and only if the adversary’s i-th query is a backward query. We then define an additional predicate
(“FB” stands for “Forward Backward.”) Here \(\alpha \) is the same free parameter as above. Note that \(\lnot \mathsf{FB}(\mathcal {Q})\) implies that
It is really consequences (6) and (7) of \(\lnot \mathsf{FB}(\mathcal {Q})\) that interest us, though we define \(\mathsf{FB}(\mathcal {Q})\) via (5) because this makes it slightly easier to bound \(\Pr [\mathsf{FB}(\mathcal {Q})]\). We will use the bound
One should thus think of \(\mathsf{FB}(\mathcal {Q})\) and \(\mathsf{Xor}(\mathcal {Q})\) as bad events whose non-occurrence helps bound the probability of \(\mathsf{Coll}(\mathcal {Q})\) occurring. We warn that (8) constitutes a slightly oversimplified encapsulation of our proof’s high-level structure. We refer to Sect. 3 for more details.
2.2 Preimage Resistance
In the preimage resistance experiment, a computationally unbounded adversary \({\mathcal {A}}\) with oracle access to a uniformly sampled blockcipher \(E : \{0,1\}^{2n} \times \{0,1\}^n \rightarrow \{0,1\}^n\) selects and announces a point \(C \in \{0,1\}^{2n}\), before making queries to E. \({\mathcal {A}}\) makes queries to both E and \(E^{-1}\) and records its query history \(\mathcal {Q}= \{(X_i, K_i, Y_i)\}_{i=1}^q\). We say \({\mathcal {A}}\) succeeds or finds a preimage if its query history \(\mathcal {Q}\) contains the means of computing a preimage of C, in the sense that there exist values A, B, L, R, \(S \in \{0,1\}^n\) such that \(A \oplus R ||B \oplus S = C\) and such that the queries \((A, B||L, R)\), \((B, L||R, S)\) are in \(\mathcal {Q}\). (In this case, we say \(\mathcal {Q}\) contains a preimage of C.)
Unfortunately, Tandem-DM has the particularity that the point \(0^{2n}\) is weaker than other range points with respect to preimage resistance. Indeed, to find a preimage of \(0^{2n}\) (given a random blockcipher), an adversary can make queries of the form \(E_{U||U}(U)\) for different values of U until it finds a U such that \(E_{U||U}(U) = U\); then it is easy to see that \(\mathsf{TDM}^E(U||U||U) = 0^{2n}\). The probability (over the choice of E) of this attack succeeding in q queries is \(1-(1-1/2^n)^q\approx q/2^n\), since a different key is used for each query. On the other hand, we shall see that all nonzero points in \(\{0,1\}^{2n}\) have much better preimage resistance than \(q/2^n\), at least for q in the range of interest (i.e., \(q = o(2^n), \omega (1)\)). We also note this preimage attack on \(0^{2n}\) is nearly matched by an easily proved preimage resistance bound of \(q/(2^n - q)\) for \(0^{2n}\) (or any other point in \(\{0,1\}^{2n}\)); the bound follows from the fact that a necessary condition for inverting \(0^{2n}\) is to find a query with XOR output \(0^n\).
One solution for avoiding issues associated with \(0^{2n}\) is to have the point-to-invert be chosen at random from \(\{0,1\}^{2n}\); in this case there is chance at most \(1/2^{2n}\) anyway that \(0^{2n}\) is the image to invert. However, we find it slightly more interesting to emphasize that \(0^{2n}\) is the only “bad” point in the range by letting the adversary choose which point to invert, under the stipulation that the adversary is not allowed to choose \(0^{2n}\) (for which we anyway have the above \(q/(2^n-q)\) preimage resistance bound which, though worse than the preimage resistance bound we shall prove for nonzero points, is acceptable from a practical standpoint).
Thus our preimage resistance experiment is modified as follows: An adversary \({\mathcal {A}}\) announces a point \(C \in \{0,1\}^{2n}\), \(C \ne 0^{2n}\), before making queries to E. The adversary wins after q queries if its query history \(\mathcal {Q}= \{(X_i, K_i, Y_i)\}_{i=1}^q\) contains the means of computing a preimage of C, in the sense described above. We denote by
the maximum advantage of any (probabilistic, computationally unbounded) adversary at this game. We note that here, too, n is a hidden parameter of the advantage. Moreover, we let
be the predicate that is true if and only if \(\mathcal {Q}\) contains a preimage of C, where C is an elided-but-understood parameter of the predicate. Thus, \(\mathbf{Adv}_{\mathsf{TDM}}^{\mathrm{epre}\ne 0}(q)\) is the maximum of \(\Pr [\mathsf{Preim}(\mathcal {Q})]\) taken over all q-query adversaries \({\mathcal {A}}\), the probability being taken over E and the coins of \({\mathcal {A}}\). We always assume that \({\mathcal {A}}\) is honest in the sense of choosing a nonzero value C.
3 Collision Resistance of Tandem-DM
It will be easier to explain the form of the probability bound in our main theorem if we explain a few high-level ideas from the proof beforehand. The proof starts by considering an arbitrary q-query collision-finding adversary \({\mathcal {A}}\) for Tandem-DM. We then construct an adversary \({\mathcal {A}}'\) as follows: \({\mathcal {A}}'\) simulates \({\mathcal {A}}\), but after each forward query \(E_{V||W}(U)\) made by \({\mathcal {A}}\), \({\mathcal {A}}'\) makes the backward query \(E_{U||V}^{-1}(W)\) if it does not already knowFootnote 1 the answer to this query, and after each backward query \(E_{U||V}^{-1}(W)\) made by \({\mathcal {A}}\), \({\mathcal {A}}'\) makes the forward query \(E_{V||W}(U)\) if it does not already knowFootnote 2 the answer to this query. (To better understand the relation of these instructions to Tandem-DM, view U, V, W as B, L, R.) Moreover if \({\mathcal {A}}\) ever makes a query to which \({\mathcal {A}}'\) already knows the answer from its query history, \({\mathcal {A}}'\) ignores this query. Thus \({\mathcal {A}}'\) never makes a query to which it knows the answer.
Let \(\mathcal {Q}'\) be the query history of \({\mathcal {A}}'\) and \(\mathcal {Q}\) be the query history of \({\mathcal {A}}\). Then \(\mathcal {Q}\subseteq \mathcal {Q}'\) and \(|\mathcal {Q}'| \le 2q\). Since \(\mathcal {Q}\subseteq \mathcal {Q}'\) we have
Our proof uses the inequality above to bound \(\Pr [\mathsf{Coll}(\mathcal {Q})]\). The use of the augmented adversary \({\mathcal {A}}'\) may seem superficially similar to Fleischmann et al.’s idea of “giving away a query for free.” However, it will become clear from our case analysis that we exploit the added structure of \(\mathcal {Q}'\) entirely differently from the way Fleischmann et al. exploit their free queries. We also point out that the added structure of \(\mathcal {Q}'\) enables the main interesting trick of our analysis, to be found in case “TL Forward” of Proposition 3 below.
We can now more easily discuss our main result:
Theorem 1
Let \(N = 2^n\), \(q < N/2\), \(N' = N - 2q\) and let \(\alpha \) be an integer, \(1\le \alpha \le 2q\). Then
The term \(2N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha }\) in Theorem 1 is an upper bound for \(\Pr [\mathsf{Xor}(\mathcal {Q}')] + \Pr [\mathsf{FB}(\mathcal {Q}')]\). In fact \(\Pr [\mathsf{Xor}(\mathcal {Q}')] \le N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha }\) and \(\Pr [\mathsf{FB}(\mathcal {Q}')] \le N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha }\). The two remaining terms \(4q\alpha /N' + 4q/N'\) are an upper bound for \(\Pr [\mathsf{Coll}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')]\). To bound \(\mathbf{Adv}_{\mathsf{TDM}}^\mathrm{coll}(q)\) for a given value of n and q, one should optimize \(\alpha \) numerically. For example, for \(n = 128\), Theorem 1 yields that \(\mathbf{Adv}_{\mathsf{TDM}}^\mathrm{coll}(2^{120.87}) < \frac{1}{2}\) using \(\alpha = 16\).
Asymptotically, Theorem 1 yields the following result:
Corollary 1
\(\lim _{n\rightarrow \infty }{} \mathbf{Adv}_{\mathsf{TDM}}^\mathrm{coll}\left( N/n\right) =0\).
Proof
Let \(q=N/n\) and \(\alpha =n/\log n\), where the logarithm takes base 2. Since \(N'>N/2\) for \(n>4\), we have
The last expression obviously goes to zero as \(n \rightarrow \infty \). \(\square \)
In particular, \(\lim _{n\rightarrow \infty }\mathbf{Adv}_{\mathsf{TDM}}^\mathrm{coll}\left( 2^{(1-\varepsilon )n}\right) =0\) for any fixed \(\varepsilon > 0\).
The proof of Theorem 1 uses refinements \(\mathsf{Coll}_{1}(\mathcal {Q})\), \(\mathsf{Coll}_{2}(\mathcal {Q})\), \(\mathsf{Coll}_{3}(\mathcal {Q})\) of the collision predicate \(\mathsf{Coll}(\mathcal {Q})\), defined as follows:
-
\(\mathsf{Coll}_{1}(\mathcal {Q})\) occurs if \(\mathcal {Q}\) contains a collision with TL, BL, TR, BR distinct.
-
\(\mathsf{Coll}_{2}(\mathcal {Q})\) occurs if \(\mathcal {Q}\) contains a collision with either TL = BL or TR = BR.
-
\(\mathsf{Coll}_{3}(\mathcal {Q})\) occurs if \(\mathcal {Q}\) contains a collision with either TL = BR or BL = TR.
For example, \(\mathsf{Coll}_{2}(\mathcal {Q})\) occurs if there exist values \(A, B, L, R, S, A', B', L', R', S'\) such that (1 2 3)–(4) hold and such that \((A, B||L, R) = (B, L||R, S)\). Since BL \(\ne \) BR and TL \(\ne \) TR in any collision, we have the following proposition.
Proposition 1
\(\mathsf{Coll}(\mathcal {Q}) \implies \mathsf{Coll}_{1}(\mathcal {Q}) \vee \mathsf{Coll}_{2}(\mathcal {Q}) \vee \mathsf{Coll}_{3}(\mathcal {Q})\) for any query history \(\mathcal {Q}\).
In view of proving Theorem 1, let \({\mathcal {A}}\) be an arbitrary q-query adversary for Tandem-DM, and let \({\mathcal {A}}'\) be obtained from \({\mathcal {A}}\) as outlined above; let \(\mathcal {Q}\) be the query history of A and \(\mathcal {Q}'\) be the query history of \({\mathcal {A}}'\). Then by (9), it suffices to show that
since the sum of the above probabilities is an upper bound for \(\Pr [\mathsf{Coll}(\mathcal {Q})]\). Moreover, by Proposition 1, \(\Pr [\mathsf{Coll}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')]\) can be upper bounded by finding upper bounds for \(\Pr [\mathsf{Coll}_i(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')]\) for \(i = 1, 2, 3\) and taking the sum of these. We now upper bound these various probabilities in a series of propositions. For these propositions, q, N and \(\alpha \) are as in Theorem 1, and \(\mathcal {Q}'\) is the query history of any adversary \({\mathcal {A}}'\) as just specified. We emphasize that \(|\mathcal {Q}'| \le 2q\) and that probabilities are taken over the random cipher E and over the coins of \({\mathcal {A}}'\), if any (it inherits these from \({\mathcal {A}}\)).
Proposition 2
\(\Pr [\mathsf{Xor}(\mathcal {Q}')] \le N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha }\) and \(\Pr [\mathsf{FB}(\mathcal {Q}')] \le N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha }\).
Proof
Without loss of generality, we could assume that \(A'\) always makes exactly 2q queries. Let \(\mathcal {Q}' = \{(X'_i, K'_i, Y'_i)\}_{i=1}^{2q}\) denote the query history of \({\mathcal {A}}'\). Since
for each \(Z\in \{0,1\}^n\), we have
\(\Pr [\mathsf{FB}(\mathcal {Q}')]\) can be bounded similarly.\(\square \)
Proposition 3
\(\Pr [\mathsf{Coll}_{1}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le 4q\alpha /N'\).
Proof
Let
for \(i = 1\ldots 2q\). Then \(\Pr [\mathsf{Coll}_{1}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le \sum _{i=1}^{2q}\Pr [\mathsf{Success}_1(\mathcal {Q}'_i)]\) and \(\Pr [\mathsf{Success}_1(\mathcal {Q}'_i)] \le \Pr [\mathsf{Coll}_{1}(\mathcal {Q}'_i) | \lnot \mathsf{Coll}_{1}(\mathcal {Q}'_{i-1}) \wedge \lnot \mathsf{Xor}(\mathcal {Q}'_{i-1}) \wedge \lnot \mathsf{FB}(\mathcal {Q}'_{i-1})]\).
Fix a value of i, \(1 \le i \le 2q\). We call the ith query made by \({\mathcal {A}}'\) the last query. If \(\mathsf{Success}_1(\mathcal {Q}'_i)\) occurs, then either the adversary (i.e., . \({\mathcal {A}}'\)) can use its last query as query TL or as query BL of a collision in which TL, BL, TR and BR are distinct, by symmetry. Moreover the last query could either be a forward query or a backward query. This gives rise to four possible cases, and we bound \(\Pr [\mathsf{Success}_1(\mathcal {Q}'_i)]\) for each separately. (We note the very first case, “TL forward,” is the case we discussed in “Appendix.”) For each case, we call the last query successful if this query completes a collision with TL, BL, TR, BR distinct and where the last query is used in the position stipulated by that case (e.g., ., for the case “TL forward,” the last query must be used in position TL).
TL forward: Let the last query be \(E_{B||L}(A)\). Call a value R good if there exists a query of the form \((B, L||R, \cdot )\) in \(\mathcal {Q}'\) that was obtained by \({\mathcal {A}}'\) as a backward query. We note that because of (7), \(\lnot \mathsf{FB}(\mathcal {Q}'_{i-1})\) implies there are at most \(\alpha \) good R’s.
We claim that for the last query to be successful the value R returned as an answer to the query must be good. Indeed, let R be the value returned; then a prerequisite for the query to be successful is that there be a query of the form \((B, L||R, \cdot )\) in \(\mathcal {Q}'_{i-1}\). We claim that this query must have been obtained as a backward query. Indeed, assume that the query \((B, L||R, \cdot )\) was obtained as a forward query \(E_{L||R}(B)\) by \({\mathcal {A}}'\). Then, by construction, \({\mathcal {A}}'\) would have immediately followed this query by the query \(E_{B||L}^{-1}(R)\) unless \({\mathcal {A}}'\) already knew the answer to \(E_{B||L}^{-1}(R)\). Either way, \({\mathcal {A}}'\) would have the query \((A, B||L, R)\) in its query history prior to the ith (forward) query \(E_{B||L}(A)\), a contradiction since \({\mathcal {A}}'\) never makes a query to which it knows the answer. Thus the value R returned as an answer to the query \(E_{B||L}(A)\) must be good for the query to be successful.
Since there are at most \(\alpha \) good values of R and since \({\mathcal {A}}'\) makes at most 2q queries, the probability that the last query is successful is therefore at most \(\alpha /(2^n - 2q) = \alpha /N'\).
TL backward: Let the last query be \(E_{B||L}^{-1}(R)\). For the last query to be successful, there must be a (necessarily unique) query BL \(= (B, L||R, S) \in \mathcal {Q}'_{i-1}\), for some value \(S \in \{0,1\}^n\). From the condition \(B \oplus S = B' \oplus S'\) and from \(\lnot \mathsf{Xor}(\mathcal {Q}'_{i-1})\), there are at most \(\alpha \) possibilities for the query BR. As each query BR uniquely determines the query TR, there are at most \(\alpha \) possibilities for the query TR as well and thus at most \(\alpha \) possibilities for the value \(A' \oplus R'\). Thus the value A returned by the last query has chance at most \(\alpha /N'\) that \(A \oplus R\) will be equal to \(A' \oplus R'\) for one of these values \(A' \oplus R'\), and so the last query has chance at most \(\alpha /N'\) of being successful.
BL forward: A \(180^{\circ }\) rotation of the collision diagram shows this case is symmetric to the case TL backward. The chance of success in this case is therefore at most \(\alpha /N'\).
BL backward: A \(180^{\circ }\) rotation of the collision diagram shows this case is symmetric to the case TL forward. The chance of success in this case is therefore at most \(\alpha /N'\).
The chance a forward last query is successful is therefore at most \(2\alpha /N'\) (adding the TL and BL forward cases), and likewise, the chance that a backward last query is successful is at most \(2\alpha /N'\). Thus \(\Pr [\mathsf{Success}_1(\mathcal {Q}'_i)] \le 2\alpha /N'\) for all i and \(\sum _{i=1}^{2q}\Pr [\mathsf{Success}_1(\mathcal {Q}'_i)] \le 4q\alpha /N'\). \(\square \)
Proposition 4
\(\Pr [\mathsf{Coll}_{2}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le 2q/N'\).
Proof
Note that when TL = BL, \(B||L = L||R\), so \(B = L = R\); moreover \(R = S\) and \(A = B\), so \(A = B = L = R = S\). For the adversary to obtain a collision with TL = BL, therefore, it must obtain a query of the form \((U, U||U, U)\). The same argument applies to the case TR = BR. The chance of a query \(E_{U||U}(U)\) or of a query \(E_{U||U}^{-1}(U)\) being answered by U is at most \(1/N'\). Thus, since 2q queries are made total, \(\Pr [\mathsf{Coll}_{2}(\mathcal {Q}')] \le 2q/N'\). \(\square \)
Proposition 5
\(\Pr [\mathsf{Coll}_{3}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le 2q\alpha /N' + 2q/N'\).
Proof
Note that in a collision with TL = BR, we must have TL \(\ne \) BL and \(A \oplus R = B \oplus S\) (since \(B \oplus S = B' \oplus S' = A \oplus R\), using TL = BR). Say the event \(\mathsf{Coll}_{3}'(\mathcal {Q}'_i)\) occurs if there exist distinct queries \((A, B||L, R)\), \((B, L||R, S)\) in \(\mathcal {Q}_i'\) such that \(A\oplus R = B \oplus S\). With the same argument applied to the case BL = TR, we have \(\mathsf{Coll}_{3}(\mathcal {Q}_i') \implies \mathsf{Coll}_{3}'(\mathcal {Q}_i')\). Therefore it suffices to show \(\Pr [\mathsf{Coll}_{3}'(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le 2q\alpha /N' + 2q/N'\).
The analysis now proceeds rather similarly to Proposition 3. Let
We have \(\Pr [\mathsf{Coll}_{3}'(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le \sum _{i=1}^{2q}\Pr [\mathsf{Success}_3'(\mathcal {Q}'_i)]\).
Fix a value of i, \(1 \le i \le 2q\), and call the ith query made by \(A'\) the last query. If \(\mathsf{Success}_3'(\mathcal {Q}'_i)\) occurs, then either the adversary (i.e., . \({\mathcal {A}}'\)) can use its last query as query TL or as query BL of its \(\mathsf{Coll}_{3}'\)-solution. This gives rise to four possible cases given that the last query could be either forward or backward. In each case, we call the last query successful if \(\mathsf{Success}_3'(\mathcal {Q}'_i)\) occurs and if the last query can be used in the position prescribed by that case (either TL or BL) in the \(\mathsf{Coll}_{3}'\)-solution.
TL forward: We can use exactly the same analysis as in the case “Forward TL” of Proposition 3. The probability that the last query is successful is therefore at most \(\alpha /N'\).
TL backward: Let \(E_{B||L}^{-1}(R)\) be the last query. For the last query to be successful, there must be a (necessarily unique) query of the form \((B, L||R, S) \in \mathcal {Q}_{i-1}'\), for some \(S \in \{0,1\}^n\). Since the answer A to the last query must be such that \(A \oplus R = B \oplus S\) (as per the definition of \(\mathsf{Coll}_{3}'\)) and \(B \oplus S\) is uniquely determined, the last query has chance at most \(1/N'\) of success.
BL forward: A \(180^{\circ }\) rotation of the collision diagram shows this case is symmetric to the case TL backward. The chance of success in this case is therefore at most \(1/N'\).
BL backward: A \(180^{\circ }\) rotation of the collision diagram shows this case is symmetric to the case TL forward. The chance of success in this case is therefore at most \(\alpha /N'\).
The chance a forward last query is successful is therefore at most \((\alpha +1)/N'\) (adding the TL and BL forward cases), and likewise, the chance that a backward last query is successful is at most \((\alpha +1)/N'\). Thus \(\Pr [\mathsf{Success}_3'(\mathcal {Q}'_i)] \le (\alpha +1)/N'\) for all i and \(\sum _{i=1}^{2q}\Pr [\mathsf{Success}_1(\mathcal {Q}'_i)] \le 2q\alpha /N' + 2q/N'\). (In fact, we even have \(\Pr [\mathsf{Coll}_{3}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le 2q\alpha /N' + 2q/N'\) since \(\lnot \mathsf{Xor}(\mathcal {Q}')\) was never used in the above.) \(\square \)
Taking the sum of the bounds of Propositions 3, 4 and 5, one obtains that
However, cases TL forward, BL backward and cases TL forward, BL backward of Propositions 3 and 5 reference the same events (the adversary is successful in case TL forward of Proposition 3 if and only if it is successful in case TL forward of Proposition 5 and likewise for the BL backward cases), which results in an “overcounting” of the adversary’s probability of success by \(2q\alpha /N'\). A more careful accounting of the adversary’s probability of success thus shows
Here we have not established (10) entirely formally, though this is the bound we use for \(\Pr [\mathsf{Coll}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')]\) in Theorem 1. Establishing (10) formally would require dividing the event \(\mathsf{Coll}(\mathcal {Q})\) into a different, less intuitive set of events than \(\mathsf{Coll}_{1}(\mathcal {Q})\), \(\mathsf{Coll}_{2}(\mathcal {Q})\), \(\mathsf{Coll}_{3}(\mathcal {Q})\), events that are directly based on those that occur in the case analyses of Propositions 3–5. (For example, one of these events would be the event that the adversary ever obtains a “good R” through a forward or backward query, as defined for forward queries in case TL forward of Proposition 3 and implicitly defined (by symmetry) for backward queries in case BL backward of Proposition 3; another event would cover the cases TL backward and BL forward of Proposition 5, and so on.) The current form of the proof is our best compromise between readability and formality. In any case, the difference between \(4q\alpha /N'\) and \(6q\alpha /N'\) is relatively minor.
Summing (10) with the bounds of Proposition 2 and using (9), we obtain
Since (11) holds for an arbitrary q-query adversary \({\mathcal {A}}\), this establishes Theorem 1.
4 Preimage Resistance of Tandem-DM
To build some intuition for our preimage resistance analysis, let us start with considering the much easier problem of constructing a 3n-bit to 2n-bit compression function H based on two 3n-bit to n-bit smaller underlying primitives f and \(f^{\prime }\). An obvious approach is simply to concatenate the outputs of f and \(f^{\prime }\), that is let \(H(B)=f(B) ||f^{\prime }(B)\) for \(B\in \{0,1\}^{3n}\). If f and \(f^{\prime }\) are modeled as independently sampled, ideally random functions, then it is not hard to see that H behaves ideally as well. In particular, it is preimage resistant up to \(2^{2n}\) queries (to f and \(f^{\prime }\)).
When switching to a blockcipher-based scenario, it is natural to replace f and \(f^{\prime }\) in the construction above by E, resp. \(E^{\prime }\), both run in Davies–Meyer mode. In other words, for blockciphers E and \(E^\prime \) both with 2n-bit keys and operating on n-bit blocks, define \(H(A||B) = (E_B(A)\oplus A) ||(E^{\prime }_B(A)\oplus A)\) where \(A\in \{0,1\}^n\) and \(B\in \{0,1\}^{2n}\). While there is every reason to believe this construction maintains preimage resistance up to \(2^{2n}\) queries, the standard proof technique against adaptive adversaries falls short significantly. Indeed, the usual argument goes that the ith query an adversary makes to E using key K will return an answer uniform from a set of size at least \(2^n-(i-1)\), and thus, the probability of hitting a prespecified value is at most \(1/(2^n-(i-1)) < 1/(2^n-q)\). Unfortunately, once q approaches \(2^n\), the denominator tends to zero (rendering the bound useless). As a result, one cannot hope to prove anything beyond \(2^n\) queries using this method. This restriction holds even for a “typical” bound of type \(q/(2^n-q)^2\).
When considering non-adaptive adversaries only, the situation is far less grim. Such adversaries need to commit to all queries in advance, which allows bounding the probability of each individual query hitting a prespecified value by \(2^{-n}\). While obviously there are dependencies (in the answers), these can safely be ignored when a union bound is later used to combine the various individual queries. Since the q offset has disappeared from the denominator, the typical bound \(q/(2^n)^2\) would give the desired security.
Our solution, then, is to force an adaptive adversary to behave non-adaptively. As this might sound a bit cryptic, let us be more precise. Consider an adversary adaptively making queries to the blockcipher, using the same key throughout. As soon as the number of queries to this key passes a certain threshold, we give the remaining queries to the blockcipher using this very key for free. We will refer to this event as a super query. Since these free queries are all asked in one go, they can be dealt with non-adaptively, preempting the problems that occur (in standard proofs) due to adaptive queries. Nonetheless, for every super query we need to hand out a very large number of free queries, which can aid the adversary. Thus we need to limit the number of super queries an adversary can make by setting the threshold that triggers a super query sufficiently high. In fact, we set the threshold at exactly half the total number of queries that can be made under a given key (i.e., it is set at \(2^n/2\) queries). This effectively doubles the adversary’s query budget, since for every query the adversary makes it can get another one later “for free” (if it keeps on making queries under the same key), but such a doubling of the number of queries does not lead to an unacceptable deterioration of the security bound.
Now our preimage resistance bound for Tandem-DM, parameterized by a certain parameter \(\alpha \), is given as follows.
Theorem 2
Let \(N=2^n, q<N^2\) and let \(\alpha > 0\) be an integer. Then
Proof
Let \(U||V \ne 0^n||0^n\) be the point to invert, chosen by the adversary before making any queries to E. We upper bound the probability that, in q queries, the adversary finds a point \(A||B||M \in \{0,1\}^{3n}\) such that \(\mathsf{TDM}^{E}(A||B||M) = U||V\).
In this analysis, we give free queries to the adversary as follows: Whenever the adversary has made N / 2 queries under a given key \(K||L\), and after the (N / 2)-th such query has been answered and placed in the query history, we give the remaining N / 2 queries under the key \(K||L\) for free to the adversary, in any order. In this case, we say that a super query occurs; every query in the query history is either part of a super query, or not; in the latter case we call the query a “normal query.” (Thus, in this theorem, normal queries are exactly the non-free queries.) We alert the reader to the fact that a “super query” consists of a set of N / 2 queries, whereas a “normal query” is a single query.
We define an event \(\mathsf{Lucky}(\mathcal {Q})\) on the query history; \(\mathsf{Lucky}(\mathcal {Q})\) occurs if
or if
The adversary obtains a preimage of \(U||V\) precisely if it obtains queries of the form \((A, B||M, R)\), \((B, M||R, S)\) such that \(A \oplus R = U\), \(B \oplus S = V\). It is easy to see these two queries must be distinct; otherwise, we would have \(A = B = M = R = S\) and therefore \(U||V = 0^n||0^n\). We call two queries as above a “winning pair” of queries, where the two elements of a winning pair need not be adjacent in the query history (and could be in any order). We speak of the “first” and “second” query in a winning pair referring to the order in which they appear in the query history.
Let \(\mathsf{WinNormal}(\mathcal {Q})\) be the event that the adversary obtains a winning pair in which the second query is a normal query. Let \(\mathsf{WinSuper}_1(\mathcal {Q})\) be the event that the adversary obtains a winning pair in which the second query is part of a super query and the first is either normal or part of a super query, but is not part of the same super query as the second. Finally let \(\mathsf{WinSuper}_2(\mathcal {Q})\) be the event that the adversary obtains a winning pair in which both queries of the pair are part of the same super query. It is then clear that if the adversary wins, one of the events
occurs. In particular, thus, one of the four events
must occur if the adversary wins. We upper bound the probability of each of these four events and sum the upper bounds in order to obtain an upper bound on the adversary’s advantage.
We start by upper bounding \(\Pr [\mathsf{Lucky}(\mathcal {Q})]\). For this we introduce two new events. Let \(\mathcal {Q}_\mathrm{n}\) be the restriction of \(\mathcal {Q}\) to normal queries, and let \(\mathcal {Q}_\mathrm{s}\) be the restriction of \(\mathcal {Q}\) to queries that are part of super queries. Let \(\mathsf{Lucky}_\mathrm{n}(\mathcal {Q})\) be the event that either
or
The event \(\mathsf{Lucky}_\mathrm{s}(\mathcal {Q})\) is likewise defined with respect to \(\mathcal {Q}_\mathrm{s}\). Obviously, \(\mathsf{Lucky}(\mathcal {Q}) \implies \mathsf{Lucky}_\mathrm{n}(\mathcal {Q}) \vee \mathsf{Lucky}_\mathrm{s}(\mathcal {Q})\), so it suffices to upper bound \(\mathsf{Lucky}_\mathrm{n}(\mathcal {Q})\) and \(\mathsf{Lucky}_\mathrm{s}(\mathcal {Q})\) and to sum these upper bounds.
Since every answer to a normal query, forward or backward, comes at random from a set of size at least N / 2, and since at most q normal queries are made, we have that
To upper bound \(\Pr [\mathsf{Lucky}_\mathrm{s}(\mathcal {Q})]\), note that there occur at most \(q/(N/2) = 2q/N\) super queries, since it costs N / 2 queries to set up a super query for a given key. Each super query contains N / 2 queries, so we can define random variables \(Z_{i,j}\) for \(1\le i\le 2q/N\) and \(1\le j\le N/2\), where \(Z_{i,j}=1\) if and only if \(X \oplus Y = U\) for the j-th query \((X, K||L, Y)\) within the ith super query. Then we have
Since \(\mathrm {E}(Z_{i,j})\le 2/N\) for each i and j, we have \(\mathrm {E}(Z)\le (2q/N)(N/2)(2/N)=2q/N\). Therefore, by Markov’s inequality, the probability that
is at most \(2q/\alpha N\). Now by a union bound and a symmetric argument (for \(X \oplus Y = V\)) , we obtain that \(\Pr [\mathsf{Lucky}_\mathrm{s}(\mathcal {Q})] \le 4q/\alpha N\). Summing the upper bounds for \(\Pr [\mathsf{Lucky}_\mathrm{n}(\mathcal {Q})]\) and \(\Pr [\mathsf{Lucky}_\mathrm{s}(\mathcal {Q})]\), we thus obtain that
To upper bound \(\Pr [\mathsf{WinNormal}(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q})]\), we use a “wish list” argument. As the adversary makes queries, we maintain two sequences \(\mathcal {W}_\mathrm{T}\) and \(\mathcal {W}_\mathrm{B}\) called wish lists. These are initially empty. For each query \((X, K||L, Y)\) added to the query history (whether normal or part of a super query), we update the wish lists as follows:
-
1.
If \(X \oplus Y = U\), then \((K, L||Y, K \oplus V)\) is added to \(\mathcal {W}_\mathrm{B}\).
-
2.
If \(X \oplus Y = V\), then \((L \oplus U, X||K, L)\) is added to \(\mathcal {W}_\mathrm{T}\).
The following properties are easy to check: (1) A query never “adds itself” to a wish list (this uses \(U||V \ne 0^n||0^n\)); (2) the elements within each wish list are all distinct from one another; (3) the adversary obtains a winning pair precisely if it obtains a query that is already in one of its wish lists (at the moment of insertion of that query into the query history). And by definition of \(\mathsf{Lucky}(\mathcal {Q})\), the wish lists never exceed length \(2\alpha \) as long as \(\lnot \mathsf{Lucky}(\mathcal {Q})\) holds.
Let \(E_{K||L}(X)\) be a query made to E during the adversary’s attack (either a normal query, or as part of a super query). If, at the moment when the query is being made, there is an element of the form \((X, K||L, Y)\) in (at least) one of the wish lists for some \(Y\in \{0,1\}^n\), then we say this wish list element is being “wished for” when the query \(E_{K||L}(X)\) is made. We similarly say the wish list element \((X, K||L, Y)\) is being “wished for” if the query \(E_{K||L}^{-1}(Y)\) is made (note that in this case, the query \(E_{K||L}^{-1}(Y)\) is necessarily normal, since a super query is, by default, implemented by forward queries). We note, importantly, that any wish list element can only be wished for once, since \(E_{K||L}(\cdot )\) is a permutation.
Let \(\mathsf{NormalWishGranted}_{\mathrm{T}, i}\) be the event that a normal query \((X, K||L, Y)\), when added to the query list, is equal to the ith element of \(\mathcal {W}_\mathrm{T}\) (presuming \(\mathcal {W}_\mathrm{T}\) has length at least i when the query is added). Likewise define \(\mathsf{NormalWishGranted}_{\mathrm{B}, i}\) with respect to the list \(\mathcal {W}_\mathrm{B}\). Then by the above remarks
so by a union bound
Because each wish list element can only be wished for once and because a normal query is answered at random uniformly from a set of size at least N / 2, we have
and therefore
We now upper bound \(\Pr [\mathsf{WinSuper}_1(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q})]\). We keep the same definition of the wish lists \(\mathcal {W}_\mathrm{T}\), \(\mathcal {W}_\mathrm{B}\) as above. We let \(\mathsf{SuperWishGranted}^1_{\mathrm{T}, i}\) be the event that a query \((X, K||L, Y)\) that is part of a super query is equal to the ith element of \(\mathcal {W}_\mathrm{T}\), where \(\mathcal {W}_\mathrm{T}\) has length \(\ge i\) before any of the super queries under key \(K||L\) have been made. The event \(\mathsf{SuperWishGranted}^1_{\mathrm{B}, i}\) is similarly defined. By the definition of \(\mathsf{WinSuper}_1(\mathcal {Q})\), we have that
Assume, for a given i, that the i-th element of \(\mathcal {W}_\mathrm{T}\) (say) is \((X, K||L, Y)\), and that a super query is about to be made for the key \(K||L\), and that X is in the domain of the super query. Then the probability that \(E_{K||L}(X) = Y\) is at most 2 / N (more precisely, it is exactly 2 / N unless Y is not in the super query’s range, in which case it is 0). Thus, arguing similarly for the list \(\mathcal {W}_\mathrm{B}\), we obtain that
Therefore
We finally bound \(\Pr [\mathsf{WinSuper}_2(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q})]\). Note the event \(\mathsf{WinSuper}_2(\mathcal {Q})\) can only occur when a super query occurs for a key of the form \(L||L\) and when that super query results in the triples \((U \oplus L, L||L, L)\), \((L, L||L, L \oplus V)\) being added to the query history. The probability that \(E_{L||L}(U\oplus L) = L\) is at most 2 / N, and, conditioned on the event that \(E_{L||L}(U\oplus L) = L\), the probability that \(E_{L||L}(L) = L \oplus V\) is at most \(1/(N/2 - 1)\). Since at most 2q / N super queries occur, we thus find that
The theorem follows by summing (12), (13), (14) and (15). \(\square \)
As an numerical example, for \(n=128\) and \(q=2^{245.99}\), let \(\alpha = q^{1/2}/2\). Then by Theorem 2, we have
Corollary 2
\(\lim _{n\rightarrow \infty } \mathbf{Adv}_{\mathsf{TDM}}^{\mathrm{epre}\ne 0}(N^2/n)=0\).
Proof
By setting \(\alpha = q^{1/2}/2\) (note that \(\alpha \) is allowed to depend on q), the bound from Theorem 2 simplifies to
If \(q =N^2/n\), then this bound can be rewritten as
This converges to zero as \(n\rightarrow \infty \). \(\square \)
5 Conclusion
In this work, we have shown that an earlier work concerning the security of Tandem-DM is incorrect. However, with a new proof (exploiting new ideas), we have shown that, in the ideal cipher model, Tandem-DM is collision resistant almost up to the birthday bound. We note that our collision resistance has the form \(O(q/(2^n-q))\) rather than \(O(q^2/(2^n-q)^2)\) ignoring log factors. Both bounds reach constant values when \(q = {\varOmega }(2^n)\); however, \(q^2/(2^n - q)^2\) grows slower than \(q/(2^n-q)\) since our bound is (only) “linear birthday” rather than true “quadratic birthday.” We leave it as an open problem to prove “quadratic birthday”-type collision resistance for Tandem-DM (as exists for Abreast-DM and Hirose’s scheme).
On a high level, our proof of collision resistance adheres to a (by now) standard framework. We first modify the collision-finding adversary by giving it several “free” queries, and subsequently, we bound the modified adversary’s chance of success using a case analysis. This approach allows to easily bound both the number of free queries and the probability of a query (free or not) causing a collision.
In contrast, the FGL proof directly uses a case analysis and subsequently uses free queries within the case analysis. This ad hoc addition of free queries (and its binding to a particular case) is problematic, as it does not allow proper accounting of the free queries. In particular, if a free query is fresh, it might cause a collision (or other bad event) elsewhere, whereas if the free query has actually been asked before, no new randomness can be extracted from it. Thus, apart from having established the security of Tandem-DM, we hope that our work also serves as a useful reminder to some of the subtleties involved in ICM proofs and as a guideline on how to avoid certain pitfalls.
Using a new technique based on super queries, we provided an improved bound on the preimage security of Tandem-DM. Specifically, we showed that asymptotically an adversary must make at least \(2^{2n-10}\) blockcipher queries to achieve chance 0.5 of inverting a randomly chosen point in the range. This bound improves upon the previous best bound of \({\varOmega }(2^n)\) queries and is optimal up to a constant factor. We note that the super query technique applies to many classical double-block-length compression functions such as Abreast-DM and Hirose’s scheme, as detailed in [1].
Notes
More formally, if its query history does not contain any triple of the form \((\cdot , U||V, W)\).
More formally, if its query history does not contain any triple of the form \((U, V||W, \cdot )\).
Neither do we, in fact. Using a careful trick, we manage to upper bound the number of good R’s by only considering the possibilities for the query BL rather than by considering the possible triples (BL, TR, BR). In the ePrint version of [10], however, we give for comparison the “brute-force” proof which uses the method of upper bounding the number of triples (BL, TR, BR).
References
F. Armknecht, E. Fleischmann, M. Krause, J. Lee, M. Stam and J. Steinberger, The preimage security of double-block-length compression functions. Asiacrypt 2011, LNCS 7073, pp. 233–251. Springer, Heidelberg (2011)
Y. Dodis and J. Steinberger, Message Authentication Codes from Unpredictable Block Ciphers. Crypto 2009, LNCS 5677, pp. 267–285. Springer, Heidelberg (2010). Full version available at http://people.csail.mit.edu/dodis/ps/tight-mac.ps
E. Fleischmann, C. Forler, M. Gorski and S. Lucks, Collision Resistant Double-Length Hashing. ProvSec 2010, LNCS 6401, pp. 102–118. Springer, Heidelberg (2010)
E. Fleischmann, M. Gorski and S. Lucks, On the security of Tandem-DM. FSE 2009, LNCS 5665, pp. 84–103. Springer, Heidelberg (2009)
E. Fleischmann, M. Gorski and S. Lucks, Security of cyclic double block length hash functions, Cryptography and Coding, 12th IMA International Conference, Cirencester, UK, LNCS 5921 pp. 153–175. Springer, Heidelberg (2009)
S. Hirose, Provably secure double-block-length hash functions in a black-box model. ICISC 2004, LNCS 3506, pp. 330–342. Springer, Heidelberg (2005)
S. Hirose, Some plausible constructions of double-block-length hash functions. FSE 2006, LNCS 4047, pp. 210–225. Springer, Heidelberg (2006)
X. Lai and J. Massey, Hash function based on block ciphers. Eurocrypt 1992, LNCS 658, pp. 55–70. Springer, Heidelberg (1993)
J. Lee and D. Kwon, The security of Abreast-DM in the ideal cipher model. IEICE Transactions 94-A(1), pp. 104–109. (2011) Also available at http://eprint.iacr.org/2009/225
J. Lee, M. Stam and J. Steinberger, The collision security of Tandem-DM in the ideal cipher model, Crypto 2011, LNCS 6841, pp. 561–577. Springer, Heidelberg (2011) ePrint version available at http://eprint.iacr.org/2010/409
J. Lee and J. Steinberger, Multi-property preservation using polynomial-based modes of operation, Eurocrypt 2010, LNCS 6110, pp. 573–596. Springer, Heidelberg (2010)
S. Lucks, A collision-resistant rate-1 double-block-length hash function. Symmetric Cryptography, Dagstuhl Seminar Proceedings 07021 (2007)
O. Özen and M. Stam, Another Glance at Double-Length Hashing, Cryptography and Coding, 12th IMA International Conference, Cirencester, UK, LNCS 5921, pp. 94–115. Springer, Heidelberg (2009)
P. Rogaway and T. Shrimpton, Cryptographic hash-function basics: definitions, implications, and separations for preimage resistance, second-preimage resistance, and collision-resistance. FSE 2004, LNCS 3017, pp. 371–388. Springer, Heidelberg (2004)
P. Rogaway and J. Steinberger, Constructing cryptographic hash functions from fixed-key blockciphers. Crypto 2008, LNCS 5157, pp. 433–450. Springer, Heidelberg (2008)
T. Satoh, M. Haga and K. Kurosawa, Towards secure and fast hash functions. IEICE Transactions 82-A(1), pp. 55–62. (1999)
T. Shrimpton and M. Stam, Building a collision-resistant compression function from non-compressing primitives. ICALP 2008, Part II. LNCS 5126, pp. 643–654, Springer, 2008.
J. Steinberger, The collision intractability of MDC-2 in the ideal-cipher model, Eurocrypt 2007, LNCS 4515, pp. 34–51. Springer, Heidelberg (2007)
M. Stam, Beyond uniformity: better security/efficiency tradeoffs for compression functions, Crypto 2008, LNCS 5157, pp. 397–412. Springer, Heidelberg (2008)
M. Stam, Blockcipher-based hashing revisited, FSE 2009, LNCS 5665, pp. 67–83. Springer, Heidelberg (2009)
D. Wagner, Cryptanalysis of the Yi-Lam hash, Asiacrypt 2000, LNCS 1976, pp. 483–488. Springer, Heidelberg (2000)
X. Yi and K.-Y. Lam, A new hash function based on block cipher. ACISP 1997, Second Australasian Conference on Information Security and Privacy, LNCS 1270, pp. 139–146. Springer, Heidelberg (1997)
Acknowledgments
We thank Frederik Armknecht, Ewan Fleischmann and Matthias Krause for their kind permission to incorporate here the Tandem-DM preimage results from our joint paper [1]. Jooyoung Lee was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2013R1A1A2007488).
Author information
Authors and Affiliations
Corresponding author
Appendix: The FGL Collision Resistance Proof
Appendix: The FGL Collision Resistance Proof
Since the interest in our collision result would be substantially diminished (though not nullified, since our bound is tighter and our proof much shorter) if the FGL collision resistance proof were correct, we detail here some of our objections to [4]. This material also serves as a good introduction to our own proof and will give the reader more intuition about Tandem-DM. We will concentrate on a critique of the FSE’09 paper [4], as the derivation of the main result in the follow-up [3, Theorem 3] builds on the earlier mistakes by Fleischmann et al. [4] (the errors in the original [4] are never mentioned). The ePrint version of [10] contains a detailed list of problems with the later paper.
Starting with a q-query collision-finding adversary \({\mathcal {A}}\), FGL first make the standard assumption that \({\mathcal {A}}\) never makes a query to which it already knows the answer (this could occur in two ways: \({\mathcal {A}}\) could make the exact same query twice, or \({\mathcal {A}}\) could query (say) \(E^{-1}_K(Y)\) after having received Y as an answer beforehand to a query \(E_K(X)\)). This ensures each answer \({\mathcal {A}}\) receives comes uniformly at random from a set of size at least \(2^n - q\) (since \(E_K(\cdot )\) is a random permutation for each K). Moreover, after \({\mathcal {A}}\) makes i queries, its query history will contain exactly i distinct elements.
Say \({\mathcal {A}}\) succeeds at the ith query if \(\mathsf{Coll}(\mathcal {Q}_i)\) holds but \(\mathsf{Coll}(\mathcal {Q}_{i-1})\) and \(\mathsf{Xor}(\mathcal {Q}_{i-1})\) do not hold. By upper bounding the probability that \({\mathcal {A}}\) ever succeeds, we upper bound \(\Pr [\mathsf{Coll}(\mathcal {Q}) \wedge \lnot \mathsf{Xor}(\mathcal {Q})]\). (Upper bounding \(\Pr [\mathsf{Xor}(\mathcal {Q})]\) is an easy probability exercise that we overlook for the purposes of this proof sketch.) A good analogy is to view \({\mathcal {A}}\) as trying to complete a puzzle where each element of its query history is a puzzle piece it can use to complete the collision diagram of Fig. 3. We use the expressions “\({\mathcal {A}}\) succeeds,” “\({\mathcal {A}}\) finds a [puzzle] solution” or “\({\mathcal {A}}\) completes a collision” interchangeably (and we will rarely remind that the condition \(\lnot \mathsf{Xor}(\mathcal {Q}_{i-1})\) must hold for \({\mathcal {A}}\) to succeed). We refer to the four queries (in any hypothetical puzzle solution (a.k.a. collision)) as “TL,” “BL,” “TR” and “BR”; see Fig. 3.
Note the constraint \(A||B||L \ne A'||B'||L'\) does not imply that the queries TL, BL, TR, BR are all distinct. For example, one could have TL = BR (in which case \((A, B||L, R) = (B', L'||R', S')\), so \(A = B'\), \(B = L'\), \(L = R'\) and \(R = S'\)) or TL = BL (in which case we have the dramatic consequence that \(A = B = L = R = S\), as is easy to check). This gives rise to several combinatorially distinct cases to consider; \({\mathcal {A}}\)’s chance of obtaining a solution of each kind is upper bounded separately, and these probabilities are added together to form a final upper bound on \({\mathcal {A}}\)’s chance of success.
We shall restrict our critique to FGL’s analysis of the “generic” case when the queries TL, BL, TR, BR are all distinct. We note that, in these types of analyses, the generic case is usually the hardest to handle as \({\mathcal {A}}\)’s job typically grows harder when additional constraints are placed on its solution. (The possibility of reusing the same query in two different positions of the collision diagram does, however, occasionally prove useful to \({\mathcal {A}}\), depending on the construction, so all cases must always be considered.) We call a puzzle solution in which TL, BL, TR, BR are distinct a “generic solution.”
If \({\mathcal {A}}\) succeeds in finding a generic solution, there is a smallest i such that a generic solution can be assembled from the queries in \(\mathcal {Q}_i\). The ith query is then called the “last query” of \({\mathcal {A}}\)’s solution. To upper bound \({\mathcal {A}}\)’s chance of obtaining a generic solution, FGL consider two cases. The first case is the event that \({\mathcal {A}}\)’s last query can be used in position TL of the puzzle solution and the second case is the event that \({\mathcal {A}}\)’s last query can be used in position BL (one of these two cases must occur). We shall focus on the first of these two cases, which is also the first analyzed in the order of the FGL proof. We call it the “TL-generic” case.
One would usually consider two subcases for the TL-generic case (or any other) depending on whether \({\mathcal {A}}\)’s last query is a forward query to E or an inverse query to \(E^{-1}\), but FGL lump their analysis into a single argument claiming that the two types of queries can be handled the same (in fact, they make this claim for every case in their proof and never distinguish between forward and backward queries to E). For clarity, however, we shall restrict ourselves to consider the case of a forward query to E, and discuss how their argument specializes to that case. We also choose to specifically consider the forward query case because this is where FGL’s analysis seems to be the most problematic.
The task at hand is thus to upper bound \({\mathcal {A}}\)’s chance of completing a generic solution by making a forward query to E that can be used as query TL of such a solution. The usual approach for this, and the one used by FGL, is to consider any given forward query \(E_{K_i}(X_i)\) made by \({\mathcal {A}}\) and to upper bound the probability that the answer \(Y_i\) to this query is such that the query history element \((X_i, K_i, Y_i)\) can be used in the desired manner; one then multiplies this probability by q since \({\mathcal {A}}\) can make q queries total. With foresight on how we wish to use the query \(E_{K_i}(X_i)\), it is convenient to rename \(K_i\) as \(B||L\) and \(X_i\) as A; thus the query is \(E_{B||L}(A)\). To proceed, one would typically upper bound the number of values \(R \in \{0,1\}^n\) such that, if we had \(E_{B||L}(A) = R\), the query \((A, B||L, R)\) could be used in position TL of a generic solution together with previous elements of the query history, and divide this number by \(2^n - q\), since the answer to the query \(E_{B||L}(A)\) will come uniformly at random from a set of size at least \(2^n - q\). In turn, the standard, formal way of bounding the number of such R’s would be to upper bound the possible number of query triples (BL, BR, TR) already in the query history that could potentially be used with the query \(E_{B||L}(A)\) to form a generic solution, as the number of such triples is an upper bound for the number of R’s. Note such a triple must have the form \(\mathrm{BL} = (B, L||R, S)\), \(\mathrm{BR} = (B', L'||R', S')\), \(\mathrm{TR} = (A',B'||L',R')\) where \(B\oplus S = B'\oplus S'\) (and note that A, B and L are fixed here by the last query).
FGL, however, do not adoptFootnote 3 this approach for bounding the number of good R’s. Rather, they make the following argument: Take the value of R, whatever it is, that is returned by the query \(E_{B||L}(A)\); because \(\lnot \mathsf{Xor}(\mathcal {Q}_{i-1})\) there will be at most \(\alpha \) queries TR \(= (A', B'||L', R')\) in the query history such that \(A \oplus R = A' \oplus R'\); as the TR query uniquely determines the BR query, there are at most \(\alpha \) possibilities for the BR query; now “give the query BL \(= (B, L||R, S)\) for free to the adversary”; then since there are at most \(\alpha \) possibilities for the query BR \(= (B', L'||R', S')\) there is chance at most \(\alpha /(2^n - q)\) that \(B \oplus S = B' \oplus S'\) for one of the queries BR, so total chance at most \(q\alpha /(2^n-q)\) that the adversary ever obtains a TL-generic solution with a forward query, there being at most q queries total.
The fallacy in the above argument can be succinctly summarized by pointing out that the query BL \(= (B, L||R, S)\) may already be in the query history, in which case there is no randomness left in the value \(B \oplus S\). However, let us review in detail the argument in two different cases: when the query BL = \((B, L||R, S)\) is already in the query history prior to the last query, and when it is not. (Note that query BL only depends on R (besides B and L which are fixed by the last query), and not on which queries are “chosen” for TR and BR.) In the latter case, when BL = \((B, L||R, S)\) is not yet in the query history at the i-th query, then A’s last query cannot succeed in any case in completing a generic TL collision since the query BL is missing; thus there is no need to bound anything (and no need even to “give the query BL for free”). In the case when query BL is already in the query history, on the other hand, all randomness is lost once R is revealed. FGL successfully argue that, for a given value of R, there will be at most \(\alpha \) possibilities for the pair (TR , BR), but this does not in any way imply the non-existence of such queries TR and BR.
Note also that nothing in the FGL argument precludes the possibility that, when the adversary makes its i-th query \(E_{B||L}(A)\), there is not some very large number of distinct values of R—say \(2^{0.5n}\)—for which there exists a triplet of queries (BL, TR, BR) of the form \(\mathrm{BL} = (B, L||R, S)\), \(\mathrm{BR} = (B', L'||R', S')\), \(\mathrm{TR} = (A',B'||L',R')\) where \(B\oplus S = B'\oplus S'\), and such that R does not yet appear as the third coordinate of any query in the query history with key \(B||L\). Certainly, there being such a large number of values of R does not contradict \(\lnot \mathsf{Xor}(\mathcal {Q}_{i-1})\). Also certainly, the ith query would have chance \(2^{0.5n}/(2^n-q)\) of making the adversary succeed if such a large number of values of R existed and not chance \(\alpha /(2^n-q)\). In other words, one can infer something is wrong with the FGL argument because it simply does not address the main difficulty of the case at hand—that being the potential existence of a large number of triples (BL, BR, TR) that may fit with the query \(E_{B||L}(A)\).
Other issues are raised by FGL’s casual comment that the query BL \(= (B, L||R, S)\) is simply “given for free” to the adversary. Indeed, if this query is not yet present, is it added to the query history before or after the ith query itself? Is this query only made after the value of R is revealed, or is it somehow inserted into the query history before the value of R is revealed? The former might be all right; the latter not, since it would (drastically) alter R’s distribution conditioned on the query history, i.e., R would no longer come uniformly at random from a set of size at least \(2^n - q\). Most importantly, since this free query becomes part of the query history, one should account for the possibility that this query (not the ith query) causes the adversary to succeed (and not necessarily by being used in position BL of a generic solution). Indeed, we are forced to give such credit to the adversary, since we have required the adversary never to make a query to which it already knows the answer, and since the adversary may have wished to subsequently make this query itself; this means the case analysis should be applied recursively to the free query, but if the case analysis requires other queries to be “given for free,” then we bite our tail and end up giving an astronomical number of free queries to the adversary (e.g., nearly all possible queries).
While we singled out the TL generic case for examination, the same kinds of problems recur throughout the FGL case analysis, essentially invalidating the entire proof. Moreover, since the FGL proof sidesteps the most crucial challenges posed by an analysis of Tandem-DM (see the paragraph before last), it leaves little for any subsequent analysis to build on.
Rights and permissions
About this article
Cite this article
Lee, J., Stam, M. & Steinberger, J. The Security of Tandem-DM in the Ideal Cipher Model. J Cryptol 30, 495–518 (2017). https://doi.org/10.1007/s00145-016-9230-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00145-016-9230-z