1 Introduction

The Tandem-DM compression function is a 3n-bit to 2n-bit compression function based on two applications of a blockcipher of 2n-bit key and n-bit word length (Fig. 1). While Tandem-DM was proposed by Lai and Massey in 1992 [8], the first proof of collision security for Tandem-DM (in the ideal cipher model, as is usual for all such proofs) was only proposed in 2009 by Fleischmann, Gorski and Lucks [4]. Unfortunately, as we detail in “Appendix,” the “FGL proof” (as we shall refer to it) has a number of serious flaws which invalidate it and are non-obvious to repair. The purpose of this paper is to offer a comprehensive security analysis of Tandem-DM, including both a correct collision resistance analysis and a proof of (close to) optimal preimage resistance.

In Sect. 3 we show that, as previously claimed [4], Tandem-DM does indeed have birthday-type collision security (necessitating at least \(2^{120.8}\) queries to break when the output length is \(2n = 256\) bits). A nice feature of our work is that the analysis is relatively simple compared to typical results in this area. This simplicity is afforded by a new trick we introduce, apparently not used before in ideal cipher analyses.

A similar technique is also used for a new preimage resistance analysis in Sect. 4. Our new upper bound for Tandem-DM is nearly optimal (up to a log factor), significantly improving upon the previous best bound of \({\varOmega }(2^n)\) queries.

Fig. 1
figure 1

The Tandem-DM compression function. All wires carry n-bit values. The top and bottom blockciphers are the same. Each has a 2n-bit key and n-bit input/output. The wire marked L is an input to the compression function (along with A and B)

1.1 Related Work on 2-Call Constructions

Another classical scheme for turning a 2n-bit key blockcipher into a 3n-bit to 2n-bit compression function is Abreast-DM, pictured in Fig. 2a, which was proposed by Lai and Massey in the same paper as Tandem-DM [8]. The collision resistance of Abreast-DM was independently resolved by Fleischmann, Gorski and Lucks [5] and Lee and Kwon [9], who both showed birthday-type collision resistance for Abreast-DM. Before that, Hirose [6] had given a collision resistance analysis for a general class of compression functions that included Abreast-DM as a special case, but under the assumption that the top and bottom blockciphers of the diagram be distinct (this considerably simplifies the analysis). The work by Hirose was further generalized by Özen and Stam [13], who additionally discuss schemes that are only secure in the iteration.

Another 3n-bit to 2n-bit compression function making two calls to a blockcipher of 2n-bit key was proposed by Hirose [7], who proved birthday-type collision resistance for his construction in the ideal cipher model. Hirose’s construction (Fig. 2b) is simpler than either Abreast-DM or Tandem-DM and in particular uses a single keying schedule for the top and bottom blockciphers. It is noteworthy that while Hirose introduced his construction over 10 years after Abreast-DM and Tandem-DM, his collision resistance analysis predates similar collision resistance analyses for Abreast-DM and Tandem-DM.

Fig. 2
figure 2

Two related double-block-length constructions. a The Abreast-DM compression function. The empty circle at bottom left denotes bit complementation. b Hirose’s compression function. The bottom left-hand wire carries an arbitrary nonzero constant c

1.2 Related Work on 1-Call Constructions

Stam [19] proposed a class of “polynomial-based” 3n-bit to 2n-bit compression functions making a single call to a 2n-bit key blockcipher and subsequently proved [20] birthday-type collision resistance for this construction. Lee and Steinberger [11] proved collision resistance for the same compression function in the weaker “unpredictable cipher” model. Lucks [12] proposed a double-block-length hash function using a 3n-bit to 2n-bit compression function making a single call to a blockcipher of 2n-bit key and proved this hash function collision resistant in the ideal cipher model (see [13] for a generalization). However, Lucks’ construction is only secure in the iteration, as the compression function itself is collision insecure.

Earlier, Yi and Lam [22] had proposed a 3n-bit to 2n-bit compression function making a single call to a 2n-bit key blockcipher whose design was somewhat similar to Stam’s polynomial-based construction but which used a single-integer addition operation instead of several field multiplication operations. However, this construction was broken by Satoh et.al. and Wagner [16, 21].

1.3 Comparison

Of the three well-known 3n-bit to 2n-bit compression functions making two calls to a 2n-bit key blockcipher—those being Tandem-DM, Abreast-DM and Hirose’s construction—the two constructions whose collision resistance has been successfully resolved (Hirose and Abreast-DM) share the feature that the inputs to the top and bottom blockcipher are bijectively related. For example, for Abreast-DM, if the top blockcipher call is \(E_{B||L}(A)\), then the bottom blockcipher call (for the same input \(A||B||L\)) is \(E_{L||A}(\overline{B})\), where \(\overline{B}\) denotes bit complementation of B; thus, the inputs to the top and bottom blockciphers are related by the permutation \(\pi : \{0,1\}^{3n}\rightarrow \{0,1\}^{3n}\), \(\pi (X||Y||Z) = \overline{Y}||Z ||X\). (Here the last 2n bits are the key.) In Hirose’s construction, the inputs to the top and bottom blockciphers are related by the permutation \(\pi ^\prime : \{0,1\}^{3n} \rightarrow \{0,1\}^{3n}\), \(\pi ^\prime (X||Y||Z) = (X\oplus c)||Y ||Z\).

By contrast, Tandem-DM exhibits a more subtle relationship between the inputs of the top and bottom blockciphers, as an output of the top blockcipher is used to key the bottom blockcipher. It is the presence of this “feedback” within the construction, it seems, that has complicated efforts to prove a collision resistance bound. On the other hand, Tandem-DM still has the agreeable feature that the top and bottom blockcipher calls uniquely determine each other in the following sense: Given the key \(B||L\) and output R of the top cipher, one can determine the key \(L ||R\) and the input B of the bottom cipher and vice versa. This contrasts with constructions such as MDC-2 which use two calls to a blockcipher of n-bit key and in which the top and bottom blockcipher calls do not uniquely determine each other. Typically, collision resistance analyses are much harder for the latter kind of compression functions. (MDC-2 can only be proved non-trivially collision resistant in the iteration, and the current best bound of \({\varOmega }(2^{\frac{3}{5}n})\) queries due to Steinberger [18] is likely to be suboptimal.)

We note that the permutations \(\pi \) and \(\pi ^\prime \) discussed above share the common feature of having small cycle lengths—all cycles of \(\pi \) have length (dividing) 6 and all cycles of \(\pi ^\prime \) have length 2—which constitutes another strong similarity between Abreast-DM and Hirose’s scheme. In fact, due to this reason, Hirose’s collision resistance proof and the Abreast-DM collision resistance proof can be seen as special cases of the same framework [5, 9]. Building on this observation, Fleischmann et al. [5] defined a general class of compression functions called “Cyclic-DM” that are amenable to collision resistance analyses and that include Hirose’s scheme and Abreast-DM as special cases. Afterward, Fleischmann et al. [3] also provided a comprehensive generalization of their earlier works [4, 5]. In particular, a new and tighter collision resistance claim for Tandem-DM is made. This second analysis shares many of the flaws of the work [4] it is building upon. These flaws are fatal to the integrity of the argument and to the final bound, and we refer to the ePrint version of [10] for further details.

1.4 Preimage Resistance

For a particular point \(0^{2n}\), one can make queries \(E_{U||U}(U)\) for all \(U\in \{0,1\}^n\) to find a U such that \(E_{U||U}(U) = U\), and hence \(\mathsf{TDM}^E(U||U||U) = 0^{2n}\), with a high probability. Except this peculiarity, it has been an open problem to prove preimage resistance for values of q higher than \(2^n\). Tandem-DM inherits an obvious preimage resistance bound from ordinary Davies–Meyer, but once \(2^n\) queries are reached a “natural barrier” occurs: Namely, a blockcipher “loses randomness” after being queried \({\varOmega }(2^n)\) times on the same key (for example, when \(2^n-1\) queries have been made to a blockcipher under a given key, the answer to the last query under that key is deterministic). Going beyond, the \(2^n\) barrier seemed to require either a very technical probabilistic analysis, or some brand new idea. In this paper, we show a new idea which delivers tight bounds in a quite pain-free and untechnical fashion.

Table 1 numerically compares the collision and the preimage security of Abreast-DM, Hirose’s construction and Tandem-DM for \(n=128\). Note that the preimage security of Abreast-DM and Hirose’s construction is also obtained using our new proof technique (introduced in [1]). For the preimage security of Tandem-DM, we only consider a nonzero target image. (See Sect. 4 for the reason.)

Table 1 Threshold numbers of queries such that the adversarial advantage is not greater than 1 / 2

2 Definitions

A blockcipher is a function \(E : \{0,1\}^m \times \{0,1\}^n \rightarrow \{0,1\}^n\) such that \(E(K,\cdot )\) is a permutation of \(\{0,1\}^n\) for each \(K \in \{0,1\}^m\). We call m the key size and n the word size of the blockcipher. It is customary to write \(E_K(X)\) instead of E(KX) for \(K \in \{0,1\}^m\), \(X \in \{0,1\}^n\). The function \(E^{-1}_K(\cdot )\) denotes the inverse of \(E_K(\cdot )\) (as \(E_K(\cdot )\) is a permutation).

Given a blockcipher \(E : \{0,1\}^{2n} \times \{0,1\}^n \rightarrow \{0,1\}^n\), we define the Tandem-DM compression function \(\mathsf{TDM}^E : \{0,1\}^{3n} \rightarrow \{0,1\}^{2n}\) by

$$\begin{aligned} \mathsf{TDM}^E(A||B||L) = (A\oplus R)||(B\oplus S) \end{aligned}$$

where

$$\begin{aligned} R= & {} E_{B||L}(A),\\ S= & {} E_{L||R}(B). \end{aligned}$$

2.1 Collision Resistance

In the collision resistance experiment, a computationally unbounded adversary \({\mathcal {A}}\) is given oracle access to a blockcipher E uniformly sampled among all blockciphers of key length 2n and word length n. We allow \({\mathcal {A}}\) to query both E and \(E^{-1}\). After q queries to E, the query history of \({\mathcal {A}}\) is the (ordered) set of triples \(\mathcal {Q}= \{(X_i, K_i, Y_i)\}_{i=1}^q\) such that \(E_{K_i}(X_i) = Y_i\) and \({\mathcal {A}}\)’s i-th query is either \(E_{K_i}(X_i)\) or \(E_{K_i}^{-1}(Y_i)\) for \(1 \le i \le q\). We let \(\mathcal {Q}_i = \{(X_j, K_j, Y_j)\}_{j=1}^i\) be the first i elements of the query history; thus \(\mathcal {Q}= \mathcal {Q}_q\). We say \({\mathcal {A}}\) succeeds or finds a collision after its first i queries if there exist distinct 3n-bit values, \(A||B||L\), \(A'||B'||L'\) such that \(\mathsf{TDM}^E(A||B||L) = \mathsf{TDM}^E(A'||B'||L')\) and such that \(\mathcal {Q}_i\) contains both the queries necessary to compute \(\mathsf{TDM}^E(A||B||L)\) and \(\mathsf{TDM}^E(A'||B'||L')\). More formally—and see Fig. 3—we define this event by a predicate \(\mathsf{Coll}(\mathcal {Q}_i)\), which is true if and only if there exist n-bit values A, B, L, R, S, \(A'\), \(B'\), \(L'\), \(R'\), \(S'\) such that

$$\begin{aligned}&A||B||L \ne A'||B'||L' \end{aligned}$$
(1)
$$\begin{aligned}&\quad A \oplus R = A' \oplus R' \end{aligned}$$
(2)
$$\begin{aligned}&\quad B \oplus S = B' \oplus S' \end{aligned}$$
(3)

and such that

$$\begin{aligned} (A, B||L, R), (B, L||R, S), (A', B'||L', R'), (B', L'||R', S') \in \mathcal {Q}_i. \end{aligned}$$
(4)

We denote by

$$\begin{aligned} \mathbf{Adv}_{\mathsf{TDM}}^\mathrm{coll}(q) \end{aligned}$$

the maximum chance of an adversary making q queries causing \(\mathsf{Coll}(\mathcal {Q})\) to become true. The probability occurs over the uniform choice of E and over \({\mathcal {A}}\)’s coin tosses, if any. Also note that n is a hidden parameter.

Fig. 3
figure 3

The collision diagram for Tandem-DM. The adversary must find blockcipher queries to fit both sides of the diagram such that \(A\oplus R = A'\oplus R'\), \(B\oplus S = B'\oplus S'\) and \(A||B||L \ne A'||B'||L'\). More precisely, the adversary must find four queries of the form \(E_{B||L}(A) = R\), \(E_{L||R}(B) = S\), \(E_{B'||L'}(A') = R'\), \(E_{L'||R'}(B') = S'\) such that the above conditions hold. Each query could either be learned through a forward query (to E) or through a backward query (to \(E^{-1}\)). The four queries in the diagram are labeled “TL,” “BL,” “TR,” “BR” for “Top Left,” “Bottom Left,” etc

The “XOR output” of a query \((X_i, K_i, Y_i)\) is the quantity \(X_i \oplus Y_i\). Another predicate which plays an important part in both our proof and the FGL proof is the “many queries with the same XOR output” predicate \(\mathsf{Xor}(\mathcal {Q})\), defined on a query history \(\mathcal {Q}= \{(X_i, K_i, Y_i)\}_{i=1}^q\) by

$$\begin{aligned} \mathsf{Xor}(\mathcal {Q}) \iff \max _{Z\in \{0,1\}^n} |\{i : X_i \oplus Y_i = Z\}| > \alpha . \end{aligned}$$

Here \(\alpha \) is a free parameter of the analysis which appears in the final collision resistance bound. (In [4], this predicate is named \(\textsc {Lucky}(\mathcal {Q})\); in [18] a similar predicate is named \(\textsf {Win0}(\mathcal {Q})\).) Without going into details at this point, we mention the FGL collision resistance proof—and ours, essentially, as well—upper bounds \(\Pr [\mathsf{Coll}(\mathcal {Q})]\) by \(\Pr [\mathsf{Xor}(\mathcal {Q})] + \Pr [\mathsf{Coll}(\mathcal {Q}) \wedge \lnot \mathsf{Xor}(\mathcal {Q})]\). A larger \(\alpha \) implies a lower value for \(\Pr [\mathsf{Xor}(\mathcal {Q})]\) and a higher value for \(\Pr [\mathsf{Coll}(\mathcal {Q}) \wedge \lnot \mathsf{Xor}(\mathcal {Q})]\). The best value of \(\alpha \) can be found numerically for a given value of n and q. Generally, readers may think of \(\alpha \) as some small constant value (e.g., for \(n = 128\) and \(q = 2^{120.87}\), \(\alpha = 16\)).

So far, we have described “infrastructure” that is common to both proofs. We shall now introduce some material proper to our proof. Note a query history \(\mathcal {Q}= \{(X_i, K_i, Y_i)\}_{i=1}^q\) does not record whether each triple \((X_i, K_i, Y_i)\) was obtained by the adversary through a forward query \(E_{K_i}(X_i)\) or a backward query \(E_{K_i}^{-1}(Y_i)\). For this, we maintain two arrays \(\textsf {Fwd}[\cdot ]\) and \(\textsf {Bwd}[\cdot ]\) where \(\textsf {Fwd}[i] = 1\) if and only if the adversary’s i-th query is a forward query and \(\textsf {Bwd}[i] = 1\) if and only if the adversary’s i-th query is a backward query. We then define an additional predicate

$$\begin{aligned} \mathsf{FB}(\mathcal {Q}) \iff \max _{Z\in \{0,1\}^n} |\{i : (Y_i = Z \wedge \textsf {Fwd}[i] = 1) \vee (X_i = Z \wedge \textsf {Bwd}[i] = 1)\}| > \alpha .\nonumber \\ \end{aligned}$$
(5)

(“FB” stands for “Forward Backward.”) Here \(\alpha \) is the same free parameter as above. Note that \(\lnot \mathsf{FB}(\mathcal {Q})\) implies that

$$\begin{aligned}&\max \nolimits _{Z\in \{0,1\}^n} |\{i : Y_i = Z \wedge \,\textsf {Fwd}[i] = 1\}| \le \alpha , \end{aligned}$$
(6)
$$\begin{aligned}&\quad \max \nolimits _{Z\in \{0,1\}^n} |\{i : X_i = Z \wedge \textsf {Bwd}[i] = 1\}| \le \alpha . \end{aligned}$$
(7)

It is really consequences (6) and (7) of \(\lnot \mathsf{FB}(\mathcal {Q})\) that interest us, though we define \(\mathsf{FB}(\mathcal {Q})\) via (5) because this makes it slightly easier to bound \(\Pr [\mathsf{FB}(\mathcal {Q})]\). We will use the bound

$$\begin{aligned} \Pr [\mathsf{Coll}(\mathcal {Q})]\le & {} \Pr [\mathsf{Xor}(\mathcal {Q})] + \Pr [\mathsf{Coll}(\mathcal {Q}) \wedge \lnot \mathsf{Xor}(\mathcal {Q})] \nonumber \\\le & {} \Pr [\mathsf{Xor}(\mathcal {Q})] + \Pr [\mathsf{FB}(\mathcal {Q})] + \Pr [\mathsf{Coll}(\mathcal {Q}) \wedge \lnot \mathsf{Xor}(\mathcal {Q}) \wedge \lnot \mathsf{FB}(\mathcal {Q})].\nonumber \\ \end{aligned}$$
(8)

One should thus think of \(\mathsf{FB}(\mathcal {Q})\) and \(\mathsf{Xor}(\mathcal {Q})\) as bad events whose non-occurrence helps bound the probability of \(\mathsf{Coll}(\mathcal {Q})\) occurring. We warn that (8) constitutes a slightly oversimplified encapsulation of our proof’s high-level structure. We refer to Sect. 3 for more details.

2.2 Preimage Resistance

In the preimage resistance experiment, a computationally unbounded adversary \({\mathcal {A}}\) with oracle access to a uniformly sampled blockcipher \(E : \{0,1\}^{2n} \times \{0,1\}^n \rightarrow \{0,1\}^n\) selects and announces a point \(C \in \{0,1\}^{2n}\), before making queries to E. \({\mathcal {A}}\) makes queries to both E and \(E^{-1}\) and records its query history \(\mathcal {Q}= \{(X_i, K_i, Y_i)\}_{i=1}^q\). We say \({\mathcal {A}}\) succeeds or finds a preimage if its query history \(\mathcal {Q}\) contains the means of computing a preimage of C, in the sense that there exist values A, B, L, R, \(S \in \{0,1\}^n\) such that \(A \oplus R ||B \oplus S = C\) and such that the queries \((A, B||L, R)\), \((B, L||R, S)\) are in \(\mathcal {Q}\). (In this case, we say \(\mathcal {Q}\) contains a preimage of C.)

Unfortunately, Tandem-DM has the particularity that the point \(0^{2n}\) is weaker than other range points with respect to preimage resistance. Indeed, to find a preimage of \(0^{2n}\) (given a random blockcipher), an adversary can make queries of the form \(E_{U||U}(U)\) for different values of U until it finds a U such that \(E_{U||U}(U) = U\); then it is easy to see that \(\mathsf{TDM}^E(U||U||U) = 0^{2n}\). The probability (over the choice of E) of this attack succeeding in q queries is \(1-(1-1/2^n)^q\approx q/2^n\), since a different key is used for each query. On the other hand, we shall see that all nonzero points in \(\{0,1\}^{2n}\) have much better preimage resistance than \(q/2^n\), at least for q in the range of interest (i.e., \(q = o(2^n), \omega (1)\)). We also note this preimage attack on \(0^{2n}\) is nearly matched by an easily proved preimage resistance bound of \(q/(2^n - q)\) for \(0^{2n}\) (or any other point in \(\{0,1\}^{2n}\)); the bound follows from the fact that a necessary condition for inverting \(0^{2n}\) is to find a query with XOR output \(0^n\).

One solution for avoiding issues associated with \(0^{2n}\) is to have the point-to-invert be chosen at random from \(\{0,1\}^{2n}\); in this case there is chance at most \(1/2^{2n}\) anyway that \(0^{2n}\) is the image to invert. However, we find it slightly more interesting to emphasize that \(0^{2n}\) is the only “bad” point in the range by letting the adversary choose which point to invert, under the stipulation that the adversary is not allowed to choose \(0^{2n}\) (for which we anyway have the above \(q/(2^n-q)\) preimage resistance bound which, though worse than the preimage resistance bound we shall prove for nonzero points, is acceptable from a practical standpoint).

Thus our preimage resistance experiment is modified as follows: An adversary \({\mathcal {A}}\) announces a point \(C \in \{0,1\}^{2n}\), \(C \ne 0^{2n}\), before making queries to E. The adversary wins after q queries if its query history \(\mathcal {Q}= \{(X_i, K_i, Y_i)\}_{i=1}^q\) contains the means of computing a preimage of C, in the sense described above. We denote by

$$\begin{aligned} \mathbf{Adv}_{\mathsf{TDM}}^{\mathrm{epre}\ne 0}(q) \end{aligned}$$

the maximum advantage of any (probabilistic, computationally unbounded) adversary at this game. We note that here, too, n is a hidden parameter of the advantage. Moreover, we let

$$\begin{aligned} \mathsf{Preim}(\mathcal {Q}) \end{aligned}$$

be the predicate that is true if and only if \(\mathcal {Q}\) contains a preimage of C, where C is an elided-but-understood parameter of the predicate. Thus, \(\mathbf{Adv}_{\mathsf{TDM}}^{\mathrm{epre}\ne 0}(q)\) is the maximum of \(\Pr [\mathsf{Preim}(\mathcal {Q})]\) taken over all q-query adversaries \({\mathcal {A}}\), the probability being taken over E and the coins of \({\mathcal {A}}\). We always assume that \({\mathcal {A}}\) is honest in the sense of choosing a nonzero value C.

3 Collision Resistance of Tandem-DM

It will be easier to explain the form of the probability bound in our main theorem if we explain a few high-level ideas from the proof beforehand. The proof starts by considering an arbitrary q-query collision-finding adversary \({\mathcal {A}}\) for Tandem-DM. We then construct an adversary \({\mathcal {A}}'\) as follows: \({\mathcal {A}}'\) simulates \({\mathcal {A}}\), but after each forward query \(E_{V||W}(U)\) made by \({\mathcal {A}}\), \({\mathcal {A}}'\) makes the backward query \(E_{U||V}^{-1}(W)\) if it does not already knowFootnote 1 the answer to this query, and after each backward query \(E_{U||V}^{-1}(W)\) made by \({\mathcal {A}}\), \({\mathcal {A}}'\) makes the forward query \(E_{V||W}(U)\) if it does not already knowFootnote 2 the answer to this query. (To better understand the relation of these instructions to Tandem-DM, view U, V, W as B, L, R.) Moreover if \({\mathcal {A}}\) ever makes a query to which \({\mathcal {A}}'\) already knows the answer from its query history, \({\mathcal {A}}'\) ignores this query. Thus \({\mathcal {A}}'\) never makes a query to which it knows the answer.

Let \(\mathcal {Q}'\) be the query history of \({\mathcal {A}}'\) and \(\mathcal {Q}\) be the query history of \({\mathcal {A}}\). Then \(\mathcal {Q}\subseteq \mathcal {Q}'\) and \(|\mathcal {Q}'| \le 2q\). Since \(\mathcal {Q}\subseteq \mathcal {Q}'\) we have

$$\begin{aligned}&\Pr [\mathsf{Coll}(\mathcal {Q})] \le \Pr [\mathsf{Coll}(\mathcal {Q}')] \le \Pr [\mathsf{Xor}(\mathcal {Q}')] + \Pr [\mathsf{FB}(\mathcal {Q}')] \nonumber \\&\quad + \Pr [\mathsf{Coll}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')]. \end{aligned}$$
(9)

Our proof uses the inequality above to bound \(\Pr [\mathsf{Coll}(\mathcal {Q})]\). The use of the augmented adversary \({\mathcal {A}}'\) may seem superficially similar to Fleischmann et al.’s idea of “giving away a query for free.” However, it will become clear from our case analysis that we exploit the added structure of \(\mathcal {Q}'\) entirely differently from the way Fleischmann et al. exploit their free queries. We also point out that the added structure of \(\mathcal {Q}'\) enables the main interesting trick of our analysis, to be found in case “TL Forward” of Proposition 3 below.

We can now more easily discuss our main result:

Theorem 1

Let \(N = 2^n\), \(q < N/2\), \(N' = N - 2q\) and let \(\alpha \) be an integer, \(1\le \alpha \le 2q\). Then

$$\begin{aligned} \mathbf{Adv}_{\mathsf{TDM}}^\mathrm{coll}(q) \le 2N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha } + \frac{4q\alpha }{N'} + \frac{4q}{N'}. \end{aligned}$$

The term \(2N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha }\) in Theorem 1 is an upper bound for \(\Pr [\mathsf{Xor}(\mathcal {Q}')] + \Pr [\mathsf{FB}(\mathcal {Q}')]\). In fact \(\Pr [\mathsf{Xor}(\mathcal {Q}')] \le N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha }\) and \(\Pr [\mathsf{FB}(\mathcal {Q}')] \le N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha }\). The two remaining terms \(4q\alpha /N' + 4q/N'\) are an upper bound for \(\Pr [\mathsf{Coll}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')]\). To bound \(\mathbf{Adv}_{\mathsf{TDM}}^\mathrm{coll}(q)\) for a given value of n and q, one should optimize \(\alpha \) numerically. For example, for \(n = 128\), Theorem 1 yields that \(\mathbf{Adv}_{\mathsf{TDM}}^\mathrm{coll}(2^{120.87}) < \frac{1}{2}\) using \(\alpha = 16\).

Asymptotically, Theorem 1 yields the following result:

Corollary 1

\(\lim _{n\rightarrow \infty }{} \mathbf{Adv}_{\mathsf{TDM}}^\mathrm{coll}\left( N/n\right) =0\).

Proof

Let \(q=N/n\) and \(\alpha =n/\log n\), where the logarithm takes base 2. Since \(N'>N/2\) for \(n>4\), we have

$$\begin{aligned} \mathbf{Adv}_{\mathsf{TDM}}^\mathrm{coll}(q)&\le 2N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha } +\frac{4q\alpha }{N'}+\frac{4q}{N'}\le 2N\left( \frac{4eq}{\alpha N}\right) ^{\alpha } +\frac{8q\alpha }{N}+\frac{8q}{N} \\&\le 2N\left( \frac{4e\log n}{n^2}\right) ^{\frac{n}{\log n}} +\frac{8}{\log n}+\frac{8}{n} =2\left( \frac{4e\log n}{n}\right) ^{\frac{n}{\log n}}+\frac{8}{\log n}+\frac{8}{n}. \end{aligned}$$

The last expression obviously goes to zero as \(n \rightarrow \infty \). \(\square \)

In particular, \(\lim _{n\rightarrow \infty }\mathbf{Adv}_{\mathsf{TDM}}^\mathrm{coll}\left( 2^{(1-\varepsilon )n}\right) =0\) for any fixed \(\varepsilon > 0\).

The proof of Theorem 1 uses refinements \(\mathsf{Coll}_{1}(\mathcal {Q})\), \(\mathsf{Coll}_{2}(\mathcal {Q})\), \(\mathsf{Coll}_{3}(\mathcal {Q})\) of the collision predicate \(\mathsf{Coll}(\mathcal {Q})\), defined as follows:

  • \(\mathsf{Coll}_{1}(\mathcal {Q})\) occurs if \(\mathcal {Q}\) contains a collision with TL, BL, TR, BR distinct.

  • \(\mathsf{Coll}_{2}(\mathcal {Q})\) occurs if \(\mathcal {Q}\) contains a collision with either TL = BL or TR = BR.

  • \(\mathsf{Coll}_{3}(\mathcal {Q})\) occurs if \(\mathcal {Q}\) contains a collision with either TL = BR or BL = TR.

For example, \(\mathsf{Coll}_{2}(\mathcal {Q})\) occurs if there exist values \(A, B, L, R, S, A', B', L', R', S'\) such that (1 2 3)–(4) hold and such that \((A, B||L, R) = (B, L||R, S)\). Since BL \(\ne \) BR and TL \(\ne \) TR in any collision, we have the following proposition.

Proposition 1

\(\mathsf{Coll}(\mathcal {Q}) \implies \mathsf{Coll}_{1}(\mathcal {Q}) \vee \mathsf{Coll}_{2}(\mathcal {Q}) \vee \mathsf{Coll}_{3}(\mathcal {Q})\) for any query history \(\mathcal {Q}\).

In view of proving Theorem 1, let \({\mathcal {A}}\) be an arbitrary q-query adversary for Tandem-DM, and let \({\mathcal {A}}'\) be obtained from \({\mathcal {A}}\) as outlined above; let \(\mathcal {Q}\) be the query history of A and \(\mathcal {Q}'\) be the query history of \({\mathcal {A}}'\). Then by (9), it suffices to show that

$$\begin{aligned} \Pr [\mathsf{Xor}(\mathcal {Q}')]\le & {} N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha }\\ \Pr [\mathsf{FB}(\mathcal {Q}')]\le & {} N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha }\\ \Pr [\mathsf{Coll}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')]\le & {} \frac{4q\alpha }{N'} + \frac{4q}{N'} \end{aligned}$$

since the sum of the above probabilities is an upper bound for \(\Pr [\mathsf{Coll}(\mathcal {Q})]\). Moreover, by Proposition 1, \(\Pr [\mathsf{Coll}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')]\) can be upper bounded by finding upper bounds for \(\Pr [\mathsf{Coll}_i(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')]\) for \(i = 1, 2, 3\) and taking the sum of these. We now upper bound these various probabilities in a series of propositions. For these propositions, q, N and \(\alpha \) are as in Theorem 1, and \(\mathcal {Q}'\) is the query history of any adversary \({\mathcal {A}}'\) as just specified. We emphasize that \(|\mathcal {Q}'| \le 2q\) and that probabilities are taken over the random cipher E and over the coins of \({\mathcal {A}}'\), if any (it inherits these from \({\mathcal {A}}\)).

Proposition 2

\(\Pr [\mathsf{Xor}(\mathcal {Q}')] \le N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha }\) and \(\Pr [\mathsf{FB}(\mathcal {Q}')] \le N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha }\).

Proof

Without loss of generality, we could assume that \(A'\) always makes exactly 2q queries. Let \(\mathcal {Q}' = \{(X'_i, K'_i, Y'_i)\}_{i=1}^{2q}\) denote the query history of \({\mathcal {A}}'\). Since

$$\begin{aligned} \Pr [|\{i : X'_i \oplus Y'_i = Z\}| > \alpha ]\le \left( {\begin{array}{c}2q\\ \alpha \end{array}}\right) \left( \frac{1}{N'}\right) ^{\alpha }, \end{aligned}$$

for each \(Z\in \{0,1\}^n\), we have

$$\begin{aligned} \Pr [\mathsf{Xor}(\mathcal {Q}')]\le N\left( {\begin{array}{c}2q\\ \alpha \end{array}}\right) \left( \frac{1}{N'}\right) ^{\alpha } \le N\left( \frac{2q\cdot e}{\alpha }\right) ^{\alpha }\left( \frac{1}{N'}\right) ^{\alpha }\le N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha }. \end{aligned}$$

\(\Pr [\mathsf{FB}(\mathcal {Q}')]\) can be bounded similarly.\(\square \)

Proposition 3

\(\Pr [\mathsf{Coll}_{1}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le 4q\alpha /N'\).

Proof

Let

$$\begin{aligned} \mathsf{Success}_1(\mathcal {Q}'_i) = \mathsf{Coll}_{1}(\mathcal {Q}'_i) \wedge \lnot \mathsf{Coll}_{1}(\mathcal {Q}'_{i-1}) \wedge \lnot \mathsf{Xor}(\mathcal {Q}'_{i-1}) \wedge \lnot \mathsf{FB}(\mathcal {Q}'_{i-1}) \end{aligned}$$

for \(i = 1\ldots 2q\). Then \(\Pr [\mathsf{Coll}_{1}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le \sum _{i=1}^{2q}\Pr [\mathsf{Success}_1(\mathcal {Q}'_i)]\) and \(\Pr [\mathsf{Success}_1(\mathcal {Q}'_i)] \le \Pr [\mathsf{Coll}_{1}(\mathcal {Q}'_i) | \lnot \mathsf{Coll}_{1}(\mathcal {Q}'_{i-1}) \wedge \lnot \mathsf{Xor}(\mathcal {Q}'_{i-1}) \wedge \lnot \mathsf{FB}(\mathcal {Q}'_{i-1})]\).

Fix a value of i, \(1 \le i \le 2q\). We call the ith query made by \({\mathcal {A}}'\) the last query. If \(\mathsf{Success}_1(\mathcal {Q}'_i)\) occurs, then either the adversary (i.e., . \({\mathcal {A}}'\)) can use its last query as query TL or as query BL of a collision in which TL, BL, TR and BR are distinct, by symmetry. Moreover the last query could either be a forward query or a backward query. This gives rise to four possible cases, and we bound \(\Pr [\mathsf{Success}_1(\mathcal {Q}'_i)]\) for each separately. (We note the very first case, “TL forward,” is the case we discussed in “Appendix.”) For each case, we call the last query successful if this query completes a collision with TL, BL, TR, BR distinct and where the last query is used in the position stipulated by that case (e.g., ., for the case “TL forward,” the last query must be used in position TL).

TL forward: Let the last query be \(E_{B||L}(A)\). Call a value R good if there exists a query of the form \((B, L||R, \cdot )\) in \(\mathcal {Q}'\) that was obtained by \({\mathcal {A}}'\) as a backward query. We note that because of (7), \(\lnot \mathsf{FB}(\mathcal {Q}'_{i-1})\) implies there are at most \(\alpha \) good R’s.

We claim that for the last query to be successful the value R returned as an answer to the query must be good. Indeed, let R be the value returned; then a prerequisite for the query to be successful is that there be a query of the form \((B, L||R, \cdot )\) in \(\mathcal {Q}'_{i-1}\). We claim that this query must have been obtained as a backward query. Indeed, assume that the query \((B, L||R, \cdot )\) was obtained as a forward query \(E_{L||R}(B)\) by \({\mathcal {A}}'\). Then, by construction, \({\mathcal {A}}'\) would have immediately followed this query by the query \(E_{B||L}^{-1}(R)\) unless \({\mathcal {A}}'\) already knew the answer to \(E_{B||L}^{-1}(R)\). Either way, \({\mathcal {A}}'\) would have the query \((A, B||L, R)\) in its query history prior to the ith (forward) query \(E_{B||L}(A)\), a contradiction since \({\mathcal {A}}'\) never makes a query to which it knows the answer. Thus the value R returned as an answer to the query \(E_{B||L}(A)\) must be good for the query to be successful.

Since there are at most \(\alpha \) good values of R and since \({\mathcal {A}}'\) makes at most 2q queries, the probability that the last query is successful is therefore at most \(\alpha /(2^n - 2q) = \alpha /N'\).

TL backward: Let the last query be \(E_{B||L}^{-1}(R)\). For the last query to be successful, there must be a (necessarily unique) query BL \(= (B, L||R, S) \in \mathcal {Q}'_{i-1}\), for some value \(S \in \{0,1\}^n\). From the condition \(B \oplus S = B' \oplus S'\) and from \(\lnot \mathsf{Xor}(\mathcal {Q}'_{i-1})\), there are at most \(\alpha \) possibilities for the query BR. As each query BR uniquely determines the query TR, there are at most \(\alpha \) possibilities for the query TR as well and thus at most \(\alpha \) possibilities for the value \(A' \oplus R'\). Thus the value A returned by the last query has chance at most \(\alpha /N'\) that \(A \oplus R\) will be equal to \(A' \oplus R'\) for one of these values \(A' \oplus R'\), and so the last query has chance at most \(\alpha /N'\) of being successful.

BL forward: A \(180^{\circ }\) rotation of the collision diagram shows this case is symmetric to the case TL backward. The chance of success in this case is therefore at most \(\alpha /N'\).

BL backward: A \(180^{\circ }\) rotation of the collision diagram shows this case is symmetric to the case TL forward. The chance of success in this case is therefore at most \(\alpha /N'\).

The chance a forward last query is successful is therefore at most \(2\alpha /N'\) (adding the TL and BL forward cases), and likewise, the chance that a backward last query is successful is at most \(2\alpha /N'\). Thus \(\Pr [\mathsf{Success}_1(\mathcal {Q}'_i)] \le 2\alpha /N'\) for all i and \(\sum _{i=1}^{2q}\Pr [\mathsf{Success}_1(\mathcal {Q}'_i)] \le 4q\alpha /N'\). \(\square \)

Proposition 4

\(\Pr [\mathsf{Coll}_{2}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le 2q/N'\).

Proof

Note that when TL = BL, \(B||L = L||R\), so \(B = L = R\); moreover \(R = S\) and \(A = B\), so \(A = B = L = R = S\). For the adversary to obtain a collision with TL = BL, therefore, it must obtain a query of the form \((U, U||U, U)\). The same argument applies to the case TR = BR. The chance of a query \(E_{U||U}(U)\) or of a query \(E_{U||U}^{-1}(U)\) being answered by U is at most \(1/N'\). Thus, since 2q queries are made total, \(\Pr [\mathsf{Coll}_{2}(\mathcal {Q}')] \le 2q/N'\). \(\square \)

Proposition 5

\(\Pr [\mathsf{Coll}_{3}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le 2q\alpha /N' + 2q/N'\).

Proof

Note that in a collision with TL = BR, we must have TL \(\ne \) BL and \(A \oplus R = B \oplus S\) (since \(B \oplus S = B' \oplus S' = A \oplus R\), using TL = BR). Say the event \(\mathsf{Coll}_{3}'(\mathcal {Q}'_i)\) occurs if there exist distinct queries \((A, B||L, R)\), \((B, L||R, S)\) in \(\mathcal {Q}_i'\) such that \(A\oplus R = B \oplus S\). With the same argument applied to the case BL = TR, we have \(\mathsf{Coll}_{3}(\mathcal {Q}_i') \implies \mathsf{Coll}_{3}'(\mathcal {Q}_i')\). Therefore it suffices to show \(\Pr [\mathsf{Coll}_{3}'(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le 2q\alpha /N' + 2q/N'\).

The analysis now proceeds rather similarly to Proposition 3. Let

$$\begin{aligned} \mathsf{Success}_3'(\mathcal {Q}'_i) = \mathsf{Coll}_{3}'(\mathcal {Q}'_i) \wedge \lnot \mathsf{Coll}_{3}'(\mathcal {Q}'_{i-1}) \wedge \lnot \mathsf{Xor}(\mathcal {Q}'_{i-1}) \wedge \lnot \mathsf{FB}(\mathcal {Q}'_{i-1}). \end{aligned}$$

We have \(\Pr [\mathsf{Coll}_{3}'(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le \sum _{i=1}^{2q}\Pr [\mathsf{Success}_3'(\mathcal {Q}'_i)]\).

Fix a value of i, \(1 \le i \le 2q\), and call the ith query made by \(A'\) the last query. If \(\mathsf{Success}_3'(\mathcal {Q}'_i)\) occurs, then either the adversary (i.e., . \({\mathcal {A}}'\)) can use its last query as query TL or as query BL of its \(\mathsf{Coll}_{3}'\)-solution. This gives rise to four possible cases given that the last query could be either forward or backward. In each case, we call the last query successful if \(\mathsf{Success}_3'(\mathcal {Q}'_i)\) occurs and if the last query can be used in the position prescribed by that case (either TL or BL) in the \(\mathsf{Coll}_{3}'\)-solution.

TL forward: We can use exactly the same analysis as in the case “Forward TL” of Proposition 3. The probability that the last query is successful is therefore at most \(\alpha /N'\).

TL backward: Let \(E_{B||L}^{-1}(R)\) be the last query. For the last query to be successful, there must be a (necessarily unique) query of the form \((B, L||R, S) \in \mathcal {Q}_{i-1}'\), for some \(S \in \{0,1\}^n\). Since the answer A to the last query must be such that \(A \oplus R = B \oplus S\) (as per the definition of \(\mathsf{Coll}_{3}'\)) and \(B \oplus S\) is uniquely determined, the last query has chance at most \(1/N'\) of success.

BL forward: A \(180^{\circ }\) rotation of the collision diagram shows this case is symmetric to the case TL backward. The chance of success in this case is therefore at most \(1/N'\).

BL backward: A \(180^{\circ }\) rotation of the collision diagram shows this case is symmetric to the case TL forward. The chance of success in this case is therefore at most \(\alpha /N'\).

The chance a forward last query is successful is therefore at most \((\alpha +1)/N'\) (adding the TL and BL forward cases), and likewise, the chance that a backward last query is successful is at most \((\alpha +1)/N'\). Thus \(\Pr [\mathsf{Success}_3'(\mathcal {Q}'_i)] \le (\alpha +1)/N'\) for all i and \(\sum _{i=1}^{2q}\Pr [\mathsf{Success}_1(\mathcal {Q}'_i)] \le 2q\alpha /N' + 2q/N'\). (In fact, we even have \(\Pr [\mathsf{Coll}_{3}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le 2q\alpha /N' + 2q/N'\) since \(\lnot \mathsf{Xor}(\mathcal {Q}')\) was never used in the above.) \(\square \)

Taking the sum of the bounds of Propositions 3, 4 and 5, one obtains that

$$\begin{aligned} \Pr [\mathsf{Coll}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le \frac{6q\alpha }{N'} + \frac{4q}{N'}. \end{aligned}$$

However, cases TL forward, BL backward and cases TL forward, BL backward of Propositions 3 and 5 reference the same events (the adversary is successful in case TL forward of Proposition 3 if and only if it is successful in case TL forward of Proposition 5 and likewise for the BL backward cases), which results in an “overcounting” of the adversary’s probability of success by \(2q\alpha /N'\). A more careful accounting of the adversary’s probability of success thus shows

$$\begin{aligned} \Pr [\mathsf{Coll}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')] \le \frac{4q\alpha }{N'} + \frac{4q}{N'}. \end{aligned}$$
(10)

Here we have not established (10) entirely formally, though this is the bound we use for \(\Pr [\mathsf{Coll}(\mathcal {Q}') \wedge \lnot \mathsf{Xor}(\mathcal {Q}') \wedge \lnot \mathsf{FB}(\mathcal {Q}')]\) in Theorem 1. Establishing (10) formally would require dividing the event \(\mathsf{Coll}(\mathcal {Q})\) into a different, less intuitive set of events than \(\mathsf{Coll}_{1}(\mathcal {Q})\), \(\mathsf{Coll}_{2}(\mathcal {Q})\), \(\mathsf{Coll}_{3}(\mathcal {Q})\), events that are directly based on those that occur in the case analyses of Propositions 3–5. (For example, one of these events would be the event that the adversary ever obtains a “good R” through a forward or backward query, as defined for forward queries in case TL forward of Proposition 3 and implicitly defined (by symmetry) for backward queries in case BL backward of Proposition 3; another event would cover the cases TL backward and BL forward of Proposition 5, and so on.) The current form of the proof is our best compromise between readability and formality. In any case, the difference between \(4q\alpha /N'\) and \(6q\alpha /N'\) is relatively minor.

Summing (10) with the bounds of Proposition 2 and using (9), we obtain

$$\begin{aligned} \Pr [\mathsf{Coll}(\mathcal {Q})] \le 2N\left( \frac{2eq}{\alpha N'}\right) ^{\alpha } + \frac{4q\alpha }{N'} + \frac{4q}{N'}. \end{aligned}$$
(11)

Since (11) holds for an arbitrary q-query adversary \({\mathcal {A}}\), this establishes Theorem 1.

4 Preimage Resistance of Tandem-DM

To build some intuition for our preimage resistance analysis, let us start with considering the much easier problem of constructing a 3n-bit to 2n-bit compression function H based on two 3n-bit to n-bit smaller underlying primitives f and \(f^{\prime }\). An obvious approach is simply to concatenate the outputs of f and \(f^{\prime }\), that is let \(H(B)=f(B) ||f^{\prime }(B)\) for \(B\in \{0,1\}^{3n}\). If f and \(f^{\prime }\) are modeled as independently sampled, ideally random functions, then it is not hard to see that H behaves ideally as well. In particular, it is preimage resistant up to \(2^{2n}\) queries (to f and \(f^{\prime }\)).

When switching to a blockcipher-based scenario, it is natural to replace f and \(f^{\prime }\) in the construction above by E, resp. \(E^{\prime }\), both run in Davies–Meyer mode. In other words, for blockciphers E and \(E^\prime \) both with 2n-bit keys and operating on n-bit blocks, define \(H(A||B) = (E_B(A)\oplus A) ||(E^{\prime }_B(A)\oplus A)\) where \(A\in \{0,1\}^n\) and \(B\in \{0,1\}^{2n}\). While there is every reason to believe this construction maintains preimage resistance up to \(2^{2n}\) queries, the standard proof technique against adaptive adversaries falls short significantly. Indeed, the usual argument goes that the ith query an adversary makes to E using key K will return an answer uniform from a set of size at least \(2^n-(i-1)\), and thus, the probability of hitting a prespecified value is at most \(1/(2^n-(i-1)) < 1/(2^n-q)\). Unfortunately, once q approaches \(2^n\), the denominator tends to zero (rendering the bound useless). As a result, one cannot hope to prove anything beyond \(2^n\) queries using this method. This restriction holds even for a “typical” bound of type \(q/(2^n-q)^2\).

When considering non-adaptive adversaries only, the situation is far less grim. Such adversaries need to commit to all queries in advance, which allows bounding the probability of each individual query hitting a prespecified value by \(2^{-n}\). While obviously there are dependencies (in the answers), these can safely be ignored when a union bound is later used to combine the various individual queries. Since the q offset has disappeared from the denominator, the typical bound \(q/(2^n)^2\) would give the desired security.

Our solution, then, is to force an adaptive adversary to behave non-adaptively. As this might sound a bit cryptic, let us be more precise. Consider an adversary adaptively making queries to the blockcipher, using the same key throughout. As soon as the number of queries to this key passes a certain threshold, we give the remaining queries to the blockcipher using this very key for free. We will refer to this event as a super query. Since these free queries are all asked in one go, they can be dealt with non-adaptively, preempting the problems that occur (in standard proofs) due to adaptive queries. Nonetheless, for every super query we need to hand out a very large number of free queries, which can aid the adversary. Thus we need to limit the number of super queries an adversary can make by setting the threshold that triggers a super query sufficiently high. In fact, we set the threshold at exactly half the total number of queries that can be made under a given key (i.e., it is set at \(2^n/2\) queries). This effectively doubles the adversary’s query budget, since for every query the adversary makes it can get another one later “for free” (if it keeps on making queries under the same key), but such a doubling of the number of queries does not lead to an unacceptable deterioration of the security bound.

Now our preimage resistance bound for Tandem-DM, parameterized by a certain parameter \(\alpha \), is given as follows.

Theorem 2

Let \(N=2^n, q<N^2\) and let \(\alpha > 0\) be an integer. Then

$$\begin{aligned} \mathbf{Adv}_{\mathsf{TDM}}^{\mathrm{epre}\ne 0}(q)\le & {} \frac{16\alpha }{N} + \frac{8q}{N^2(N-2)} + 2\cdot \left( \frac{2eq}{\alpha N}\right) ^\alpha + \frac{4q}{\alpha N}. \end{aligned}$$

Proof

Let \(U||V \ne 0^n||0^n\) be the point to invert, chosen by the adversary before making any queries to E. We upper bound the probability that, in q queries, the adversary finds a point \(A||B||M \in \{0,1\}^{3n}\) such that \(\mathsf{TDM}^{E}(A||B||M) = U||V\).

In this analysis, we give free queries to the adversary as follows: Whenever the adversary has made N / 2 queries under a given key \(K||L\), and after the (N / 2)-th such query has been answered and placed in the query history, we give the remaining N / 2 queries under the key \(K||L\) for free to the adversary, in any order. In this case, we say that a super query occurs; every query in the query history is either part of a super query, or not; in the latter case we call the query a “normal query.” (Thus, in this theorem, normal queries are exactly the non-free queries.) We alert the reader to the fact that a “super query” consists of a set of N / 2 queries, whereas a “normal query” is a single query.

We define an event \(\mathsf{Lucky}(\mathcal {Q})\) on the query history; \(\mathsf{Lucky}(\mathcal {Q})\) occurs if

$$\begin{aligned} |\{(X, K||L, Y) \in \mathcal {Q}: X \oplus Y = U\}| > 2\alpha , \end{aligned}$$

or if

$$\begin{aligned} |\{(X, K||L, Y) \in \mathcal {Q}: X \oplus Y = V\}| > 2\alpha . \end{aligned}$$

The adversary obtains a preimage of \(U||V\) precisely if it obtains queries of the form \((A, B||M, R)\), \((B, M||R, S)\) such that \(A \oplus R = U\), \(B \oplus S = V\). It is easy to see these two queries must be distinct; otherwise, we would have \(A = B = M = R = S\) and therefore \(U||V = 0^n||0^n\). We call two queries as above a “winning pair” of queries, where the two elements of a winning pair need not be adjacent in the query history (and could be in any order). We speak of the “first” and “second” query in a winning pair referring to the order in which they appear in the query history.

Let \(\mathsf{WinNormal}(\mathcal {Q})\) be the event that the adversary obtains a winning pair in which the second query is a normal query. Let \(\mathsf{WinSuper}_1(\mathcal {Q})\) be the event that the adversary obtains a winning pair in which the second query is part of a super query and the first is either normal or part of a super query, but is not part of the same super query as the second. Finally let \(\mathsf{WinSuper}_2(\mathcal {Q})\) be the event that the adversary obtains a winning pair in which both queries of the pair are part of the same super query. It is then clear that if the adversary wins, one of the events

$$\begin{aligned} \mathsf{WinNormal}(\mathcal {Q}), \mathsf{WinSuper}_1(\mathcal {Q})\, \text { or }\, \mathsf{WinSuper}_2(\mathcal {Q}) \end{aligned}$$

occurs. In particular, thus, one of the four events

$$\begin{aligned}&\mathsf{Lucky}(\mathcal {Q}),\, \mathsf{WinNormal}(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q}),\, \mathsf{WinSuper}_1(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q}), \\&\quad \mathsf{WinSuper}_2(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q}) \end{aligned}$$

must occur if the adversary wins. We upper bound the probability of each of these four events and sum the upper bounds in order to obtain an upper bound on the adversary’s advantage.

We start by upper bounding \(\Pr [\mathsf{Lucky}(\mathcal {Q})]\). For this we introduce two new events. Let \(\mathcal {Q}_\mathrm{n}\) be the restriction of \(\mathcal {Q}\) to normal queries, and let \(\mathcal {Q}_\mathrm{s}\) be the restriction of \(\mathcal {Q}\) to queries that are part of super queries. Let \(\mathsf{Lucky}_\mathrm{n}(\mathcal {Q})\) be the event that either

$$\begin{aligned} |\{(X, K||L, Y) \in \mathcal {Q}_\mathrm{n} : X \oplus Y = U\}| > \alpha , \end{aligned}$$

or

$$\begin{aligned} |\{(X, K||L, Y) \in \mathcal {Q}_\mathrm{n} : X \oplus Y = V\}| > \alpha . \end{aligned}$$

The event \(\mathsf{Lucky}_\mathrm{s}(\mathcal {Q})\) is likewise defined with respect to \(\mathcal {Q}_\mathrm{s}\). Obviously, \(\mathsf{Lucky}(\mathcal {Q}) \implies \mathsf{Lucky}_\mathrm{n}(\mathcal {Q}) \vee \mathsf{Lucky}_\mathrm{s}(\mathcal {Q})\), so it suffices to upper bound \(\mathsf{Lucky}_\mathrm{n}(\mathcal {Q})\) and \(\mathsf{Lucky}_\mathrm{s}(\mathcal {Q})\) and to sum these upper bounds.

Since every answer to a normal query, forward or backward, comes at random from a set of size at least N / 2, and since at most q normal queries are made, we have that

$$\begin{aligned} \Pr [\mathsf{Lucky}_\mathrm{n}(\mathcal {Q})] \le 2\cdot {q \atopwithdelims ()\alpha }\left( \frac{2}{N}\right) ^{\alpha } \le 2\cdot \left( \frac{2eq}{\alpha N}\right) ^\alpha . \end{aligned}$$

To upper bound \(\Pr [\mathsf{Lucky}_\mathrm{s}(\mathcal {Q})]\), note that there occur at most \(q/(N/2) = 2q/N\) super queries, since it costs N / 2 queries to set up a super query for a given key. Each super query contains N / 2 queries, so we can define random variables \(Z_{i,j}\) for \(1\le i\le 2q/N\) and \(1\le j\le N/2\), where \(Z_{i,j}=1\) if and only if \(X \oplus Y = U\) for the j-th query \((X, K||L, Y)\) within the ith super query. Then we have

$$\begin{aligned} Z=\sum _{i,j}Z_{i,j}=|\{(X, K||L, Y) \in \mathcal {Q}_\mathrm{s} : X \oplus Y = U\}|. \end{aligned}$$

Since \(\mathrm {E}(Z_{i,j})\le 2/N\) for each i and j, we have \(\mathrm {E}(Z)\le (2q/N)(N/2)(2/N)=2q/N\). Therefore, by Markov’s inequality, the probability that

$$\begin{aligned} |\{(X, K||L, Y) \in \mathcal {Q}_\mathrm{s} : X \oplus Y = U\}| > \alpha \end{aligned}$$

is at most \(2q/\alpha N\). Now by a union bound and a symmetric argument (for \(X \oplus Y = V\)) , we obtain that \(\Pr [\mathsf{Lucky}_\mathrm{s}(\mathcal {Q})] \le 4q/\alpha N\). Summing the upper bounds for \(\Pr [\mathsf{Lucky}_\mathrm{n}(\mathcal {Q})]\) and \(\Pr [\mathsf{Lucky}_\mathrm{s}(\mathcal {Q})]\), we thus obtain that

$$\begin{aligned} \Pr [\mathsf{Lucky}(\mathcal {Q})] \le 2\cdot \left( \frac{2eq}{\alpha N}\right) ^\alpha + \frac{4q}{\alpha N}. \end{aligned}$$
(12)

To upper bound \(\Pr [\mathsf{WinNormal}(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q})]\), we use a “wish list” argument. As the adversary makes queries, we maintain two sequences \(\mathcal {W}_\mathrm{T}\) and \(\mathcal {W}_\mathrm{B}\) called wish lists. These are initially empty. For each query \((X, K||L, Y)\) added to the query history (whether normal or part of a super query), we update the wish lists as follows:

  1. 1.

    If \(X \oplus Y = U\), then \((K, L||Y, K \oplus V)\) is added to \(\mathcal {W}_\mathrm{B}\).

  2. 2.

    If \(X \oplus Y = V\), then \((L \oplus U, X||K, L)\) is added to \(\mathcal {W}_\mathrm{T}\).

The following properties are easy to check: (1) A query never “adds itself” to a wish list (this uses \(U||V \ne 0^n||0^n\)); (2) the elements within each wish list are all distinct from one another; (3) the adversary obtains a winning pair precisely if it obtains a query that is already in one of its wish lists (at the moment of insertion of that query into the query history). And by definition of \(\mathsf{Lucky}(\mathcal {Q})\), the wish lists never exceed length \(2\alpha \) as long as \(\lnot \mathsf{Lucky}(\mathcal {Q})\) holds.

Let \(E_{K||L}(X)\) be a query made to E during the adversary’s attack (either a normal query, or as part of a super query). If, at the moment when the query is being made, there is an element of the form \((X, K||L, Y)\) in (at least) one of the wish lists for some \(Y\in \{0,1\}^n\), then we say this wish list element is being “wished for” when the query \(E_{K||L}(X)\) is made. We similarly say the wish list element \((X, K||L, Y)\) is being “wished for” if the query \(E_{K||L}^{-1}(Y)\) is made (note that in this case, the query \(E_{K||L}^{-1}(Y)\) is necessarily normal, since a super query is, by default, implemented by forward queries). We note, importantly, that any wish list element can only be wished for once, since \(E_{K||L}(\cdot )\) is a permutation.

Let \(\mathsf{NormalWishGranted}_{\mathrm{T}, i}\) be the event that a normal query \((X, K||L, Y)\), when added to the query list, is equal to the ith element of \(\mathcal {W}_\mathrm{T}\) (presuming \(\mathcal {W}_\mathrm{T}\) has length at least i when the query is added). Likewise define \(\mathsf{NormalWishGranted}_{\mathrm{B}, i}\) with respect to the list \(\mathcal {W}_\mathrm{B}\). Then by the above remarks

$$\begin{aligned} \mathsf{WinNormal}(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q})&\implies&\bigvee _{i=1}^{2\alpha } \mathsf{NormalWishGranted}_{\mathrm{T}, i}\\&\vee \bigvee _{i=1}^{2\alpha } \mathsf{NormalWishGranted}_{\mathrm{B}, i}\end{aligned}$$

so by a union bound

$$\begin{aligned} \Pr [\mathsf{WinNormal}(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q})]\le & {} \sum _{i=1}^{2\alpha } \Pr [\mathsf{NormalWishGranted}_{\mathrm{T}, i}] \\&+ \sum _{i=1}^{2\alpha } \Pr [\mathsf{NormalWishGranted}_{\mathrm{B}, i}]. \end{aligned}$$

Because each wish list element can only be wished for once and because a normal query is answered at random uniformly from a set of size at least N / 2, we have

$$\begin{aligned} \Pr [\mathsf{NormalWishGranted}_{\mathrm{T}, i}] \le 2/N, \qquad \Pr [\mathsf{NormalWishGranted}_{\mathrm{B}, i}] \le 2/N \end{aligned}$$

and therefore

$$\begin{aligned} \Pr [\mathsf{WinNormal}(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q})] \le 2\cdot (4\alpha /N) = 8\alpha /N. \end{aligned}$$
(13)

We now upper bound \(\Pr [\mathsf{WinSuper}_1(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q})]\). We keep the same definition of the wish lists \(\mathcal {W}_\mathrm{T}\), \(\mathcal {W}_\mathrm{B}\) as above. We let \(\mathsf{SuperWishGranted}^1_{\mathrm{T}, i}\) be the event that a query \((X, K||L, Y)\) that is part of a super query is equal to the ith element of \(\mathcal {W}_\mathrm{T}\), where \(\mathcal {W}_\mathrm{T}\) has length \(\ge i\) before any of the super queries under key \(K||L\) have been made. The event \(\mathsf{SuperWishGranted}^1_{\mathrm{B}, i}\) is similarly defined. By the definition of \(\mathsf{WinSuper}_1(\mathcal {Q})\), we have that

$$\begin{aligned} \Pr [\mathsf{WinSuper}_1(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q})]\le & {} \sum _{i=1}^{2\alpha } \Pr [\mathsf{SuperWishGranted}^1_{\mathrm{T}, i}] \\&+ \sum _{i=1}^{2\alpha } \Pr [\mathsf{SuperWishGranted}^1_{\mathrm{B}, i}]. \end{aligned}$$

Assume, for a given i, that the i-th element of \(\mathcal {W}_\mathrm{T}\) (say) is \((X, K||L, Y)\), and that a super query is about to be made for the key \(K||L\), and that X is in the domain of the super query. Then the probability that \(E_{K||L}(X) = Y\) is at most 2 / N (more precisely, it is exactly 2 / N unless Y is not in the super query’s range, in which case it is 0). Thus, arguing similarly for the list \(\mathcal {W}_\mathrm{B}\), we obtain that

$$\begin{aligned} \Pr [\mathsf{SuperWishGranted}^1_{\mathrm{T}, i}] \le 2/N, \qquad \Pr [\mathsf{SuperWishGranted}^1_{\mathrm{B}, i}] \le 2/N. \end{aligned}$$

Therefore

$$\begin{aligned} \Pr [\mathsf{WinSuper}_1(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q})] \le 8\alpha /N. \end{aligned}$$
(14)

We finally bound \(\Pr [\mathsf{WinSuper}_2(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q})]\). Note the event \(\mathsf{WinSuper}_2(\mathcal {Q})\) can only occur when a super query occurs for a key of the form \(L||L\) and when that super query results in the triples \((U \oplus L, L||L, L)\), \((L, L||L, L \oplus V)\) being added to the query history. The probability that \(E_{L||L}(U\oplus L) = L\) is at most 2 / N, and, conditioned on the event that \(E_{L||L}(U\oplus L) = L\), the probability that \(E_{L||L}(L) = L \oplus V\) is at most \(1/(N/2 - 1)\). Since at most 2q / N super queries occur, we thus find that

$$\begin{aligned} \Pr [\mathsf{WinSuper}_2(\mathcal {Q}) \wedge \lnot \mathsf{Lucky}(\mathcal {Q})] \le \Pr [\mathsf{WinSuper}_2(\mathcal {Q})] \le 4q/N^2(N/2-1).\nonumber \\ \end{aligned}$$
(15)

The theorem follows by summing (12), (13), (14) and (15). \(\square \)

As an numerical example, for \(n=128\) and \(q=2^{245.99}\), let \(\alpha = q^{1/2}/2\). Then by Theorem 2, we have

$$\begin{aligned} \mathbf{Adv}_{\mathsf{TDM}}^{\mathrm{epre}\ne 0}(2^{245.99})\le 0.498. \end{aligned}$$

Corollary 2

\(\lim _{n\rightarrow \infty } \mathbf{Adv}_{\mathsf{TDM}}^{\mathrm{epre}\ne 0}(N^2/n)=0\).

Proof

By setting \(\alpha = q^{1/2}/2\) (note that \(\alpha \) is allowed to depend on q), the bound from Theorem 2 simplifies to

$$\begin{aligned} \frac{16 q^{\frac{1}{2}}}{N} + \frac{8q}{N^2(N-2)} + 2\cdot \left( \frac{4eq^{\frac{1}{2}}}{N}\right) ^{\frac{q^{\frac{1}{2}}}{2}}. \end{aligned}$$

If \(q =N^2/n\), then this bound can be rewritten as

$$\begin{aligned} \frac{16}{ \sqrt{n}} + \frac{8}{n(N-2)} + 2\cdot \left( \frac{4e}{\sqrt{n}}\right) ^{\frac{N}{2\sqrt{n}}}. \end{aligned}$$

This converges to zero as \(n\rightarrow \infty \). \(\square \)

5 Conclusion

In this work, we have shown that an earlier work concerning the security of Tandem-DM is incorrect. However, with a new proof (exploiting new ideas), we have shown that, in the ideal cipher model, Tandem-DM is collision resistant almost up to the birthday bound. We note that our collision resistance has the form \(O(q/(2^n-q))\) rather than \(O(q^2/(2^n-q)^2)\) ignoring log factors. Both bounds reach constant values when \(q = {\varOmega }(2^n)\); however, \(q^2/(2^n - q)^2\) grows slower than \(q/(2^n-q)\) since our bound is (only) “linear birthday” rather than true “quadratic birthday.” We leave it as an open problem to prove “quadratic birthday”-type collision resistance for Tandem-DM (as exists for Abreast-DM and Hirose’s scheme).

On a high level, our proof of collision resistance adheres to a (by now) standard framework. We first modify the collision-finding adversary by giving it several “free” queries, and subsequently, we bound the modified adversary’s chance of success using a case analysis. This approach allows to easily bound both the number of free queries and the probability of a query (free or not) causing a collision.

In contrast, the FGL proof directly uses a case analysis and subsequently uses free queries within the case analysis. This ad hoc addition of free queries (and its binding to a particular case) is problematic, as it does not allow proper accounting of the free queries. In particular, if a free query is fresh, it might cause a collision (or other bad event) elsewhere, whereas if the free query has actually been asked before, no new randomness can be extracted from it. Thus, apart from having established the security of Tandem-DM, we hope that our work also serves as a useful reminder to some of the subtleties involved in ICM proofs and as a guideline on how to avoid certain pitfalls.

Using a new technique based on super queries, we provided an improved bound on the preimage security of Tandem-DM. Specifically, we showed that asymptotically an adversary must make at least \(2^{2n-10}\) blockcipher queries to achieve chance 0.5 of inverting a randomly chosen point in the range. This bound improves upon the previous best bound of \({\varOmega }(2^n)\) queries and is optimal up to a constant factor. We note that the super query technique applies to many classical double-block-length compression functions such as Abreast-DM and Hirose’s scheme, as detailed in [1].