Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

This paper revisits a classical question, namely how can we turn a hash function into a PRF? The canonical answer is \(\mathsf {HMAC}\) [4], which (1) first applies the keyed hash function to the message and then (2) re-applies, to the result, the hash function keyed with another key. We consider another, even simpler, candidate way, namely to change step (2) to apply a simple un-keyed output transform such as truncation. We call this \(\mathsf {AMAC}\), for augmented MAC. This paper investigates and establishes provable-security of \(\mathsf {AMAC}\), with good bounds, when the hash function is a classical MD-style one like SHA-512.

Why? We were motivated to determine the security of \(\mathsf {AMAC}\) by the following. Usage. \(\mathsf {AMAC}\) with SHA-512 is used as a PRF in the Ed25519 signature scheme [8]. (\(\mathsf {AMAC}\) under a key that is part of the signing key is applied to the hashed message to get coins for a Schnorr-like signature.) Ed25519 is widely deployed, including in SSH, Tor, OpenBSD and dozens of other places [10]. The security of \(\mathsf {AMAC}\) for this usage was questioned in cfrg forum debates on Ed25519 as a proposed standard. Analysis of \(\mathsf {AMAC}\) is important to assess security of this usage and allow informed choices. Speed. \(\mathsf {AMAC}\) is faster than \(\mathsf {HMAC}\), particularly on short messages. See [3]. Context. Sponge-based PRFs, where truncation is the final step due to its already being so for the hash function, have been proven secure [1, 9, 11, 17, 20]. Our work can be seen as stepping back to ask if truncation works in a similar way for classical MD-style hash functions.

Findings in a nutshell. Briefly, the message of this paper is the following: (1) First, we are able to prove PRF security of \(\mathsf {AMAC}\). (2) Second, \(\mathsf {AMAC}\) has a quite unique and attractive feature, namely that its multi-user security is essentially as good as its single-user security and in particular superior in some settings to that of competitors. (3) Third, it is technically interesting, its security and analysis intrinsically linked to security of the compression function in the presence of leakage, so that leakage becomes of interest for reasons entirely divorced from side-channel attacks. We now step back to provide some background and discuss our approach and results.

The basic cascade. Let \({\mathsf {h}}{:\;\,}\{0,1\}^c\times \{0,1\}^b\rightarrow \{0,1\}^c\) represent a compression function taking a c-bit chaining variable and b-bit message block to return a c-bit output. The basic cascade of \({\mathsf {h}}\) is the function \({\mathsf {h}}^*{:\;\,}\{0,1\}^c\times (\{0,1\}^b)^{+}\rightarrow \{0,1\}^c\) defined by

figure a

where \({\mathbf {X}}\) is a vector over \(\{0,1\}^b\) whose length is denoted n and whose i-th component is denoted \({\mathbf {X}}[i]\). This construct is the heart of MD-style hash functions [13, 21] like MD5, SHA-1, SHA-256 and SHA-512, which are obtained by setting K to a fixed, public value and then applying \({\mathsf {h}}^*\) to the padded message.

Now we want to key \({\mathsf {h}}^*\) to get PRFs. We regard \({\mathsf {h}}\) itself as a PRF on domain \(\{0,1\}^b\), keyed by its c-bit chaining variable. Then \({\mathsf {h}}^*\) is the natural candidate for a PRF on the larger domain \((\{0,1\}^b)^{+}\). Problem is, \({\mathsf {h}}^*\) isn’t secure as a PRF. This is due to the well-known extension attack. If I obtain \(Y_1 = {\mathsf {h}}^*(K,X_1)\) for some \(X_1\in \{0,1\}^b\) of my choice, I can compute \(Y_2={\mathsf {h}}^*(K,X_1X_2)\) for any \(X_2\in \{0,1\}^b\) of my choice without knowing K, via \(Y_2 \leftarrow {\mathsf {h}}(Y_1,X_2)\). This clearly violates PRF security of \({\mathsf {h}}^*\).

Although \({\mathsf {h}}^*\) is not a PRF, BCK2 [5] show that it is a prefix-free PRF. (A PRF as long as no input on which it is evaluated is a prefix of another. The two inputs \(X_1, X_1X_2\) of the above attack violate this property.) When \(b=1\) and all inputs on which \({\mathsf {h}}^*\) is evaluated are of the same fixed length, the cascade \({\mathsf {h}}^*\) is the GGM construction of a PRF from a PRG [18].

To get a full-fledged PRF, \(\mathsf {NMAC}\) applies \({\mathsf {h}}\), under another key, to \({\mathsf {h}}^*\). The augmented cascade \({\mathsf {ACSC}}= {\mathsf {Out}}\circ {\mathsf {h}}^*\) that we discuss next replaces \(\mathsf {NMAC}\)’s outer application of a keyed function with a simple un-keyed one.

Augmented cascade. The augmented cascade is parameterized by some (keyless) function \({\mathsf {Out}}{:\;\,}\{0,1\}^c\rightarrow {\mathsf {Out}}.\mathsf {R}\) that we call the output transform, and is obtained by simply applying this function to the output of the basic cascade:

figure b

\(\mathsf {AMAC}\) is obtained from \({\mathsf {ACSC}}\) just as \(\mathsf {HMAC}\) is obtained from \(\mathsf {NMAC}\), namely by putting the key in the input to the hash function rather than directly keying the cascade: \(\mathsf {AMAC}(K,M) = {\mathsf {Out}}(H(K\Vert M))\). Just as \(\mathsf {NMAC}\) is the technical core of \(\mathsf {HMAC}\), the augmented cascade is the technical core of \(\mathsf {AMAC}\), and our analysis will focus in it. We will be able to bridge to \(\mathsf {AMAC}\) quite simply with the tools we develop.

The \({\mathsf {ACSC}}\) construction was suggested by cryptanalysts with the intuition that “good” choices of \({\mathsf {Out}}\) appear to allow \({\mathsf {Out}}\circ {\mathsf {h}}^*\) to evade the extension attack and thus possibly be a PRF. To understand this, first note that not all choices of \({\mathsf {Out}}\) are good. For example if \({\mathsf {Out}}\) is the identity function then the augmented cascade is the same as the basic one and the attack applies, or if \({\mathsf {Out}}\) is a constant function returning \(0^r\) then \({\mathsf {Out}}\circ {\mathsf {h}}^*\) is obviously not a PRF over range \(\{0,1\}^r\). Cryptanalysts have suggested some specific choices of \({\mathsf {Out}}\), the most important being (1) truncation, where \({\mathsf {Out}}{:\;\,}\{0,1\}^c\rightarrow \{0,1\}^r\) returns, say, the first \(r < c\) bits of its input, or (2) the mod function, as in Ed25519, where \({\mathsf {Out}}\) treats its input as an integer and returns the result modulo, say, a public r-bit prime number. Suppose r is sufficiently smaller than c (think \(c=512\) and \(r=256\)). An adversary querying \(X_1\) in the PRF game no longer gets back \(Y_1 = {\mathsf {h}}^*(K,X_1)\) but rather \(Z_1 = {\mathsf {Out}}(Y_1)\), and this does not allow the extension attack to proceed. On this basis, and for the choices of \({\mathsf {Out}}\) just named, the augmented cascade is already seeing extensive usage and is suggested for further usage and standardization.

This raises several questions. First, that \({\mathsf {Out}}\circ {\mathsf {h}}^*\) seems to evade the extension attack does not mean it is a PRF. There may be other attacks. The goal is to get a PRF, not to evade some specific attacks. Moreover we would like a proof that this goal is reached. Second, for which choices of \({\mathsf {Out}}\) does the construction work? We could try to analyze the PRF security of \({\mathsf {Out}}\circ {\mathsf {h}}^*\) in an ad hoc way for the specific choices of \({\mathsf {Out}}\) named above, but it would be more illuminating and useful to be able to establish security in a broad way, for all \({\mathsf {Out}}\) satisfying some conditions. These are the questions our work considers and resolves.

Connection to leakage. If we want to prove PRF security of \({\mathsf {Out}}\circ {\mathsf {h}}^*\), a basic question to ask is, under what assumption on the compression function \({\mathsf {h}}\)? The natural one is that \({\mathsf {h}}\) is itself a PRF, the same assumption as for the proof of \(\mathsf {NMAC}\) [2, 16]. We observe that this is not enough. Consider an adversary who queries the one-block message \(X_1\) to get back \(Z_1={\mathsf {Out}}(Y_1)\) and then queries the two-block message \(X_1X_2\) to get back \(Z_2={\mathsf {Out}}(Y_2)\) where by definition \(Y_1={\mathsf {h}}^*(K,X_1)={\mathsf {h}}(K,X_1)\) and \(Y_2={\mathsf {h}}^*(K,X_1X_2) = {\mathsf {h}}(Y_1,X_2)\). Note that \(Y_1\) is being used as a key in applying \({\mathsf {h}}\) to \(X_2\). But this key is not entirely unknown to the adversary because the latter knows \(Z_1={\mathsf {Out}}(Y_1)\). If the application of \({\mathsf {h}}\) with key \(Y_1\) is to provide security, it must be in the face of the fact that some information about this key, namely \({\mathsf {Out}}(Y_1)\), has been “leaked” to the adversary. As a PRF, \({\mathsf {h}}\) must thus be resilient to some leakage on its key, namely that represented by \({\mathsf {Out}}\) viewed as a leakage function.

Approach and qualitative results. We first discuss our results at the qualitative level and then later at the (in our view, even more interesting) quantitative level. Theorems 3 and 4 show that if \({\mathsf {h}}\) is a PRF under \({\mathsf {Out}}\)-leakage then \({\mathsf {Out}}\circ {\mathsf {h}}^*\) is indistinguishable from the result of applying \({\mathsf {Out}}\) to a random function. (The compression function \({\mathsf {h}}\) being a PRF under \({\mathsf {Out}}\)-leakage means it retains PRF security under key K even if the adversary is given \({\mathsf {Out}}(K)\). The formal definition is in Sect. 4.) This result makes no assumptions about \({\mathsf {Out}}\) beyond that implicit in the assumption on \({\mathsf {h}}\), meaning the result is true for all \({\mathsf {Out}}\), and is in the standard model. As a corollary we establish PRF security of \({\mathsf {Out}}\circ {\mathsf {h}}^*\) for a large class of output functions \({\mathsf {Out}}\), namely those that are close to regular. (This means that the distribution of \({\mathsf {Out}}(Y)\) for random Y is close to the uniform distribution on the range of \({\mathsf {Out}}\).) In summary we have succeeded in providing conditions on \({\mathsf {Out}},{\mathsf {h}}\) under which \({\mathsf {Out}}\circ {\mathsf {h}}^*\) is proven to be PRF. Our conditions are effectively both necessary and sufficient and cover cases proposed for usage and standardization.

The above is a security proof for the augmented cascade \({\mathsf {Out}}\circ {\mathsf {h}}^*\) under the assumption that the compression function \({\mathsf {h}}\) is resistant to \({\mathsf {Out}}\) leakage. To assess the validity of this assumption, we analyze the security under leakage of an ideal compression function. Theorem 6 shows that an ideal compression function is resistant to \({\mathsf {Out}}\)-leakage as long as no range point of \({\mathsf {Out}}\) has too few pre-images. This property is in particular true if \({\mathsf {Out}}\) is close to regular. As a result, in the ideal model, we have a validation of our \({\mathsf {Out}}\)-leakage resilience assumption. Putting this together with the above we have a proof-based validation of the augmented cascade.

Multi-user security. The standard definition of PRF security of a function family \({\mathsf {F}}\) [18] is single user (su), represented by there being a single key K such that the adversary has access to an oracle \(\textsc {Fn}\) that given x returns either \({\mathsf {F}}(K,x)\) or the result of a random function F on x. But in “real life” there are many users, each with their own key. If we look across the different entities and Internet connections active at any time, the number of users/keys is very large. The more appropriate model is thus a multi-user (mu) one, where, for a parameter \(u\) representing the number of users, there are \(u\) keys \(K_1,\ldots ,K_{u}\). Oracle \(\textsc {Fn}\) now takes ix with \(1\le i \le u\) and returns either \({\mathsf {F}}(K_i,x)\) or the result of a random function \(F_i\) on x. It is in this setting that we should address security.

Multi-user security is typically neglected because it makes no qualitative difference: BCK2 [5], who first formalized the notion, also showed by a hybrid argument that the advantage of an adversary relative to \(u\) users is not more than \(u\) times the advantage of an adversary of comparable resources relative to a single user. Our Lemma 1 is a generalization of this result. But this degradation in advantage is quite significant in practice, since \(u\) is large, and raises the important question of whether one can do quantitatively better. Clearly one cannot in general, but perhaps one can for specific, special function families \({\mathsf {F}}\). If so, these function families are preferable in practice. This perspective is reflected in recent work like [22, 25].

These special function families seem quite rare. But we show that the augmented cascade is one of them. In fact we show that mu security gives us a double benefit in this setting, one part coming from the cascade itself and the other from the security of the compression function under leakage, the end result being very good bounds for the mu security of the augmented cascade.

Theorem 3 establishes su security of the augmented cascade based not on the su, but on the mu security of the compression function under \({\mathsf {Out}}\)-leakage. The bound is very good, the advantage dropping only by a factor equal to the maximum length of a query. The interesting result is Theorem 4, establishing mu security of the augmented cascade under the same assumptions and with essentially the same bounds as Theorem 3 establishing its su security. In particular we do not lose a factor of the number of users \(u\) in the advantage. This is the first advance.

Now note that the assumption in both of the above-mentioned results is the mu (not su) security of the compression function under \({\mathsf {Out}}\)-leakage. Our final bound will thus depend on this. The second advance is that Theorem 6 shows mu security of the compression function under \({\mathsf {Out}}\)-leakage with bounds almost as good as for su security. This represents an interesting result of independent interest, namely that, under leakage, the mu security of an ideal compression function is almost as good as its su security. This is not true in the absence of leakage. The results are summarized via Fig. 4.

Quantitative results. We obtain good quantitative bounds on the mu prf security of the augmented cascade in the ideal compression function model by combining our aforementioned results on the mu prf security under leakage of an ideal compression function with our also aforementioned reduction of the security of the cascade to the security of the compression function under leakage. We illustrate these results for the case where the compression function is of form \({\mathsf {h}}{:\;\,}\{0,1\}^c \times \{0,1\}^b \rightarrow \{0,1\}^c\) and the output transform \({\mathsf {Out}}\) simply outputs the first r bits of its c-bit input, for \(r \le c\). We consider an attacker making at most q queries to a challenge oracle (that is either the augmented cascade or a random function), each query consisting of at most \(\ell \) b-bit blocks, and \(q_{\textsc {f}}\) queries to the ideal compression function oracle. We show that such an attacker achieves distinguishing advantage at most

$$\begin{aligned} \frac{\ell ^2 q^2 + \ell q q_{\textsc {f}}}{2^c} + \frac{cr \cdot (\ell ^2 q + \ell q_{\textsc {f}})}{2^{c-r}} , \end{aligned}$$
(1)

where we have intentionally omitted constant factors and lower order terms. Note that this bound holds regardless of the number of users \(u\). Here c is large, like \(c=512\), so the first term is small. But \(c-r\) is smaller, for example \(c-r=256\) with \(r=256\). The crucial merit of the bound of Eq. (1) is that the numerator in the second term does not contain quadratic terms like \(q^2\) or \(q \cdot q_{\textsc {f}}\). In practice, \(q_{\textsc {f}}\) and q are the terms we should allow to be large, so this is significant. To illustrate, say for example \(\ell = 2^{10}\) (meaning messages are about 128 KBytes if \(b=1024\)) and \(q_{\textsc {f}}= 2^{100}\) and \(q=2^{90}\). The bound from Eq. (1) is about \(2^{-128}\), which is very good. But, had the second term been of the form \(\ell ^2(q_{\textsc {f}}^2+q^2) / 2^{c-r}\) then the bound would be only \(2^{-36}\). See Sect. 8 for more information.

2-tier cascade. We introduce and use an extension of the basic cascade \({\mathsf {h}}^*\). Our 2-tier cascade is associated to two function families \({\mathsf {g}},{\mathsf {h}}\). Under key K, it applies \({\mathsf {g}}(K,\cdot )\) to the first message block to get a sub-key \(K^*\) and the applies \({\mathsf {h}}^*(K^*,\cdot )\) to the rest of the message. The corresponding augmented cascade applies \({\mathsf {Out}}\) to the result. Our results about the augmented cascade above are in fact shown for the augmented 2-tier cascade. This generalization has both conceptual and analytical value. We briefly mention two instances. (1) First, we can visualize mu security of \({\mathsf {Out}}\circ {\mathsf {h}}^*\) as pre-pending the user identity to the message and then applying the 2-tier cascade with first tier a random function. This effectively reduces mu security to su security. With this strategy we prove Theorem 4 as a corollary of Theorem 3 and avoid a direct analysis of mu security. Beyond providing a modular proof this gives some insight into why the mu security is almost as good as the su security. (2) Second, just as \(\mathsf {NMAC}\) is the technical core and \(\mathsf {HMAC}\) the function used (because the latter makes blackbox use of the hash function), in our case the augmented cascade is the technical core but what will be used is \(\mathsf {AMAC}\), defined by \(\mathsf {AMAC}(K,M) = {\mathsf {Out}}(H(K,M))\) where H is the hash function derived from compression function \({\mathsf {h}}{:\;\,}\{0,1\}^c\times \{0,1\}^b\rightarrow \{0,1\}^c\) and K is a k-bit key. For the analysis we note (assuming \(k=b\)) that this is simply an augmented 2-tier cascade with the first tier being the dual of \({\mathsf {h}}\), meaning the key and input roles are swapped. We thus directly get an analysis and proof for this case from our above-mentioned results. Obtaining \(\mathsf {HMAC}\) from \(\mathsf {NMAC}\) was more work [2, 4] and required assumptions about PRF security of the dual function under related keys.

Davies-Meyer. Above we have assessed the PRF security under \({\mathsf {Out}}\)-leakage of the compression function by modeling the latter as ideal (random). But, following CDMP [12], one might say that the compression functions underlying MD-style hash functions are not un-structured enough to be treated as random because they are built from blockciphers via the Davies-Meyer (DM) construction. To address this we analyze the mu PRF security under \({\mathsf {Out}}\)-leakage of the DM construction in the ideal-cipher model. One’s first thought may be that such an analysis would follow from our analysis for a random compression function and the indifferentiability [12, 19] of DM from a random oracle, but the catch is that DM is not indifferentiable from a RO so a direct analysis is needed. The one we give in [3] shows mu security with good bounds. Similar analyses can be given for other PGV [24] compression functions.

2 Related Work

Sponges. SHA-3 already internally incorporates a truncation output transform. The construction itself is a sponge. The suggested way to obtain a PRF is to simply key the hash function via its IV, so that the PRF is a keyed, truncated sponge. The security of this construct has been intensively analyzed [1, 9, 11, 17, 20] with Gaži, Pietrzak and Tessaro (GPT) [17] establishing PRF security with tight bounds. Our work can be seen as stepping back to ask whether the same truncation method would work for MD-style hash functions like SHA-512. Right now these older hash functions are much more widely deployed than SHA-3, and current standardization and deployment efforts continue to use them, making the analysis of constructions based on them important with regard to security in practice. The underlying construction in this case is the cascade, which is quite different from the sponge. The results and techniques of GPT [17] do not directly apply but were an important inspiration for our work.

We note that keyed sponges with truncation to an r-bit output from a c-bit state can easily be distinguished from a random function with advantage roughly \(q^2 / 2^{c-r}\) or \(q q_{\textsc {f}}/ 2^{c-r}\), as shown for example in [17]. The bound of Eq. (1) is better, meaning the augmented cascade offers greater security. See [3] for more information.

Cascade. BCK2 [5] show su security of the basic cascade (for prefix-free queries) in two steps. First, they show su security of the basic cascade (for prefix-free queries) assuming not su, but mu security of the compression function. Second, they apply the trivial bound mentioned above to conclude su security of the basic cascade for prefix-free queries assuming su security of the compression function. We follow their approach to establish su security of the augmented cascade, but there are differences as well: They have no output transform while we do, they assume prefix-free queries and we do not, we have leakage and they do not. They neither target nor show mu security of the basic cascade in any form, mu security arising in their work only as an intermediate technical step and only for the compression function, not for the cascade.

Chop-MD. The chop-MD construction of CDMP [12] is the case of the augmented cascade in which the output transform is truncation. They claim this is indifferentiable from a RO when the compression function is ideal. This implies PRF security but their bound is \(O(\ell ^2(q + q_{\textsc {f}})^2/2^{c-r})\) which as we have seen is significantly weaker than our bound of Eq. (1). Also, they have no standard-model proofs or analysis for this construction. In contrast our results in Sect. 5 establish standard-model security.

NMAC and HMAC. \(\mathsf {NMAC}\) takes keys \(K_{\mathrm {in}},K_{\mathrm {out}}\) and input \({\mathbf {X}}\) to return \({\mathsf {h}}(K_{\mathrm {out}},{\mathsf {h}}^*(K_{\mathrm {in}},{\mathbf {X}})\Vert {\mathrm {pad}})\) where \({\mathrm {pad}}\) is some \((b-c)\)-bit constant and \(b\ge c\). Through a series of intensive analyses, the PRF security of \(\mathsf {NMAC}\) has been established based only on the assumed PRF security of the compression function \({\mathsf {h}}\), and with tight bounds [2, 4, 16]. Note that \(\mathsf {NMAC}\) is not a special case of the augmented cascade because \({\mathsf {Out}}\) is not keyed but the outer application of \({\mathsf {h}}\) in \(\mathsf {NMAC}\) is keyed. In the model where the compression function is ideal, one can show bounds for \(\mathsf {NMAC}\) that are somewhat better than for the augmented cascade. This is not surprising. Indeed, when attacking the augmented cascade, the adversary can learn far more information about the internal states of the hash computation. What is surprising (at least to us) is that the gap is actually quite small. See [3] for more information. We stress also that this is in the ideal model. In the standard model, there is no proof that \(\mathsf {NMAC}\) has the type of good mu prf security we establish for the augmented cascade in Sect. 5.

AES and other MACs. Why consider new MACs? Why not just use an AES-based MAC like CMAC? The 128 bit key and block size limits security compared to \(c=512\) for SHA-512. A Schnorr signature takes the result of the PRF modulo a prime; the PRF output must have at least as many bits as the prime, and even more bits for most primes, to avoid the Bleichenbacher attack discussed in [23]. Also in that context a hash function is already being used to hash the message before signing so it is convenient to implement the PRF also with the same hash function. HMAC-SHA-512 will provide the desired security but \(\mathsf {AMAC}\) has speed advantages, particularly on short messages, as discussed in [3], and is simpler. Finally, the question is in some sense moot since \(\mathsf {AMAC}\) is already deployed and in widespread use via Ed25519 and we need to understand its security.

Leakage. Leakage-resilience of a PRF studies the PRF security of a function \({\mathsf {h}}\) when the attacker can obtain the result of an arbitrary function, called the leakage function, applied to the key [14, 15]. This is motivated by side-channel attacks. We are considering a much more restricted form of leakage where there is just one, very specific leakage function, namely \({\mathsf {Out}}\). This arises naturally, as we have seen, in the PRF security of the augmented cascade. We are not considering side-channel attacks.

3 Notation

If \({\mathbf {x}}\) is a vector then \(|{\mathbf {x}}|\) denotes its length and \({\mathbf {x}}[i]\) denotes its i-th coordinate. (For example if \({\mathbf {x}}= (10,00,1)\) then \(|{\mathbf {x}}|=3\) and \({\mathbf {x}}[2]=00\).) We let \(\varepsilon \) denote the empty vector, which has length 0. If \(0 \le i \le |{\mathbf {x}}|\) then we let \({\mathbf {x}}[1\ldots i] = ({\mathbf {x}}[1],\ldots ,{\mathbf {x}}[i])\), this being \(\varepsilon \) when \(i=0\). We let \(S^n\) denote the set of all length n vectors over the set S. We let \(S^{+}\) denote the set of all vectors of positive length over the set S and \(S^* = S^{+}\cup \{\varepsilon \}\) the set of all finite-length vectors over the set S. As special cases, \(\{0,1\}^n\) and \(\{0,1\}^*\) denote vectors whose entries are bits, so that we are identifying strings with binary vectors and the empty string with the empty vector.

For sets \(A_1,A_2\) we let \([\![A_1,A_2]\!]\) denote the set of all vectors \({\mathbf {X}}\) of length \(|{\mathbf {X}}|\ge 1\) such that \({\mathbf {X}}[1]\in A_1\) and \({\mathbf {X}}[i]\in A_2\) for \(2\le i \le |{\mathbf {X}}|\).

We let denote picking an element uniformly at random from a set X and assigning it to x. For infinite sets, it is assumed that a proper measure can be defined on X to make this meaningful. Algorithms may be randomized unless otherwise indicated. Running time is worst case. If A is an algorithm, we let \(y \leftarrow A(x_1,\ldots ;r)\) denote running A with random coins r on inputs \(x_1,\ldots \) and assigning the output to y. We let be the result of picking r at random and letting \(y \leftarrow A(x_1,\ldots ;r)\). We let \([A(x_1,\ldots )]\) denote the set of all possible outputs of A when invoked with inputs \(x_1,\ldots \).

We use the code based game playing framework of [6]. (See Fig. 1 for an example.) By \(\Pr [\mathrm {G}]\) we denote the probability that game \(\mathrm {G}\) returns \({\mathsf {true}}\).

For an integer n we let \([1\ldots n] = \{1,\ldots ,n\}\).

4 Function-Family Distance Framework

We will be considering various generalizations and extensions of standard prf security. This includes measuring proximity not just to random functions but to some other family, multi-user security and leakage on the key. We also want to allow an easy later extension to a setting with ideal primitives. To enable all this in a unified way we introduce a general distance metric on function families and then derive notions of interest as special cases.

Function families. A function family is a two-argument function \({\mathsf {F}}{:\;\,}{\mathsf {F}}.\mathsf {K}\times {\mathsf {F}}.\mathsf {D}\rightarrow {\mathsf {F}}.\mathsf {R}\) that takes a key K in the key space \({\mathsf {F}}.\mathsf {K}\) and an input x in the domain \({\mathsf {F}}.\mathsf {D}\) to return an output \(y \leftarrow {\mathsf {F}}(K,x)\) in the range \({\mathsf {F}}.\mathsf {R}\). We let be shorthand for , the operation of picking a function at random from family \({\mathsf {F}}\).

An example of a function family that is important for us is the compression function underlying a hash function, in which case \({\mathsf {F}}.\mathsf {K}={\mathsf {F}}.\mathsf {R}=\{0,1\}^c\) and \({\mathsf {F}}.\mathsf {D}=\{0,1\}^b\) for integers cb called the length of the chaining variable and the block length, respectively. Another example is a block cipher. However, families of functions do not have to be efficiently computable or have short keys. For sets DR the family \({\mathsf {A}}{:\;\,}{\mathsf {A}}.\mathsf {K}\times D\rightarrow R\) of all functions from D to R is defined simply as follows: let \({\mathsf {A}}.\mathsf {K}\) be the set of all functions mapping D to R and let \({\mathsf {A}}(f,x)=f(x)\). (We can fix some representation of f as a key, for example the vector whose i-th component is the value f takes on the i-th input under some ordering of D. But this is not really necessary.) In this case denotes picking at random a function mapping D to R.

Let \({\mathsf {F}}{:\;\,}{\mathsf {F}}.\mathsf {K}\times {\mathsf {F}}.\mathsf {D}\rightarrow {\mathsf {F}}.\mathsf {R}\) be a function family and let \({\mathsf {Out}}{:\;\,}{\mathsf {F}}.\mathsf {R} \rightarrow {\mathsf {Out}}.\mathsf {R}\) be a function with domain the range of \({\mathsf {F}}\) and range \({\mathsf {Out}}.\mathsf {R}\). Then the composition \({\mathsf {Out}}\circ {\mathsf {F}}{:\;\,}{\mathsf {F}}.\mathsf {K}\times {\mathsf {F}}.\mathsf {D}\rightarrow {\mathsf {Out}}.\mathsf {R}\) is the function family defined by \(({\mathsf {Out}}\circ {\mathsf {F}})(K,x) = {\mathsf {Out}}({\mathsf {F}}(K,x))\). We will use composition in some of our constructions.

Fig. 1.
figure 1

Games defining distance metric between function families \({\mathsf {F}}_0,{\mathsf {F}}_1\). In the basic (left) case there is no leakage, while in the extended (right) case there is leakage represented by \({\mathsf {Out}}\).

Basic distance metric. We define a general metric of distance between function families that will allow us to obtain other metrics of interest as special cases. Let \({\mathsf {F}}_0,{\mathsf {F}}_1\) be families of functions such that \({\mathsf {F}}_0.\mathsf {D}={\mathsf {F}}_1.\mathsf {D}\). Consider game \(\mathrm {DIST}\) on the left of Fig. 1 associated to \({\mathsf {F}}_0,{\mathsf {F}}_1\) and an adversary \(\mathcal{A}\). Via oracle \(\textsc {New}\), the adversary can create a new instance \(F_{v}\) drawn from \({\mathsf {F}}_{c}\) where \(c\) is the challenge bit. It can call this oracle multiple times, reflecting a multi-user setting. It can obtain \(F_i(x)\) for any ix of its choice with the restriction that \(1\le i \le v\) (instance i has been initialized) and \(x\in {\mathsf {F}}_1.\mathsf {D}\). It wins if it guesses the challenge bit \(c\). The advantage of adversary \(\mathcal{A}\) is

$$\begin{aligned}&{\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {F}}_0,{\mathsf {F}}_1}(\mathcal{A}) = 2\Pr [\mathrm {DIST}_{{\mathsf {F}}_0,{\mathsf {F}}_1}(\mathcal{A})]-1 \end{aligned}$$
(2)
$$\begin{aligned}&= {\Pr }[\mathrm {DIST}_{{\mathsf {F}}_0,{\mathsf {F}}_1}(\mathcal{A})\left| \right. c=1] - \left( 1-{\Pr }[\mathrm {DIST}_{{\mathsf {F}}_0,{\mathsf {F}}_1}(\mathcal{A})\left| \right. c=0] \right) . \end{aligned}$$
(3)

Equation (2) is the definition, while Eq. (3) is a standard alternative formulation that can be shown equal via a conditioning argument. We often use the second in proofs.

Let \({\mathsf {F}}\) be a function family and let \({\mathsf {A}}\) be the family of all functions from \({\mathsf {F}}.\mathsf {D}\) to \({\mathsf {F}}.\mathsf {R}\). Let \({\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {F}}}(\mathcal{A}) = {\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {F}},{\mathsf {A}}}(\mathcal{A})\). This gives a metric of multi-user prf security. The standard (single user) prf metric is obtained by restricting attention to adversaries that make exactly one \(\textsc {New}\) query.

Distance under leakage. We extend the framework to allow leakage on the key. Let \({\mathsf {Out}}{:\;\,}{\mathsf {F}}_1.\mathsf {K}\rightarrow {\mathsf {Out}}.\mathsf {R}\) be a function with domain \({\mathsf {F}}_1.\mathsf {K}\) and range a set we denote \({\mathsf {Out}}.\mathsf {R}\). Consider game \(\mathrm {DIST}\) on the right of Fig. 1, now associated not only to \({\mathsf {F}}_0,{\mathsf {F}}_1\) and an adversary \(\mathcal{A}\) but also to \({\mathsf {Out}}\). Oracle \(\textsc {New}\) picks a key \(K_{v}\) for \({\mathsf {F}}_1\) and will return as leakage the result of \({\mathsf {Out}}\) on this key. The instance \(F_{v}\) is either \({\mathsf {F}}_1(K_{v},\cdot )\) or a random function from \({\mathsf {F}}_0\). Note that the leakage is on a key for a function from \({\mathsf {F}}_1\) regardless of the challenge bit, meaning even if \(c=0\), we leak on the key \(K_{v}\) drawn from \({\mathsf {F}}_1.\mathsf {K}\). The second oracle is as before. The advantage of adversary \(\mathcal{A}\) is

$$\begin{aligned}&{\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {F}}_0,{\mathsf {F}}_1,{\mathsf {Out}}}(\mathcal{A}) = 2\Pr [\mathrm {DIST}_{{\mathsf {F}}_0,{\mathsf {F}}_1,{\mathsf {Out}}}(\mathcal{A})]-1 \end{aligned}$$
(4)
$$\begin{aligned}&= {\Pr }[\,\mathrm {DIST}_{{\mathsf {F}}_0,{\mathsf {F}}_1,{\mathsf {Out}}}(\mathcal{A}){\!\!}\,\left| \right. \,{\!\!}c=1\,] - \left( 1-{\Pr }[\,\mathrm {DIST}_{{\mathsf {F}}_0,{\mathsf {F}}_1,{\mathsf {Out}}}(\mathcal{A}){\!\!}\,\left| \right. \,{\!\!}c=0\,] \right) . \end{aligned}$$
(5)

This generalizes the basic metric because \({\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {F}}_0,{\mathsf {F}}_1}(\mathcal{A}) = {\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {F}}_0,{\mathsf {F}}_1,{\mathsf {Out}}}(\mathcal{A})\) where \({\mathsf {Out}}\) is the function that returns \(\varepsilon \) on all inputs.

As a special case we get a metric of multi-user prf security under leakage. Let \({\mathsf {F}}\) be a function family and let \({\mathsf {A}}\) be the family of all functions from \({\mathsf {F}}.\mathsf {D}\) to \({\mathsf {F}}.\mathsf {R}\). Let \({\mathsf {Out}}{:\;\,}{\mathsf {F}}.\mathsf {K}\rightarrow {\mathsf {Out}}.\mathsf {R}\). Let \({\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {F}},{\mathsf {Out}}}(\mathcal{A}) = {\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {F}},{\mathsf {A}},{\mathsf {Out}}}(\mathcal{A})\).

Naive mu to su reduction. Multi-user security for PRFs was first explicitly considered in [5]. They used a hybrid argument to show that the prf advantage of an adversary \(\mathcal{A}\) against \(u\) users is at most \(u\) times the prf advantage of an adversary of comparable resources against a single user. The argument extends to the case where instead of prf advantage we consider distance and where leakage is present. This is summarized in Lemma 1 below.

We state this lemma to emphasize that mu security is not qualitatively different from su security, at least in this setting. The question is what is the quantitative difference. The lemma represents the naive bound, which always holds. The interesting element is that for the 2-tier augmented cascade, Theorem 4 shows that one can do better: the mu advantage is not a factor \(u\) less than the single-user advantage, but about the same. In the proof of the lemma in [3] we specify the adversary for the sake of making the reduction concrete but we omit the standard hybrid argument that establishes that this works.

Lemma 1

Let \({\mathsf {F}}_0,{\mathsf {F}}_1\) be function families with \({\mathsf {F}}_0.\mathsf {D}={\mathsf {F}}_1.\mathsf {D}\) and let \({\mathsf {Out}}{:\;\,}{\mathsf {F}}_1.\mathsf {K}\rightarrow {\mathsf {Out}}.\mathsf {R}\) be an output transform. Let \(\mathcal{A}\) be an adversary making at most \(u\) queries to its \(\textsc {New}\) oracle and at most q queries to its \(\textsc {Fn}\) oracle. The proof specifies an adversary \(\mathcal{A}_1\) making one query to its \(\textsc {New}\) oracle and at most q queries to its \(\textsc {Fn}\) oracle such that

$$\begin{aligned} {\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {F}}_0,{\mathsf {F}}_1,{\mathsf {Out}}}(\mathcal{A})&\le u\cdot {\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {F}}_0,{\mathsf {F}}_1,{\mathsf {Out}}}(\mathcal{A}_1). \end{aligned}$$
(6)

The running time of \(\mathcal{A}_1\) is that of \(\mathcal{A}\) plus the time for \(u\) computations of \({\mathsf {F}}_0\) or \({\mathsf {F}}_1\).    \(\blacksquare \)

5 The Augmented Cascade and Its Analysis

We first present a generalization of the basic cascade construction that we call the 2-tier cascade. We then present the augmented (2-tier) cascade construction and analyze its security.

2-tier cascade construction. Let \(\mathcal{K}\) be a set. Let \({\mathsf {g}},{\mathsf {h}}\) be function families such that \({\mathsf {g}}{:\;\,}{\mathsf {g}}.\mathsf {K}\times {\mathsf {g}}.\mathsf {D}\rightarrow \mathcal{K}\) and \({\mathsf {h}}{:\;\,}\mathcal{K}\times {\mathsf {h}}.\mathsf {D}\rightarrow \mathcal{K}\). Thus, outputs of both \({\mathsf {g}}\) and \({\mathsf {h}}\) can be used as keys for \({\mathsf {h}}\). This is the basis of our 2-tier version of the cascade. This is a function family \({\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}] {:\;\,}{\mathsf {g}}.\mathsf {K} \times [\![{\mathsf {g}}.\mathsf {D},{\mathsf {h}}.\mathsf {D}]\!] \rightarrow \mathcal{K}\). That is, a key is one for \({\mathsf {g}}\). An input —as per the notation \([\![\cdot ,\cdot ]\!]\) defined in Sect. 3— is a vector \({\mathbf {X}}\) of length at least one whose first component is in \({\mathsf {g}}.\mathsf {D}\) and the rest of whose components are in \({\mathsf {h}}.\mathsf {D}\). Outputs are in \(\mathcal{K}\). The function itself is defined as follows:

figure c

We say that a function family \({\mathsf {G}}\) is a 2-tier cascade if \({\mathsf {G}}= {\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]\) for some \({\mathsf {g}},{\mathsf {h}}\). If \({\mathsf {f}}{:\;\,}\mathcal{K}\times {\mathsf {f}}.\mathsf {D} \rightarrow \mathcal{K}\) then its basic cascade is recovered as \({\mathbf {CSC}}[{\mathsf {f}},{\mathsf {f}}] {:\;\,}\mathcal{K}\times {\mathsf {f}}.\mathsf {D}^{+} \rightarrow \mathcal{K}\). We will also denote this function family by \({\mathsf {f}}^*\).

Recall that even if \({\mathsf {f}}{:\;\,}\{0,1\}^c\times \{0,1\}^b\rightarrow \{0,1\}^c\) is a PRF, \({\mathsf {f}}^*\) is not a PRF due to the extension attack. It is shown by BCK2 [5] to be a PRF when the adversary is restricted to prefix-free queries. When \(b=1\) and the adversary is restricted to queries of some fixed length \(\ell \), the cascade \({\mathsf {f}}^*\) is the GGM construction of a PRF from a PRG [18]. Bernstein [7] considers a generalization of the basic cascade in which the function applied depends on the block index and proves PRF security for any fixed number \(\ell \) of blocks.

Our generalization to the 2-tier cascade has two motivations and corresponding payoffs. First, it will allow us to reduce mu security to su security in a simple, modular and tight way, the idea being that mu security of the basic cascade is su security of the 2-tier one for a certain choice of the 1st tier family. Second, it will allow us to analyze the blackbox \(\mathsf {AMAC}\) construction in which the cascade is not keyed directly but rather the key is put in the input to the hash function.

The augmented cascade. With \(\mathcal{K},{\mathsf {g}},{\mathsf {h}}\) as above let \({\mathsf {Out}}{:\;\,}\mathcal{K}\rightarrow {\mathsf {Out}}.\mathsf {R}\) be a function we call the output transform. The augmented (2-tier) cascade \({\mathbf {ACSC}}[{\mathsf {g}},{\mathsf {h}},{\mathsf {Out}}]{:\;\,}{\mathsf {g}}.\mathsf {K} \times [\![{\mathsf {g}}.\mathsf {D},{\mathsf {h}}.\mathsf {D}]\!] \rightarrow {\mathsf {Out}}.\mathsf {R}\) is the composition of \({\mathsf {Out}}\) with \({\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]\), namely \({\mathbf {ACSC}}[{\mathsf {g}},{\mathsf {h}},{\mathsf {Out}}] = {\mathsf {Out}}\circ {\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]\), where composition was defined above. In code:

figure d

We say that a function family \({\mathsf {G}^{+}}\) is an augmented (2-tier) cascade if \({\mathsf {G}^{+}}= {\mathbf {ACSC}}[{\mathsf {g}},{\mathsf {h}},{\mathsf {Out}}]\) for some \({\mathsf {g}},{\mathsf {h}},{\mathsf {Out}}\).

The natural goal is that an augmented cascade \({\mathsf {G}^{+}}\) be a PRF. This however is clearly not true for all \({\mathsf {Out}}\). For example \({\mathsf {Out}}\) may be a constant function, or a highly irregular one. Rather than restrict \({\mathsf {Out}}\) at this point we target a general result that would hold for any \({\mathsf {Out}}\). Namely we aim to show that \({\mathbf {ACSC}}[{\mathsf {g}},{\mathsf {h}},{\mathsf {Out}}]\) is close under our distance metric to the result of applying \({\mathsf {Out}}\) to a random function. Next we formalize and prove this.

Single-user security of 2-tier augmented cascade. Given \({\mathsf {g}},{\mathsf {h}},{\mathsf {Out}}\) defining the 2-tier augmented cascade \({\mathsf {Out}}\circ {\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]\), we want to upper bound \({\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]}(\mathcal{A})\) for an adversary \(\mathcal{A}\) making one \(\textsc {New}\) query, where \({\mathsf {A}}\) is the family of all functions with the same domain as \({\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]\). We will do this in two steps. First, in Lemma 2, we will consider the case that the first tier is a random function, meaning \({\mathsf {g}}= {\mathsf {r}}\) is the family of all functions with the same domain and range as \({\mathsf {g}}\). Then, in Theorem 3, we will use Lemma 2 to analyze the general case where \({\mathsf {g}}\) is a PRF. Most interestingly we will later use these single-user results to easily obtain, in Theorem 4, bounds for multi-user security that are essentially as good as for single-user security. This showcases a feature of the 2-tier cascade that is rare amongst PRFs. We now proceed to the above-mentioned lemma.

Lemma 2

Let \(\mathcal{K},\mathcal{D}\) be non-empty sets. Let \({\mathsf {h}}{:\;\,}\mathcal{K}\times {\mathsf {h}}.\mathsf {D}\rightarrow \mathcal{K}\) be a function family. Let \({\mathsf {r}}\) be the family of all functions with domain \(\mathcal{D}\) and range \(\mathcal{K}\). Let \({\mathsf {Out}}{:\;\,}\mathcal{K}\rightarrow {\mathsf {Out}}.\mathsf {R}\) be an output transform. Let \({\mathsf {A}}\) be the family of all functions with domain \([\![\mathcal{D},{\mathsf {h}}.\mathsf {D}]\!]\) and range \(\mathcal{K}\). Let \(\mathcal{A}\) be an adversary making exactly one query to its \(\textsc {New}\) oracle followed by at most q queries to its \(\textsc {Fn}\) oracle, the second argument of each of the queries in the latter case being a vector \({\mathbf {X}}\in [\![\mathcal{D},{\mathsf {h}}.\mathsf {D}]\!]\) with \(2 \le |{\mathbf {X}}| \le \ell +1\). Let adversary \(\mathcal{A}_{{\mathsf {h}}}\) be as in Fig. 2. Then

$$\begin{aligned} {\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {r}},{\mathsf {h}}]}(\mathcal{A})&\le \ell \cdot {\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {h}},{\mathsf {Out}}}(\mathcal{A}_{{\mathsf {h}}}) . \end{aligned}$$
(7)

Adversary \(\mathcal{A}_{{\mathsf {h}}}\) makes at most q queries to its \(\textsc {New}\) oracle and at most q queries to its \(\textsc {Fn}\) oracle. Its running time is that of \(\mathcal{A}\) plus the time for \(q\ell \) computations of \({\mathsf {h}}\).    \(\blacksquare \)

With the first tier being a random function, Lemma 2 is bounding the single-user (\(\mathcal{A}\) makes one \(\textsc {New}\) query) distance of the augmented 2-tier cascade to the result of applying \({\mathsf {Out}}\) to a random function under our distance metric. The bound of Eq. (7) is in terms of the multi-user security of \({\mathsf {h}}\) as a PRF and grows linearly with one less than the maximum number of blocks in a query.

We note that we could apply Lemma 1 to obtain a bound in terms of the single-user PRF security of \({\mathsf {h}}\), but this is not productive. Instead we will go the other way, later bounding the multi-user security of the 2-tier augmented cascade in terms of the multi-user PRF security of its component functions.

The proof below follows the basic paradigm of the proof of BCK2 [5], which is itself an extension of the classic proof of GGM [18]. However there are several differences: (1) The cascade in BCK2 is single-tier and non-augmented, meaning both the \({\mathsf {r}}\) component and \({\mathsf {Out}}\) are missing (2) BCK2 assume the adversary queries are prefix-free, meaning no query is a prefix of another, an assumption we do not make (3) BCK2 bounds prf security, while we bound the distance.

Fig. 2.
figure 2

Games and adversaries for proof of Theorem 2.

Proof

(Lemma 2 ). Consider the hybrid games and adversaries in Fig. 2. The following chain of equalities establishes Eq. (7) and will be justified below:

$$\begin{aligned} \ell \cdot {\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {h}},{\mathsf {Out}}}(\mathcal{A}_{{\mathsf {h}}})&= {\textstyle \sum _{g=1}^{\ell }} {\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {h}},{\mathsf {Out}}}(\mathcal{A}_g) \end{aligned}$$
(8)
$$\begin{aligned}&= {\textstyle \sum _{g=1}^{\ell } } \Pr [\mathrm {H}_{g-1}] - \Pr [\mathrm {H}_g] \end{aligned}$$
(9)
$$\begin{aligned}&= \Pr [\mathrm {H}_0] - \Pr [\mathrm {H}_{\ell }] \end{aligned}$$
(10)
$$\begin{aligned}&= {\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {r}},{\mathsf {h}}]}(\mathcal{A}) \end{aligned}$$
(11)

Adversary \(\mathcal{A}_{{\mathsf {h}}}\) (bottom left of Fig. 2) picks g at random in the range \(1,\ldots ,\ell \) and runs adversary \(\mathcal{A}_g\) (right of Fig. 2) so \({\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {h}},{\mathsf {Out}}}(\mathcal{A}_{{\mathsf {h}}}) = (1/\ell ) \cdot \sum _{g=1}^{\ell } {\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {h}},{\mathsf {Out}}}(\mathcal{A}_g)\), which explains Eq. (8). For the rest we begin by trying to picture what is going on.

We imagine a tree of depth \(\ell +1\), meaning it has \(\ell +2\) levels. The levels are numbered \(0,1,\ldots ,\ell +1\), with 0 being the root. The root has \(|\mathcal{D}|\) children while nodes at levels \(1,\ldots ,\ell \) have \(|{\mathsf {h}}.\mathsf {D}|\) children each. A query \({\mathbf {X}}\) of \(\mathcal{A}\) in game \(\mathrm {DIST}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {r}},{\mathsf {h}}],{\mathsf {Out}}}(\mathcal{A})\) specifies a path in this tree starting at the root and terminating at a node at level \(n=|{\mathbf {X}}|\). Both the path and the final node are viewed as named by \({\mathbf {X}}\). To a queried node \({\mathbf {X}}\) we associate two labels, an internal label \(T_1[{\mathbf {X}}] \in \mathcal{K}\) and an external label \(T_2[{\mathbf {X}}]={\mathsf {Out}}(T_1[{\mathbf {X}}]) \in {\mathsf {Out}}.\mathsf {R}\). The external label is the response to query \({\mathbf {X}}\). Since the first component of our 2-tier cascade is the family \({\mathsf {r}}\) of all functions from \(\mathcal{D}\) to \(\mathcal{K}\), we can view \(\mathrm {DIST}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {r}},{\mathsf {h}}],{\mathsf {Out}}}(\mathcal{A})\) as picking \(T_1[{\mathbf {X}}[1]]\) at random from \(\mathcal{K}\) and then setting \(T_1[{\mathbf {X}}] = {\mathsf {h}}^*(T_1[{\mathbf {X}}[1]],{\mathbf {X}}[2\ldots n])\) for all queries \({\mathbf {X}}\) of \(\mathcal{A}\).

Now we consider the hybrid games \(\mathrm {H}_0,\ldots ,\mathrm {H}_{\ell }\) of Fig. 2. They simulate \(\mathcal{A}\)’s \(\textsc {New},\textsc {Fn}\) oracles via procedures \(\textsc {New}^*,\textsc {Fn}^*\), respectively. By assumption \(\mathcal{A}\) makes exactly one \(\textsc {New}^*\) query, and this will have to be its first. In response \(\mathrm {H}_s\) picks at random a function \(f {:\;\,}[\![\mathcal{D},\mathcal{K}]\!] \rightarrow \mathcal{K}\). A query \(\textsc {Fn}^*\) has the form \((i,{\mathbf {X}})\) but here i can only equal 1 and is ignored in responding. By assumption \(2\le |{\mathbf {X}}|\le \ell \). The game populates nodes at levels \(2,\ldots ,s\) of the tree with \(T_1[\cdot ]\) values that are obtained via f and thus are random elements of \(\mathcal{K}\). For a node \({\mathbf {X}}\) at level \(n \ge s+1\), the \(T_1[{\mathbf {X}}[1\ldots s+1]]\) value is obtained at random and then further values (if needed, meaning if \(n\ge s+2\)) are computed by applying the cascade \({\mathsf {h}}^*\) with key \(T_1[{\mathbf {X}}[1\ldots s+1]]\) to input \({\mathbf {X}}[s+2\ldots n]\).

Consider game \(\mathrm {H}_0\), where \(s=0\). By assumption \(n\ge 2\) so we will always be in the case \(n\ge s+1\). In the Else statement, \(Y \leftarrow f({\mathbf {X}}[1])\) is initialized as a random element of \(\mathcal{K}\). With this Y as the key, \({\mathsf {h}}^*\) is then applied to \({\mathbf {X}}[2\ldots n]\) to get \(T_1[{\mathbf {X}}]\). This means \(\mathrm {H}_0\) exactly mimics the \(c=1\) case of game \(\mathrm {DIST}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {r}},{\mathsf {h}}],{\mathsf {Out}}}(\mathcal{A})\), so that

$$\begin{aligned} \Pr [\mathrm {H}_0]&= {\Pr }[\,\mathrm {DIST}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {r}},{\mathsf {h}}]}(\mathcal{A}){\!\!}\,\left| \right. \,{\!\!}c=1\,] . \end{aligned}$$
(12)

At the other extreme, consider game \(\mathrm {H}_{\ell }\), where \(s=\ell \). By assumption \(n \le \ell +1\), yielding two cases. If \(n\le \ell \) we are in the \(n\le s\) case and the game, via f, the assigns \(T_1[{\mathbf {X}}]\) a random value. If \(n=\ell +1\) we are in the \(n\ge s+1\) case, but the For loop does nothing so \(T_1[{\mathbf {X}}]\) is again random. This means \(\mathrm {H}_{\ell }\) mimics the \(c=0\) case of game \(\mathrm {DIST}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {r}},{\mathsf {h}}],{\mathsf {Out}}}(\mathcal{A})\), except returning \({\mathsf {true}}\) exactly when the latter returns \({\mathsf {false}}\). Thus

$$\begin{aligned} \Pr [\mathrm {H}_{\ell }]&= 1-{\Pr }[\,\mathrm {DIST}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {r}},{\mathsf {h}}]}(\mathcal{A}){\!\!}\,\left| \right. \,{\!\!}c=0\,] . \end{aligned}$$
(13)

We will justify Eq. (9) in a bit but we can now dispense with the rest of the chain. Equation (10) is obvious because the sum “telescopes”. Equation (11) follows from Eqs. (12) and (13) and the formulation of dist advantage of Eq. (5).

It remains to justify Eq. (9), for which we consider the adversaries \(\mathcal{A}_1,\ldots ,\mathcal{A}_{\ell }\) on the right side of Fig. 2. Adversary \(\mathcal{A}_g\) is playing the \(\mathrm {PRF}\), formally game \(\mathrm {DIST}_{{\mathsf {B}},{\mathsf {h}}}\) on the left of Fig. 1 in our notation, with \({\mathsf {B}}\) the family of all functions from \({\mathsf {h}}.\mathsf {D}\) to \(\mathcal{K}\). It thus has oracles \(\textsc {New},\textsc {Fn}\). It will make crucial use of the assumed multi-user security of \({\mathsf {h}}\), meaning its ability to query \(\textsc {New}\) many times, keeping track in variable u of the number of instances it creates. It simulates the oracles of \(\mathcal{A}\) of the same names via procedures \(\textsc {New}^*,\textsc {Fn}^*\), sampling functions lazily rather than directly as in the games. Arrays \(T_1,T_2,U\) are assumed initially to be everywhere \(\bot \) and get populated as the adversary assigns values to entries. A test of the form “If (not \(T_1[{\mathbf {X}}]\)) ...” returns \({\mathsf {true}}\) if \(T_1[{\mathbf {X}}]=\bot \), meaning has not yet been initialized. In response to the (single) \(\textsc {New}^*\) query of \(\mathcal{A}\), adversary \(\mathcal{A}_g\) does nothing. Following that, its strategy is to have the \(T_1[\cdot ]\) values of level g nodes populated, not explicitly, but implicitly by the keys in game \(\mathrm {DIST}_{{\mathsf {B}},{\mathsf {h}}}\) created by the adversary’s own \(\textsc {New}\) queries, using array U to keep track of the user index associated to a node. \(T_1[\cdot ]\) values for nodes at levels \(1,\ldots ,g-1\) are random. At level \(g+1\), the \(T_1[\cdot ]\) values are obtained via the adversary’s \(\textsc {Fn}\) oracle, and from then on via direct application of the cascade \({\mathsf {h}}^*\). One crucial point is that, if \(\mathcal{A}_g\) does not know the \(T_1[\cdot ]\) values at level g, how does it respond to a length g query \({\mathbf {X}}\) with the right \(T_2[\cdot ]\) value? This is where the leakage enters, the response being the leakage provided by the \(\textsc {New}\) oracle. The result is that for every \(g\in \{1,\ldots ,\ell \}\) we have

$$\begin{aligned} {\Pr }[\,\mathrm {DIST}_{{\mathsf {B}},{\mathsf {h}}}(\mathcal{A}_g){\!}\,\left| \right. \,{\!\!}c=1\,]&= \Pr [\mathrm {H}_{g-1}] \end{aligned}$$
(14)
$$\begin{aligned} 1-{\Pr }[\,\mathrm {DIST}_{{\mathsf {B}},{\mathsf {h}}}(\mathcal{A}_g){\!}\,\left| \right. \,{\!\!}c=0\,]&= \Pr [\mathrm {H}_g] , \end{aligned}$$
(15)

where \(c\) is the challenge bit in game \(\mathrm {DIST}_{{\mathsf {B}},{\mathsf {h}}}\). Thus

$$\begin{aligned} {\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {h}},{\mathsf {Out}}}(\mathcal{A}_g{\!})&= {\Pr }[\,\mathrm {DIST}_{{\mathsf {B}},{\mathsf {h}}}(\mathcal{A}_g){\!}\,\left| \right. \,{\!\!}c=1\,] - \left( 1-{\Pr }[\,\mathrm {DIST}_{{\mathsf {B}},{\mathsf {h}}}(\mathcal{A}_g)\,\left| \right. \,c=0\,] \right) \nonumber \\&= \Pr [\mathrm {H}_{g-1}] - \Pr [\mathrm {H}_g]. \end{aligned}$$
(16)

This justifies Eq. (9).    \(\blacksquare \)

We now extend the above to the case where the first tier \({\mathsf {g}}\) of the 2-tier cascade is a PRF rather than a random function. We will exploit PRF security of \({\mathsf {g}}\) to reduce this to the prior case. Since the proof uses standard methods, it is relegated to [3].

Theorem 3

Let \(\mathcal{K}\) be a non-empty set. Let \({\mathsf {g}}{:\;\,}{\mathsf {g}}.\mathsf {K}\times {\mathsf {g}}.\mathsf {D}\rightarrow \mathcal{K}\) and \({\mathsf {h}}{:\;\,}\mathcal{K}\times {\mathsf {h}}.\mathsf {D}\rightarrow \mathcal{K}\) be function families. Let \({\mathsf {Out}}{:\;\,}\mathcal{K}\rightarrow {\mathsf {Out}}.\mathsf {R}\) be an output transform. Let \({\mathsf {A}}\) be the family of all functions with domain \([\![{\mathsf {g}}.\mathsf {D},{\mathsf {h}}.\mathsf {D}]\!]\) and range \(\mathcal{K}\). Let \(\mathcal{A}\) be an adversary making exactly one query to its \(\textsc {New}\) oracle followed by at most q queries to its \(\textsc {Fn}\) oracle, the second argument of each of the queries in the latter case being a vector \({\mathbf {X}}\in [\![{\mathsf {g}}.\mathsf {D},{\mathsf {h}}.\mathsf {D}]\!]\) with \(2 \le |{\mathbf {X}}| \le \ell +1\). The proof shows how to construct adversaries \(\mathcal{A}_{{\mathsf {h}}},\mathcal{A}_{{\mathsf {g}}}\) such that

$$\begin{aligned} {\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]}(\mathcal{A})&\le \ell \cdot {\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {h}},{\mathsf {Out}}}(\mathcal{A}_{{\mathsf {h}}}) + 2\,{\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {g}}}(\mathcal{A}_{{\mathsf {g}}}). \end{aligned}$$
(17)

Adversary \(\mathcal{A}_{{\mathsf {h}}}\) makes at most q queries to its \(\textsc {New}\) oracle and at most q queries to its \(\textsc {Fn}\) oracle. Adversary \(\mathcal{A}_{{\mathsf {g}}}\) makes one query to its \(\textsc {New}\) oracle and at most q queries to its \(\textsc {Fn}\) oracle. The running time of both constructed adversaries is about that of \(\mathcal{A}\) plus the time for \(q\ell \) computations of \({\mathsf {h}}\).    \(\blacksquare \)

Multi-user security of 2-tier augmented cascade. We now want to assess the multi-user security of a 2-tier augmented cascade. This means we want to bound \({\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]}(\mathcal{A})\) with everything as in Theorem 3 above except that \(\mathcal{A}\) can now make any number \(u\) of \(\textsc {New}\) queries rather than just one. We could do this easily by applying Lemma 1 to Theorem 3, resulting in a bound that is \(u\) times the bound of Eq. (17). We consider Theorem 4 below the most interesting result of this section. It says one can do much better, and in fact the bound for the multi-user case is not much different from that for the single-user case.

Theorem 4

Let \(\mathcal{K}\) be a non-empty set. Let \({\mathsf {g}}{:\;\,}{\mathsf {g}}.\mathsf {K}\times {\mathsf {g}}.\mathsf {D}\rightarrow \mathcal{K}\) and \({\mathsf {h}}{:\;\,}\mathcal{K}\times {\mathsf {h}}.\mathsf {D}\rightarrow \mathcal{K}\) be function families. Let \({\mathsf {Out}}{:\;\,}\mathcal{K}\rightarrow {\mathsf {Out}}.\mathsf {R}\) be an output transform. Let \({\mathsf {A}}\) be the family of all functions with domain \([\![{\mathsf {g}}.\mathsf {D},{\mathsf {h}}.\mathsf {D}]\!]\) and range \(\mathcal{K}\). Let \(\mathcal{A}\) be an adversary making at most \(u\) queries to its \(\textsc {New}\) oracle and at most q queries to its \(\textsc {Fn}\) oracle, the second argument of each of the queries in the latter case being a vector \({\mathbf {X}}\in [\![{\mathsf {g}}.\mathsf {D},{\mathsf {h}}.\mathsf {D}]\!]\) with \(2 \le |{\mathbf {X}}| \le \ell +1\). The proof shows how to construct adversaries \(\mathcal{A}_{{\mathsf {h}}},\mathcal{A}_{{\mathsf {g}}}\) such that

$$\begin{aligned} {\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]}(\mathcal{A})&\le \ell \cdot {\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {h}},{\mathsf {Out}}}(\mathcal{A}_{{\mathsf {h}}}) + 2\,{\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {g}}}(\mathcal{A}_{{\mathsf {g}}}). \end{aligned}$$
(18)

Adversary \(\mathcal{A}_{{\mathsf {h}}}\) makes at most q queries to its \(\textsc {New}\) oracle and at most q queries to its \(\textsc {Fn}\) oracle. Adversary \(\mathcal{A}_{{\mathsf {g}}}\) makes \(u\) queries to its \(\textsc {New}\) oracle and at most q queries to its \(\textsc {Fn}\) oracle. The running time of both constructed adversaries is about that of \(\mathcal{A}\) plus the time for \(q\ell \) computations of \({\mathsf {h}}\).    \(\blacksquare \)

A comparison of Theorems 3 and 4 shows that the bound of Eq. (18) is the same as that of Eq. (17). So where are we paying for \(u\) now not being one? It is reflected only in the resources of adversary \(\mathcal{A}_{{\mathsf {g}}}\), the latter in Theorem 4 making \(u\) queries to its \(\textsc {New}\) oracle rather than just one in Theorem 3.

The proof below showcases one of the advantages of the 2-tier cascade over the basic single-tier one. Namely, by appropriate choice of instantiation of the first tier, we can reduce multi-user security to single-user security in a modular way. In this way we avoid re-entering the proofs above. Indeed, the ability to do this is one of the main reasons we introduced the 2-tier cascade.

Proof

(Theorem 4 ). Let \(\mathcal{D}= [1\ldots u]\). Let \(\overline{{\mathsf {r}}}\) be the family of all functions with domain \(\mathcal{D}\) and range \({\mathsf {g}}.\mathsf {K}\). Let function family \(\overline{{\mathsf {g}}}{:\;\,}\overline{{\mathsf {r}}}.\mathsf {K} \times (\mathcal{D}\times {\mathsf {g}}.\mathsf {D}) \rightarrow \mathcal{K}\) be defined by \(\overline{{\mathsf {g}}}(f,(i,x))={\mathsf {g}}(f(i),x)\). Let \({\mathsf {B}}\) be the family of all functions with domain \([\![\mathcal{D}\times {\mathsf {g}}.\mathsf {D},{\mathsf {h}}.\mathsf {D}]\!]\) and range \(\mathcal{K}\). The main observation is as follows. Suppose \(i\in \mathcal{D}\) and \({\mathbf {X}}\in [\![{\mathsf {g}}.\mathsf {D},{\mathsf {h}}.\mathsf {D}]\!]\). Let \({\mathbf {Y}}\in [\![\mathcal{D}\times {\mathsf {g}}.\mathsf {D},{\mathsf {h}}.\mathsf {D}]\!]\) be defined by \({\mathbf {Y}}[1]=(i,{\mathbf {X}}[1])\) and \({\mathbf {Y}}[j]={\mathbf {X}}[j]\) for \(2\le j \le |{\mathbf {X}}|\). Let \(f {:\;\,}\mathcal{D}\rightarrow {\mathsf {g}}.\mathsf {K}\) be a key for \(\overline{{\mathsf {g}}}\). Then \(f(i)\in {\mathsf {g}}.\mathsf {K}\) is a key for \({\mathsf {g}}\), and

$$\begin{aligned} {\mathbf {CSC}}[\overline{{\mathsf {g}}},{\mathsf {h}}](f,{\mathbf {Y}}) = {\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}](f(i),{\mathbf {X}}). \end{aligned}$$
(19)

Think of f(i) as the key for instance i. Then Eq. (19) allows us to obtain values of \({\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]\) for different instances \(i\in \mathcal{D}\) via values of \({\mathbf {CSC}}[\overline{{\mathsf {g}}},{\mathsf {h}}]\) on a single instance with key f. This will allow us to reduce the multi-user security of \({\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]\) to the single-user security of \({\mathbf {CSC}}[\overline{{\mathsf {g}}},{\mathsf {h}}]\). Theorem 3 will allow us to measure the latter in terms of the prf security of \({\mathsf {h}}\) under leakage and the (plain) prf security of \(\overline{{\mathsf {g}}}\). The final step will be to measure the prf security of \(\overline{{\mathsf {g}}}\) in terms of that of \({\mathsf {g}}\).

Proceeding to the details, let adversary \(\mathcal{B}\) be as follows:

figure e

Then we have

$$\begin{aligned} {\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]}(\mathcal{A})&= {\mathsf {Adv}}^{\mathsf {dist}}_{{\mathsf {Out}}\,\circ \,{\mathsf {B}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[\overline{{\mathsf {g}}},{\mathsf {h}}]}(\mathcal{B}) \end{aligned}$$
(20)
$$\begin{aligned}&\le \ell \cdot {\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {h}},{\mathsf {Out}}}(\mathcal{A}_{{\mathsf {h}}}) + 2\,{\mathsf {Adv}}^{\mathsf {prf}}_{\overline{{\mathsf {g}}}}(\mathcal{A}_{\overline{{\mathsf {g}}}}) \end{aligned}$$
(21)

Adversary \(\mathcal{B}\) is allowed only one \(\textsc {New}\) query, and begins by making it so as to initialize instance 1 in its game. It answers queries of \(\mathcal{A}\) to its \(\textsc {New}\) oracle via procedure \(\textsc {New}^*\). Adversary \(\mathcal{A}\) can make up to \(u\) queries to \(\textsc {New}^*\), but, as the absence of code for \(\textsc {New}^*\) indicates, this procedure does nothing, meaning no action is taken when \(\mathcal{A}\) makes a \(\textsc {New}^*\) query. When \(\mathcal{A}\) queries its \(\textsc {Fn}\) oracle, \(\mathcal{B}\) answers via procedure \(\textsc {Fn}^*\). The query consists of an instance index i with \(1\le i \le u\) and a vector \({\mathbf {X}}\). Adversary \(\mathcal{B}\) creates \({\mathbf {Y}}\) from \({\mathbf {X}}\) as described above. Namely it modifies the first component of \({\mathbf {X}}\) to pre-pend i, so that \({\mathbf {Y}}[1] \in \mathcal{D}\times {\mathsf {g}}.\mathsf {D}\) is in the domain of \(\overline{{\mathsf {g}}}\). It leaves the rest of the components unchanged, and then calls its own \(\textsc {Fn}\) oracle on vector \({\mathbf {Y}}\in [\![\mathcal{D}\times {\mathsf {g}}.\mathsf {D},{\mathsf {h}}.\mathsf {D}]\!]\). The instance used is 1, regardless of i, since \(\mathcal{B}\) has only one instance active. The result Z of \(\textsc {Fn}\) is returned to \(\mathcal{A}\) as the answer to its query. Eq. (20) is now justified by Eq. (19), thinking of f(i) as the key \(K_i\) chosen in game \(\mathrm {DIST}_{{\mathsf {Out}}\,\circ \,{\mathsf {A}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]}(\mathcal{A})\) where f is the (single) key chosen in game \(\mathrm {DIST}_{{\mathsf {Out}}\,\circ \,{\mathsf {B}},{\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[\overline{{\mathsf {g}}},{\mathsf {h}}]}(\mathcal{B})\). Theorem 3 applied to \(\overline{{\mathsf {g}}},{\mathsf {h}}\) and adversary \(\mathcal{B}\) provides the adversaries \(\mathcal{A}_{{\mathsf {h}}},\mathcal{A}_{\overline{{\mathsf {g}}}}\) of Eq. (21).

Now consider adversary \(\mathcal{A}_{{\mathsf {g}}}\) defined as follows:

figure f

Adversary \(\mathcal{A}_{{\mathsf {g}}}\) begins by calling its \(\textsc {New}\) oracle \(u\) times to initialize \(u\) instances. It then runs \(\mathcal{A}_{\overline{{\mathsf {g}}}}\), answering the latter’s oracle queries via procedures \(\textsc {New}^*,\textsc {Fn}^*\). By Theorem 3 we know that \(\mathcal{A}_{\overline{{\mathsf {g}}}}\) makes only one \(\textsc {New}^*\) query. In response the procedure \(\textsc {New}^*\) above does nothing. When \(\mathcal{A}_{\overline{{\mathsf {g}}}}\) makes query jX to \(\textsc {Fn}^*\) we know that \(j=1\) and \(X\in \mathcal{D}\times {\mathsf {g}}.\mathsf {D}\). Procedure \(\textsc {Fn}^*\) parses X as (ix). It then invokes its own \(\textsc {Fn}\) oracle with instance i and input x and returns the result Y to \(\mathcal{A}_{\overline{{\mathsf {g}}}}\). We have

$$\begin{aligned} {\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {g}}}(\mathcal{A}_{{\mathsf {g}}})&= {\mathsf {Adv}}^{\mathsf {prf}}_{\overline{{\mathsf {g}}}}(\mathcal{A}_{\overline{{\mathsf {g}}}}) . \end{aligned}$$
(22)

Equations (21) and (22) imply Eq. (18).    \(\blacksquare \)

One might ask why prove Theorem 4 for a 2-tier augmented cascade \({\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {g}},{\mathsf {h}}]\) instead of a single tier one \({\mathsf {Out}}\,\circ \,{\mathbf {CSC}}[{\mathsf {h}},{\mathsf {h}}]\). Isn’t the latter the one of ultimate interest in usage? We establish a more general result in Theorem 4 because it allows us to analyze \(\mathsf {AMAC}\) itself by setting \({\mathsf {g}}\) to the dual of \({\mathsf {h}}\) [2], and also for consistency with Theorem 3.

6 Framework for Ideal-Model Cryptography

In Sect. 5 we reduced the (mu) security of the augmented cascade tightly to the assumed mu prf security of the compression function under leakage. To complete the story, we will, in Sect. 7, bound the mu prf security of an ideal compression function under leakage and thence obtain concrete bounds for the mu security of the augmented cascade in the same model. Additionally, we will consider the same questions when the compression function is not directly ideal but obtained via the Davies-Meyer transform on an ideal blockcipher, reflecting the design in popular hash functions. If we gave separate, ad hoc definitions for all these different constructions in different ideal models for different goals, it would be a lot of definitions. Accordingly we introduce a general definition of an ideal primitive (that may be of independent interest) and give a general definition of PRF security of a function family with access to an instance of an ideal primitive, both for the basic setting and the setting with leakage. A reader interested in our results on the mu prf security of ideal primitives can jump ahead to Sect. 7 and refer back here as necessary.

Idealized cryptography. We define an ideal primitive to simply be a function family \({\mathbf {P}}{:\;\,}{\mathbf {P}}.\mathsf {K}\times {\mathbf {P}}.\mathsf {D}\rightarrow {\mathbf {P}}.\mathsf {R}\). Below we will provide some examples but first let us show how to lift security notions to idealized models using this definition by considering the cases of interest to us, namely PRFs and PRFs under leakage.

Fig. 3.
figure 3

Games defining prf security of function family \({\mathsf {F}}\) in the presence of an ideal primitive \({\mathbf {P}}\). In the basic (left) case there is no leakage, while in the extended (right) case there is leakage represented by \({\mathsf {Out}}\).

An oracle function family \({\mathsf {F}}\) specifies for each function \({\mathsf {P}}\) in its oracle space \({\mathsf {F}}.\mathsf {O}\) a function family \({\mathsf {F}}^{{\mathsf {P}}} {:\;\,}{\mathsf {F}}.\mathsf {K}\times {\mathsf {F}}.\mathsf {D}\rightarrow {\mathsf {F}}.\mathsf {R}\). We say \({\mathsf {F}}\) and ideal primitive \({\mathbf {P}}\) are compatible if \(\{\,{\mathbf {P}}({\mathsf {KK}},\cdot ) {\,} :\,{\mathsf {KK}}\in {\mathbf {P}}.\mathsf {K}\,\}\subseteq {\mathsf {F}}.\mathsf {O}\), meaning instances of \({\mathbf {P}}\) are legitimate oracles for \({\mathsf {F}}\). These represent constructs whose security we want to measure in an idealized model represented by \({\mathbf {P}}\).

We associate to \({\mathsf {F}},{\mathbf {P}}\) and adversary \(\mathcal{A}\) the game \(\mathrm {PRF}\) in the left of Fig. 3. In this game, \({\mathsf {A}}\) is the family of all functions with domain \({\mathsf {F}}.\mathsf {D}\) and range \({\mathsf {F}}.\mathsf {R}\). The game begins by picking an instance \({\mathsf {P}}{:\;\,}{\mathbf {P}}.\mathsf {D}\rightarrow {\mathbf {P}}.\mathsf {R}\) of \({\mathbf {P}}\) at random. The function \({\mathsf {P}}\) is provided as oracle to \({\mathsf {F}}\) and to \(\mathcal{A}\) via procedure \(\textsc {Prim}\). The game is in the multi-user setting, and when \(c=1\) it selects a new instance \(F_v\) at random from the function family \({\mathsf {F}}^{{\mathsf {P}}}\). Otherwise it selects \(F_v\) to be a random function from \({\mathsf {F}}.\mathsf {D}\) to \({\mathsf {F}}.\mathsf {R}\). As usual a query ix to \(\textsc {Fn}\) must satisfy \(1\le i\le v\) and \(x\in {\mathsf {F}}.\mathsf {D}\). A query to \(\textsc {Prim}\) must be in the set \({\mathbf {P}}.\mathsf {D}\). We let \({\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {F}},{\mathbf {P}}}(\mathcal{A}) = 2\Pr [\mathrm {PRF}_{{\mathsf {F}},{\mathbf {P}}}(\mathcal{A})]-1\) be the advantage of \(\mathcal{A}\).

We now extend this to allow leakage on the key. Let \({\mathsf {Out}}{:\;\,}{\mathsf {F}}.\mathsf {K}\rightarrow {\mathsf {Out}}.\mathsf {R}\) be a function with domain \({\mathsf {F}}.\mathsf {K}\) and range \({\mathsf {Out}}.\mathsf {R}\). Game \(\mathrm {PRF}\) on the right of Fig. 3 is now associated not only to \({\mathsf {F}},{\mathbf {P}}\) and an adversary \(\mathcal{A}\) but also to \({\mathsf {Out}}\). The advantage of \(\mathcal{A}\) is \({\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {F}},{\mathsf {Out}},{\mathbf {P}}}(\mathcal{A}) = 2\Pr [\mathrm {PRF}_{{\mathsf {F}},{\mathsf {Out}},{\mathbf {P}}}(\mathcal{A})]-1\).

Capturing particular ideal models. The above framework allows us to capture the random oracle model, ideal cipher model and many others as different choices of the ideal primitive \({\mathbf {P}}\). Not all of these are relevant to our paper but we discuss them to illustrate how the framework captures known settings.

Let \(\mathcal{Y}\) be a non-empty set. Let \({\mathbf {P}}.\mathsf {K}\) be the set of all functions \({\mathsf {P}}{:\;\,}\{0,1\}^*\rightarrow \mathcal{Y}\). (Each function is represented in some canonical way, in this case for example as a vector over \(\mathcal{Y}\) of infinite length.) Let \({\mathbf {P}}{:\;\,}{\mathbf {P}}.\mathsf {K}\times \{0,1\}^*\rightarrow \mathcal{Y}\) be defined by \({\mathbf {P}}({\mathsf {P}},x) = {\mathsf {P}}(x)\). Then is a random oracle with domain \(\{0,1\}^*\) and range \(\mathcal{Y}\). In this case, an oracle function family compatible with \({\mathbf {P}}\) is simply a function family in the random oracle model, and its prf security in the random oracle model is measured by \({\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {F}},{\mathbf {P}}}(\mathcal{A})\).

Similarly let \({\mathbf {P}}.\mathsf {K}\) be the set of all functions \({\mathsf {P}}{:\;\,}\{0,1\}^*\times {{\mathbb N}}\rightarrow \{0,1\}^*\) with the property that \(|{\mathsf {P}}(x,l)|=l\) for all \((x,l) \in \{0,1\}^*\times {{\mathbb N}}\). Let \({\mathbf {P}}{:\;\,}{\mathbf {P}}.\mathsf {K}\times (\{0,1\}^*\times {{\mathbb N}}) \rightarrow \{0,1\}^*\) be defined by \({\mathbf {P}}({\mathsf {P}},(x,l)) = {\mathsf {P}}(x,l)\). Then is a variable output length random oracle with domain \(\{0,1\}^*\) and range \(\{0,1\}^*\).

Let \(\mathcal{D}\) be a non-empty set. To capture the single random permutation model, let \({\mathbf {P}}.\mathsf {K}\) be the set of all permutations \(\pi {:\;\,}\mathcal{D}\rightarrow \mathcal{D}\). Let \({\mathbf {P}}.\mathsf {D}=\mathcal{D}\times \{+,-\}\). Let \({\mathbf {P}}.\mathsf {R}=\mathcal{D}\). Define \({\mathbf {P}}(\pi ,(x,+))=\pi (x)\) and \({\mathbf {P}}(\pi ,(y,-))=\pi ^{-1}(y)\) for all \(\pi \in {\mathbf {P}}.\mathsf {K}\) and all \(x,y\in \mathcal{D}\). An oracle for an instance \({\mathsf {P}}= {\mathbf {P}}(\pi ,\cdot )\) of \({\mathbf {P}}\) thus allows evaluation of both \(\pi \) and \(\pi ^{-1}\) on inputs of the caller’s choice.

Finally we show how to capture the ideal cipher model. If \(\mathcal{K},\mathcal{D}\) are non-empty sets, a function family \(E{:\;\,}\mathcal{K}\times \mathcal{D}\rightarrow \mathcal{D}\) is a blockcipher if \(E(K,\cdot )\) is a permutation on \(\mathcal{D}\) for every \(K\in \mathcal{K}\), in which case \(E^{-1}{:\;\,}\mathcal{K}\times \mathcal{D}\rightarrow \mathcal{D}\) denotes the blockcipher in which \(E^{-1}(K,\cdot )\) is the inverse of the permutation \(E(K,\cdot )\) for all \(K\in \mathcal{K}\). Let \({\mathbf {P}}.\mathsf {K}\) be the set of all block ciphers \(E{:\;\,}\mathcal{K}\times \mathcal{D}\rightarrow \mathcal{D}\). Let \({\mathbf {P}}.\mathsf {D} = \mathcal{K}\times \mathcal{D}\times \{+,-\}\). Let \({\mathbf {P}}.\mathsf {R}=\mathcal{D}\). Define \({\mathbf {P}}(E,(K,X,+))=E(K,X)\) and \({\mathbf {P}}(E,(K,Y,-))=E^{-1}(K,Y)\) for all \(E\in {\mathbf {P}}.\mathsf {K}\) and all \(X,Y\in \mathcal{D}\). An oracle for an instance \({\mathsf {P}}= {\mathbf {P}}(E,\cdot )\) of \({\mathbf {P}}\) thus allows evaluation of both E and \(E^{-1}\) on inputs of the caller’s choice.

7 Security of the Compression Function Under Leakage

In Sect. 5 we reduced the (multi-user) security of the augmented cascade tightly to the assumed multi-user prf security of the compression function under leakage. To complete the story, we now study (bound) the multi-user prf security of the compression function under leakage. This will be done assuming the compression function is ideal. Combining these results with those of Sect. 5 we will get concrete bounds for the security of the augmented cascade for use in applications, discussed in [3].

In the (leak-free) multi-user setting, it is well known that prf security of a compression function decreases linearly in the number of users. We will show that this is an extreme case, and as the amount of leakage increases, the multi-user prf security degrades far more gracefully in the number of users (Theorem 6). This (perhaps counterintuitive) phenomenon will turn out to be essential to obtain good bounds on augmented cascades. We begin below with an informal overview of the bounds and why this phenomenon occurs.

Overview of bounds. The setting of an ideal compression function mapping \(\mathcal{K}\times \mathcal{X}\rightarrow \mathcal{D}\) is formally captured, in the framework of Sect. 6, by the ideal primitive \(\mathbf {F}{:\;\,}\mathbf {F}.\mathsf {K} \times (\mathcal{K}\times \mathcal{X})\rightarrow \mathcal{K}\) defined as follows. Let \(\mathbf {F}.\mathsf {K}\) be the set of all functions mapping \(\mathcal{K}\times \mathcal{X}\rightarrow \mathcal{K}\) and let \(\mathbf {F}(\mathsf {f},(K,X))=\mathsf {f}(K,X)\). Now, the construction we are interested in is the simplest possible, namely the compression function itself. Formally, again as per Sect. 6, this means we consider the oracle function family \(\mathsf {CF}\) whose oracle space \(\mathsf {CF}.\mathsf {O}\) consists of all functions \(\mathsf {f}{:\;\,}\mathcal{K}\times \mathcal{X}\rightarrow \mathcal{K}\), and with \(\mathsf {CF}^\mathsf {f}= \mathsf {f}\).

Fig. 4.
figure 4

Upper bounds on prf advantage of an adversary \(\mathcal{B}\) attacking an ideal compression function mapping \(\{0,1\}^c\times \mathcal{X}\) to \(\{0,1\}^c\). Left: Basic case, without leakage. Right: With leakage \({\mathsf {Out}}\) being the truncation function that returns the first \(r \le c\) bits of its output. First row: Single user security, \(q_{\textsc {f}}\) is the number of queries to the ideal compression function. Second row: Multi-user security as obtained trivially by applying Lemma 1 to the su bound, \(u\) is the number of users. Third row: Multi-user security as obtained by a dedicated analysis, with the bound in the leakage case being from Theorem 6.

For this overview we let \(\mathcal{K}= \{0,1\}^c\). We contrast the prf security of an ideal compression function along two dimensions: (1) Number of users, meaning su or mu, and (2) basic (no leakage) or with leakage. The bounds are summarized in Fig. 4 and discussed below. When we say the (ij) table entry we mean the row i, column j entry of the table of Fig. 4.

First consider the basic (no leakage) case. We want to upper bound \({\mathsf {Adv}}^{\mathsf {prf}}_{\mathsf {CF},\mathbf {F}}(\mathcal{B})\) for an adversary \(\mathcal{B}\) making \(q_{\textsc {f}}\) queries to the ideal compression function (oracle \(\textsc {Prim}\)) and q queries to oracle \(\textsc {Fn}\). In the su setting (one \(\textsc {New}\) query) it is easy to see that the bound is the (1, 1) table entry. This is because a fairly standard argument bounds the advantage by the probability that \(\mathcal{B}\) makes a \(\textsc {Prim}\) query containing the actual secret key K used to answer \(\textsc {Fn}\) queries. We refer to issuing such a query as guessing the secret key K. Note that this probability is actually independent of the number q of \(\textsc {Fn}\) queries and q does not figure in the bound. Now move to the mu setting, and let \(\mathcal{B}\) make \(u\) queries to its \(\textsc {New}\) oracle. Entry (2,1) of the table is the trivial bound obtained via Lemma 1 applied with \({\mathsf {F}}_1\) being our ideal compression function and \({\mathsf {F}}_0\) a family of all functions, but one has to be careful in applying the lemma. The subtle point is that adversary \(\mathcal{A}_1\) built in Lemma 1 runs \(\mathcal{B}\) but makes an additional q queries to \(\textsc {Prim}\) to compute the function \({\mathsf {F}}_1\), so its advantage is the (1, 1) table entry with \(q_{\textsc {f}}\) replaced by \(q_{\textsc {f}}+q\). This term gets multiplied by \(u\) according to Eq. (6), resulting in our (1, 2) table entry. A closer look shows one can do a tad better: the bound of the (1, 1) table entry extends with the caveat that a collisions between two different keys also allows the adversary to distinguish. In other words, the advantage is now bounded by the probability that \(\mathcal{B}\) guesses any of the \(u\) keys \(K_1, \ldots , K_{u}\), or that any two of these keys collide. This yields the (1, 3) entry of the table. Either way, the (well known) salient point here is that the advantage in the mu case is effectively \(u\) times the one in the su case.

We show that the growth of the advantage as a function of the number of users becomes far more favorable when the adversary obtains some leakage about the secret key under some function \({\mathsf {Out}}\). For concreteness we take the leakage function to be truncation to r bits, meaning \({\mathsf {Out}}= \mathsf {TRUNC}_r\) is the function that returns the first \(r \le c\) bits of its input. (Theorem 6 will consider a general \({\mathsf {Out}}\).) Now we seek to bound \({\mathsf {Adv}}^{\mathsf {prf}}_{\mathsf {CF},{\mathsf {Out}},\mathbf {F}}(\mathcal{B})\). Now, given only \(\mathsf {TRUNC}_r(K)\) for a secret key K, then there are only \(2^{c - r}\) candidate secret keys consistent with this leakage, thus increasing the probability that the adversary can guess the secret key. Consequently, the leakage-free bound from of the (1,1) entry generalizes to the bound of the (2,1) entry. Moving to multiple users, the (2,2) entry represents the naive bound obtained by applying Lemma 1. It is perhaps natural to expect that this is best possible as in the no-leakage case. We however observe that this is overly pessimistic. To this end, we exploit the following simple fact: Every \(\textsc {Prim}\) query (KX) made by \(\mathcal{B}\) to the ideal compression function can only help in guessing a key \(K_i\) such that \({\mathsf {Out}}(K) = {\mathsf {Out}}(K_i)\). In particular, every \(\textsc {Prim}\) query (KX) has only roughly \(m\cdot 2^{-(c - r)}\) chance of guessing one of the \(u\) keys, where m is the number of generated keys \(K_i\) such that \({\mathsf {Out}}(K_i) = K\). A standard balls-into-bins arguments (Lemma 5) can be used to infer that except with small probability (e.g., \(2^{-c}\)), we always have \(m \le 2 u/2^r + 3cr\) for any K. Combining these two facts yields our bound, which is the (3,2) entry of the table. Theorem 6 gives a more general result and the full proof. Note that if \(r = 0\), i.e., nothing is leaked, this is close to the bound of the (1,3) entry and the bound does grow linearly with the number of users, but as r grows, the \(3crq_{\textsc {f}}\cdot 2^{-(c - r)}\) term becomes the leading one, and does not grow with \(u\). We now proceed to the detailed proof of the (3,2) entry.

Combinatorial preliminaries. Our statements below will depend on an appropriate multi-collision probability of the output function \({\mathsf {Out}}{:\;\,}{\mathsf {Out}}.\mathsf {D} \rightarrow {\mathsf {Out}}.\mathsf {R}\). In particular, for any \(X_1, \ldots , X_u\in {\mathsf {Out}}.\mathsf {R}\), we first define

$$\begin{aligned} \mu (X_1, \ldots , X_u) = \max _{Y \in {\mathsf {Out}}.\mathsf {R}} \left| {\{\,i {\,} :\,X_i =Y\,\}}\right| , \end{aligned}$$

i.e., the number of occurrences of the most frequent value amongst \(X_1, \ldots , X_u\). In particular, this is an integer between 1 and \(u\), and \(\mu (X_1, \ldots , X_u) = 1\) if all elements are distinct, whereas \(\mu (X_1, \ldots , X_u) = u\) if they are all equal. (Note when \(u=1\) the function has value 1.) Then, the m-collision probability of \({\mathsf {Out}}\) for \(u\) users is defined as

(23)

We provide a bound on \({\mathsf {P}}^{\mathsf {coll}}_{{\mathsf {Out}}}(u, m)\) for the case where \({\mathsf {Out}}(K)\), for a random K, is close enough to uniform. (We stress that a combinatorial restriction on \({\mathsf {Out}}\) is necessary for this probability to be small – it would be one if \({\mathsf {Out}}\) is the contant function, for example.) To this end, denote

$$\begin{aligned} \delta ({\mathsf {Out}}) = {\mathbf {SD}}({\mathsf {Out}}(\mathrm {K}), \mathrm {R}) = \frac{1}{2} \sum _{y \in {\mathsf {Out}}.\mathsf {R}} \left| {\Pr \left[ \,{{\mathsf {Out}}(\mathrm {K}) = y}\,\right] } - \frac{1}{\left| {{\mathsf {Out}}.\mathsf {R}}\right| }\right| , \end{aligned}$$
(24)

i.e., the statistical distance between \({\mathsf {Out}}(\mathrm {K})\), where \(\mathrm {K}\) is uniform on \({\mathsf {Out}}.\mathsf {D}\), and a random variable \(\mathrm {R}\) uniform on \({\mathsf {Out}}.\mathsf {R}\).

We will use the following lemma, which we prove using standard balls-into-bins techniques. The proof is deferred to [3].

Lemma 5

(Multi-collision probability). Let \({\mathsf {Out}}: {\mathsf {Out}}.\mathsf {D} \rightarrow {\mathsf {Out}}.\mathsf {R}\), \(u\ge 1\), and \(\lambda \ge 0\). Then, for any \(m\le u\) such that

$$\begin{aligned} m\ge \frac{2u}{\left| {{\mathsf {Out}}.\mathsf {R}}\right| } + \lambda \ln \left| {{\mathsf {Out}}.\mathsf {R}}\right| , \end{aligned}$$
(25)

we have

$$\begin{aligned} {} {\mathsf {P}}^{\mathsf {coll}}_{{\mathsf {Out}}}(u,m) \le u\cdot \delta ({\mathsf {Out}}) + \exp (-\lambda /3).\quad \blacksquare \end{aligned}$$

We stress that the factor 2 in Eq. (25) can be omitted (one can use an additive Chernoff bound when \(u\) is sufficiently large in the proof given below, rather than a multiplicative one) at the cost of a less compact statement. As this factor will not be crucial in the following, we keep this simpler variant.

For the analysis below, we also need to use a lower bound the number of potential preimages of a given output. To this end, given \({\mathsf {Out}}{:\;\,}{\mathsf {Out}}.\mathsf {D} \rightarrow {\mathsf {Out}}.\mathsf {R}\), we define

$$\begin{aligned} \rho ({\mathsf {Out}}) = \min _{y \in {\mathsf {Out}}.\mathsf {R}} \left| {{\mathsf {Out}}^{-1}(y)}\right| . \end{aligned}$$

Security of ideal compression functions. The following theorem establishes the multi-user security under key-leakage of a random compression function. We stress that the bound here does not depend on the number of queries the adversary \(\mathcal{B}\) makes to oracle \(\textsc {Fn}\). Also, the parameter \(m\) can be set arbitrarily in the theorem statement for better flexibility, even though our applications below will mostly use the parameters from Lemma 5.

Theorem 6

Let \({\mathsf {Out}}{:\;\,}\mathcal{K}\rightarrow {\mathsf {Out}}.\mathsf {R}\). Then, for all \(m\ge 1\), and all adversaries \(\mathcal{B}\) making \(u\) queries to \(\textsc {New}\), and \(q_{\textsc {f}}\) queries to \(\textsc {Prim}\),

$$\begin{aligned} {} {\mathsf {Adv}}^{\mathsf {prf}}_{\mathsf {CF},{\mathsf {Out}},\mathbf {F}}(\mathcal{B}) \le \frac{u^2}{2 \left| {\mathcal{K}}\right| } + {\mathsf {P}}^{\mathsf {coll}}_{{\mathsf {Out}}}(u, m) + \frac{(m- 1) \cdot q_{\textsc {f}}}{\rho ({\mathsf {Out}})} . \quad \blacksquare \end{aligned}$$

The statement could be rendered useless whenever \(\rho ({\mathsf {Out}}) = 1\) because a single point has a single pre-image. We note here that Theorem 6 can easily be generalized to use a “soft” version of \(\rho ({\mathsf {Out}})\) guaranteeing that the number of preimages of a point is bounded from below by \(\rho ({\mathsf {Out}})\), except with some small probability \(\epsilon \), at the cost of an extra additive term \(u\cdot \epsilon \). This more general version will not be necessary for our applications. We also note that it is unclear how to use the average number of preimages of \({\mathsf {Out}}(\mathrm {K})\) in our proof.

Fig. 5.
figure 5

Games \(\mathrm {G}_0\) and \(\mathrm {G}_1\) in the proof of Theorem 6. The boxed assignment statements are only executed in Game \(\mathrm {G}_1\), but not in Game \(\mathrm {G}_0\).

Proof

(Theorem 6 ). The first step of the proof involves two games, \(\mathrm {G}_0\) and \(\mathrm {G}_1\), given in Fig. 5. Game \(\mathrm {G}_1\) is semantically equivalent to \(\mathrm {PRF}_{\mathsf {CF},{\mathsf {Out}},\mathbf {F}}\) with challenge bit \(c = 1\), except that we have modified the concrete syntax of the oracles. In particular, the randomly sampled function is now implemented via lazy sampling, and the table entry \(T_{\textsc {f}}[k, x]\) contains the value of \(\mathsf {f}(k, x)\) if it has been queried. Otherwise, \(T_{\textsc {f}}\) is \(\bot \) on all entries which have not been set. Also, the game keeps another table \(T_{\textsc {Fn}}\) such that \(T_{\textsc {Fn}}[i, x]\) contains the value returned upon a query \(\textsc {Fn}(i, x)\). Note that the game enforces that any point in time, if \(T_{\textsc {Fn}}[i, x]\) and \(T_{\textsc {f}}[K_i, x]\) are both set (i.e., they are not equal \(\bot \)), then we also have \(T_{\textsc {Fn}}[i, x] = T_{\textsc {f}}[K_i, x]\) and that, moreover, if \(K_i = K_j\), then \(T_{\textsc {Fn}}[i, x] = T_{\textsc {Fn}}[j, x]\) whenever both are not \(\bot \). Finally, whenever any of these entries is set for the first time, then it is set to a fresh random value from \(\mathcal{K}\). This guarantees that the combined behavior of the \(\textsc {Fn}\) and the \(\textsc {Prim}\) oracles are the same as in \(\mathrm {PRF}_{\mathsf {CF}, {\mathsf {Out}}, \mathbf {F}}\) for the case \(c = 1\). Thus,

$$\begin{aligned} {\Pr \left[ \,{G_1}\,\right] } = {\Pr }[\,\mathrm {PRF}_{\mathsf {CF},{\mathsf {Out}},\mathbf {F}}{\!}\,\left| \right. \,{\!}c = 1\,]. \end{aligned}$$

It is easier to see that in game \(\mathrm {G}_0\), in contrast, the \(\textsc {Prim}\) and \(\textsc {Fn}\) oracles always return random values, and thus, since we are checking whether \(c'\) equals 1, rather than c, we get \({\Pr \left[ \,{G_0}\,\right] } = 1 - {\Pr }[\,\mathrm {PRF}_{\mathsf {CF},{\mathsf {Out}},\mathbf {F}}\,\left| \right. \,c = 0\,]\), and consequently,

$$\begin{aligned} {\mathsf {Adv}}^{\mathsf {prf}}_{\mathsf {CF},{\mathsf {Out}},\mathbf {F}}(\mathcal{B}) = {\Pr \left[ \,{\mathrm {G}_1}\,\right] } - {\Pr \left[ \,{\mathrm {G}_0}\,\right] } . \end{aligned}$$

Both games \(\mathrm {G}_0\) and \(\mathrm {G}_1\) also include two flags \({\mathsf {bad}}_1\) and \({\mathsf {bad}}_2\), initially false, which can be set to \({\mathsf {true}}\) when specific events occur. In particular, \({\mathsf {bad}}_1\) is set whenever one of the following two events happens: Either \(\mathcal{B}\) queries \(\textsc {Fn}(i, x)\) after querying \(\textsc {Prim}(K_i, x)\), or \(\mathcal{B}\) queries \(\textsc {Prim}(K_i, x)\) after querying \(\textsc {Fn}(i, x)\). Moreover, \({\mathsf {bad}}_2\) is set whenever \(\mathcal{B}\) queries \(\textsc {Fn}(i, x)\) after \(\textsc {Fn}(j, x)\), \(K_i = K_j\), and \(\textsc {Prim}(K_i, x) = \textsc {Prim}(K_j, x)\) was not queried earlier. (Note that if the latter condition is not true, then \({\mathsf {bad}}_1\) has been set already.) It is immediate to see that \(\mathrm {G}_0\) and \(\mathrm {G}_1\) are identical until \({\mathsf {bad}}_1 \vee {\mathsf {bad}}_2\) is set. Therefore, by the fundamental lemma of game playing [6],

Fig. 6.
figure 6

Games \(\mathrm {H}_0\) and \(\mathrm {H}_1\) in the proof of Theorem 6. Both games share the same \(\textsc {New}\), \(\textsc {Prim}\), and \(\textsc {Fn}\) oracles, the only difference being the additional re-sampling of the secret keys \(K_i'\) in the main procedure of \(\mathrm {H}_1\).

$$\begin{aligned} {\mathsf {Adv}}^{\mathsf {prf}}_{\mathsf {CF},{\mathsf {Out}},\mathbf {F}}(\mathcal{B}) = {\Pr \left[ \,{\mathrm {G}_1}\,\right] } - {\Pr \left[ \,{\mathrm {G}_0}\,\right] } \le {\Pr \left[ \,{\mathrm {G}_0 {\mathrm {\; sets \;}}{\mathsf {bad}}_1}\,\right] } + {\Pr \left[ \,{\mathrm {G}_0 {\mathrm {\; sets \;}}{\mathsf {bad}}_2}\,\right] }. \end{aligned}$$
(26)

We immediately note that in order for \({\mathsf {bad}}_2\) to be set in \(\mathrm {G}_0\), we must have \(K_i = K_j\) for distinct \(i \ne j\), i.e., two keys must collide. Since we know that at most \(u\) calls are made to \(\textsc {New}\), a simple Birthday bound yields

$$\begin{aligned} {\Pr \left[ \,{\mathrm {G}_0 {\mathrm {\; sets \;}}{\mathsf {bad}}_2}\,\right] } \le \frac{u^2}{2 \cdot \left| {\mathcal{K}}\right| }. \end{aligned}$$
(27)

The rest of the proof thus deals with the more difficult problem of bounding \({\Pr \left[ \,{\mathrm {G}_0 {\mathrm {\; sets \;}}{\mathsf {bad}}_1}\,\right] }\). To simplify this task, we first introduce a new game, called \(\mathrm {H}_0\) (cf. Fig. 6), which behaves as \(\mathrm {G}_0\), except that it only checks at the end of the game whether the bad event triggering \({\mathsf {bad}}_1\) has occurred during the interaction, in which case the game outputs \({\mathsf {true}}\). Note that we are relaxing this check a bit further compared with \(\mathrm {G}_0\), allowing it to succeed as long as a query to \(\textsc {Prim}\) of form \((K_j, x)\) for some j and some x was made, even if \(\textsc {Fn}(j, x)\) was never queried before. Therefore,

$$\begin{aligned} {\Pr \left[ \,{\mathrm {G}_0 {\mathrm {\; sets \;}}{\mathsf {bad}}_1}\,\right] } \le {\Pr \left[ \,{\mathrm {H}_0}\,\right] }. \end{aligned}$$
(28)

Note that in \(\mathrm {H}_0\), the replies to all oracle calls made by \(\mathcal{B}\) do not depend on the keys \(K_1, K_2, \ldots \) anymore, except for the leaked values \({\mathsf {Out}}(K_1), {\mathsf {Out}}(K_2), \ldots \) returned by calls to \(\textsc {New}\). We introduce a new and final game \(\mathrm {H}_1\) which modifies \(\mathrm {H}_0\) by pushing the sampling of the actual key values as far as possible in the game: That is, we first only gives values to \(\mathcal{B}\) with the correct leakage distribution, and in the final phase of \(\mathrm {H}_1\), when computing the game output, we sample keys that are consistent with this leakage. In other words, in the final check we replace the keys \(K_1, K_2, \ldots \) with freshly sampled key \(K'_1, K_2', \ldots \), which are uniform, under the condition that \({\mathsf {Out}}(K_i) = {\mathsf {Out}}(K_i') = Y_i\).

It is not hard to see that \({\Pr \left[ \,{\mathrm {H}_0}\,\right] } = {\Pr \left[ \,{\mathrm {H}_1}\,\right] }\). This follows from two observations: First, for every i, the joint distribution of \((K_i, Y_i = {\mathsf {Out}}(K_i))\) is identical to that of \((K_i', Y_i = {\mathsf {Out}}(K_i))\), since given \(Y_i\), both \(K_i\) and \(K_i'\) are uniformly distributed over the set of pre-images of \(Y_i\). Second, the behavior of both \(\mathrm {H}_0\) and \(\mathrm {H}_1\), before the final check to decide their outputs, only depends on values \(Y_i = {\mathsf {Out}}(K_i)\), and not on the \(K_i\)’s. The actual keys \(K_i\) are only used for the final check, and since the probability distributions of \(K_i\) and \(K_i'\) conditioned on \({\mathsf {Out}}(Y_i)\) are identical, then so are the probabilities of outputting \({\mathsf {true}}\) in games \(\mathrm {H}_0\) and \(\mathrm {H}_1\).

Thus, combining Eqs. (26), (27), and (28), we have

$$\begin{aligned} {\mathsf {Adv}}^{\mathsf {prf}}_{\mathsf {CF},{\mathsf {Out}},\mathbf {F}}(\mathcal{B}) \le \frac{u^2}{2 \cdot \left| {\mathcal{K}}\right| } + {\Pr \left[ \,{\mathrm {H}_1}\,\right] }. \end{aligned}$$
(29)

We are left with computing an upper bound on \({\Pr \left[ \,{\mathrm {H}_1}\,\right] }\). For this purpose, denote by \(\mathcal{S}\) the set of pairs (kx) on which \(T_{\textsc {f}}[k, x] \ne \bot \) after \(\mathcal{B}\) outputs its bit \(c'\) in \(\mathrm {H}_1\). Also, let \(\mathcal{Y}\) be the multi-set \(\{Y_0, Y_1, \ldots , Y_{u-1}\}\) of values output by \(\textsc {New}\) to \(\mathcal{B}\), and denote \(\overline{\mathcal{Y}}\) the resulting set obtained by removing repetitions. Note that \(\left| {\mathcal{S}}\right| \le q_{\textsc {f}}\) and \(\left| {\overline{\mathcal{Y}}}\right| \le \left| {\mathcal{Y}}\right| \le u\), and the first inequality may be strict, since some elements can be repeated due to collisions \({\mathsf {Out}}(K_i) = {\mathsf {Out}}(K_j)\).

Asume that now \(\mathcal{S}\) and \(\mathcal{Y}\) are given and fixed. We proceed to compute the probability that \(\mathrm {H}_1\) outputs \({\mathsf {true}}\) conditioned on the event that \(\mathcal{S}\) and \(\mathcal{Y}\) have been generated. For notational help, for every \(y \in \overline{\mathcal{Y}}\), also denote

$$\begin{aligned} \mathcal{S}_y = \{\,(k, x) \in \mathcal{S} {\,} :\,{\mathsf {Out}}(k) = y\,\}, \end{aligned}$$

and let \(q_y = \left| {\mathcal{S}_y}\right| \). Also, let \(n_y\) be the number of occurrence of \(y \in \overline{\mathcal{Y}}\) in \(\mathcal{Y}\). Note that except with probability \({\mathsf {P}}^{\mathsf {coll}}(u, m)\), we have \(n_{y} \le m- 1\) for all \(y \in \overline{\mathcal{Y}}\), and thus

$$\begin{aligned} \begin{aligned} {\Pr \left[ \,{\mathrm {H}_1}\,\right] }&\le {\Pr \left[ \,{\exists y \in \overline{\mathcal{Y}} \;:\; n_y \ge m}\,\right] } + {\Pr }[\,\mathrm {H}_1\,\left| \right. \,\forall y \in \overline{\mathcal{Y}}\;:\, n_y< m\,] \\&= {\mathsf {P}}^{\mathsf {coll}}_{{\mathsf {Out}}}(u, m) + {\Pr }[\,\mathrm {H}_1\,\left| \right. \,\forall y \in \overline{\mathcal{Y}}\;:\, n_y < m\,]. \end{aligned} \end{aligned}$$
(30)

Therefore, let us assume we are given \(\mathcal{S}\) and \(\mathcal{Y}\) sich that \(n_{y} \le m- 1\) for all \(y \in \overline{\mathcal{Y}}\). Denote by \({\Pr }[\,\mathrm {H}_1\,\left| \right. \,\mathcal{S}, \mathcal{Y}\,]\) the probability that \(\mathrm {H}_1\) outputs \({\mathsf {true}}\) conditioned on the fact that this \(\mathcal{S}\) and \(\mathcal{Y}\) has been generated. Using the fact that the keys \(K_0', K_1', \ldots K'_{u-1}\) are sampled independently of \(\mathcal{S}\), we compute

$$\begin{aligned} \mathrm{Pr}[{\mathrm {H}_1} \vert {\mathcal{S}, \mathcal{Y}}] =&\;{\Pr \left[ \,{\exists j, x: (K_j', x) \in \mathcal{S}}\,\right] } \le \sum _{y \in \mathcal{Y}} \frac{q_y \cdot n_y}{\left| {{\mathsf {Out}}^{-1}(y)}\right| } \\&\le (m- 1) \cdot \sum _{y \in \mathcal{Y}} \frac{q_y}{\left| {{\mathsf {Out}}^{-1}(y)}\right| } \le \frac{m- 1}{\rho ({\mathsf {Out}})} \sum _{y \in \mathcal{Y}} q_{y} \le \frac{(m- 1) q_{\textsc {f}}}{\rho ({\mathsf {Out}})}. \end{aligned}$$

Since the bound holds for all such \(\mathcal{S}\) and \(\mathcal{Y}\), we also have

$$\begin{aligned} {\Pr }[\,\mathrm {H}_1{\!}\,\left| \right. \,{\!}\forall y \in \overline{\mathcal{Y}}\;:\, n_y < m\,] \le \frac{(m - 1) q_{\textsc {f}}}{\rho ({\mathsf {Out}})}. \end{aligned}$$
(31)

The final bound follows by combining Eqs. (29), (30), and (31).    \(\blacksquare \)

Security of the Davies-Meyer construction. One might object that practical compression functions are not un-structured enough to be treated as random because they are built from blockciphers via the Davies-Meyer construction. Accordingly, in [3], we study the mu PRF security under leakage of the Davies-Meyer construction with an ideal blockcipher and show that bounds of the quality we have seen for a random compression function continue to hold.

8 Quantitative Bounds for Augmented Cascades and AMAC

We consider two instantiations of augmented cascades, one using bit truncation, the other using modular reduction. We give concrete bounds on the mu prf security of these constructions in the ideal compression function model, combining results from above. This will give us good guidelines for a comparison with existing constructions – such as \(\mathsf {NMAC}\) and sponges – in [3].

Bit truncation. Let \(\mathcal{K}= \{0,1\}^c\), and \({\mathsf {Out}}= \mathsf {TRUNC}_r: \{0,1\}^c \rightarrow \{0,1\}^{r}\), for \(r \le c\), outputs the first r bits of its inputs, i.e., \(\mathsf {TRUNC}_r(X) = X[1\ldots r]\). Note that \(\delta (\mathsf {TRUNC}_r) = 0\), since omitting \(c - r\) bits does not affect uniformity, and \(\rho (\mathsf {TRUNC}_r) = 2^{c - r}\), since every r-bit strings has \(2^{c - r}\) preimages. Then, combining Lemma 5 with Theorem 6, using \(m= 2u/2^r + 3c r\), we obtain the following corollary, denoting with \(\mathbf {F}_{c}\) the ideal compression function for \(\mathcal{K}= \{0,1\}^c\). (We do not specify \(\mathcal{X}\) further, as it does not influence the statement.)

Corollary 7

For any \(c \le r\), and all adversaries \(\mathcal{B}\) making \(u\) queries to \(\textsc {New}\) and \(q_{\textsc {f}}\) queries to \(\textsc {Prim}\),

$$\begin{aligned} {}{\mathsf {Adv}}^{\mathsf {prf}}_{\mathsf {CF},\mathsf {TRUNC}_r,\mathbf {F}_c}(\mathcal{B}) \le \frac{u^2}{2^{c+1}} + \frac{2u \cdot q_{\textsc {f}}}{2^{c}} + \frac{3cr \cdot q_{\textsc {f}}}{2^{c-r}} + \exp (-c).\quad \blacksquare \end{aligned}$$

We can then use this result to obtain our bounds for the augmented cascade \({\mathbf {ACSC}}[\mathsf {CF},\mathsf {CF},\mathsf {TRUNC}_r]\) when using an ideal compression function \(\{0,1\}^c \times \mathcal{X}\rightarrow \{0,1\}^c\). The proof is in [3].

Theorem 8

(mu prf security for r -bit truncation). For any \(r \le n\), and all adversaries \(\mathcal{A}\) making q queries to \(\textsc {Fn}\) consisting of vectors from \(\mathcal{X}^*\) of length at most \(\ell \), \(q_{\textsc {f}}\) queries to \(\textsc {Prim}\), and \(u\le q\) queries to \(\textsc {New}\),

$$\begin{aligned}{}{\mathsf {Adv}}^{\mathsf {prf}}_{{\mathbf {ACSC}}[\mathsf {CF},\mathsf {CF},\mathsf {TRUNC}_r],\mathbf {F}_c}(\mathcal{A}) \le \frac{5\ell ^2 q^2 + 3 \ell q q_{\textsc {f}}}{2^c} + \frac{3cr (\ell ^2 q + \ell q_{\textsc {f}})}{2^{c-r}} + \ell \exp ({-c}),\quad \blacksquare \end{aligned}$$

Modular reduction. Our second example becomes particularly important for the application to the Ed25519 signature scheme.

Here, we let \(\mathcal{K}= {{\mathbb Z}}_{N}\), and consider the output function \({\mathsf {Out}}= \mathsf {MOD}_M: {{\mathbb Z}}_N \rightarrow {{\mathbb Z}}_M\) for \(M \le N\) is such that \(\mathsf {MOD}_M(X) = X \mod M\). (Note that as a special case, we think of \(\mathcal{K}= \{0,1\}^c\) here as \({{\mathbb Z}}_{2^c}\).) We need the following two properties of \(\mathsf {MOD}_M\), proved in [3].

Lemma 9

For all \(M \le N\): (1) \(\rho (\mathsf {MOD}_M) \ge \frac{N}{M} - 1\), (2) \(\delta (\mathsf {MOD}_M) \le M/N\).

Then, combining Lemmas 5 and 9 with Theorem 6, using \(m= 2u/M + 3\ln N \ln M\), we obtain the following corollary, denoting with \(\mathbf {F}_{N}\) the ideal compression function with \(\mathcal{K}= Z_N\). (As above, we do not specify \(\mathcal{X}\) further, as it does not influence the statement.)

Corollary 10

For any \(M \le N/2\), and all adversaries \(\mathcal{B}\) making \(u\) queries to \(\textsc {New}\) and \(q_{\textsc {f}}\) queries to \(\textsc {Prim}\),

$$\begin{aligned} {}{\mathsf {Adv}}^{\mathsf {prf}}_{\mathsf {CF},\mathsf {MOD}_M,\mathbf {F}_{N}}(\mathcal{B}) \le \frac{u^2}{2 N} + \frac{uM}{N} + \frac{4u \cdot q_{\textsc {f}}}{N} + \frac{6M \ln N \ln M \cdot q_{\textsc {f}}}{N} + \frac{1}{N}.\quad \blacksquare \end{aligned}$$

This can once again be used to obtain the final analysis of the augmented cascade using modular reduction. The proof is similar to that of Theorem 8 and is deferred to [3].

Theorem 11

(mu prf security for modular reduction). For any \(M \le N/2\), and all adversaries \(\mathcal{A}\) making q queries to \(\textsc {Fn}\) consisting of vectors from \(\mathcal{X}^*\) of length at most \(\ell \), \(q_{\textsc {f}}\) queries to \(\textsc {Prim}\), and \(u\le q\) queries to \(\textsc {New}\),

$$\begin{aligned} {\mathsf {Adv}}^{\mathsf {prf}}_{{\mathbf {ACSC}}[\mathsf {CF},\mathsf {CF},\mathsf {MOD}_M],\mathbf {F}_N}(\mathcal{A})&\le \frac{5\ell ^2 q^2 + 3 \ell q q_{\textsc {f}}}{N} \\&\quad + \frac{7 M \ln N \ln M (\ell ^2 q + \ell q_{\textsc {f}}) }{N} + \frac{\ell }{N}.\quad \blacksquare \end{aligned}$$

Bounds for AMAC. The above bounds are for augmented cascades, but they can easily be adapted to \(\mathsf {AMAC}\), at the cost of adding an extra additive term, which we now discuss. Recall that \(\mathsf {AMAC}(K,M) = {\mathsf {Out}}(H(K\Vert M))\), where the iterated hash function H is derived from a compression function \({\mathsf {h}}\). We only consider here the special case where the key K is completely handled by the first compression function call of H (and is exactly a random element of \(\mathcal{X}\)), and the message is processed from the second call onwards. In other words, \(\mathsf {AMAC}\) is the 2-tier cascade with the first tier being the dual of \({\mathsf {h}}\), meaning the key and input roles are swapped. In particular, we can use Theorem 4, which would give us a modified version of the above bounds with an additional additive term, accounting \(2\,{\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {g}}}(\mathcal{A}_{{\mathsf {g}}})\) for \(\mathcal{A}_{{\mathsf {g}}}\) as given in the reduction. This can easily be upper bounded (using the dedicated mu bound from Fig. 4) as

$$\begin{aligned} 2 \cdot {\mathsf {Adv}}^{\mathsf {prf}}_{{\mathsf {g}}}(\mathcal{A}_{{\mathsf {g}}}) \le \frac{u^2 + u(q_{\textsc {f}}+ q \ell )}{\left| {\mathcal{X}}\right| } \le \frac{q^2 + q (q_{\textsc {f}}+ q \ell )}{\left| {\mathcal{X}}\right| }. \end{aligned}$$