1 Introduction

Block Cipher-Based MACs. A Message Authentication Code (MAC) is a symmetric-key cryptographic function that ensures the authenticity of messages. A large family of MACs (such as CBC-MAC  [BKR00] or OMAC  [IK03]) are constructed as modes of operation of some underlying block cipher. They are often provably secure and reasonably efficient, however, they also have inherent limitations with respect to speed and security. First, such modes cannot process more than n bits of input per block cipher call, where n is the block-length (in bits) of the underlying block cipher. Second, most block cipher-based modes are secure only up to the so-called birthday bound (i.e., up to \(2^{n/2}\) message blocks), and very few proposals, such as PMAC_Plus  [Yas11], achieve security beyond the birthday bound (BBB), often at the cost of efficiency. For block ciphers with block-length 128, birthday-bound security can be deemed to low in many situations.

For these reasons, a recent popular trend has been to design modes of operation for a stronger primitive, namely tweakable block ciphers (TBCs). In comparison to traditional block ciphers, TBCs take an extra t-bit input called the tweak, and should behave as a family of \(2^t\) independent block ciphers indexed by the tweak. This primitive was formalized by Liskov et al. [LRW02] (even though the informal idea surfaced in several papers before), and turns out to be surprisingly flexible for building various cryptographic functionalities. A TBC can be either constructed in a generic way from a block cipher through a mode of operation such as XEX  [Rog04], or as a dedicated design such as Threefish  [FLS+10], SCREAM  [GLS+14], Deoxys-BC  [JNP14a], Joltik-BC  [JNP14b], KIASU-BC  [JNP14c], and SKINNY  [BJK+16], these last four examples following the so-called TWEAKEY framework [JNP14d].

The first construction of a parallelizableFootnote 1 MAC from a TBC is PMAC1  [Rog04], derived from the block cipher-based construction PMAC  [BR02] by abstracting the block cipher-based TBC implicitly used in PMAC. Assuming that the underlying TBC has n-bit blocks and t-bit tweaks, PMAC1 processes n bits of inputs per TBC call, handles messages of length up to (roughly) \(2^t\) n-bit blocks, and is secure up to the birthday bound (i.e., up to roughly \(2^{n/2}\) message blocks). This scheme is simple, efficient and fully parallelizable (all calls to the TBC except the final one can be made in parallel). For these reasons, it has been adopted for example by multiple TBC-based submissions to the CAESAR competition for Authenticated Encryption (AE), e.g. SCREAM  [GLS+14], Deoxys  [JNP14a], Joltik  [JNP14b], or KIASU  [JNP14c].

Several authors have proposed schemes that push security beyond the birthday bound. Naito [Nai15] proposed two constructions called PMAC_TBC1k and PMAC_TBC3k which are reminiscent from PMAC_Plus  [Yas11]. As PMAC1, they allow to process only n bits of inputs per TBC call, but their security is significantly higher than for PMAC1: they are secure up to roughly \(2^n\) message blocks. Recently, List and Nandi [LN17] proposed PMAC2x which extends the output size of Naito’s PMAC_TBC1k scheme from n to 2n bits without harming efficiency nor security. (They also proposed a minor modification of PMAC_TBC1k with n-bit outputs called PMACx.) We remark that Minematsu and Iwata [MI17] recently reported severe flaws in [LN17] (the ePrint version of [LN17] was subsequently updated in order to fix these flaws).

Our Contribution. We propose a new TBC-based MAC called \(\mathsf {ZMAC}\). As PMAC_TBC1k  [Nai15] or PMAC2x/PMACx  [LN17], it achieves BBB-security (as a variable-input-length PRF) and it is fully parallelizable. However, our proposal is more efficient than any of the previous schemes. Specifically, \(\mathsf {ZMAC}\) processes \(n+t\) bits of inputs per TBC call when using an n-bit block and t-bit tweak TBC, whereas previous schemes are limited to n bits of inputs per TBC call, independently of the tweak size (see Table 1 for a comparison with existing schemes). To the best of our knowledge, this is the first TBC-based MAC that exploits the full power of the tweak input of the underlying TBC. Note that an n-bit block, t-bit tweak TBC cannot handle more than \(n+t\) bits of public input per call, hence the efficiency of our construction is essentially optimal (a few tweak bits are reserved for domain separation but the impact is very limited). The tweak-length t of the TBC used in \(\mathsf {ZMAC}\) can be arbitrary, which is important since existing dedicated TBCs have various tweak-length, smaller (e.g. Threefish or KIASU-BC) or larger (e.g. Deoxys-BC or SKINNY) than the block-length n.

Main Ideas of Our Design. Our construction follows the traditional “UHF-then-PRF” paradigm: first, the message is hashed with a universal hash function (UHF), and the resulting output is given to a fixed-input-length PRF. Building a BBB-secure fixed-input-length PRF from a TBC is more or less straightforward (one can simply use the “XOR of permutations” construction, which has been extensively analyzed [Luc00, Pat08, Pat13, CLP14]). The most innovative part of our work lies in the design of our TBC-based UHF, which we call \(\mathsf {ZHASH}\). The structure of our proposal is reminiscent of Naito’s PMAC_TBC1k (and thus of PMAC_Plus) combined with the XTX tweak extension construction by Minematsu and Iwata [MI15]. We note that a TBC is often used to abstract a block cipher-based construction to simplify the security proof, for example in the case of PMAC and OCB  [Rog04], where one can prove the security of TBC-based abstraction and the construction of TBC itself separately. The TBC-based abstraction eliminates the handling of masks, which simplifies the security proof. That is, it is often the case that TBC-based constructions do not have masks, where the masks are treated as tweaks. With \(\mathsf {ZMAC}\), we take the opposite direction to the common approach. We restore the masks in the construction, and our scheme explicitly relies on the use of masks together with a TBC.

Application to Deterministic Authenticated Encryption. Following List and Nandi [LN17], we use \(\mathsf {ZMAC}\) to construct a (stateless) Deterministic Authenticated Encryption (DAE) scheme (i.e., a scheme whose security does not rely on the use of random IVs or noncesFootnote 2 [RS06]). The resulting scheme, called \(\mathsf {ZAE}\), is BBB-secure and very efficient: it processes on average \(n(n+t)/(2n+t)\) input bits per TBC call (this complex form comes from the fact that the MAC, resp. encryption part processes \(n+t\), resp. n input bits per TBC call). Note that when \(t=0\), this is (unsurprisingly) similar to standard double-pass block cipher-based DAE schemes (n/2 bits per block cipher call), but as t grows, efficiency approaches n bits per TBC calls, i.e., the efficiency of an online block cipher-based scheme (which cannot be secure in the DAE sense). We provide a comparison with other DAE schemes in Table 1. We emphasize that \(\mathsf {ZAE}\) is a mere combination of \(\mathsf {ZMAC}\) with a TBC-based encryption mode called \(\mathsf {IVCTRT} \) previously proposed in [PS16] through the SIV composition method [RS06]. Nevertheless, we think the proposal of a concrete DAE scheme based on \(\mathsf {ZMAC}\) is quite relevant here, and helps further illustrate the performance gains allowed by \(\mathsf {ZMAC}\) (see Table 3 in Sect. 6).

Table 1. Comparison of our designs \(\mathsf {ZMAC}\) and \(\mathsf {ZAE}\) with other MAC and DAE (a.k.a MRAE) schemes. Column “# bits per call” refers to the number of bits of input processed per primitive call. Notation: n is the block-length of the underlying BC/TBC, t is the tweak-length of the underlying TBC. NR denotes the nonce-respecting scenario.

Future Works. \(\mathsf {ZMAC}\) achieves optimal efficiency while providing full n-bit security (assuming \(t\ge n\)). For this reason, it seems that this mode cannot be substantially improved. However, it would be very interesting to study how \(\mathsf {ZMAC}\) ’s design can influence ad-hoc TBC constructions: if one could construct an efficient, BBB-secure n-bit block TBC with a very large tweak (something which has not been studied much yet), this would lead to extremely efficient MAC algorithms.

Organization. We give useful definitions in Sect. 2. Our new mode \(\mathsf {ZMAC}\) is defined in Sect. 3, and its security is analyzed in Sect. 4. Applications to Authenticated Encryption are presented in Sect. 5. Finally, a performance estimation for \(\mathsf {ZMAC}\) and \(\mathsf {ZAE}\) when Deoxys-BC or SKINNY are used to instantiate the TBC is provided in Sect. 6.

2 Preliminaries

Basic Notation. Let \(\{0,1\}^*\) be the set of all finite bit strings. For an integer \(n\ge 0\), let \(\{0,1\}^n\) be the set of all bit strings of length n, and \((\{0,1\}^n)^+\) be the set of all bit strings of length a (non-zero) positive multiple of n. For \(X\in \{0,1\}^*\), |X| is its length in bits, and for \(n\ge 1\), \(|X|_n=\lceil |X|/n\rceil \) is its length in n-bit blocks. The string of n zeros is denoted \(0^n\). The concatenation of two bit strings X and Y is written \(X\,\Vert \,Y\), or XY when no confusion is possible. For any \(X\in \{0,1\}^n\) and \(i\le n\), let \(\texttt {msb}_{i}(X)\), resp. \(\texttt {lsb}_{i}(X)\) be the first, resp. last i bits of X. For non-negative integers a and d with \(a\le 2^d-1\), let \(\texttt {str}_{d}(a)\) be the d-bit binary representation of a.

Given a bit string \(X\in \{0,1\}^{i+j}\), we write

$$ (X[1],X[2])\xleftarrow {\scriptscriptstyle i,j} X $$

where \(X[1]=\texttt {msb}_i(X)\) and \(X[2]=\texttt {lsb}_j(X)\). For \(X\in \{0,1\}^*\), we also define the parsing into fixed-length subsequences of length n, denoted

$$ (X[1],X[2],\dots ,X[m])\xleftarrow {\scriptscriptstyle n} X, $$

where \(m = |X|_n\), \(X[1]\,\Vert \,X[2]\,\Vert \,\dots \,\Vert \,X[m]=X\), \(|X[i]|=n\) for \(1\le i<m\) and \(0\le |X[m]|\le n\) when \(|X|>0\). When \(|X|=0\), we let \(X[1]\xleftarrow {\scriptscriptstyle n} X\), where X[1] is the empty string.

Let n and t be positive integers. For any \(X\in \{0,1\}^*\), we define the “one-zero padding” \(\texttt {ozp}({X})\) to be X if |X| is a positive multiple of \((n+t)\) and \(X\,\Vert \,10^c\) for \(c=|X| \bmod (n\,+\,t)-1\) otherwise. We stress that \(\texttt {ozp}({\cdot })\) is defined with respect to \((n+t)\)-bit blocks rather than n-bit blocks, and that the empty string is padded to \(10^{n+t-1}\).

For any \(X\in \{0,1\}^n\) and \(Y\in \{0,1\}^t\), we define

$$\begin{aligned} X \oplus _t Y \mathop {=}\limits ^{\tiny {\text {def}}}{\left\{ \begin{array}{ll} \texttt {msb}_t(X) \oplus Y &{} \text { if}~t \le ~n,\\ (X\,\Vert \,0^{t-n}) \oplus Y &{} \text { if}~t>~n. \end{array}\right. } \end{aligned}$$

Hence, \(|X \oplus _t Y|=t\) in both cases and if \(t=n\) then \(X \oplus _t Y = X\oplus Y\).

Given a non-empty set \(\mathcal{X}\), we let \(X\mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}}\mathcal{X}\) denote the draw of an element X uniformly at random in \(\mathcal{X}\).

Galois Field. An element a in the Galois field GF\((2^n)\) will be interchangeably represented as an n-bit string \(a_{n-1}\ldots a_1 a_0\), a formal polynomial \(a_{n-1}\mathtt {x}^{n-1}+\cdots + a_1\mathtt {x}+a_0\), or an integer \(\sum _{i=0}^{n-1}a_i 2^i\). Hence, by writing \(2\cdot a\) or 2a when no confusion is possible, we mean the multiplication of a by \(2=\mathtt {x}\). This operation is called doubling. For \(n=128\), we define the field \( GF (2^n)\) (as is standard) by the primitive polynomial \(\mathtt {x}^{128} + \mathtt {x}^7 + \mathtt {x}^2 + \mathtt {x}+ 1\). The doubling 2a over this field is \((a\ll 1)\) if \(\texttt {msb}_1(a)=0\) and \((a\ll 1)\oplus (0^{120}10000111)\) if \(\texttt {msb}_1(a)=1\), where \((a\ll 1)\) denotes the left-shift of a by one bit.

Keyed Functions and Modes. A keyed function with key space \(\mathcal{K}\), domain \(\mathcal{X}\), and range \(\mathcal{Y}\) is a function \(F:\mathcal{K}\times \mathcal{X}\rightarrow \mathcal{Y}\). We write \(F_K(X)\) for F(KX). If \(\textsf {Mode}\) is a mode of operation for F using a single key \(K\in \mathcal{K}\) for F, we write \(\textsf {Mode}[F_K]\) instead of \(\textsf {Mode}[F]_K\).

For any keyed function \(F:\mathcal{K}\times (\{0,1\}^n)^+\rightarrow \{0,1\}^a\) for some a, we define the collision probability of F as

$$\begin{aligned} \textsf {Coll} _{F}(n,m,m') \mathop {=}\limits ^{\tiny {\text {def}}}\max _{\begin{array}{c} M\in (\{0,1\}^n)^m\\ M'\in (\{0,1\}^n)^{m'}\\ M \ne M' \end{array}}\Pr [K\mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}}\mathcal{K}:F_K(M)=F_K(M')]. \end{aligned}$$

Tweakable Blockciphers. A tweakable blockcipher (TBC) is a keyed function \({\widetilde{E}}:\mathcal{K}\times \mathcal{T}\times \mathcal{M}\rightarrow \mathcal{M}\) such that for each \((K,T)\in \mathcal{K}\times \mathcal{T}\), \({\widetilde{E}}(K,T,\cdot )\) is a permutation over \(\mathcal{M}\). Here, K is the key and T is a public value called tweak. Note that a conventional block cipher is a TBC such that the tweak space \(\mathcal{T}\) is a singleton. The output \({\widetilde{E}}(K,T,X)\) of the encryption of \(X\in \mathcal{M}\) under key \(K\in \mathcal{K}\) and tweak \(T\in \mathcal{T}\) may also be written \({\widetilde{E}}_K(T,X)\) or \({\widetilde{E}}_K^T(X)\). Following [PS16], when the tweak space of \({\widetilde{E}}\) is \(\mathcal{T}_I=\mathcal{T}\times \mathcal{I}\) for some \(\mathcal{I}\subset \mathbb {N}\) and for some set \(\mathcal{T}\), we call \(\mathcal{T}\) the effective tweak space of \({\widetilde{E}}\), and we write \({\widetilde{E}}^i(K,T,X)\) to mean \({\widetilde{E}}(K,(T,i),X)\). By convention we also write \({\widetilde{E}}^i_K(T,X)\) or \({\widetilde{E}}^{i,T}_K(X)\). The set \(\mathcal{I}\) is typically a small set used to generate a small number of distinct TBC instances in the scheme, something we call domain separation. For \(T'=(T,i)\in \mathcal{T}_I\), we call \(i\in \mathcal{I}\) the domain separation integer of tweak \(T'\).

Random Primitives. Let \(\mathcal{X}\), \(\mathcal{Y}\) and \(\mathcal{T}\) be non-empty finite sets. Let \( Func (\mathcal{X},\mathcal{Y})\) be the set of all functions from \(\mathcal{X}\) to \(\mathcal{Y}\), and let \( Perm (\mathcal{X})\) be the set of all permutations over \(\mathcal{X}\). Moreover, let \( Perm ^\mathcal{T}(\mathcal{X})\) be the set of all functions \(f:\mathcal{T}\times \mathcal{X}\rightarrow \mathcal{X}\) such that for any \(T\in \mathcal{T}\), \(f(T,\cdot )\) is a permutation over \(\mathcal{X}\).

A uniform random function (URF) with domain \(\mathcal{X}\) and range \(\mathcal{Y}\), denoted \( \textsf {R} :\mathcal{X}\rightarrow \mathcal{Y}\), is a random function with uniform distribution over \( Func (\mathcal{X},\mathcal{Y})\). Similarly, a uniform random permutation (URP) over \(\mathcal{X}\), denoted \( \textsf {P} :\mathcal{X}\rightarrow \mathcal{X}\), is a random permutation with uniform distribution over \( Perm (\mathcal{X})\). An n-bit URP is a URP over \(\{0,1\}^n\). Finally, a tweakable URP (TURP) with tweak space \(\mathcal{T}\) and message space \(\mathcal{X}\), denoted \(\widetilde{ \textsf {P} }:\mathcal{T}\times \mathcal{X}\rightarrow \mathcal{X}\), is a random tweakable permutation with uniform distribution over \( Perm ^\mathcal{T}(\mathcal{X})\).

Security Notions. We recall standard security notions for (tweakable) block ciphers and keyed functions.

Definition 1

Let \({\widetilde{E}}:\mathcal{K}\times \mathcal{T}\times \mathcal{X}\rightarrow \mathcal{X}\) be a TBC, and let \(\mathcal{A}\) be an adversary with oracle access to a tweakable permutation whose goal is to distinguish \({\widetilde{E}}\) and a TURP \(\widetilde{ \textsf {P} }:\mathcal{T}\times \mathcal{X}\rightarrow \mathcal{X}\) by oracle access. The advantage of \(\mathcal{A}\) against the Tweakable Pseudorandom Permutation-security (or TPRP-security) of \({\widetilde{E}}\) is defined as

$$\begin{aligned} \texttt {Adv}^{ \texttt {tprp}}_{{\widetilde{E}}}(\mathcal{A}) \mathop {=}\limits ^{\tiny {\text {def}}}\left| \Pr [K\mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}}\mathcal{K}: \mathcal{A}^{{\widetilde{E}}_K} \Rightarrow 1] - \Pr [\widetilde{ \textsf {P} }\mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}} Perm ^\mathcal{T}(\mathcal{X}): \mathcal{A}^{\widetilde{ \textsf {P} }} \Rightarrow 1]\right| , \end{aligned}$$

where \(\mathcal{A}^{{\widetilde{E}}_K} \Rightarrow 1\) denotes the event that the final binary decision by \(\mathcal{A}\) is 1.

We remark that the above definition only allows \(\mathcal{A}\) to make encryption queries. If decryption queries are allowed, the corresponding notion is called Strong TPRP (or STPRP) security. In this paper, we only use TPRP-security for the TBC underlying our constructions. The standard PRP-security notion for conventional block ciphers is recovered by letting the tweak space \(\mathcal{T}\) be a singleton.

Definition 2

For \(F:\mathcal{K}\times \mathcal{X}\rightarrow \mathcal{Y}\), let \(\mathcal{A}\) be an adversary whose goal is to distinguish \(F_K\) and a URF \( \textsf {R} :\mathcal{X}\rightarrow \mathcal{Y}\) by oracle access. The advantage of \(\mathcal{A}\) against the PRF-security of F is defined as

$$\begin{aligned} \texttt {Adv}^{ \texttt {prf}}_{F}(\mathcal{A}) \mathop {=}\limits ^{\tiny {\text {def}}}\left| \Pr [K\mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}}\mathcal{K}: \mathcal{A}^{F_K} \Rightarrow 1] - \Pr [ \textsf {R} \mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}} Func (\mathcal{X},\mathcal{Y}): \mathcal{A}^{ \textsf {R} } \Rightarrow 1]\right| . \end{aligned}$$

Moreover, for any \(F:\mathcal{K}\times \mathcal{X}\rightarrow \mathcal{Y}\) and \(G:\mathcal{K}'\times \mathcal{X}\rightarrow \mathcal{Y}\), the advantage of \(\mathcal{A}\) in distinguishing F and G is defined as

$$\begin{aligned} \texttt {Adv}^{ \texttt {dist}}_{F,G}(\mathcal{A}) \mathop {=}\limits ^{\tiny {\text {def}}}\left| \Pr [K\mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}}\mathcal{K}: \mathcal{A}^{F_K} \Rightarrow 1] - \Pr [K'\mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}}\mathcal{K}': \mathcal{A}^{G_{K'}} \Rightarrow 1]\right| . \end{aligned}$$

When a cryptographic scheme (or a mode of operation) \(\textsf {Mode}\) uses a (T)BC of block-length n bits, the security bound (i.e., the best advantage for any adversary with fixed resources) is typically a function of the query complexity of the adversary (in terms of number q of queries or total number \(\sigma \) of queried blocks) and n. When this function reaches 1 for query complexity \(2^{n/2}\), we say that \(\textsf {Mode}\) is secure up to the birthday bound, since this typically arises from the birthday paradox on the block input of the (T)BC. Conversely, if the advantage is negligibly small for any adversary of query complexity \(2^{n/2}\), we say that \(\textsf {Mode}\) is secure beyond the birthday bound (BBB-secure).

3 Specification of \(\mathsf {ZMAC}\)

3.1 Overview

Let \({\widetilde{E}}:\mathcal{K}\times \mathcal{T}_I\times \{0,1\}^n\rightarrow \{0,1\}^n\) be a TBC with tweak space \(\mathcal{T}_I=\mathcal{T}\times \mathcal{I}\), where \(\mathcal{T}=\{0,1\}^t\) for some \(t>0\) and \(\mathcal{I}\supseteq \{0,1,\ldots ,9\}\). We present a construction of a PRF \(\mathsf {ZMAC} [\widetilde{E}]:\mathcal{K}\times \{0,1\}^*\rightarrow \{0,1\}^{2n}\) with variable-input-length and 2n-bit outputs based on \({\widetilde{E}}\).

The \(\mathsf {ZMAC}\) mode has the following properties, holding for any effective tweak size \(t>0\):

  1. 1.

    it uses a single key for calls to \({\widetilde{E}}\);

  2. 2.

    the calls to \({\widetilde{E}}\) are parallelizable;

  3. 3.

    it processes on average \(n+t\) input bits per TBC call;

  4. 4.

    it is provably secure as long as the total length \(\sigma \) of queries in \((n+t)\)-bit blocks is small compared with \(2^{\min \{n,(n+t)/2\}}\).

\(\mathsf {ZMAC}\) is more efficient than any previous TBC-based MAC, which process at most n bits per TBC call (e.g., when \(t=n\), \(\mathsf {ZMAC}\) is twice faster than PMAC1). We emphasize that any mode based on an n-bit block, t-bit tweak TBC can process at most \(n+t\) input bits per TBC call, thus \(\mathsf {ZMAC}\) ’s efficiency is essentially optimal if one wants to achieve any meaningful provable security, since otherwise there must be some part of the input which is not processed by the TBC.Footnote 3

Property 4 shows that the security of \(\mathsf {ZMAC}\) is beyond the birthday bound with respect to n. In particular, it is n-bit secure when \(t\ge n\). These properties demonstrate that \(\mathsf {ZMAC}\) is the first TBC-based MAC to fully use the power of the underlying TBC.

We specify \(\mathsf {ZMAC}\) with 2n-bit outputs, which will be useful for defining our BBB-secure DAE scheme in Sect. 5. However, if one simply wants an n-bit-secure MAC, one can truncate the output of \(\mathsf {ZMAC}\) to n bits (which saves two TBC calls in the finalization).

Design Rationale. The structure of \(\mathsf {ZMAC}\) has some similarities with previous BBB-secure TBC-based PRF constructions [Nai15, LN17]. However, there are several innovative features that make \(\mathsf {ZMAC}\) faster and n-bit secure.

The core idea of [Nai15, LN17] is to start from a TBC-based instantiation of PHASH, the UHF underlying PMAC  [Rog04]. PHASH is quite simple: it simply XORs together the encryptions \({\widetilde{E}}_K(i,M_i)\) of message blocks with the index i of the block as tweak. In order to obtain a 2n-bit output, some linear layer is applied to all encrypted blocks, as originally introduced by Yasuda [Yas11] in his PMAC_Plus block cipher-based PRF. This yields a 2n-bit message hash, to which some finalization function (a fixed-input-length PRF) is applied to obtain the final output.

Whereas the t-bit tweak in the previous schemes takes as input the index of each message block, we crucially use both the message space and the tweak space of the TBC to process \(n+t\) input bits in order to improve efficiency. The block index is incorporated via (a variant of) a tweak extension scheme called XTX  [MI15], which allows to efficiently update the block index with only two field doublings, somehow similarly to XEX  [Rog04].

The above trick, however, is not enough to achieve BBB-security. Since we process each \((n+t)\)-bit input block by one call to an n-bit output TBC, the input block and the output block are no longer in one-to-one correspondence. Yet the BBB-security of previous schemes (where each input block is n-bit) crucially relies on this fact (otherwise, one can find a collision with complexity \(2^{n/2}\), resulting in n/2-bit security). Fortunately, this problem can be solved by processing each \((n+t)\)-bit input block with a Feistel-like permutation involving one TBC call, and applying the linear layer to the output of this \((n+t)\)-bit permutation.

High-Level Structure of ZMAC. \(\mathsf {ZMAC}\) consists of a hashing part

$$ \mathsf {ZHASH} [{\widetilde{E}}]:\mathcal{K}\times (\{0,1\}^{n+t})^+\rightarrow \{0,1\}^{n+t} $$

and a finalization part

$$ \mathsf {ZFIN} [{\widetilde{E}}]:\mathcal{K}\times \{0,1\}^{n+t}\rightarrow \{0,1\}^{2n}. $$

Then, \(\mathsf {ZMAC} \) is defined as the composition of \(\mathsf {ZHASH} \) and \(\mathsf {ZFIN} \). When the input-length is not a positive multiple of \((n+t)\) bits, one-zero padding (into \((n+t)\)-bit blocks) is applied first. To separate inputs whose length is a positive multiple of \((n+t)\) bits or not, we use distinct domain separation integers in \(\mathsf {ZFIN} \).

The pseudocode for \(\mathsf {ZHASH}\), \(\mathsf {ZFIN}\), and \(\mathsf {ZMAC}\) is shown in Fig. 1. See Fig. 2 and Fig. 3 illustrating \(\mathsf {ZHASH}\) and \(\mathsf {ZFIN}\). Fig. 1 gives a unified specification that covers both cases \(t\le n\) and \(t>n\) (note that the only operation which differs in the two cases is the \(\oplus _t\) operation). We describe more informally \(\mathsf {ZHASH} \) separately for \(t\le n\) and \(t>n\), as well as \(\mathsf {ZFIN} \) in the following sections.

Fig. 1.
figure 1

Specification of \(\mathsf {ZMAC}\).

3.2 Specification of \(\mathsf {ZHASH} \) for the Case \(t \le N\)

We first define \(\mathsf {ZHASH} [{\widetilde{E}}]\) when \(t\le n\). For simplicity, we assume \(n+t\) is even. Before processing the input, \(\mathsf {ZHASH} [{\widetilde{E}}]\) computes two n-bit initial mask values \(L_{\ell }={\widetilde{E}}^9_K(0^t,0^n)\) and \(L_r={\widetilde{E}}^9_K(0^{t-1}1,0^n)\).

Given input \(X\in (\{0,1\}^{n+t})^+\), \(\mathsf {ZHASH} [{\widetilde{E}}]\) parses X into \((n+t)\)-bit blocks \((X[1],\dots ,X[m])\), parses each block X[i] as \(X_{\ell }[i] = \texttt {msb}_n(X[i])\) and \(X_{r}[i] = \texttt {lsb}_{t}(X[i])\), and computes, for \(i=1\) to m,

$$\begin{aligned} C_{\ell }[i]&= {\widetilde{E}}^8_K(2^{i-1}L_r \oplus _t X_{r}[i],2^{i-1}L_{\ell } \oplus X_{\ell }[i]), \end{aligned}$$
(1)
$$\begin{aligned} C_{r}[i]&= C_{\ell }[i]\oplus _t X_{r}[i] . \end{aligned}$$
(2)

Then \(\mathsf {ZHASH} [{\widetilde{E}}]\) computes two chaining values, \(U\in \{0,1\}^n\) and \(V\in \{0,1\}^t\) defined as

$$\begin{aligned} U&= \bigoplus _{i=1}^m 2^{m-i+1}C_{\ell }[i], \\ V&= \bigoplus _{i=1}^m C_{r}[i]. \end{aligned}$$

The final output is (UV).

As shown in Fig. 1, the field doublings are computed in an incremental manner. Specifically, \(\mathsf {ZHASH} [{\widetilde{E}}]\) needs one call to \({\widetilde{E}}\) and three \( GF (2^n)\) doublings to process an \((n+t)\)-bit block, plus two pre-processing calls to \({\widetilde{E}}\). Obviously, the calls to \({\widetilde{E}}\) are parallelizable.

Fig. 2.
figure 2

The \(\mathsf {ZHASH}\) hash function.

3.3 Specification of \(\mathsf {ZHASH} \) for the Case \(t > n\)

The hashing scheme \(\mathsf {ZHASH} [{\widetilde{E}}]\) for the case \(t > n\) is defined as follows (the two internal masks \(L_{\ell }\) and \(L_r\) are derived and incremented in the same way as in the case \(t\le n\)).

  • The input X is parsed into \((n+t)\)-bit blocks as in the case \(t\le n\), and each block is further parsed into n, n, and \(t-n\) bit-blocks;

  • The first and second n-bit sub-blocks are processed in the same way as in the case \(t=n\). The third \((t-n)\)-bit sub-block is directly fed to the tweak input of the TBC as the last \((t-n)\) bits of effective tweak;

  • The output consists of two checksums, \(U\in \{0,1\}^n\) and \(V\in \{0,1\}^t\), where \((U,\texttt {msb}_n(V))\) corresponds to the output for the case \(t= n\), and \(\texttt {lsb}_{t-n}(V)\) corresponds to the sum of all third \((t-n)\)-bit sub-blocks.

Hence, the computation of V is just written as the sum of all \(C_r\) blocks in the unified specification of Fig. 1, since the last \((t-n)\) bits of \(C_r[i]\) only contains the last \((t-n)\) bits of the input block X[i].

3.4 Finalization

The finalization function, denoted by \(\mathsf {ZFIN} [{\widetilde{E}}]\), takes the output of \(\mathsf {ZHASH} [{\widetilde{E}}]\), \((U,V)\in \{0,1\}^n\times \{0,1\}^t\), and generates a 2n-bit output. It is defined as

$$ \mathsf {ZFIN} [{\widetilde{E}}_K](i,U,V) = ({\widetilde{E}}_K^{i}(U,V)\oplus {\widetilde{E}}_K^{i+1}(U,V)\,\Vert \,{\widetilde{E}}_K^{i+2}(U,V)\oplus {\widetilde{E}}_K^{i+3}(U,V)), $$

where the first argument i is a non-negative integer used for domain separation. Note that if \(|i-j|\ge 4\), domain separation integers used for TBC calls in \(\mathsf {ZFIN} [{\widetilde{E}}_K](i,\cdot ,\cdot )\) and in \(\mathsf {ZFIN} [{\widetilde{E}}_K](j,\cdot ,\cdot )\) are distinct. We use \(i=0\) when no padding is applied, i.e., when \(M\in (\{0,1\}^{n+t})^+\), and \(i=4\) otherwise.

Fig. 3.
figure 3

The \(\mathsf {ZFIN}\) finalization function.

We remark that \(\mathsf {ZFIN} \) is close but not identical to finalization functions used in previous works [Nai15, LN17]. For example, Naito [Nai15] employed \({\widetilde{E}}_K^{i}(U,V)\oplus {\widetilde{E}}_K^{i+1}(V,U)\) for building a PRF with n-bit outputs. One potential advantage of \(\mathsf {ZFIN} \) over using two independent instances of Naito’s construction is that \(\mathsf {ZFIN} \) can be faster if the algorithm of \({\widetilde{E}}\) allows to leverage on the similarity of inputs for computing \({\widetilde{E}}_K^{i}(U,V)\) and \({\widetilde{E}}_K^{i+1}(U,V)\).

4 The PRF Security of \(\mathsf {ZMAC}\)

4.1 XT Tweak Extension

Our first step is to recast the use of masks \(2^{i-1}L_{\ell }\) and \(2^{i-1}L_r\) as a way to extend the tweak space of \({\widetilde{E}}\). More specifically, we observe that the “core” construction of \(\mathsf {ZHASH}\) in Eq. (1),

$$\begin{aligned} ((T,i),X) \mapsto \widetilde{E}^8_K(2^{i-1}L_r \oplus _t T,2^{i-1}L_{\ell } \oplus X), \end{aligned}$$
(3)

keyed by \((K,(L_{\ell },L_r))\), is an instantiation of a CPA-secure variant of a tweak extension scheme called XTX proposed in [MI15], which allows to extend the tweak space of \(\widetilde{E}^8\) from \(\mathcal{T}=\{0,1\}^t\) to \(\mathcal{T}_J=\mathcal{T}\times \mathcal{J}\) with \(\mathcal{J}=\{1,\dots ,2^n-1\}\). Following the naming convention for XE and XEX by Rogaway [Rog04] which defines CPA- and CCA-secure TBCs based on a block cipher, we use XT to denote the CPA-secure variant of XTX without output mask.

In order to describe the XT construction, we need the notion of partial AXU hash function introduced by [MI15].

Definition 3

Let \(H:\mathcal{L}\times \mathcal{X}\rightarrow \mathcal{Y}\) be a keyed function with key space \(\mathcal{L}\), domain \(\mathcal{X}\), and range \(\mathcal{Y}=\{0,1\}^n\times \{0,1\}^t\). We say that H is \((n,t,\epsilon )\)-partial almost-XOR-universal (\((n,t,\epsilon )\)-pAXU) if for any \(X\ne X'\), one has

$$\begin{aligned} \max _{\delta \in \{0,1\}^n}\Pr [L\mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}}\mathcal{L}\,:\, H_L(X)\oplus H_L(X')=(\delta ,0^t)]\le \epsilon . \end{aligned}$$

Now define the \(\textsf {XT} \) tweak extension scheme. Let \(\widetilde{E}:\mathcal{K}\times \mathcal{T}\times \{0,1\}^n \rightarrow \{0,1\}^n\) be a TBC with tweak space \(\mathcal{T}=\{0,1\}^t\) and let \(H:\mathcal{L}\times \mathcal{T}' \rightarrow \mathcal{Y}\) be a keyed function with range \(\mathcal{Y}=\{0,1\}^n \times \{0,1\}^t\). Let \( \textsf {XT} [\widetilde{E},H]\) be the TBC with key space \(\mathcal{K}\times \mathcal{L}\), tweak space \(\mathcal{T}'\), and message space \(\{0,1\}^n\) defined as

$$\begin{aligned} \textsf {XT} [\widetilde{E},H]_{K,L}(T',X) = \widetilde{E}_K(Z_r,Z_{\ell } \oplus X) \text { where } H_L(T') = (Z_{\ell },Z_r). \end{aligned}$$
(4)

The following lemma characterizes the security of \( \textsf {XT} [\widetilde{ \textsf {P} },H]\) where \(\widetilde{E}\) is replaced by a TURP \(\widetilde{ \textsf {P} }\). It is similar to [MI15, Theorem 1] and its proof is deferred to the full version of the paper.

Lemma 1

Let \( \textsf {XT} [\widetilde{ \textsf {P} },H]\) be defined as above, where \(\widetilde{ \textsf {P} }:\mathcal{T}\times \{0,1\}^n \rightarrow \{0,1\}^n\) is a TURP and H is \((n,t,\epsilon )\)-pAXU. Then, for any adversary \(\mathcal{A}\) making at most q queries, one has

$$ \texttt {Adv}^{ \texttt {tprp}}_{ \textsf {XT} [\widetilde{ \textsf {P} },H]}(\mathcal{A}) \le \frac{q^2 \epsilon }{2}. $$

Assume for a moment that \(L_{\ell }={\widetilde{E}}^9_K(0^t,0^n)\) and \(L_r={\widetilde{E}}^9_K(0^{t-1}1,0^n)\) are uniformly random (this will hold once the TBC underlying \(\mathsf {ZMAC}\) has been replaced by a TURP later in the security proof). Consider the function H with key space \(\{0,1\}^n \times \{0,1\}^n\), domain \(\mathcal{T}_J=\mathcal{T}\times \mathcal{J}\) with \(\mathcal{J}=\{1,\dots ,2^n-1\}\), and range \(\{0,1\}^n\times \{0,1\}^t\) defined as

$$\begin{aligned} H_{(L_{\ell },L_r)}(T,i)=(2^{i-1}L_{\ell },2^{i-1}L_r \oplus _t T). \end{aligned}$$
(5)

Then observe that the construction of Eq. (3) is exactly \( \textsf {XT} [\widetilde{E}^8,H]\) with H defined as above. We prove that H is pAXU in the following lemma.

Lemma 2

Let H be defined as in Eq. (5). Then H is \((n,t,1/2^{n+\min \{n,t\}})\)-pAXU.

Proof

Assume first that \(t\le n\). Then, by definition of \(\oplus _t\), one has

$$ H_{(L_{\ell },L_r)}(T,i)=(2^{i-1}L_{\ell },\texttt {msb}_t(2^{i-1}L_r) \oplus T). $$

Hence, we must upper bound

$$ p\mathop {=}\limits ^{\tiny {\text {def}}}\Pr _{(L_{\ell },L_r)}\left[ \Big ((2^{i-1}+2^{j-1})L_{\ell },\texttt {msb}_t((2^{i-1}+2^{j-1})L_r)\oplus T\oplus T'\Big ) = (\delta ,0^t)\right] $$

for any distinct inputs \((T,i),(T',j)\in \mathcal{T}_J\) and any \(\delta \in \{0,1\}^n\).

If \(i=j\), then necessarily \(T\ne T'\), and hence

$$ \texttt {msb}_t((2^{i-1}+2^{j-1})L_r)\oplus T\oplus T' = T\oplus T' \ne 0^t. $$

Thus the probability p is zero.

If \(i\ne j\), then \(2^{i-1}\ne 2^{j-1}\). Therefore, \(2^{i-1}+2^{j-1}\) is a non-zero element over \( GF (2^n)\) and thus

$$\begin{aligned} p&=\Pr _{(L_{\ell },L_r)}[(2^{i-1}+2^{j-1})L_{\ell } = \delta ,\, \texttt {msb}_t((2^{i-1}+2^{j-1})L_r)\oplus T\oplus T' = 0^t] \\&= \Pr _{(L_{\ell },L_r)}[(2^{i-1}+2^{j-1})L_{\ell } = \delta ,\, \texttt {msb}_t((2^{i-1}+2^{j-1})L_r) = T\oplus T' ]\\&= \frac{1}{2^{n}}\cdot \frac{1}{2^{t}} = \frac{1}{2^{n+t}}. \end{aligned}$$

For the case \(t>n\), observe that by definition of \(\oplus _t\),

$$ H_{(L_{\ell },L_r)}(T,i) = (2^{i-1}L_{\ell },(2^{i-1}L_r\,\Vert \,0^{t-n})\oplus T). $$

Hence, we can use the previous analysis for the special case \(t=n\), so that p is at most \(1/2^{2n}\). In all cases, p is at most \(1/2^{n+\min \{n,t\}}\).    \(\square \)

Combining Lemmas 1 and 2, we obtain the following for the construction of Eq. (3) when \(\widetilde{E}^8_K\) is replaced by a TURP.

Lemma 3

Let \( \textsf {XT} [\widetilde{ \textsf {P} },H]\) be defined as in Eq. (4) where \(\widetilde{ \textsf {P} }:\mathcal{T}\times \{0,1\}^n \rightarrow \{0,1\}^n\) is a TURP and H is defined as in Eq. (5). Then, for any adversary making at most q queries,

$$ \texttt {Adv}^{ \texttt {tprp}}_{ \textsf {XT} [\widetilde{ \textsf {P} },H]}(\mathcal{A}) \le \frac{q^2}{2^{n+\min \{n,t\}+1}}. $$

4.2 Collision Probability of \(\mathsf {ZHASH}\)

Let \({\widetilde{E}}':\mathcal{K}'\times \mathcal{T}_J\times \{0,1\}^n \rightarrow \{0,1\}^n\) be a TBC with tweak space \(\mathcal{T}_J=\mathcal{T}\times \mathcal{J}\) where \(\mathcal{T}=\{0,1\}^t\) and \(\mathcal{J}=\{1,\dots , 2^{n}-1\}\) as before. We define \(\mathbb {ZHASH}[{\widetilde{E}}']\) as shown in Fig. 4 and depicted in Fig. 5. Note that, assuming that masking keys \(L_{\ell }\) and \(L_r\) are uniformly random rather than derived through \(\widetilde{E}_K^9\), \(\mathsf {ZHASH} [\widetilde{E}]\) is exactly \(\mathbb {ZHASH}[ \textsf {XT} [\widetilde{E}^8,H]]\), with H defined as in Eq. (5).

Let \(\widetilde{ \textsf {P} }_J:\mathcal{T}_J\times \{0,1\}^n\rightarrow \{0,1\}^n\) be a TURP. The following lemma plays a central role in our security proof.

Lemma 4

For any \(m,m'\le 2^{\min \{n,(n+t)/2\}}\), we have

$$ \textsf {Coll} _{\mathbb {ZHASH}[\widetilde{ \textsf {P} }_J]}(n+t,m,m') \le \frac{4}{2^{n+\min \{n,t\}}}. $$

Proof

Without loss of generality, we assume \(m\le m'\). Let \(X=(X[1],\dots ,X[m])\) and \(X'=(X'[1],\dots ,X'[m'])\) be two distinct messages of \((n+t)\)-bit blocks. Let \((U,V)=\mathbb {ZHASH}[\widetilde{ \textsf {P} }_J](X)\) and \((U',V')=\mathbb {ZHASH}[\widetilde{ \textsf {P} }_J](X')\) be the outputs. We define \(X_{r}[i]\), \(X_{\ell }[i]\), \(C_{\ell }[i]\), and \(C_{r}[i]\) following Fig. 4 augmented with the loop index i. Let \(\varDelta U=U\oplus U'\), \(\varDelta V = V\oplus V'\), etc. A collision of \(\mathbb {ZHASH}[\widetilde{ \textsf {P} }_J]\) outputs is equivalent to \((\varDelta U,\varDelta V)=(0^n,0^t)\).

We perform a case analysis. We first focus on the case \(t\le n\), and consider four sub-cases.

Fig. 4.
figure 4

Pseudocode for the \(\mathbb {ZHASH}\) construction using \({\widetilde{E}}':\mathcal{K}'\times \mathcal{T}_J\times \{0,1\}^n\rightarrow \{0,1\}^n\) with \(\mathcal{T}_J=\{0,1\}^t\times \{1,2,\dots ,2^n-1\}\).

Fig. 5.
figure 5

The \(\mathbb {ZHASH}\) hash function.

 

Case 1::

\(m=m'\), \(\exists h\in \{1,\dots ,m\}\), \(X[h] \ne X'[h]\), \(X[i]=X'[i]\) for \(\forall i\ne h\). Then we have

$$\begin{aligned} \varDelta U&= \bigoplus _{1\le i\le m}2^{m-i+1}\varDelta C_{\ell }[i] = 2^{m-h+1}\varDelta C_{\ell }[h],\\ \varDelta V&= \bigoplus _{1\le j\le m}\varDelta C_{r}[j] = \varDelta C_{r}[h]. \end{aligned}$$

Since the mapping \((X_{\ell }[i],X_{r}[i])\mapsto (C_{\ell }[i],C_{r}[i])\) is a permutation, we have \((C_{\ell }[h],C_{r}[h])\ne (C_{\ell }'[h],C_{r}'[h])\) and thus we have either \(\varDelta C_{\ell }[h]\ne 0^n\) or \(\varDelta C_{r}[h]\ne 0^t\). This implies \(\varDelta U\ne 0^n\) or \(\varDelta V\ne 0^t\).

Case 2::

\(m=m'\), \(\exists h,s\in \{1,\dots ,m\}\), \(h\ne s\), \(X[h] \ne X'[h]\), \(X[s] \ne X'[s]\). Then we have

$$\begin{aligned} \varDelta U&= 2^{m-h+1}\varDelta C_{\ell }[h] \oplus 2^{m-s+1}\varDelta C_{\ell }[s] \oplus \underbrace{\bigoplus _{\begin{array}{c} 1\le i\le m\\ i\ne h,s \end{array}}2^{m-i+1}\varDelta C_{\ell }[i]}_{\varDelta _1}, \\ \varDelta V&= \varDelta C_{r}[h] \oplus \varDelta C_{r}[s] \oplus \underbrace{\bigoplus _{\begin{array}{c} 1\le i\le m\\ i\ne h,s \end{array}}\varDelta C_{r}[i]}_{\varDelta _2}. \end{aligned}$$

Observe that \(\varDelta _1\) and \(\varDelta _2\) are functions of variables of the form \(\widetilde{ \textsf {P} }_J((T,i),X'')\) where \(i\notin \{h,s\}\) and T and \(X''\) are determined by X and \(X'\). In particular, by definition of a TURP, they are independent (as random variables) from the other terms in the two right-hand sides. Hence, letting \(\lambda _h=2^{m-h+1}\) and \(\lambda _s=2^{m-s+1}\), and using that since \(t\le n\), \(C_r[i]=\texttt {msb}_t(C_{\ell }[i])\oplus X_r[i]\), we have

$$\begin{aligned} \left\{ \begin{array}{l} \varDelta U=0^n\\ \varDelta V=0^t \end{array} \right.&\Longleftrightarrow \left\{ \begin{array}{l} {\lambda _h}\varDelta C_{\ell }[h] \oplus {\lambda _s}\varDelta C_{\ell }[s] = \varDelta _1\\ \varDelta C_{r}[h] \oplus \varDelta C_{r}[s] = \varDelta _2 \end{array} \right. \\&\Longleftrightarrow \left\{ \begin{array}{l} {\lambda _h}\varDelta C_{\ell }[h] \oplus {\lambda _s}\varDelta C_{\ell }[s] = \varDelta _1\\ \texttt {msb}_t(\varDelta C_{\ell }[h])\oplus \varDelta X_{r}[h] \oplus \texttt {msb}_t(\varDelta C_{\ell }[s])\oplus \varDelta X_{r}[s] = \varDelta _2 \end{array} \right. \\&\Longleftrightarrow \left\{ \begin{array}{l} {\lambda _h}\varDelta C_{\ell }[h] \oplus {\lambda _s}\varDelta C_{\ell }[s] = \varDelta _1\\ \texttt {msb}_t(\varDelta C_{\ell }[h] \oplus \varDelta C_{\ell }[s]) = \varDelta _2 \oplus \varDelta X_{r}[h] \oplus \varDelta X_{r}[s]. \end{array} \right. \end{aligned}$$

Hence, it follows that

$$\begin{aligned} \Pr \left[ \begin{array}{l} \varDelta U=0^n\\ \varDelta V=0^t \end{array} \right]&\le \max _{\begin{array}{c} \delta _1\in \{0,1\}^n\\ \delta _2\in \{0,1\}^t \end{array}} \Pr \left[ \begin{array}{l} {\lambda _h}\varDelta C_{\ell }[h] \oplus {\lambda _s}\varDelta C_{\ell }[s] = \delta _1\\ \texttt {msb}_t(\varDelta C_{\ell }[h]\oplus \varDelta C_{\ell }[s]) = \delta _2 \end{array} \right] \\&\le \max _{\begin{array}{c} \delta _1\in \{0,1\}^n\\ \delta _2\in \{0,1\}^t \end{array}} \sum _{\begin{array}{c} \delta _3\in \{0,1\}^{n}\\ \texttt {msb}_t(\delta _3)=\delta _2 \end{array}} \Pr \left[ \begin{array}{l} {\lambda _h}\varDelta C_{\ell }[h] \oplus {\lambda _s}\varDelta C_{\ell }[s] = \delta _1\\ \varDelta C_{\ell }[h]\oplus \varDelta C_{\ell }[s] = \delta _3 \end{array} \right] . \end{aligned}$$

Observe that since \(h\ne s\), \(\lambda _h \oplus \lambda _s \ne 0\) and the linear system inside the last probability above has a unique solution for any pair \((\delta _1,\delta _3)\), namely

$$\begin{aligned} \varDelta C_{\ell }[h]&= ({\lambda _s}\delta _3 \oplus \delta _1)/({\lambda _h}\oplus {\lambda _s})\\ \varDelta C_{\ell }[s]&= \delta _3 \oplus ({\lambda _s}\delta _3 \oplus \delta _1)/({\lambda _h}\oplus {\lambda _s}). \end{aligned}$$

Moreover, the random variables \(\varDelta C_{\ell }[h]\) and \(\varDelta C_{\ell }[s]\) are independent (as they involve distinct tweaks) and their probability distributions are uniform over either \(\{0,1\}^n\) or \(\{0,1\}^n\setminus \{0^n\}\), implying that their point probabilities are at most \(1/(2^{n}-1)\). Hence,

$$\begin{aligned} \Pr \left[ \begin{array}{l} \varDelta U=0^n\\ \varDelta V=0^t \end{array} \right]&\le \max _{\begin{array}{c} \delta _1\in \{0,1\}^n\\ \delta _2\in \{0,1\}^t \end{array}} \sum _{\begin{array}{c} \delta _3\in \{0,1\}^{n}\\ \texttt {msb}_t(\delta _3)=\delta _2 \end{array}} \frac{1}{(2^n-1)^2}\\&\le \frac{2^{n-t}}{(2^{n}-1)^2} \le \frac{4\cdot 2^{n-t}}{2^{2n}} \le \frac{4}{2^{n+t}}. \end{aligned}$$
Case 3::

\(m'= m+1\). Then, isolating the terms corresponding to block indices m and \(m+1\), we have

$$\begin{aligned} \varDelta U&= \bigoplus _{1\le i \le m} 2^{m-i+1} C_{\ell }[i] \oplus \bigoplus _{1\le i \le m+1} 2^{m+1-i+1} C'_{\ell }[i]\\&= 2 (C_{\ell }[m] + 2 C'_{\ell }[m] + C'_{\ell }[m+1] \oplus \varDelta _1) \end{aligned}$$

and

$$\begin{aligned} \varDelta V&= \bigoplus _{1\le \le m} C_r[i] \oplus \bigoplus _{1\le i \le m+1} C'_r[i] \\&= \texttt {msb}_t(C_{\ell }[m] + C'_{\ell }[m] + C'_{\ell }[m+1]) \oplus \varDelta _2, \end{aligned}$$

where \(\varDelta _1\) and \(\varDelta _2\) are independent (as random variables) from \(C_{\ell }[m]\), \(C'_{\ell }[m]\), and \(C'_{\ell }[m+1]\). Hence, exactly as for Case 2, the probability that \(\varDelta U=0^n\) and \(\varDelta V=0^t\) is at most

$$ \max _{\begin{array}{c} \delta _1\in \{0,1\}^n\\ \delta _2\in \{0,1\}^t \end{array}} \sum _{\begin{array}{c} \delta _3\in \{0,1\}^{n}\\ \texttt {msb}_t(\delta _3)=\delta _2 \end{array}} \Pr \left[ \begin{array}{l} C_{\ell }[m] + 2 C'_{\ell }[m] + C'_{\ell }[m+1] = \delta _1\\ C_{\ell }[m] + C'_{\ell }[m] + C'_{\ell }[m+1] = \delta _3 \end{array} \right] . $$

Letting \(Y=C_{\ell }[m]+C'_{\ell }[m+1]\) and \(Z=C'_{\ell }[m]\), the linear system in the probability above becomes

$$ \left\{ \begin{array}{l} Y + 2Z = \delta _1\\ Y+Z = \delta _3, \end{array} \right. $$

which has a unique solution over \( GF (2^n)\) for any pair \((\delta _1,\delta _3)\). Note that Y and Z are uniformly random and independent (since Y involves domain separation integer \(m+1\) but Z does not) and hence, the system is satisfied with probability \(1/2^{2n}\). Therefore,

$$ \Pr \left[ \begin{array}{l} \varDelta U=0^n\\ \varDelta V=0^t \end{array} \right] \le \max _{\begin{array}{c} \delta _1\in \{0,1\}^n\\ \delta _2\in \{0,1\}^t \end{array}} \sum _{\begin{array}{c} \delta _3\in \{0,1\}^{n}\\ \texttt {msb}_t(\delta _3)=\delta _2 \end{array}} \frac{1}{2^{2n}} = \frac{1}{2^{n+t}}. $$
Case 4::

\(m' \ge m+2\). Then, isolating terms corresponding to block indices \(m'\) and \(m'-1\), we have

$$\begin{aligned} \varDelta U&= 2(2C_{\ell }'[m'-1] \oplus C_{\ell }'[m']\oplus \varDelta _1 ),\\ \varDelta V&= \texttt {msb}_t(C_{\ell }'[m'-1] \oplus C_{\ell }'[m']) \oplus \varDelta _2, \end{aligned}$$

where \(\varDelta _1\) and \(\varDelta _2\) are independent of \(C_{\ell }'[m'-1]\) and \(C_{\ell }'[m']\). Moreover, \(C_{\ell }'[m'-1]\) and \(C_{\ell }'[m']\) are independent and uniformly random. Letting \(Y=C_{\ell }'[m']\) and \(Z=C_{\ell }'[m'-1]\), we can apply the same analysis as for Case 3, and therefore, the collision probability is at most \(1/2^{n+t}\).

  In the above analysis, the collision probability is bounded by \(4/2^{n+t}\) for all cases, which proves the lemma for the case \(t\le n\).

We next consider the case \(t>n\). We let \(\overline{X}_w[i] = \texttt {lsb}_{t-n}(X[i])\) and \(\overline{X}_r[i] = \texttt {lsb}_n(\texttt {msb}_{2n}(X[i]))\), i.e., the \((n+1)\)-th to 2n-th bits of X[i]. For \(V\in \{0,1\}^{t}\), let \(\overline{V}=\texttt {msb}_n(V)\) and \(\overline{W}=\texttt {lsb}_{t-n}(V)\), thus \(V=(\overline{V}\,\Vert \,\overline{W})\). The corresponding variables are also defined for \(X'\).

We first focus on the case \(m=m'\). When \(\overline{X}_w[i]=\overline{X}'_w[i]\) for all \(1\le i\le m\), the analysis is the same as the case \(t\le n\), since for each i-th input block, \(\widetilde{ \textsf {P} }_J\) takes exactly the same values (between X and \(X'\)) for the last \((t-n)\)-bit of \(\mathcal{T}\). Thus the output collision probability (in particular, the first 2n-bit of output (UV)) is at most \(4/2^{2n}\).

If there exists an index i such that \(\overline{X}_w[i]\ne \overline{X}_w'[i]\) and \(\overline{X}_w[j]= \overline{X}_w'[j]\) for all \(j\ne i\), we have \(\varDelta \overline{W}\ne 0^{t-n}\), that is, the non-zero difference in the last \((t-n)\) bits of \(\varDelta V\). Hence the collision probability is zero.

If there exist two (or more) distinct indices ij such that \(\overline{X}_w[i]\ne \overline{X}_w'[i]\) and \(\overline{X}_w[j]\ne \overline{X}_w'[j]\), the analysis is almost the same as (the Case 2 of) the case \(t\le n\). The collision probability of (UV) is at most \(1/2^{2n}\).

Finally, we consider the case \(m<m'\). For both \(m'=m+1\) and \(m'\ge m+2\), we can apply the same arguments as the corresponding cases for \(t\le n\) and the collision probability of (UV) is at most \(1/2^{2n}\). Summarizing, the collision probability of (UV) is at most \(4/2^{2n}\).    \(\square \)

We remark that because of Case 1 when \(t\le n\), \(\mathbb {ZHASH}[\widetilde{ \textsf {P} }_J]\) is not almost XOR universal (i.e., the output differential probability is not guaranteed to be small).

4.3 PRF Security of Finalization

We prove that \(\mathsf {ZFIN} \) is a fixed-input-length PRF with n-bit security. The key observation is that, given \(V\in \{0,1\}^t\), \(\mathsf {ZFIN} \) is reduced to a pair of independent instances of the sum of two independent random permutations, also called \({ \textsf {SUM} }2\) by Lucks [Luc00]. More precisely, let \({ \textsf {SUM} }2\) be a function that maps n-bit input to n-bit output, such that \({ \textsf {SUM} }2(X) \mathop {=}\limits ^{\tiny {\text {def}}} \textsf {P} _1(X)\oplus \textsf {P} _2(X)\) for \(X\in \{0,1\}^n\), using two independent n-bit URPs \( \textsf {P} _1\) and \( \textsf {P} _2\).

On input (UV), each n-bit output in \(\mathsf {ZFIN} \) is equivalent to \({ \textsf {SUM} }2(U)\) for two independent n-bit URPs \( \textsf {P} _1\) and \( \textsf {P} _2\), and the sampling of the pair of these URPs is independent for each \(V\in \{0,1\}^t\) and for the output blocks, thanks to the domain separation.

\({ \textsf {SUM} }2\) has been actively studied and BBB bounds have been proved [Luc00, BI99]. Among them, Patarin [Pat08, Pat13] has proved that

$$ \texttt {Adv}^{ \texttt {prf}}_{{ \textsf {SUM} }2}(\mathcal{A}) \le O\left( \frac{q}{2^n}\right) , $$

for any adversary \(\mathcal{A}\) using q queries. However, the constant is not known in the literature. Here, following [PS16], we propose a well-accepted conjecture that \({ \textsf {SUM} }2\) is an n-bit secure PRF with a small constant.

Conjecture 1

For any adversary with q queries, \( \texttt {Adv}^{ \texttt {prf}}_{{ \textsf {SUM} }2}(\mathcal{A}) \le C q/2^n\) holds for some small constant \(C>0\).

For \(i\in \{0,4\}\), we let \(\mathsf {ZFIN} _i[\widetilde{E}_K](U,V)=\mathsf {ZFIN} [\widetilde{E}_K](i,U,V)\). Based on Conjecture 1, the following lemma gives the PRF security of \(\mathsf {ZFIN} _i\) in the information-theoretic setting, i.e., when \(\widetilde{E}_K\) is replaced by a TURP \(\widetilde{ \textsf {P} }_I:\mathcal{T}_I\times \{0,1\}^n \rightarrow \{0,1\}^n\).

Lemma 5

Let \(\mathcal{A}\) be an adversary against the PRF-security of \(\mathsf {ZFIN} _i[\widetilde{ \textsf {P} }_I]\) making at most q queries. Then, for \(i\in \{0,4\}\), we have

$$ \texttt {Adv}^{ \texttt {prf}}_{\mathsf {ZFIN} _i[\widetilde{ \textsf {P} }_I]}(\mathcal{A}) \le \sum _{T\in \{0,1\}^t} \frac{2C q_T}{2^n} \le \frac{2C q}{2^n}, $$

where \(q_T\) denotes the number of queries with \(V=T\).

The proof is obtained by the standard hybrid argument and an observation that adaptive choice of \(q_T\)’s does not help. Lemma 5 shows that \(\mathsf {ZFIN} \) is a parallelizable and n-bit secure PRF with \((n+t)\)-bit inputs using a TBC with n-bit blocks and t-bit tweaks.

Alternative Constructions. We could build the finalization function from [CDMS10, Min09]. Coron et al. [CDMS10] proposed a 2n-bit SPRP construction using 3 TBC calls of n-bit block and tweak, and Minematsu [Min09] proposed a 2n-bit SPRP construction using 2 TBC calls with two \( GF (2^n)\) multiplications. Both constructions achieve n-bit security with small constants. As they are also n-bit secure 2n-bit PRFs (via standard PRP-PRF switching), we could use them. However, they are totally serial, hence if input to MAC is short (say 64 bytes) and we have a parallel TBC computation unit, this choice of finalization will be quite slower than \(\mathsf {ZFIN} \).

We could also use CENC by Iwata [Iwa06]. In a recent work by Iwata et al. [IMV16], it is shown that \( \textsf {P} (X\,\Vert \,0)\oplus \textsf {P} (X\,\Vert \,1)\) for \(X\in \{0,1\}^{n-1}\), called \(\textsf {XORP}[1]\), achieves n-bit PRF-security with constant 1, by making explicit that this was in fact already proved by Patarin [Pat10]. However, we think the finalization based on this construction would be slightly more complex than ours.

4.4 PRF Security of \(\mathsf {ZMAC}\)

We are now ready to state and prove the security result for \(\mathsf {ZMAC}\).

Theorem 1

Let \(\mathcal{A}\) be an adversary against \(\mathsf {ZMAC} [{\widetilde{E}}]\) making at most q queries of total length (in number of \((n\,+\,t)\)-bit blocks) at most \(\sigma \) and running in time at most \( \texttt {time} \). Then there exists an adversary \(\mathcal{B}\) against \({\widetilde{E}}\) making at most \(\sigma \,+\,4q\,+\,2\) queries and running in time at most \( \texttt {time} +O(\sigma )\) such that

$$ \texttt {Adv}^{ \texttt {prf}}_{\mathsf {ZMAC} [{\widetilde{E}}]}(\mathcal{A}) \le \texttt {Adv}^{ \texttt {tprp}}_{{\widetilde{E}}}(\mathcal{B}) + \frac{2.5\sigma ^2}{2^{n+\min \{n,t\}}} + \frac{4Cq}{2^n}, $$

where the constant \(C>0\) is as specified in Conjecture 1.

Proof

Since \(\mathsf {ZMAC}\) calls the underlying TBC \({\widetilde{E}}\) with a single key K, we can replace \({\widetilde{E}}_K\) by a TURP \(\widetilde{ \textsf {P} }_I:\mathcal{T}_I\times \{0,1\}^n \rightarrow \{0,1\}^n\) and focus on the information-theoretic security of \(\mathsf {ZMAC} [\widetilde{ \textsf {P} }_I]\). Derivation of the computational counterpart is standard.

Let \(G:\mathcal{K}_G\times (\{0,1\}^{n+t})^+ \rightarrow \{0,1\}^{n+t}\) and \(F:\mathcal{K}_F\times \{0,1\}^{n+t} \rightarrow \{0,1\}^{2n}\). Let \( \textsf {CW}3 [G_{K_1},F_{K_2},F_{K_3}]\) be the three-key Carter-Wegman construction with independent keys \((K_1,K_2,K_3)\) as defined by Black and Rogaway [BR05], i.e.,

$$\begin{aligned} \textsf {CW}3 [G_{K_1},F_{K_2},F_{K_3}](M) = {\left\{ \begin{array}{ll} F_{K_2}(G_{K_1}(\texttt {ozp}({M}))) &{} \text { if}~M\in (\{0,1\}^{n+t})^+,\\ F_{K_3}(G_{K_1}(\texttt {ozp}({M}))) &{} \text { otherwise.} \end{array}\right. } \end{aligned}$$

It is easy to see that \(\mathsf {ZMAC} [\widetilde{ \textsf {P} }_I]\) is a instantiation of \( \textsf {CW}3 \). Indeed,

$$ \mathsf {ZMAC} [\widetilde{ \textsf {P} }_I]= \textsf {CW}3 \big [\mathsf {ZHASH} [\widetilde{ \textsf {P} }_I],\mathsf {ZFIN} _0[\widetilde{ \textsf {P} }_I],\mathsf {ZFIN} _4[\widetilde{ \textsf {P} }_I]\big ], $$

and independence between the three components follows from domain separation of tweaks which implies that for distinct integers \(i,j\in \mathcal{I}\), \(\widetilde{ \textsf {P} }_I^i\) and \(\widetilde{ \textsf {P} }_I^j\) are independent TURPs with tweak space \(\mathcal{T}=\{0,1\}^t\). Besides, as already observed in Sect. 4.2, since the masking keys \(L_{\ell }=\widetilde{ \textsf {P} }_I^9(0^t,0^n)\) and \(L_r=\widetilde{ \textsf {P} }_I^9(0^{t-1}1,0^n)\) are uniformly random, one has

$$ \mathsf {ZHASH} [\widetilde{ \textsf {P} }_I] = \mathbb {ZHASH}\big [\textsf {XT} [\widetilde{ \textsf {P} }_I^8,H]\big ], $$

with H as defined by Eq. (5). Hence, by replacing \(\textsf {XT} [\widetilde{ \textsf {P} }_I^8,H]\) by a TURP \(\widetilde{ \textsf {P} }_J:\mathcal{T}_J\times \{0,1\}^n \rightarrow \{0,1\}^n\) and \(\mathsf {ZFIN} _0\), resp. \(\mathsf {ZFIN} _4\) by independent random functions \( \textsf {R} _0\), resp. \( \textsf {R} _1\) from \(\{0,1\}^{n+t}\) to \(\{0,1\}^n\), we have that there exists an adversary \(\mathcal{B}'\) against \(\textsf {XT} [\widetilde{ \textsf {P} }_I^8,H]\) making at most \(\sigma \) queries and an adversary \(\mathcal{B}''\) against \(\mathsf {ZFIN} _{0/4}[\widetilde{ \textsf {P} }_I]\) making at most q queries such that

(6)

where the last inequality follows from Lemmas 3 and 5.

From Lemma 2 of [BR05] and Lemma 4, we have

$$\begin{aligned} \texttt {Adv}^{ \texttt {prf}}_{ \textsf {CW}3 [\mathbb {ZHASH}[\widetilde{ \textsf {P} }_J], \textsf {R} _0, \textsf {R} _1]}(\mathcal{A})&\le \max _{m_1,\dots ,m_q} \sum _{i\ne j} \textsf {Coll} _{\mathbb {ZHASH}[\widetilde{ \textsf {P} }_J]}(n+t,m_i,m_j) \nonumber \\&\le \max _{m_1,\dots ,m_q} \sum _{i\ne j} \frac{4}{2^{n+\min \{n,t\}}} \nonumber \\&\le \frac{2q^2}{2^{n+\min \{n,t\}}}, \end{aligned}$$
(7)

where the maximum is taken over all \(m_1,\dots ,m_q\) such that \(\sum _i m_i=\sigma \). Combining (6) and (7), we obtain the information-theoretic bound.    \(\square \)

4.5 Other Variants of \(\mathsf {ZMAC}\)

\(\mathsf {ZMAC} \) has a wide range of variants, depending on the required level of security. We briefly discuss some of them.

Eliminating The Input-Length Effect. \(\mathsf {ZMAC} \) ensures security as long as the total number of \((n+t)\)-bit blocks \(\sigma \) throughout queries is small compared to \(2^{\min \{n,(n+t)/2\}}\). If one wants to completely remove the effect of the input length as in [Nai15, LN17] (i.e., to get security as long as the number of queries q is small compared to \(2^{\min \{n,(n+t)/2\}}\)), we suggest to use \(\mathbb {ZHASH}\). The underlying TBC \({\widetilde{E}}\) needs to have a tweak space of the form \(\{0,1\}^t\times \mathcal{J}\times \mathcal{I}\), where \(\mathcal{J}=\{1,2,\dots ,B\}\) for some \(B>0\) and \(\mathcal{I}\) is a set of domain separation integers. Here, the effective tweak space of \({\widetilde{E}}\) is \(\{0,1\}^t\times \mathcal{J}\) and the effective tweak-length is \(t'=t+\log _2 B\) bits.

For finalization, we can use \(\mathsf {ZFIN} [{\widetilde{E}}]\) with an adequate domain separation. From Lemma 4, the message hashing has a constant collision probability of \(4/2^{n+\min \{n,t\}}\) for both cases of \(t\le n\) and \(t>n\). The security bounds (for both \(t\le n\) and \(t>n\)) are \(O({q^2}/{2^{n+\min \{n,t\}}})\) plus the PRF bound of \(\mathsf {ZFIN} [{\widetilde{E}}]\), thus, security does not degrade with the total input length.

On the downside, since we waste \(\log _2 B\) effective tweak bits to process the input block index, this mode processes only \(n+t\) input bits per TBC call rather than the optimal amount \(n+t'\). This is a trade-off between efficiency and security.

Birthday Security. If we only require up-to-birthday bound security, then we could simply use \( \textsf {XT} [{\widetilde{E}}]\) in the same manner to PMAC, that is, the message hashing is mostly the same as \(\mathsf {ZHASH} \), however we XOR all TBC outputs \(C_{\ell }\) in Fig. 1 to form the final n-bit output. The finalization is done by a single TBC call with an adequate domain separation, and hashing and finalization are composed by \( \textsf {CW}3 \).

From Lemma 3 and the security proof for (TBC-based) PMAC1 found in [Rog04], this variant has PRF advantage \(O(\sigma ^2/2^{n+\min \{n,t\}} + q^2/2^n)\), which is slightly better than “standard” birthday bound \(O(\sigma ^2/2^n)\). Efficiency is optimal since \(n+t\) input bits are processed per TBC call for any \({\widetilde{E}}\) having effective tweak space of t bits, for any \(t>0\).

5 Application to Authenticated Encryption: \(\mathsf {ZAE}\)

As an application of \(\mathsf {ZMAC}\), we provide an efficient construction of a Deterministic Authenticated Encryption (DAE) scheme [RS06] from a TBC called \(\mathsf {ZAE}\).

Let us briefly recall the syntax and the security definition for a DAE scheme (see [RS06] for details). A DAE scheme \(\mathsf {DAE}\) is a tuple \((\mathcal{K},\mathcal{A}\mathcal{D},\mathcal{M},\mathcal{C},\mathsf {DAE}.\mathsf {Enc},\mathsf {DAE}.\mathsf {Dec})\), where \(\mathcal{K}\), \(\mathcal{A}\mathcal{D}\), \(\mathcal{M}\), and \(\mathcal{C}\) are non-empty sets and \(\mathsf {DAE}.\mathsf {Enc}\) and \(\mathsf {DAE}.\mathsf {Dec}\) are deterministic algorithms. The encryption algorithm \(\mathsf {DAE}.\mathsf {Enc}\) takes as input a key \(K\in \mathcal{K}\), associated data \(AD\in \mathcal{A}\mathcal{D}\), and a plaintext \(M\in \mathcal{M}\), and returns a ciphertext \(C\in \mathcal{C}\). The decryption algorithm \(\mathsf {DAE}.\mathsf {Dec}\) takes as input a key \(K\in \mathcal{K}\), associated data \(AD\in \mathcal{A}\mathcal{D}\), and a ciphertext \(C\in \mathcal{C}\), and returns either a message \(M\in \mathcal{M}\) or the special symbol \(\bot \) indicating that the ciphertext is invalid. We write \(\mathsf {DAE}.\mathsf {Enc}_K(AD,M)\), resp. \(\mathsf {DAE}.\mathsf {Dec}_K(AD,C)\) for \(\mathsf {DAE}.\mathsf {Enc}(K,AD,M)\), resp. \(\mathsf {DAE}.\mathsf {Dec}(K,AD,C)\). As usual, we require that for any tuple \((K,AD,M)\in \mathcal{K}\times \mathcal{A}\mathcal{D}\times \mathcal{M}\), one has

$$ \mathsf {DAE}.\mathsf {Dec}(K,AD,\mathsf {DAE}.\mathsf {Enc}(K,AD,M))=M. $$

The associated data AD is authenticated but not encrypted, and may include a nonce, which is why DAE is sometimes called nonce-misuse resistant authenticated encryption (MRAE), since for such a scheme the repetition of a nonce does not hurt authenticity and only allows the adversary to detect repetitions of inputs (ADM) to the encryption algorithm.

Definition 4

Let \(\mathsf {DAE}\) be a DAE scheme. The advantage of an adversary \(\mathcal{A}\) in breaking the DAE-security of \(\mathsf {DAE}\) is defined as

$$ \texttt {Adv}^{ \texttt {dae}}_{\mathsf {DAE}}(\mathcal{A}) \mathop {=}\limits ^{\tiny {\text {def}}}\left| \Pr [K\mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}}\mathcal{K}: \mathcal{A}^{\mathsf {DAE}.\mathsf {Enc}_K,\mathsf {DAE}.\mathsf {Dec}_K} \Rightarrow 1] - \Pr [\mathcal{A}^{ \$,\bot } \Rightarrow 1]\right| , $$

where oracle \(\$(\cdot ,\cdot )\), on input (ADM), returns a random bit string of lengthFootnote 4 \(|\mathsf {DAE}.\mathsf {Enc}_K(AD,M)|\), and oracle \(\bot (\cdot ,\cdot )\) always returns \(\bot \). The adversary \(\mathcal{A}\) is not allowed to repeat an encryption query or to submit a decryption query (ADC) if a previous encryption query (ADM) returned C.

In addition to \(\mathsf {ZMAC}\), our construction will rely on a (random) IV-based encryption (ivE) scheme \(\mathsf {IVE}\). Such a scheme consists of a tuple \((\mathcal{K},\mathcal{I}\mathcal{V},\mathcal{M},\mathcal{C},\mathsf {IVE}.\mathsf {Enc},\mathsf {IVE}.\mathsf {Dec})\), where \(\mathcal{K}\), \(\mathcal{I}\mathcal{V}\), \(\mathcal{M}\), and \(\mathcal{C}\) are non-empty sets and \(\mathsf {IVE}.\mathsf {Enc}\) and \(\mathsf {IVE}.\mathsf {Dec}\) are deterministic algorithms. The encryption algorithm \(\mathsf {IVE}.\mathsf {Enc}\) takes as input a key \(K\in \mathcal{K}\), an initialization value \(IV\in \mathcal{I}\mathcal{V}\), and a plaintext \(M\in \mathcal{M}\), and returns a ciphertext \(C\in \mathcal{C}\). The decryption algorithm \(\mathsf {IVE}.\mathsf {Dec}\) takes as input a key \(K\in \mathcal{K}\), an IV \(IV\in \mathcal{I}\mathcal{V}\), and a ciphertext \(C\in \mathcal{C}\), and returns a message \(M\in \mathcal{M}\). Given \(K\in \mathcal{K}\), we let \(\mathsf {IVE}.\mathsf {Enc}^\$_K\) denote the randomized algorithm which takes as input \(M\in \mathcal{M}\), draws \(IV\mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}}\mathcal{I}\mathcal{V}\), computes \(C=\mathsf {IVE}.\mathsf {Enc}(K,IV,M)\), and returns (IVC).

Definition 5

Let \(\mathsf {IVE}\) be an IV-based encryption scheme. The advantage of an adversary \(\mathcal{A}\) in breaking the ivE-security of \(\mathsf {IVE}\) is defined as

$$\begin{aligned} \texttt {Adv}^{ \texttt {ive}}_{\mathsf {IVE}}(\mathcal{A}) \mathop {=}\limits ^{\tiny {\text {def}}}\left| \Pr [K\mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}}\mathcal{K}: \mathcal{A}^{\mathsf {IVE}.\mathsf {Enc}^\$_K} \Rightarrow 1] - \Pr [\mathcal{A}^{\$} \Rightarrow 1]\right| , \end{aligned}$$

where oracle \(\$(\cdot )\), on input \(M\in \mathcal{M}\), returns a random bit string of length \(|\mathsf {IVE}.\mathsf {Enc}^\$_K(M)|\).

For our purposes, we consider the IV-based encryption mode \(\mathsf {IVCTRT} \) proposed in [PS16, Appendix B]. This mode uses a TBC \(\widetilde{E}\) with tweak space \(\mathcal{T}'=\{0,1\}^t \times \mathcal{I}\) and message space \(\{0,1\}^n\), and has 2n-bit IVs. We assume \(10 \in \mathcal{I}\) as all calls to \(\widetilde{E}\) in \(\mathsf {IVCTRT} \) will use domain separation integer 10 which is distinct from all those used in \(\mathsf {ZMAC}\). The encryption \(\mathsf {IVCTRT} [\widetilde{E}_K].\mathsf {Enc}(IV,M)\) of a message M with initialization value IV under key K is defined as follows. The IV and the message are parsed as

$$\begin{aligned} (IV[1],IV[2])&\xleftarrow {\scriptscriptstyle n,n} IV \\ (M[1],\dots ,M[m])&\xleftarrow {\scriptscriptstyle n} M. \end{aligned}$$

Let \(IV'[1]=IV[1] \oplus _t 0^t\), i.e., IV[1] is either padded with zeros up to t bits when \(t>n\) or truncated to t bits when \(t\le n\). Then, the ciphertext is \(C=(C[1],\dots ,C[m])\) where \(X\boxplus Y\) denotes t-bit modular addition,

$$\begin{aligned} C[i]&= M[i] \oplus {\widetilde{E}}^{10}_K(IV'[1] \boxplus i,IV[2])&\text {for}~i=1,\ldots ,m-1, \\ C[m]&= M[m] \oplus \texttt {msb}_{|M[m]|}({\widetilde{E}}^{10}_K(IV'[1] \boxplus m,IV[2])). \end{aligned}$$

Our TBC-based BBB-secure DAE mode proposal \(\mathsf {ZAE}\) follows the genericFootnote 5 SIV construction [RS06], where the PRF is instantiated with \(\mathsf {ZMAC}\) and the IV-based encryption mode is instantiated with \(\mathsf {IVCTRT} \).

Let \(\widetilde{E}\) be a TBC with tweak space \(\mathcal{T}'= \{0,1\}^t \times \mathcal{I}\) where \(\mathcal{I}\supseteq \{0,1,\ldots ,10\}\) and message space \(\{0,1\}^n\). The encryption \(\mathsf {ZAE} [\widetilde{E}_K].\mathsf {Enc}(AD,M)\) of a message M with associated data AD under key K is the pair \(C'=(IV,C)\) where

$$\begin{aligned}&IV = \mathsf {ZMAC} [{\widetilde{E}}_K](\texttt {encode}(AD,M)) \\&C = \mathsf {IVCTRT} [{\widetilde{E}}_K].\mathsf {Enc}(IV,M). \end{aligned}$$

The encode function is an injective mapping which pads AD and M independently using the \(\texttt {ozp}({})\) function, so that the bit length of the resulting strings are multiples of \((n+t)\). Then, it concatenates these two strings and appends the n/2-bit representations of the lengths of AD and M (an n-bit representation can naturally be used if more than \(2^{n/2}\) AD and M blocks are possible). The tag (synthetic IV) is 2n bits, which is inevitable for n-bit security of the SIV construction, since a collision of two tags would immediately break the scheme. See Fig. 6 for the pseudocode and Fig. 7 for a graphical representation of \(\mathsf {ZAE}\).

Fig. 6.
figure 6

Pseudocode for the \(\mathsf {ZAE}\) deterministic authenticated encryption scheme. Algorithm \(\mathsf {IVCTRT} [\widetilde{E}_K].\mathsf {Dec}\) is similar to \(\mathsf {IVCTRT} [\widetilde{E}_K].\mathsf {Enc}\) and hence omitted.

The security bound for \(\mathsf {ZAE}\) is given in the following theorem. Here, we let the length of a query (encryption or decryption) be the block length of \(\mathtt {encode}(AD,M)\), where \((IV,C) \xleftarrow {\scriptscriptstyle 2n,|C'|-2n} C'\) and \(M \leftarrow \mathsf {IVCTRT} [\widetilde{E}_K].\mathsf {Dec}(IV,C)\) for a decryption query \((AD,C')\).

Theorem 2

Let \(\widetilde{E}\) be a TBC with tweak space \(\mathcal{T}'= \{0,1\}^t \times \mathcal{I}\) and message space \(\{0,1\}^n\). Let \(\mathcal{A}\) be an adversary attacking \(\mathsf {ZAE} [{\widetilde{E}}]\) making at most q (encryption or decryption) queries, such that the total length of all its queries is at most \(\sigma \) blocks of n bitsFootnote 6, and running in time at most \( \texttt {time} \). Then there exists an adversary \(\mathcal{B}\) against \({\widetilde{E}}\) making at most \(2\sigma + 4q+2\) chosen-plaintext queries and running in time at most \( \texttt {time} +O(\sigma )\) such that

$$\begin{aligned} \texttt {Adv}^{ \texttt {dae}}_{\mathsf {ZAE} [{\widetilde{E}}]}(\mathcal{A})&\le \texttt {Adv}^{ \texttt {tprp}}_{{\widetilde{E}}}(\mathcal{B}) + \frac{3.5\sigma ^2}{2^{n+\min \{n,t\}}} + \frac{4Cq}{2^n} + \frac{q}{2^{2n}}, \end{aligned}$$

where the constant C is from Conjecture 1.

Proof

We prove the information-theoretic security of \(\mathsf {ZAE} [\widetilde{ \textsf {P} }]\) where \(\widetilde{ \textsf {P} }\) is a TURP (the computational counterpart is standard). By Theorem 2 of [RS06], there exists an adversary \(\mathcal{A}'\) attacking \(\mathsf {ZMAC} [\widetilde{ \textsf {P} }]\) and an adversary \(\mathcal{A}''\) attacking \(\mathsf {IVCTRT} [\widetilde{ \textsf {P} }]\), both making at most q queries of total length \(\sigma \), such that

$$\begin{aligned} \texttt {Adv}^{ \texttt {dae}}_{\mathsf {ZAE} [\widetilde{ \textsf {P} }]}(\mathcal{A}) \le \texttt {Adv}^{ \texttt {prf}}_{\mathsf {ZMAC} [\widetilde{ \textsf {P} }]}(\mathcal{A}') + \texttt {Adv}^{ \texttt {ive}}_{\mathsf {IVCTRT} [\widetilde{ \textsf {P} }]}(\mathcal{A}'')+\frac{q}{2^{2n}}. \end{aligned}$$
(8)

According to [PS16, Appendix B], we have

$$ \texttt {Adv}^{ \texttt {ive}}_{\mathsf {IVCTRT} [\widetilde{ \textsf {P} }]}(\mathcal{A}'') \le \frac{\sigma ^2}{2^{n+\min \{n,t\}}}. $$

(In more details, the security bound from [PS16, Appendix B] is \(\sigma ^2/2^{n+t}\) assuming \(IV'[1]\) is uniform in \(\{0,1\}^t\), which is the case here only when \(t\le n\). When \(t>n\), the security bound caps at \(\sigma ^2/2^{2n}\) since only the first n bits of \(IV'[1]\) are random.) The result follows by combining these two equations with Theorem 1. The query complexity of \(\mathcal{B}\) follows from the fact that \(\mathsf {ZAE} \) makes at most 2 TBC calls per n-bit block of input and the complexity of \(\mathsf {ZFIN} \) and masks.    \(\square \)

It is to be noted that for the encryption part \(\mathsf {IVCTRT} \) there is no specific efficiency benefit in having access to a TBC with a larger tweak input than n bits. In contrary, for the \(\mathsf {ZMAC} \) part, there is a direct gain in having a large tweak if this is not too costly (say much smaller than a factor of two), since this increases the amount of input bits per TBC call. In order to optimize performance, one can thus use a TBC with \(t=n\) for the encryption part, but switch to a TBC with \(t>n\) for the MAC part of the scheme, since building a TBC with a large tweak usually leads to (slightly) slower performances than a TBC with a small tweak [JNP14d].

Another direction to further increase performance of \(\mathsf {ZAE}\) in practice, without reducing its security, is to use a counter addition on only \(\min \{n,t\}\) bits instead of t bits, i.e. by redefining \(X\boxplus Y\) for \(Y\in \{1,\dots ,2^{\min \{n,t\}}\}\) to denote

$$ \texttt {msb}_{\min \{n,t\}}(X) + Y \bmod 2^{\min \{n,t\}}\,\Vert \,\texttt {lsb}_{t-\min \{n,t\}}(X), $$

that is, addition over the first \(\min \{n,t\}\) bits and the remaining bits intact. One could even consider having a LFSR-based counter instead of a modular addition based counter to improve hardware implementations. We have not used these improvements in \(\mathsf {ZAE}\) specifications in order to simplify its description.

\(\mathsf {ZAE}\) compares very favorably with existing TBC-based MRAE solutions both in terms of efficiency and security. Indeed, it can process \(n+t\) message bits per TBC call for the MAC part, and n bits per TBC call for the encryption part. Other schemes such as SIV  [RS06], SCT  [PS16], or SIVx  [LN17] can only handle n message bits per TBC call in the MAC part. Moreover, \(\mathsf {ZAE}\) is secure beyond the birthday bound and hence provides better security than SIV (only birthday security) or SCT (only birthday security in the nonce-misuse setting) while leading to better performances.

We remark that \(\mathsf {ZMAC}\) could also be used to improve OCB-like (more precisely its TBC-based generalization \(\Theta \) CB [KR11]) or SCT-like designs: by changing the PMAC-like part that handles the associated data for \(\mathsf {ZMAC}\), one would fully benefit from the efficiency improvement provided by our design.

Fig. 7.
figure 7

The \(\mathsf {ZAE}\) deterministic authenticated encryption scheme with associated data. Note that the n-bit value IV[1] is mapped to the t-bit value \(IV[1] \oplus _t 0^t\) to obtain the initial t-bit counter.

6 MAC and AE Instances

In this section, we give instantiation examples of \(\mathsf {ZMAC}\) and \(\mathsf {ZAE}\). There are many possible ways to build a TBC, but in practice block cipher-based constructions are generally less efficient than ad-hoc TBCs. Since our design leverages heavily the possibilities offered by a large tweak, a candidate such as Threefish  [FLS+10] is not very interesting as it handles only 128 bits of tweak input for a block size of 256/512/1024 bits. The effective efficiency gain would be limited (and Threefish is much slower than AES on current platforms, due to AES-NI instruction sets).

One could also consider using block ciphers with large keys (in comparison to their block size), but as remarked in [JNP14d], it remains unclear if one can generally use the key input of a TBC as tweak input. For example, using AES-256 while allocating half of its key input as tweak is a very bad idea, considering the related-key attacks against AES-256, such as [BKN09].

Recently, Jean et al. [JNP14d] proposed a framework called TWEAKEY and a generic construction STK for building ad-hoc tweakable Substitution-Permutation Network (SPN) ciphers. The authors proposed three TBCs based on the STK framework, Deoxys-BC  [JNP14a], Joltik-BC  [JNP14b], and KIASU-BC  [JNP14c], as part of three candidates for CAESAR authenticated encryption competition [CAE]. In particular, Deoxys-BC is the TBC used in the Deoxys CAESAR candidate (together with the SCT authenticated encryption mode), selected for the third round of the competition. Later, SKINNY  [BJK+16], a lightweight family of TBCs based on similar ideas was proposed.

We will study here the performances of \(\mathsf {ZMAC}\) and \(\mathsf {ZAE}\) when instantiated with Deoxys-BC and the 128-bit block versions of SKINNY. Note that for a key size of 128 bits, both these ciphers offer versions with 128 or 256 bits of tweak input (respectively Deoxys-BC-256/SKINNY-128-256 and Deoxys-BC-384/SKINNY-128-384). It is interesting to compare the respective number of rounds (and thus efficiencies) of these different versions (see Table 2).

Table 2. Number of rounds of Deoxys-BC-256/Deoxys-BC-384, and SKINNY-128-128/SKINNY-128-256/SKINNY-128-384.

This shows the strength of the \(\mathsf {ZMAC}\) general design: for practical ad-hoc TBC constructions, it seems that adding twice more input to the TBC slows down the primitive by a much smaller factor than 2. Thus, we can expect the efficiency to improve with the tweak-length.

6.1 Handling the Domain Separation of TBC Instances

In \(\mathsf {ZMAC}\) and \(\mathsf {ZAE}\), we use several independent TBC instances through domain separation integers. In detail, for \(\mathsf {ZMAC}\), one needs one TBC instance (\({\widetilde{E}}_{K}^9\)) for the generating the masking keys \(L_{\ell }\) and \(L_r\), one instance (\({\widetilde{E}}_{K}^8\)) for the hashing part, 4 instances (\({\widetilde{E}}_{K}^0\), \({\widetilde{E}}_{K}^1\), \({\widetilde{E}}_{K}^2\), \({\widetilde{E}}_{K}^3\)) for the finalization function when the message is a positive multiple of \((n+t)\) bits, and 4 instances (\({\widetilde{E}}_{K}^4\), \({\widetilde{E}}_{K}^5\), \({\widetilde{E}}_{K}^6\), \({\widetilde{E}}_{K}^7\)) for the finalization function when the message is not a positive multiple of \((n+t)\) bits. This sums up to 10 instances. Moreover, \(\mathsf {ZAE}\) requires one more instance (\({\widetilde{E}}_{K}^{10}\)) for the encryption part.

For all instances, encoding can be achieved by simply reserving 4 bits of the tweak input of the TBC. This has the advantage of being very simple and elegant, but it also means that in practice the message block size of \(\mathsf {ZMAC}\) will be a little unusual (as the tweak-length is usually a multiple of the block-length).

Another solution is to separate the instances using distinct field multiplications. This allows the message block size of \(\mathsf {ZMAC}\) to be a multiple of the TBC block size. However, the number of distinct multiplications is non-negligible and will render the implementation much more complex.

Finally, a last solution could be to XOR into the state distinct words that are dependent of the secret key (for example generated just like the masks \(L_{\ell }\) and \(L_r\), but with different plaintext inputs). The advantage is that the implementation is simple and it allows the message block size of \(\mathsf {ZMAC}\) to be a multiple of the TBC block size. However, more precomputations will be needed.

All these solutions represent different possible tradeoffs, and we note that this issue is present for most TBC-based MAC or AE schemes.

Table 3. Estimated efficiencies (in c/B) of various MAC and AE primitives (for (1) long messages and (2) long message with equally long AD) on a Intel Skylake processor. For (2), the input bytes are the sum of message and AD bytes. NR denotes the nonce-respecting scenario. GCM-SIV is proposed by [GL15]. (\(^{\star }\)) Performances are reported for SIV instantiated with a fully parallelizable PRF (e.g., PMAC), while the specifications from [RS06] use a PRF based on CMAC which has a limited parallelizability.

6.2 Efficiency Comparisons

In this subsection, we report the efficiency estimates of our operating modes \(\mathsf {ZMAC}\) and \(\mathsf {ZAE}\), when the TBC is instantiated with Deoxys-BC and SKINNY, while comparing with existing MAC and AE schemes.

We do not perform a comprehensive comparison with schemes combining a (T)BC and a 2n-bit algebraic UHFs, such as a 256-bit variant of GMAC  [MV04]. In principle such schemes can achieve n-bit security. However, the additional implementation of an algebraic UHF would require more resources (memory for software and gates for hardware) than pure (T)BC modes, which is not desirable for the performance across multiple devices. Moreover, the existence of weak-key class for the popular polynomial hash functions, such as [HP08, PC15], can be an issue.

We will consider two scenarios: (1) long messages and (2) long messages with equally long associated data (AD). For these two scenarios, the cost of the precomputations or finalizations can be considered negligible (for benchmarking, we used 65536 bytes for long messages or AD). Moreover, we note that in \(\mathsf {ZMAC}\), the two calls for precomputation can be done in parallel, while the calls in the finalization function \(\mathsf {ZFIN} \) can also all be done in parallel. For modern processors, where parallel encryptions (for bitslice implementations) or pipelined encryptions (for implementations using the AES-NI instructions set) are by far the most efficient strategy, having a finalization composed of four parallel encryption calls (like in \(\mathsf {ZMAC}\)) or a single one (like in SCT) will not make a big difference in terms of efficiency.

On an Intel Skylake processor Intel Core i5-6600, we measure that for long messages AES-128 runs at 0.65 c/B (cycles/Byte), while Deoxys-BC-256 runs at 0.87 c/B, Deoxys-BC-384 runs at 0.99 c/B, SKINNY-128-256 at 4.12 c/B and SKINNY-128-384 at 4.8 c/B. However, these numbers assume that the tweak input of the ciphers is being used as a counter (as in SCT or SIVx). This can make an important difference depending on the TBC considered, especially for ciphers with a heavy key schedule. One can observe [BJK+16] that when the tweak input is considered random (in opposition to being a counter), there is not much efficiency penalty for SKINNY (probably due to the fact that the best SKINNY implementations use high-parallelism bitslice strategy). For Deoxys-BC, we have implemented a random tweak version and compared it with the case where the tweak is used as a counter. We could observe that in the case of AES-NI implementations a penalty factor on efficiency of 1.4 must be taken in account for Deoxys-BC-256, and a factor 1.8 for Deoxys-BC-384. We emphasize that these penalties will probably not appear for other types of implementations (table or bitslice implementations).

Taking into account all these considerations, we compare \(\mathsf {ZMAC}\) and \(\mathsf {ZAE}\) efficiencies with its competitorsFootnote 7 in Table 3. One can see that \(\mathsf {ZMAC}\) is the fastest MAC, while providing n-bit security. Moreover, \(\mathsf {ZAE}\) offers better performances when compared to misuse-resistant competitors, while providing optimal n-bit security, even in nonce-misuse scenario.

It is interesting to note that, as foreseen in previous section, for \(\mathsf {ZAE}\) the maximum speed might be achieved by using a TBC version with a large tweak for the MAC part, and a TBC version with a small tweak for the encryption part (typically Deoxys-BC-384 for the MAC part and Deoxys-BC-256 for the encryption part). This is because \(\mathsf {ZMAC}\) really benefits from using a TBC with a large tweak, while the encryption part is not faster when using a TBC with a large tweak (and a TBC with a large tweak is supposed to be slightly slower).